Linear to Logistic Regression — Classification (Part — II)

Vishal Chaudhary, PhD
4 min readAug 1, 2024

--

Referring Eq. (3), from the previous article (Part — I) and provided again for convenience below, sigmoid function as a function of linear equation gives value between (0–1).

Where w0 and w1 represents the weights, and x is input variable.

Let’s visualize the sigmoid function with binary class data-points, refer Fig. (5). Data points are very nicely covered by the s-curve and it is very convenient now to define the threshold (say — 0.5) and classify the data, e.g. — Covid positive, if > 0.5 otherwise, Covid negative. Addition of outliers also doesn’t have impact on threshold value.

Fig. (5) (source — [1])

Next step is training the model while optimizing the weights. Following the traditional supervised learning, weights are initialized and data is passed through it. Output of linear equation is processed with sigmoid function and corresponding value is calculated. Based on rules defined, it will be defined as 1 or 0.

These classified outputs are compared with actual outputs. The error is further utilized to correct the weights using backpropagation. Gradient descent technique is used for optimization of weights. In next iteration, data passed and processed with modified weights and cycle continues.

Next task is to select the cost function. In Linear Regression, MSE was the best choice to be used as cost function, but can we use it for Logistic regression too?

Can we use ‘MSE’ for Logistic regression too? Let’s evaluate

Referring to the MSE equation below (Eq. (4)) and replace ybar with Eq. (3), the resultant graph would be with multiple local minima, refer Fig. (6). Such a situation wouldn’t allow the optimizing algorithm to identify the global minima.

Fig. (6) (source — [2])

Log loss can be better choice to represents the cost function for logistic regression. Log loss is natural logarithm of probabilities from Eq. (3) and given as under

Where p varies between (0–1).

Let’s calculate the log loss for various p’s

Log loss varies between infinity to 0 and all are negative values. It will convenient to take negative of log loss to analyze it, refer Table — 2. The graph of same is as under

Fig. (7)

Similarly, we can observe the behavior of log loss w.r.t (1-p). The equation is defined below with datapoints in Table — 3 and graph in Fig. (8).

Fig. (8)

Considering both log-loss and log-loss invert together and observe the graph. Datapoints and graph are given below in Table — 4 and Fig. (9).

Fig. (9)

In Fig. (9), blue line represents -ln(p) and loss decreases as the probability increases from 0 to 1. Similarly, orange line represents -ln(1-p) and loss increases as the probability increases from 0 to 1. Now we can relate both blue and orange lines with penalization in terms or loss function. Let’s take actual output (y) cases

Case — I: y=1, blue line fits well for this case. When p = 1 from the sigmoid function, log-loss will be 0 as both actual and predicted outputs are same. Similarly, when p = 0, log-loss will be high as both have different outputs, thus penalization as per the difference between both.

Case — II: y=0, orange line fits well for this case. When p = 0 from the sigmoid function, log-loss will be 0 as both actual and predicted outputs are same. Similarly, when p = 1, log-loss will be high as both have different outputs, thus penalization as per the difference between both.

To cover both the cases, we have to sum both log-loss and log-loss invert expression. The same is defined as below

Where w0 and w1 represents weights, y represents actual output and p represents the output from sigmoid function.

References

[1] https://static.javatpoint.com/tutorial/machine-learning/images/logistic-regression-in-machine-learning.png

[2] https://postimg.cc/9R5VBqBk

--

--

Vishal Chaudhary, PhD
Vishal Chaudhary, PhD

Written by Vishal Chaudhary, PhD

Computer Vision | Video Analytics | Thermal Imaging | Research and Development | AI & ML |

No responses yet