Linear to Logistic Regression — Classification (Part — III)

3 min readAug 23, 2024

Coming to the final part of the Logistic Regression. In this article, I will cover the optimization process using the loss function discussed in (Part II).

Loss function for binary class classification also described as Cross-Entropy is defined as under

Where w0 and w1 represents weights, y represents actual output and p represents the output from sigmoid function (Part I).

Loss function for multi-class will be extension of Eq. (7), also described as categorical cross entropy and given below

Where p represents the output from SoftMax function and w(bar) represents the weight vectors.

SoftMax is non-linear function similar to Sigmoid and used for multiple class classification (>2). The equation is as under

Where p is output, is input vector, C is number of classes.

Coming to the optimization part of Binary Class Classification, Gradient descent is used to calculate the correction required in weights to further update the weights and thus find the minima of loss function. Since Cross-entropy fulfill the condition to be used with Gradient Descent, for more details refer (Gradient Descent).

As per the Gradient Descent algorithm, partial derivative of Loss function w.r.t w0 and w1 is required with weights update equation, refer Eq. (10–13)