Linear to Logistic Regression — Classification (Part — III)
Coming to the final part of the Logistic Regression. In this article, I will cover the optimization process using the loss function discussed in (Part II).
Loss function for binary class classification also described as Cross-Entropy is defined as under
Where w0 and w1 represents weights, y represents actual output and p represents the output from sigmoid function (Part I).
Loss function for multi-class will be extension of Eq. (7), also described as categorical cross entropy and given below
Where p represents the output from SoftMax function and w(bar) represents the weight vectors.
SoftMax is non-linear function similar to Sigmoid and used for multiple class classification (>2). The equation is as under
Where p is output, is input vector, C is number of classes.
Coming to the optimization part of Binary Class Classification, Gradient descent is used to calculate the correction required in weights to further update the weights and thus find the minima of loss function. Since Cross-entropy fulfill the condition to be used with Gradient Descent, for more details refer (Gradient Descent).
As per the Gradient Descent algorithm, partial derivative of Loss function w.r.t w0 and w1 is required with weights update equation, refer Eq. (10–13)
Let’s take datapoints and use the above four equations to demonstrate the process
All equations altogether for reference
Calculate Loss function and move to next iteration if required