Gradient Descent — Regression
Gradient Descent (GD), for regression, is an optimization algorithm to calculate the best weight values so as to fit in the data. It uses error function as a reference to calculate the best weight values.
Before moving forward, lets understand the meaning of both the terms. Gradient means change in values of function (say) and Descent means moving down. Combination of both Gradient and Descent represents change of value in downward direction. Therefore, this algorithm calculates the gradient of target function and decrease the value of target variable.
There are two main requirements for Gradient Descent to work well.
a. Differentiable
b. Convex
First requirement is target function must be differentiable, that means it has gradient at each point. Usually function with continuity comes under this category. Second, it must be of convex shape that means it has one minima or global and local minima are same.
Once the requirements are fulfilled, Gradient Descent can be used to process the target function to calculate the final value of target variable. It is an iterative process and steps are given as under
(i) Calculate gradient ∆L at initial point of function L(w) (say)
(ii) Subtract the gradient ∆L from the target variable w (say)
(iii) Calculate the new value of L(w)
(iv) Repeat steps (i to iii) until, a. Sufficient number of iterations is achieved or b. Function value stop decreasing
Same theory is extendable to multi-variable. First derivative is used for single variable whereas partial derivative is used for multi-variable.
Mean Square Error (MSE) does fulfil the two requirements whereas Mean Absolute Error doesn’t. For using MAE with Gradient Descent, please refer other methods to calculate gradient. Let’s take an example of Regression with MSE
Recalling Regression equation and putting them into MSE equation
Since it has two variable, partial derivative has to be used w.r.t w0 and w1. The partial derivatives are
Repeat the process until fulfil either of the conditions mentioned above in (iv). Once weights are finalized, use it in equation (1) to predict the value of ybar.
Next: Classification