What is Gradient in Machine Learning? Definition and Examples

In simpler terms, a gradient is a slope. You can calculate a function’s first-order derivative to get its gradient. This tutorial explains gradients with an example and plots. For instance, the following is a function of x, and the function can be plotted on a two-dimension x-y space.

$\ y=x^2+6x+10$

If you calculate its first-order derivative, you can get the following function.

$\ \frac{dy}{dx}=2x+6$

• For x = -3, its slope is 0. That is, the gradient is 0 for the point of (-3, 1).
• For x = 5, its slope is 16. That is, the gradient is 16 for the point (5, 65).

What does the slope of 16 mean? It means for the change of 1 unit on the x-axis, the y-axis is 16. Literally, it is the definition of slope.

$\ \frac{\triangle y}{\triangle x}=16$

For a function with 2 independent variables (x, y) and 1 dependent variable (z), it will be 3-dimension if we plot it (x-y-z). In this case, the gradient still is the slope, but such a slope is determined by 2 parameters or factors (i.e., x and y). The following is an example of 3-dimension function.

$\ z=f(x, y) = x^2+y^2$

If we plot it, it looks as follows.

$\nabla f(x, y)=\left[ \begin{array} {} 2x \\ 2y \end{array} \right]$

Since we get the function of the gradient, we can get a specific gradient for any point. For instance, for the point of (4, -6), its gradient is as follows.

$\nabla f(2, 3)=\left[ \begin{array} {} 8 \\ -12 \end{array} \right]$

However, in such a 3-dimensional space, it is difficult to visually understand what the slope would look like. This is a bit different from the 2-dimensional space shown earlier.

From a more generalized perspective (i.e., with more than 2 IVs), we can write its gradient as a vector, a combination of all the partial derivatives. To calculate the gradient, you just need to insert the value of p(x, y, w...) to the vector.

$\nabla f(p)=\left[ \begin{array} {} \frac{\partial f}{\partial x} f(p) \\ \frac{\partial f}{\partial y} f(p) \\\frac{\partial f}{\partial w} f(p) \\ … \end{array} \right]$

The following is an example.

$\ z=f(x, y, w) = x^2+y^2+w^2$

$\nabla f(p)=\left[ \begin{array} {} 2x \\ 2y \\ 2w \end{array} \right]$