Machine Learning: Linear Regression in Python (Code Example)

1. Introduction of Cost Function in Machine Learning

Linear regression in machine learning via gradient descent can be used to estimate slope (b1) and intercept (b0) for a linear regression model. The criteria for selecting the right b0 and b1 is to minimize the difference between the estimated y and the observed y.

Example of Linear Regression in Machine Learning (using Gradient Descent)

We can write the criteria for minimizing the difference as follows, which is called the cost function in the machine learning context.

\[ C=\frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y_i})^2 \]

We can write out the predicated y as follows.

\[ \hat{y_i} = b_0 +b_1 x_i \]

Thus, the cost function can be rewritten as follows.

\[ C=\frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y_i})^2 = \frac{1}{n} \sum_{i=1}^{n} (y_i-b_0 -b_1 x_i)^2 \]

2. Iteration Process in Machine Learning for Linear Regression

After knowing the cost function, we can calculate the partial derivatives as follows.

\[ \frac{\partial C}{\partial b_0}= \frac{-2}{n} \sum_{i=1}^{n} (y_i-b_0 -b_1 x_i) \]

\[ \frac{\partial C}{\partial b_1}= \frac{-2}{n} \sum_{i=1}^{n} (y_i-b_0 -b_1 x_i) x_i \]

The algorithm iteratively calculates the next point using a gradient at the current position, then multiplies with a learning rate, η, which controls the step size of moving to the next point.

Then, it subtracts the obtained value (i.e., gradient*learning rate) from the current position. The process of jumping from the last position (n) to the new position (n+1) is called making a step. And, this process can be written as follows.

\[ b_{0 (n+1)} = b_{0 (n)} – \eta \frac{\partial C}{\partial b_{0 (n)}} \]

\[ b_{1 (n+1) } = b_{1 (n)} – \eta \frac{\partial C}{\partial b_{1 (n)}} \]

They can be rewritten as follows.

\[ b_{0 (n+1)} = b_{0 (n)} – \eta \frac{\partial C}{\partial b_{0 (n)}}= b_{0 (n) } – \eta (\frac{-2}{n} \sum_{i=1}^{n} (y_i-b_{0 (n)} -b_{1 (n)} x_i) ) \]

\[ b_{1 (n+1) } = b_{1 (n)} – \eta \frac{\partial C}{\partial b_{1 (n)}}= b_{0 (n)} – \eta (\frac{-2}{n} \sum_{i=1}^{n} (y_i-b_{0 (n)} -b_{1 (n)} x_i) x_i ) \]

3. Python Code for Linear Regression in Machine Learning

# import numpy
import numpy as np

# defining the function to update the steps
def updating_steps(x, y, b_1, b_0, learning_rate):
    b1_deriv = 0
    b0_deriv = 0
    n_number = len(x)

    y_predicted=b_0 + b_1*x 
    b0_deriv = -2*np.sum(y - y_predicted) 
    b1_deriv = -2* - y_predicted),x)

    b_1 -= (b1_deriv/n_number)*learning_rate
    b_0 -= (b0_deriv/n_number)*learning_rate

# iteration process of finding the coefficients
def prediction(x, y, b_1, b_0, learning_rate, iters):
    b_0_history = []
    b_1_history = []
    for i in range(iters):
        b_0, b_1= updating_steps(x, y, b_1, b_0, learning_rate)
        if i % 100 == 0:
            print(i,"b_0=",b_0, "b_1=",b_1)


We can then download data and apply the prediction function. We are going to use the following model, using the radio to predict sales.

sales = b0+b1radio

Thus, we use the following Python code to estimate b0 and b1. Note that, this dataset is originally from the book An Introduction to Statistical Learning.

# download data from Github
import pandas as pd

# apply the prediction function
b_0_history,b_1_history=prediction(x=df_train['radio'], y=df_train['sales'], b_1=0,b_0=4, learning_rate=0.001,iters=10000)

The following is the partial output:

0 b_0= 4.020045 b_1= 0.5551519
100 b_0= 4.310919241910213 b_1= 0.3555198433855029
200 b_0= 4.591006211887993 b_1= 0.3469490665522249
300 b_0= 4.85540569511566 b_1= 0.33885833328548937
[do not print this part, to save space]
9600 b_0= 9.290698724706752 b_1= 0.2031365367669492
9700 b_0= 9.291871525057088 b_1= 0.2031006485923814
9800 b_0= 9.292978637632094 b_1= 0.20306677049083868
9900 b_0= 9.294023741560824 b_1= 0.20303478987945422

4. Plot Iteration Process for Linear Regression in Machine Learning

# import matplotlib
from matplotlib import pyplot as plt

# set the size of the figure
plt.rcParams['figure.figsize'] = [10, 6]

# plot the iteration process for b_1


Iteration Process for b_1 for Linear Regression in Machine Learning
Iteration Process for b_1 for Linear Regression in Machine Learning
# plot the iteration process for b_0


Iteration Process for b_0 for Linear Regression in Machine Learning
Iteration Process for b_0 for Linear Regression in Machine Learning

5. Conclusion and Compare to Ordinary Least Square (OLS)

We can see that b0 converges to 9.234, and b1 converges to 0.203. We can write the model statement below for linear regression using gradient descent.

sales = b0+b1radio=9.234 + 0.203 radio

We can compare it to the approach of Ordinary Least Square (OLS). Below, we use the OLS function in Numpy to calculate the regression coefficients.

# add 1s into the array

# use the OLS function in Numpy 
results_1=np.linalg.lstsq(x_array, df_train['sales'], rcond=None)[0]

# print out the result


[9.3116381 0.20249578]

Thus, we can see that b0 is 9.312, and b1 converges to 0.202. Thus, they are pretty close to the gradient descent machine learning method.

Further Reading

Leave a Comment