This tutorial shows how you can conduct linear regression Python Numpy from scratch.
1. Math and Matrix of Linear Regression
We can use just use pure matrix calculation to estimate the regression coefficients in a linear regression model. Below is the process.
\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] = \left[ \begin{array} {} b_0+b_1 x_{11} + b_2 x_{21} \\ b_0+b_1 x_{12}+b_2 x_{22} \\ b_0+b_1 x_{13}+ b_2 x_{23} \\..\\b_0+b_1 x_{1n} + b_2 x_{2n} \end{array} \right] = \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} = X B \]
Thus, we can simplify the function above to the function below.
\[ Y = XB \]
We can time X transpose on both sides and get the following.
\[ X^TY = X^TXB \]
Since XT X is a square matrix, we can calculate its inverse matrix and time both sides.
\[ (X^T X)^{-1} X^TY =(X^T X)^{-1} X^T X B\]
Since (XT X)-1XT X is an identity matrix, we can write it as follows.
\[ (X^T X)^{-1} X^TY = B\]
If we change the position of left and right, it will become below. By using the following function, we can calculate the regression coefficients of the linear model.
\[B =(X^TX)^{-1}X^TY\]
Where,
\[ B = \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} \]
\[ X= \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \]
\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] \]
2. Sample Data for Linear regression
The following is a linear regression model, including household income as IV
s and purchase intention as DV
.
\[f(x)=b_0 +b_1 \times Price+b_2 \times Household \ Income \]
The following is the hypothetical data, including purchase intention as DV
and prices and household income as IV
s.
Prices | Household Income | Purchase Intention |
---|---|---|
5 | 7 | 7 |
6 | 5 | 6 |
7 | 4 | 5 |
8 | 6 | 5 |
9 | 3 | 3 |
10 | 3 | 4 |
3. Steps of Doing Linear Regression with Python Numpy
Below are 6 steps of using Numpy to estimate the regression coefficients in linear regression models.
Step 1: Prepare the X matrix and Y vector
# Generate the X matrix
import numpy as np
X_rawdata = np.array([np.ones(6),[5,6,7,8,9,10], [7,5,4,6,3,3]])
X_matrix=X_rawdata.T
print("X Matrix:\n", X_matrix)
Output:
X Matrix: [[ 1. 5. 7.] [ 1. 6. 5.] [ 1. 7. 4.] [ 1. 8. 6.] [ 1. 9. 3.] [ 1. 10. 3.]]
# Generate the Y vector
Y_rawdata = np.array([[7,6,5,5,3,4]])
Y_vector=Y_rawdata.T
print("Y Vector:\n",Y_vector)
Output:
Y Vector: [[7] [6] [5] [5] [3] [4]]
Step 2: Calculate XT and XTX
The following Python code calculates XT.
# calculates X^T
X_matrix_T=X_matrix.transpose()
print("X Matrix Transpose:\n",X_matrix_T)
Output:
X Matrix Transpose: [[ 1. 1. 1. 1. 1. 1.] [ 5. 6. 7. 8. 9. 10.] [ 7. 5. 4. 6. 3. 3.]]
The following Python code calculates XT X.
# calculates X^T X
X_T_X=np.matmul(X_matrix_T,X_matrix)
print(X_T_X)
Output:
[[ 6. 45. 28.] [ 45. 355. 198.] [ 28. 198. 144.]]
Step 3: Calculate (XTX)-1
The following Python code calculates (XT X)-1.
# calculates (X^T X)^(-1)
X_T_X_Inv=np.linalg.inv(X_T_X)
print(X_T_X_Inv)
Output:
[[22.23134328 -1.74626866 -1.92164179] [-1.74626866 0.14925373 0.13432836] [-1.92164179 0.13432836 0.19589552]]
Step 4: Calculate (XTX)-1XTY
The following code calculates (XTX)-1XTY.
# calculates (X^T X)^(-1) X^T Y
X_T_X_Inv@X_matrix_T@Y_vector
Output:
array([[ 6.73880597], [-0.44776119], [ 0.34701493]])
Step 5: Write out the linear regression model
We can see 𝑏₀ = 6.73, 𝑏₁ = -0.45, and b2 =0.35. We can write the estimated regression function below.
\[f(x)=b_0 +b_1x_1+b_2x_2=6.73-0.45Price+0.35Household Income\]
Step 6. Use numpy.linalg.lstsq to verify
We can use the Numpy function numpy.linalg.lstsq
to verify our calculation above. Below is the Python code for linear regression regression model.
# Use numpy.linalg.lstsq to verify
results=np.linalg.lstsq(X_matrix, Y_vector, rcond=None)[0]
print(results)
Output:
[[ 6.73880597] [-0.44776119] [ 0.34701493]]
As we can see, it is exactly the same as matrix calculation method shown above. Thus, we know that we did it correctly by using the matrix method.
4. Conclusion
This tutorial shows how you can conduct linear regression using Python Numpy from scratch. Thus, we do not need to use any built-in function to do linear regression. Further, we also used numpy.linalg.lstsq to verify our Numpy method, and the result confirmed that our Numpy code was correct.