# Linear Regression: Python Numpy Implementation from Scratch

This tutorial shows how you can conduct linear regression Python Numpy from scratch.

## 1. Math and Matrix of Linear Regression

We can use just use pure matrix calculation to estimate the regression coefficients in a linear regression model. Below is the process.

$Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] = \left[ \begin{array} {} b_0+b_1 x_{11} + b_2 x_{21} \\ b_0+b_1 x_{12}+b_2 x_{22} \\ b_0+b_1 x_{13}+ b_2 x_{23} \\..\\b_0+b_1 x_{1n} + b_2 x_{2n} \end{array} \right] = \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} = X B$

Thus, we can simplify the function above to the function below.

$Y = XB$

We can time X transpose on both sides and get the following.

$X^TY = X^TXB$

Since XT X is a square matrix, we can calculate its inverse matrix and time both sides.

$(X^T X)^{-1} X^TY =(X^T X)^{-1} X^T X B$

Since (XT X)-1XT X is an identity matrix, we can write it as follows.

$(X^T X)^{-1} X^TY = B$

If we change the position of left and right, it will become below. By using the following function, we can calculate the regression coefficients of the linear model.

$B =(X^TX)^{-1}X^TY$

Where,

$B = \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix}$

$X= \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right]$

$Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right]$

## 2. Sample Data for Linear regression

The following is a linear regression model, including household income as IVs and purchase intention as DV.

$f(x)=b_0 +b_1 \times Price+b_2 \times Household \ Income$

The following is the hypothetical data, including purchase intention as DV and prices and household income as IVs.

## 3. Steps of Doing Linear Regression with Python Numpy

Below are 6 steps of using Numpy to estimate the regression coefficients in linear regression models.

### Step 1: Prepare the X matrix and Y vector

# Generate the X matrix
import numpy as np
X_rawdata = np.array([np.ones(6),[5,6,7,8,9,10], [7,5,4,6,3,3]])
X_matrix=X_rawdata.T
print("X Matrix:\n", X_matrix)

Output:

X Matrix:
[[ 1.  5.  7.]
[ 1.  6.  5.]
[ 1.  7.  4.]
[ 1.  8.  6.]
[ 1.  9.  3.]
[ 1. 10.  3.]]
# Generate the Y vector
Y_rawdata = np.array([[7,6,5,5,3,4]])
Y_vector=Y_rawdata.T
print("Y Vector:\n",Y_vector)

Output:

Y Vector:
[




]

### Step 2: Calculate XT and XTX

The following Python code calculates XT.

# calculates X^T
X_matrix_T=X_matrix.transpose()
print("X Matrix Transpose:\n",X_matrix_T)

Output:

X Matrix Transpose:
[[ 1.  1.  1.  1.  1.  1.]
[ 5.  6.  7.  8.  9. 10.]
[ 7.  5.  4.  6.  3.  3.]]

The following Python code calculates XT X.

# calculates X^T X
X_T_X=np.matmul(X_matrix_T,X_matrix)
print(X_T_X)

Output:

[[  6.  45.  28.]
[ 45. 355. 198.]
[ 28. 198. 144.]]

### Step 3: Calculate (XTX)-1

The following Python code calculates (XT X)-1.

# calculates (X^T X)^(-1)
X_T_X_Inv=np.linalg.inv(X_T_X)
print(X_T_X_Inv)

Output:

[[22.23134328 -1.74626866 -1.92164179]
[-1.74626866  0.14925373  0.13432836]
[-1.92164179  0.13432836  0.19589552]]

### Step 4: Calculate (XTX)-1XTY

The following code calculates (XTX)-1XTY.

# calculates (X^T X)^(-1) X^T Y
X_T_X_Inv@X_matrix_T@Y_vector

Output:

array([[ 6.73880597],
[-0.44776119],
[ 0.34701493]])

### Step 5: Write out the linear regression model

We can see 𝑏₀ = 6.73, 𝑏₁ = -0.45, and b2 =0.35. We can write the estimated regression function below.

$f(x)=b_0 +b_1x_1+b_2x_2=6.73-0.45Price+0.35Household Income$

### Step 6. Use numpy.linalg.lstsq to verify

We can use the Numpy function numpy.linalg.lstsq to verify our calculation above. Below is the Python code for linear regression regression model.

# Use numpy.linalg.lstsq to verify
results=np.linalg.lstsq(X_matrix, Y_vector, rcond=None)
print(results)

Output:

[[ 6.73880597]
[-0.44776119]
[ 0.34701493]]

As we can see, it is exactly the same as matrix calculation method shown above. Thus, we know that we did it correctly by using the matrix method.

## 4. Conclusion

This tutorial shows how you can conduct linear regression using Python Numpy from scratch. Thus, we do not need to use any built-in function to do linear regression. Further, we also used numpy.linalg.lstsq to verify our Numpy method, and the result confirmed that our Numpy code was correct.