This tutorial shows how you can conduct linear regression Python Numpy from scratch.

## 1. Math and Matrix of Linear Regression

We can use just use pure matrix calculation to estimate the regression coefficients in a linear regression model. Below is the process.

\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] = \left[ \begin{array} {} b_0+b_1 x_{11} + b_2 x_{21} \\ b_0+b_1 x_{12}+b_2 x_{22} \\ b_0+b_1 x_{13}+ b_2 x_{23} \\..\\b_0+b_1 x_{1n} + b_2 x_{2n} \end{array} \right] = \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} = X B \]

Thus, we can simplify the function above to the function below.

\[ Y = XB \]

We can time X transpose on both sides and get the following.

\[ X^TY = X^TXB \]

Since X^{T }X is a square matrix, we can calculate its inverse matrix and time both sides.

\[ (X^T X)^{-1} X^TY =(X^T X)^{-1} X^T X B\]

Since (X^{T }X)^{-1}X^{T} X is an identity matrix, we can write it as follows.

\[ (X^T X)^{-1} X^TY = B\]

If we change the position of left and right, it will become below. By using the following function, we can calculate the regression coefficients of the linear model.

\[B =(X^TX)^{-1}X^TY\]

Where,

\[ B = \begin{bmatrix} b_0\\ b_1\\ b_2\end{bmatrix} \]

\[ X= \left[ \begin{array} {} 1& x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\..\\1 & x_{1n} & x_{2n} \end{array} \right] \]

\[ Y= \left[ \begin{array} {} y_{11} \\ y_{12} \\ y_{13} \\ ..\\y_{1n} \end{array} \right] \]

## 2. Sample Data for Linear regression

The following is a linear regression model, including household income as `IV`

s and purchase intention as `DV`

.

\[f(x)=b_0 +b_1 \times Price+b_2 \times Household \ Income \]

The following is the hypothetical data, including purchase intention as `DV`

and prices and household income as `IV`

s.

Prices | Household Income | Purchase Intention |
---|---|---|

5 | 7 | 7 |

6 | 5 | 6 |

7 | 4 | 5 |

8 | 6 | 5 |

9 | 3 | 3 |

10 | 3 | 4 |

## 3. Steps of Doing Linear Regression with Python Numpy

Below are 6 steps of using Numpy to estimate the regression coefficients in linear regression models.

### Step 1: Prepare the X matrix and Y vector

```
# Generate the X matrix
import numpy as np
X_rawdata = np.array([np.ones(6),[5,6,7,8,9,10], [7,5,4,6,3,3]])
X_matrix=X_rawdata.T
print("X Matrix:\n", X_matrix)
```

Output:

X Matrix: [[ 1. 5. 7.] [ 1. 6. 5.] [ 1. 7. 4.] [ 1. 8. 6.] [ 1. 9. 3.] [ 1. 10. 3.]]

```
# Generate the Y vector
Y_rawdata = np.array([[7,6,5,5,3,4]])
Y_vector=Y_rawdata.T
print("Y Vector:\n",Y_vector)
```

Output:

Y Vector: [[7] [6] [5] [5] [3] [4]]

### Step 2: Calculate X^{T }and X^{T}X

The following Python code calculates X^{T}.

```
# calculates X^T
X_matrix_T=X_matrix.transpose()
print("X Matrix Transpose:\n",X_matrix_T)
```

Output:

X Matrix Transpose: [[ 1. 1. 1. 1. 1. 1.] [ 5. 6. 7. 8. 9. 10.] [ 7. 5. 4. 6. 3. 3.]]

The following Python code calculates X^{T} X.

```
# calculates X^T X
X_T_X=np.matmul(X_matrix_T,X_matrix)
print(X_T_X)
```

Output:

[[ 6. 45. 28.] [ 45. 355. 198.] [ 28. 198. 144.]]

### Step 3: Calculate (X^{T}X)^{-1}

The following Python code calculates (X^{T} X)^{-1}.

```
# calculates (X^T X)^(-1)
X_T_X_Inv=np.linalg.inv(X_T_X)
print(X_T_X_Inv)
```

Output:

[[22.23134328 -1.74626866 -1.92164179] [-1.74626866 0.14925373 0.13432836] [-1.92164179 0.13432836 0.19589552]]

### Step 4: Calculate (X^{T}X)^{-1}X^{T}Y

The following code calculates (X^{T}X)^{-1}X^{T}Y.

```
# calculates (X^T X)^(-1) X^T Y
X_T_X_Inv@X_matrix_T@Y_vector
```

Output:

array([[ 6.73880597], [-0.44776119], [ 0.34701493]])

### Step 5: Write out the linear regression model

We can see 𝑏₀ = 6.73, 𝑏₁ = -0.45, and *b*_{2} =0.35. We can write the estimated regression function below.

\[f(x)=b_0 +b_1x_1+b_2x_2=6.73-0.45Price+0.35Household Income\]

### Step 6. Use numpy.linalg.lstsq to verify

We can use the Numpy function `numpy.linalg.lstsq`

to verify our calculation above. Below is the Python code for linear regression regression model.

```
# Use numpy.linalg.lstsq to verify
results=np.linalg.lstsq(X_matrix, Y_vector, rcond=None)[0]
print(results)
```

Output:

[[ 6.73880597] [-0.44776119] [ 0.34701493]]

As we can see, it is exactly the same as matrix calculation method shown above. Thus, we know that we did it correctly by using the matrix method.

## 4. Conclusion

This tutorial shows how you can conduct linear regression using Python Numpy from scratch. Thus, we do not need to use any built-in function to do linear regression. Further, we also used numpy.linalg.lstsq to verify our Numpy method, and the result confirmed that our Numpy code was correct.