This tutorial shows how you can calculate predicted Y (or, estimated Y) in linear regression in Python.
Steps of Calculating Predicated Y in Linear Model in Python
Step 1: Prepare data, X and Y
# Import numpy
import numpy as np
# Create a numpy array of data:
X = np.array([5, 2, 3, 4, 10, 11, 14]).reshape(-1, 1)
Y = np.array([3, 1, 2, 5, 14, 15, 16]).reshape(-1, 1)
# Print out X and Y
print('X:\n',X)
print('Y:\n', Y)
Output:
X: [[ 5] [ 2] [ 3] [ 4] [10] [11] [14]] Y: [[ 3] [ 1] [ 2] [ 5] [14] [15] [16]]
Step 2: Apply LinearRegression() from sklearn
We can then apply LinearRegression() from sklearn to estimate the linear model. Specifcally, it estimates the intercept and slope for the linear model.
# Import LinearRegression from sklearn
from sklearn.linear_model import LinearRegression
# Create a shorter name for LinearRegression()
lm = LinearRegression()
# Use fit() in lm() and save to 'result'
result = lm.fit(X, Y)
# Print out intercept:
print('Intercept=', result.intercept_)
# Print out slope:
print('Slope=',result.coef_)
Output:
Intercept= [-1.84375] Slope= [[1.40625]]
Step 3. Calculate the predicated Y (or, estimated Y)
We can use the predict() to calculate the predicted Y. The following is the code.
# Calculate predicted Y
predicted_Y =result.predict(x)
# Print Out predicted Y
print('Predicted Y:', predicted_Y, sep='\n')
Output:
Predicted Y: [[ 5.1875 ] [ 0.96875] [ 2.375 ] [ 3.78125] [12.21875] [13.625 ] [17.84375]]
Step 4: Combine observed data and predicted Y
We can combine both observed X and Y and predicted Y into a same dataframe. This step is optional.
# import pandas
import pandas as pd
# combine observed X and Y and predicted Y into the same dataframe (optional step)
df = pd.DataFrame ({'X':X.ravel(),'Y':Y.ravel(),'predicted_Y':predicted_Y.ravel()})
# print out the dataframe
print (df)
Output:
X Y predicted_Y 0 5 3 5.18750 1 2 1 0.96875 2 3 2 2.37500 3 4 5 3.78125 4 10 14 12.21875 5 11 15 13.62500 6 14 16 17.84375