This tutorial shows how to use sklearn for logistic regression in Python.
Logistic regression is a model testing the relationship between Y (which is as a binary variable) and X (X can be more than one). logistic regression is also called logit regression. The following is the syntax.
LogisticRegression().fit(x, y.values.ravel())
Step 1: Data Sample
Suppose we would like to predict how age and household income impact whether consumers buy a certain brand (1 = bought it before vs. 0 = never bought it).
We can ask like 10 people about their age and household income (7=much more than the average, 4=the average, 1=much lower than the average), as well as whether they have bought this brand.
The following is the hypothetical data.
Buy or Not | Household Income | Age |
---|---|---|
1 | 7 | 26 |
1 | 6 | 23 |
0 | 5 | 29 |
1 | 5 | 28 |
0 | 3 | 50 |
0 | 4 | 60 |
1 | 2 | 45 |
1 | 2 | 19 |
0 | 2 | 36 |
0 | 0 | 45 |
The following is the Python to reproduce the data shown above.
import pandas as pd
Buy_or_not=(1,1,0,1,0,0,1,1,0,0)
HouseholdIncome=(7,6,5,5,3,4,2,2,2,0)
Age=(26,23,29,28,50,60,45,19,36,45)
x_df = pd.DataFrame(
{'HouseholdIncome':HouseholdIncome,
'Age': Age})
print(x_df)
y_df = pd.DataFrame(
{'Buy_or_not': Buy_or_not})
print(y_df)
Output:
HouseholdIncome Age 0 7 26 1 6 23 2 5 29 3 5 28 4 3 50 5 4 60 6 2 45 7 2 19 8 2 36 9 0 45 Buy_or_not 0 1 1 1 2 0 3 1 4 0 5 0 6 1 7 1 8 0 9 0
Step 2: Use sklearn for Logistic Regression In Python
After having the data sample, we can use sklearn for logistic regression in Python. The following is the actual Python code example.
import pandas as pd
from sklearn.linear_model import LogisticRegression
# Use sklearn for Logistic Regression
model = LogisticRegression().fit(x_df, y_df.values.ravel())
# print the intercept
print(model.intercept_)
# print the regression coefficients
print(model.coef_)
The following is the output, which shows the regression coefficients.
[3.95803785] [[ 0.11845729 -0.12390002]]
We can also write the logistic regression function below.
\[log\frac{p(y=1)}{1-p(y=1)}=\beta_0 +\beta_1x_1+\beta_2x_2\]
\[log\frac{p(bought-it-before)}{1-p(bought-it-before)}=3.96+0.12 Household Income – 0.12 Age\]