Use sklearn for Logistic Regression In Python

This tutorial shows how to use sklearn for logistic regression in Python.

Logistic regression is a model testing the relationship between Y (which is as a binary variable) and X (X can be more than one). logistic regression is also called logit regression. The following is the syntax.

LogisticRegression().fit(x, y.values.ravel())

Step 1: Data Sample

Suppose we would like to predict how age and household income impact whether consumers buy a certain brand (1 = bought it before vs. 0 = never bought it).

We can ask like 10 people about their age and household income (7=much more than the average, 4=the average, 1=much lower than the average), as well as whether they have bought this brand.

The following is the hypothetical data.

Buy or NotHousehold IncomeAge
1726
1623
0529
1528
0350
0460
1245
1219
0236
0045
Logistic Regression Data

The following is the Python to reproduce the data shown above.

import pandas as pd

Buy_or_not=(1,1,0,1,0,0,1,1,0,0)
HouseholdIncome=(7,6,5,5,3,4,2,2,2,0)
Age=(26,23,29,28,50,60,45,19,36,45)


x_df = pd.DataFrame(
    {'HouseholdIncome':HouseholdIncome,
        'Age': Age})
print(x_df)
y_df = pd.DataFrame(
    {'Buy_or_not': Buy_or_not})
print(y_df)

Output:

   HouseholdIncome  Age
0                7   26
1                6   23
2                5   29
3                5   28
4                3   50
5                4   60
6                2   45
7                2   19
8                2   36
9                0   45
   Buy_or_not
0           1
1           1
2           0
3           1
4           0
5           0
6           1
7           1
8           0
9           0

Step 2: Use sklearn for Logistic Regression In Python

After having the data sample, we can use sklearn for logistic regression in Python. The following is the actual Python code example.

import pandas as pd
from sklearn.linear_model import LogisticRegression

# Use sklearn for Logistic Regression
model = LogisticRegression().fit(x_df, y_df.values.ravel())
# print the intercept
print(model.intercept_)
# print the regression coefficients
print(model.coef_)

The following is the output, which shows the regression coefficients.

[3.95803785]
[[ 0.11845729 -0.12390002]]

We can also write the logistic regression function below.

\[log\frac{p(y=1)}{1-p(y=1)}=\beta_0 +\beta_1x_1+\beta_2x_2\]

\[log\frac{p(bought-it-before)}{1-p(bought-it-before)}=3.96+0.12 Household Income – 0.12 Age\]


Further Reading