The function of Poission() from statsmodels can be used to do Poisson regression in Python. The key Python code is as follows.

import statsmodels.api as sm sm.GLM(Y, X, family=sm.families.Poisson()).fit()

This tutorial uses a dataset posted by Paul Roback and Julie Legler. Two variables are the focus, namely **age** and **total**. The following are detailed explanations for these two variables.

= the age of the head of household**age**

= the number of people in the household other than the head**total**

We are going to see if ** age** can be used to predict the number of people in a household (i.e.,

**).**

*total*```
import pandas as pd
# read the csv file from Github
df=pd.read_csv("https://raw.githubusercontent.com/proback/BeyondMLR/master/data/fHH1.csv",index_col=0)
# print out the first 6 row of the data frame
print(df.head(6))
```

The following shows sample rows of the data that we are going to use in this tutorial.

location age total numLT5 roof 1 CentralLuzon 65 0 0 Predominantly Strong Material 2 MetroManila 75 3 0 Predominantly Strong Material 3 DavaoRegion 54 4 0 Predominantly Strong Material 4 Visayas 49 3 0 Predominantly Strong Material 5 MetroManila 74 3 0 Predominantly Strong Material 6 Visayas 59 6 0 Predominantly Strong Material

## Step 1: Prepare data

The following Python code prepares X (**age** with constant) and Y (**total**, renamed with to household size).

```
# add constant to the column of age and save it as a new column
age = sm.add_constant(data_HH['age'])
# save column 'column' as a new column, and renamed as household_size
household_size=df['total']
# print out the first 6 rows for both age and household_size
print(age.head(6))
print(household_size.head(6))
```

The following is the output. The first part is **age with constant**, and the second part is **household-size**.

# age with constant 1s const age 0 1.0 65 1 1.0 75 2 1.0 54 3 1.0 49 4 1.0 74 5 1.0 59 # household size 1 0 2 3 3 4 4 3 5 3 6 6 Name: total, dtype: int64

## Step 2: Use statsmodels

The following uses statsmodels to do the Poisson regression and then print out the result.

```
import statsmodels.api as sm
# use GLM() function to do the Poisson regression
result_Poisson=sm.GLM(household_size, age, family=sm.families.Poisson()).fit()
# print out the result
print(result_Poisson.summary())
```

The following is the output of Poisson regression in Python.

Generalized Linear Model Regression Results ============================================================================== Dep. Variable: total No. Observations: 1500 Model: GLM Df Residuals: 1498 Model Family: Poisson Df Model: 1 Link Function: log Scale: 1.0000 Method: IRLS Log-Likelihood: -3355.0 Date: Tue, 24 Jan 2023 Deviance: 2337.1 Time: 23:57:22 Pearson chi2: 2.27e+03 No. Iterations: 4 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ const 1.5499 0.050 30.829 0.000 1.451 1.648 age -0.0047 0.001 -5.026 0.000 -0.007 -0.003 ==============================================================================

## Step 3: Interpret the output

Based on the output above, we can write out the Poisson regression statement.

\[ log (\hat{\lambda}) =b_0+b_1 Age =1.55 -0.0047 Age \]

We can get the following.

\[ \frac{\lambda_{Age+1}}{\lambda_{Age}} =e^{-0.0047}=0.995 \]

We can further get the following.

\[ \lambda_{Age+1} – \lambda_{Age}=0.995 \lambda_{Age}- \lambda_{Age}=-0.005 \lambda_{Age}\]

If age increases by 1 unit, for instance, from 80 to 81, the change in household size is -0.005*80 =-0.4.