You can set family=poisson
in the glm()
function to do Poisson regression in R.
glm(model_statement, family = poisson, data = data_file_name)
Data Example
This tutorial will use a dataset for Poisson regression. The following shows the key variables in this dataset.
location
= where the house is locatedage
= the age of the head of householdtotal
= the number of people in the household other than the headnumLT5
= the number in the household under 5 years of ageroof
= the type of roof in the household
We are going to see if age can predict the number of people in a household (i.e., total).
We first can read the data from GitHub.
data_HH <- read.csv("https://raw.githubusercontent.com/proback/BeyondMLR/master/data/fHH1.csv")
The following print the first few lines of the data frame that we read from Github.
> head(data_HH) X location age total numLT5 roof 1 1 CentralLuzon 65 0 0 Predominantly Strong Material 2 2 MetroManila 75 3 0 Predominantly Strong Material 3 3 DavaoRegion 54 4 0 Predominantly Strong Material 4 4 Visayas 49 3 0 Predominantly Strong Material 5 5 MetroManila 74 3 0 Predominantly Strong Material 6 6 Visayas 59 6 0 Predominantly Strong Material
R Code
The following is the key R code to do the Poisson regression.
result_1 = glm(total ~ age, family = poisson, data = data_HH)
Since the p-value for age is significant (p < 0.05) based on the output below, age is a significant predictor of household size.
> summary(result_1) Call: glm(formula = total ~ age, family = poisson, data = data_HH) Deviance Residuals: Min 1Q Median 3Q Max -2.9079 -0.9637 -0.2155 0.6092 4.9561 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.5499422 0.0502754 30.829 < 2e-16 *** age -0.0047059 0.0009363 -5.026 5.01e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 2362.5 on 1499 degrees of freedom Residual deviance: 2337.1 on 1498 degrees of freedom AIC: 6714 Number of Fisher Scoring iterations: 5
Interpretation
Based on the output above, we can write the following Poisson regression equation. \( \hat{\lambda} \) is the mean of the household size.
\[ log (\hat{\lambda}) =b_0+b_1 Age =1.55 -0.0047 Age \]
We can do a simple math transformation and get the following.
\[ \frac{\lambda_{Age+1}}{\lambda_{Age}} =e^{\beta_1}=e^{-0.0047}=0.995 \]
We can further make some transformations and get the following.
\[ \lambda_{Age+1} =0.995 \lambda_{Age} \]
\[ \lambda_{Age+1} – \lambda_{Age}=0.995 \lambda_{Age}- \lambda_{Age}=-0.005 \lambda_{Age}\]
Thus, the difference in the household size by changing 1 unit of age is \( -0.005 \lambda_{Age} \).
Let’s use the change from 80-year-old old to 81-year-old as the example. The equation above means that, on average, the change in household size changing 80- to 81-year-old is -0.005*80 =-0.4. Note that, 80-year-old is referring to the age of the head of household.
Further Reading
This tutorial shows how to do Poisson regression in R. In case you are interested in learning how to do it in Python, you can check my other tutorial of Poisson Regression in Python.