Poisson Regression in R

You can set family=poisson in the glm() function to do Poisson regression in R.

glm(model_statement, family = poisson, data = data_file_name)

Data Example

This tutorial will use a dataset for Poisson regression. The following shows the key variables in this dataset.

  • location = where the house is located
  • age = the age of the head of household
  • total = the number of people in the household other than the head
  • numLT5 = the number in the household under 5 years of age
  • roof = the type of roof in the household

We are going to see if age can predict the number of people in a household (i.e., total).

We first can read the data from GitHub.

data_HH <- read.csv("https://raw.githubusercontent.com/proback/BeyondMLR/master/data/fHH1.csv")

The following print the first few lines of the data frame that we read from Github.

> head(data_HH)
  X     location age total numLT5                          roof
1 1 CentralLuzon  65     0      0 Predominantly Strong Material
2 2  MetroManila  75     3      0 Predominantly Strong Material
3 3  DavaoRegion  54     4      0 Predominantly Strong Material
4 4      Visayas  49     3      0 Predominantly Strong Material
5 5  MetroManila  74     3      0 Predominantly Strong Material
6 6      Visayas  59     6      0 Predominantly Strong Material

R Code

The following is the key R code to do the Poisson regression.

result_1 = glm(total ~ age, family = poisson, data = data_HH)

Since the p-value for age is significant (p < 0.05) based on the output below, age is a significant predictor of household size.

> summary(result_1)

Call:
glm(formula = total ~ age, family = poisson, data = data_HH)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9079  -0.9637  -0.2155   0.6092   4.9561  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.5499422  0.0502754  30.829  < 2e-16 ***
age         -0.0047059  0.0009363  -5.026 5.01e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2362.5  on 1499  degrees of freedom
Residual deviance: 2337.1  on 1498  degrees of freedom
AIC: 6714

Number of Fisher Scoring iterations: 5

Interpretation

Based on the output above, we can write the following Poisson regression equation. \( \hat{\lambda} \) is the mean of the household size.

\[ log (\hat{\lambda}) =b_0+b_1 Age =1.55 -0.0047 Age \]

We can do a simple math transformation and get the following.

\[ \frac{\lambda_{Age+1}}{\lambda_{Age}} =e^{\beta_1}=e^{-0.0047}=0.995 \]

We can further make some transformations and get the following.

\[ \lambda_{Age+1} =0.995 \lambda_{Age} \]

\[ \lambda_{Age+1} – \lambda_{Age}=0.995 \lambda_{Age}- \lambda_{Age}=-0.005 \lambda_{Age}\]

Thus, the difference in the household size by changing 1 unit of age is \( -0.005 \lambda_{Age} \).

Let’s use the change from 80-year-old old to 81-year-old as the example. The equation above means that, on average, the change in household size changing 80- to 81-year-old is -0.005*80 =-0.4. Note that, 80-year-old is referring to the age of the head of household.


Further Reading

This tutorial shows how to do Poisson regression in R. In case you are interested in learning how to do it in Python, you can check my other tutorial of Poisson Regression in Python.