Python: Type I, Type II, and Type III ANOVA

1. Introduction

Type I, Type II, and Type III ANOVA are 3 different ways of calculating sum of squares in ANOVA.

Type I ANOVA:

  • SS(A) for factor A
  • SS(B | A) for factor B
  • SS(AB | A, B) for interaction AB

Type II ANOVA:

  • SS(A | B) for factor A
  • SS(B | A) for factor B
  • SS(AB | A, B) for interaction AB

Type III ANOVA:

  • SS(A | B, AB) for factor A
  • SS(B | A, AB) for factor B
  • SS(AB | A, B) for interaction AB

2. Hypothetical Data

The hypothetical data being used has two categorical IVs (cities and stores) and one DV (sales). Cities has two levels (city 1 and city 2) and stores also has two levels (store 1 and store 2). With these two variables, there are 4 cells. Suppose that we are interested in comparing whether these 4 sales are significantly different from each other, we can do a two-way ANOVA.

Two-way ANOVA in Python
Two-way ANOVA in Python
CitiesStoresSales
City1store110
City1store220
City1store120
City1store250
City1store130
City2store210
City2store15
City2store24
City2store112
City2store24
Hypothetical Data for two-way ANOVA

We can reproduce the data in Python. The following is the Python code doing so.

import numpy as np
import pandas as pd

# generating data being used later
x_1 = np.repeat(['City1','City2'],5)
x_2 = np.tile(['store1','store2'], 5)
sales=[10,20,20,50,30,10,5,4,12,4]
sales=np.array([10,20,20,50,30,10,5,4,12,4])
df_x=pd.DataFrame({'cities':x_1, 'stores':x_2,'sales':sales})
print(df_x)

Output:

  cities  stores  sales
0  City1  store1     10
1  City1  store2     20
2  City1  store1     20
3  City1  store2     50
4  City1  store1     30
5  City2  store2     10
6  City2  store1      5
7  City2  store2      4
8  City2  store1     12
9  City2  store2      4

3. Type I ANOVA in Python

We are going to use statsmodels.api for all the analyses in this tutorial. The following is the Python code to calculate Type I ANOVA.

import statsmodels.api as sm
from statsmodels.formula.api import ols
# the following model statement will be used for Type I, Type II, and Type III ANOVA
model = ols('sales ~ C(cities) + C(stores) + C(cities):C(stores)', data=df_x).fit()

# setting typ as Type I ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=1)
print(aov_table)

Output:

                      df  sum_sq     mean_sq         F    PR(>F)
C(cities)            1.0  902.50  902.500000  7.752326  0.031816
C(stores)            1.0   93.75   93.750000  0.805297  0.404083
C(cities):C(stores)  1.0  183.75  183.750000  1.578382  0.255694
Residual             6.0  698.50  116.416667       NaN       NaN

4. Type II ANOVA in Python

The following is the Python code to calculate Type II ANOVA. As you can see, the only difference between Type I and Type II is the number for typ.

# setting typ as Type II ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=2)
print(aov_table)

Output:

                     sum_sq   df         F    PR(>F)
C(cities)            984.15  1.0  8.453686  0.027068
C(stores)             93.75  1.0  0.805297  0.404083
C(cities):C(stores)  183.75  1.0  1.578382  0.255694
Residual             698.50  6.0       NaN       NaN

5. Type III ANOVA

The following is the Python code to calculate Type III ANOVA.

# setting typ as Type III ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=3)
print(aov_table)

Output:

                      sum_sq   df          F    PR(>F)
Intercept            1200.00  1.0  10.307802  0.018354
C(cities)             158.70  1.0   1.363207  0.287277
C(stores)             270.00  1.0   2.319256  0.178611
C(cities):C(stores)   183.75  1.0   1.578382  0.255694
Residual              698.50  6.0        NaN       NaN

Further Reading