1. Introduction
Type I, Type II, and Type III ANOVA are 3 different ways of calculating sum of squares in ANOVA.
Type I ANOVA:
SS(A) for factor A
SS(B | A) for factor B
SS(AB | A, B) for interaction AB
Type II ANOVA:
SS(A | B) for factor A
SS(B | A) for factor B
SS(AB | A, B) for interaction AB
Type III ANOVA:
SS(A | B, AB) for factor A
SS(B | A, AB) for factor B
SS(AB | A, B) for interaction AB
2. Hypothetical Data
The hypothetical data being used has two categorical IVs (cities and stores) and one DV (sales). Cities has two levels (city 1 and city 2) and stores also has two levels (store 1 and store 2). With these two variables, there are 4 cells. Suppose that we are interested in comparing whether these 4 sales are significantly different from each other, we can do a two-way ANOVA.

Cities | Stores | Sales |
---|---|---|
City1 | store1 | 10 |
City1 | store2 | 20 |
City1 | store1 | 20 |
City1 | store2 | 50 |
City1 | store1 | 30 |
City2 | store2 | 10 |
City2 | store1 | 5 |
City2 | store2 | 4 |
City2 | store1 | 12 |
City2 | store2 | 4 |
We can reproduce the data in Python. The following is the Python code doing so.
import numpy as np
import pandas as pd
# generating data being used later
x_1 = np.repeat(['City1','City2'],5)
x_2 = np.tile(['store1','store2'], 5)
sales=[10,20,20,50,30,10,5,4,12,4]
sales=np.array([10,20,20,50,30,10,5,4,12,4])
df_x=pd.DataFrame({'cities':x_1, 'stores':x_2,'sales':sales})
print(df_x)
Output:
cities stores sales 0 City1 store1 10 1 City1 store2 20 2 City1 store1 20 3 City1 store2 50 4 City1 store1 30 5 City2 store2 10 6 City2 store1 5 7 City2 store2 4 8 City2 store1 12 9 City2 store2 4
3. Type I ANOVA in Python
We are going to use statsmodels.api for all the analyses in this tutorial. The following is the Python code to calculate Type I ANOVA.
import statsmodels.api as sm
from statsmodels.formula.api import ols
# the following model statement will be used for Type I, Type II, and Type III ANOVA
model = ols('sales ~ C(cities) + C(stores) + C(cities):C(stores)', data=df_x).fit()
# setting typ as Type I ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=1)
print(aov_table)
Output:
df sum_sq mean_sq F PR(>F) C(cities) 1.0 902.50 902.500000 7.752326 0.031816 C(stores) 1.0 93.75 93.750000 0.805297 0.404083 C(cities):C(stores) 1.0 183.75 183.750000 1.578382 0.255694 Residual 6.0 698.50 116.416667 NaN NaN
4. Type II ANOVA in Python
The following is the Python code to calculate Type II ANOVA. As you can see, the only difference between Type I and Type II is the number for typ
.
# setting typ as Type II ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=2)
print(aov_table)
Output:
sum_sq df F PR(>F) C(cities) 984.15 1.0 8.453686 0.027068 C(stores) 93.75 1.0 0.805297 0.404083 C(cities):C(stores) 183.75 1.0 1.578382 0.255694 Residual 698.50 6.0 NaN NaN
5. Type III ANOVA
The following is the Python code to calculate Type III ANOVA.
# setting typ as Type III ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=3)
print(aov_table)
Output:
sum_sq df F PR(>F) Intercept 1200.00 1.0 10.307802 0.018354 C(cities) 158.70 1.0 1.363207 0.287277 C(stores) 270.00 1.0 2.319256 0.178611 C(cities):C(stores) 183.75 1.0 1.578382 0.255694 Residual 698.50 6.0 NaN NaN