# Python: Type I, Type II, and Type III ANOVA

## 1. Introduction

Type I, Type II, and Type III ANOVA are 3 different ways of calculating sum of squares in ANOVA.

Type I ANOVA:

• `SS(A) for factor A`
• `SS(B | A) for factor B`
• `SS(AB | A, B) for interaction AB`

Type II ANOVA:

• `SS(A | B) for factor A`
• `SS(B | A) for factor B`
• `SS(AB | A, B) for interaction AB`

Type III ANOVA:

• `SS(A | B, AB) for factor A`
• `SS(B | A, AB) for factor B`
• `SS(AB | A, B) for interaction AB`

## 2. Hypothetical Data

The hypothetical data being used has two categorical IVs (cities and stores) and one DV (sales). Cities has two levels (city 1 and city 2) and stores also has two levels (store 1 and store 2). With these two variables, there are 4 cells. Suppose that we are interested in comparing whether these 4 sales are significantly different from each other, we can do a two-way ANOVA.

We can reproduce the data in Python. The following is the Python code doing so.

``````import numpy as np
import pandas as pd

# generating data being used later
x_1 = np.repeat(['City1','City2'],5)
x_2 = np.tile(['store1','store2'], 5)
sales=[10,20,20,50,30,10,5,4,12,4]
sales=np.array([10,20,20,50,30,10,5,4,12,4])
df_x=pd.DataFrame({'cities':x_1, 'stores':x_2,'sales':sales})
print(df_x)``````

Output:

```  cities  stores  sales
0  City1  store1     10
1  City1  store2     20
2  City1  store1     20
3  City1  store2     50
4  City1  store1     30
5  City2  store2     10
6  City2  store1      5
7  City2  store2      4
8  City2  store1     12
9  City2  store2      4```

## 3. Type I ANOVA in Python

We are going to use statsmodels.api for all the analyses in this tutorial. The following is the Python code to calculate Type I ANOVA.

``````import statsmodels.api as sm
from statsmodels.formula.api import ols
# the following model statement will be used for Type I, Type II, and Type III ANOVA
model = ols('sales ~ C(cities) + C(stores) + C(cities):C(stores)', data=df_x).fit()

# setting typ as Type I ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=1)
print(aov_table)``````

Output:

```                      df  sum_sq     mean_sq         F    PR(>F)
C(cities)            1.0  902.50  902.500000  7.752326  0.031816
C(stores)            1.0   93.75   93.750000  0.805297  0.404083
C(cities):C(stores)  1.0  183.75  183.750000  1.578382  0.255694
Residual             6.0  698.50  116.416667       NaN       NaN```

## 4. Type II ANOVA in Python

The following is the Python code to calculate Type II ANOVA. As you can see, the only difference between Type I and Type II is the number for `typ`.

``````# setting typ as Type II ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=2)
print(aov_table)``````

Output:

```                     sum_sq   df         F    PR(>F)
C(cities)            984.15  1.0  8.453686  0.027068
C(stores)             93.75  1.0  0.805297  0.404083
C(cities):C(stores)  183.75  1.0  1.578382  0.255694
Residual             698.50  6.0       NaN       NaN```

## 5. Type III ANOVA

The following is the Python code to calculate Type III ANOVA.

``````# setting typ as Type III ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=3)
print(aov_table)``````

Output:

```                      sum_sq   df          F    PR(>F)
Intercept            1200.00  1.0  10.307802  0.018354
C(cities)             158.70  1.0   1.363207  0.287277
C(stores)             270.00  1.0   2.319256  0.178611
C(cities):C(stores)   183.75  1.0   1.578382  0.255694
Residual              698.50  6.0        NaN       NaN```