## 1. Introduction

Type I, Type II, and Type III ANOVA are 3 different ways of calculating sum of squares in ANOVA.

**Type I ANOVA**:

`SS(A) for factor A`

`SS(B | A) for factor B`

`SS(AB | A, B) for interaction AB`

**Type II ANOVA**:

`SS(A | B) for factor A`

`SS(B | A) for factor B`

`SS(AB | A, B) for interaction AB`

**Type III ANOVA:**

`SS(A | B, AB) for factor A`

`SS(B | A, AB) for factor B`

`SS(AB | A, B) for interaction AB`

## 2. Hypothetical Data

The hypothetical data being used has two categorical IVs (cities and stores) and one DV (sales). Cities has two levels (city 1 and city 2) and stores also has two levels (store 1 and store 2). With these two variables, there are 4 cells. Suppose that we are interested in comparing whether these 4 sales are significantly different from each other, we can do a two-way ANOVA.

Cities | Stores | Sales |
---|---|---|

City1 | store1 | 10 |

City1 | store2 | 20 |

City1 | store1 | 20 |

City1 | store2 | 50 |

City1 | store1 | 30 |

City2 | store2 | 10 |

City2 | store1 | 5 |

City2 | store2 | 4 |

City2 | store1 | 12 |

City2 | store2 | 4 |

We can reproduce the data in Python. The following is the Python code doing so.

```
import numpy as np
import pandas as pd
# generating data being used later
x_1 = np.repeat(['City1','City2'],5)
x_2 = np.tile(['store1','store2'], 5)
sales=[10,20,20,50,30,10,5,4,12,4]
sales=np.array([10,20,20,50,30,10,5,4,12,4])
df_x=pd.DataFrame({'cities':x_1, 'stores':x_2,'sales':sales})
print(df_x)
```

Output:

cities stores sales 0 City1 store1 10 1 City1 store2 20 2 City1 store1 20 3 City1 store2 50 4 City1 store1 30 5 City2 store2 10 6 City2 store1 5 7 City2 store2 4 8 City2 store1 12 9 City2 store2 4

## 3. **Type I ANOVA** in Python

We are going to use statsmodels.api for all the analyses in this tutorial. The following is the Python code to calculate Type **I** ANOVA.

```
import statsmodels.api as sm
from statsmodels.formula.api import ols
# the following model statement will be used for Type I, Type II, and Type III ANOVA
model = ols('sales ~ C(cities) + C(stores) + C(cities):C(stores)', data=df_x).fit()
#
```**setting typ as Type I ANOVA** in Python
aov_table = sm.stats.anova_lm(model, typ=1)
print(aov_table)

Output:

df sum_sq mean_sq F PR(>F) C(cities) 1.0 902.50 902.500000 7.752326 0.031816 C(stores) 1.0 93.75 93.750000 0.805297 0.404083 C(cities):C(stores) 1.0 183.75 183.750000 1.578382 0.255694 Residual 6.0 698.50 116.416667 NaN NaN

## 4. **Type II ANOVA** in Python

The following is the Python code to calculate Type **II** ANOVA. As you can see, the only difference between **Type I **and **Type II** is the number for `typ`

.

`# setting typ as Type `**II** ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=2)
print(aov_table)

Output:

sum_sq df F PR(>F) C(cities) 984.15 1.0 8.453686 0.027068 C(stores) 93.75 1.0 0.805297 0.404083 C(cities):C(stores) 183.75 1.0 1.578382 0.255694 Residual 698.50 6.0 NaN NaN

## 5. Type III ANOVA

The following is the Python code to calculate Type **III** ANOVA.

```
# setting typ as Type III ANOVA in Python
aov_table = sm.stats.anova_lm(model, typ=3)
print(aov_table)
```

Output:

sum_sq df F PR(>F) Intercept 1200.00 1.0 10.307802 0.018354 C(cities) 158.70 1.0 1.363207 0.287277 C(stores) 270.00 1.0 2.319256 0.178611 C(cities):C(stores) 183.75 1.0 1.578382 0.255694 Residual 698.50 6.0 NaN NaN