This tutorial introduces the basic idea of interaction effects in data analysis. This tutorial includes what an interaction effect is, examples of an interaction effect, and the statistical methods to do the analysis.

## What are interaction effects? (The definition)

An interaction effect is when the effect of one variable (e.g., X) on another variable (e.g., Y) is dependent on a third variable (e.g., Y). The following is the visual illustration.

Y=β_{0}+β_{1}X+β_{2}M+β_{3}X×M

## Examples of Interaction Effects

I am going to provide two examples of interactions to help you understand interaction effects.

## Example 1

Suppose that you would like to how Brand A and Brand B are different in Sales. Thus, Brand (Brand A and Brand B) is the X, and Sales are the Y.

You calculate and find that Brand A has $45M sales and Brand B has $101M sales. Thus, you see the difference in sales. (Such difference can be called **an effect**.)

However, you realize that there is another variable (region) that you need to consider such as West Coast and East Coast.

In particular, the difference between Brand A and Brand B occurs mainly on East Coast (25M vs. 80M). In contrast, the sales numbers on West Coast are roughly the same (20M vs. 21M).

Thus, you can see the importance of considering the third variable M, as it provides further insights into the basic effect of X on Y.

East Coast | West Coast | ||
---|---|---|---|

Brand A | sales = 25M | sales = 20M | Brand A sales = 25M+20M=45M |

Brand B | sales = 80M | sales = 21M | Brand B sales = 80M+21M=101M |

## Example 2

Suppose that you find that there is a relationship between temperature and ice cream sales, such that higher temperatures will lead to more ice cream sales. Thus, the temperature is X, and ice cream sales are Y.

However, you then find out that Brand A ice cream sales increase much more than Brand B when the temperature goes up. Thus, Brand is the third variable M.

## Statistical Methods to Analyze Interaction Effects

Depending on the different data types of X, M, and Y, you can have different ways to conduct the analysis to estimate β_{0}, β_{1}, β_{2}, and β_{3}.

Y=β_{0}+β_{1}X+β_{2}M+β_{3}X×M

The following table summarizes different statistical methods to estimate those coefficients.

Y | X | M | statistical Methods |

Continuous | Continuous | Continuous | Linear Regression |

Continuous | Continuous | Categorical | Linear Regression or ANCOVA |

Continuous | Categorical | Continuous | Linear Regression or ANCOVA |

Continuous | Categorical | Categorical | ANOVA |

Categorical | Continuous or Categorical | Continuous or Categorical | Logistic regression |

I have provided tutorials to conduct such analysis in R, Python, and SPSS. The following shows a few examples.