This tutorial is about how to calculate means by group in R. There are two methods of calculating means by groups in R, either `aggregate()`

or `dplyr`

.

**Method 1: Use base R**

aggregate(df$col_being_aggregated, list(df$col_1_groupby,df$col_2_groupby,…), FUN=mean)

**Method 2: Use package of dplyr**

library(dplyr)

df%>% group_by(col_1_groupby,col_2_groupby,…) %>% summarise_at(vars(col_being_aggregated), list(name = mean))

## Example 1 for Method 1

The following is the data being used for the examples. It has two categorical variables (i.e., City and Brand) and one continuous variable (i.e., sales).

```
# download data from Github
df <- read.csv("https://raw.githubusercontent.com/TidyPython/interactions/main/city_brand_sales.csv")
# print out the dataframe
print(df)
```

Output:

City Brand sales 1 City1 brand1 70 2 City1 brand2 10 3 City1 brand1 100 4 City1 brand2 2 5 City1 brand1 30 6 City1 brand2 2 7 City1 brand1 20 8 City1 brand2 10 9 City1 brand1 20 10 City1 brand2 10 11 City2 brand1 9 12 City2 brand2 10 13 City2 brand1 5 14 City2 brand2 4 15 City2 brand1 4 16 City2 brand2 4 17 City2 brand1 5 18 City2 brand2 4 19 City2 brand1 12 20 City2 brand2 11

The following uses the `aggregate()`

from base R to calculate the means. We group by two categorical variables, City and Brand.

```
# use aggregate() in base R to calculate means grouped by City and Brand
aggregate(df$sales, list(df$City,df$Brand), FUN=mean)
```

The following output shows the four means of all 4 cells.

Group.1 Group.2 x 1 City1 brand1 48.0 2 City2 brand1 7.0 3 City1 brand2 6.8 4 City2 brand2 6.6

## Example 2 for Method 2

We are going to use the same data frame, namely `df`

. Thus, I am not going to print it out again here.

The following uses `dplyr`

to calculate means grouped by two categorical variables, City and Brand.

```
# use dplyr to calculate means grouped by City and Brand
library(dplyr)
df%>% group_by(City,Brand) %>% summarise_at(vars(sales), list(name = mean))
```

The following is the output. We can see that it prints out 4 means for all cells.

# A tibble: 4 x 3 # Groups: City [2] City Brand name <chr> <chr> <dbl> 1 City1 brand1 48 2 City1 brand2 6.8 3 City2 brand1 7 4 City2 brand2 6.6