Suppose we have gathered independent samples from more than two populations and we wish to test the null hypothesis that all population means are equal against the alternative that they are not all equal. The test for us is ANOVA (Analysis of Variance).

This page provides some R commands to help us through the ANOVA worksheet.

Load the data

To import the stress data set, either run this command:

stress <- read.csv("https://mphitchman.com/stats/data/stress.csv")

Click on the stress data frame in the Environment tab to get a look at how the data has been formatted. Having the data in this long form (as opposed to having separate columns for each of the three groups) makes it much easier to use R commands to run ANOVA and visualize the data.

Load the tidyverse

The code below makes use of packages in the tidyverse, so be sure to load it into your session:

library(tidyverse)

Side-by-side Boxplots

The following code makes boxplots of the heart rates for each of the three treatments, and colors them by treatment too. Feel free to add axis labels and plot title

ggplot(stress,aes(y=heart.rate,x=treatment))+
  geom_boxplot(aes(fill=treatment),show.legend=FALSE)+
  ggtitle("Plot title would go here")

Heart Rate Summary Statistics by Treatment Group

stress %>% 
  group_by(treatment) %>%
  summarise(sample_size = length(heart.rate),
            mean=mean(heart.rate),
            stdev=sd(heart.rate)) 

Run ANOVA

anova(lm(heart.rate~treatment,stress))