In the start of term survey MATH 140 students are asked several questions, including these two:
Is there an association between these two categorical variables? That is, is a student’s answer to the achievement question independent of where at Linfield they intend to earn their major?
In this page we provide code to load the raw data into an RStudio session and conduct a chi-square test of independence on these hypotheses:
Run the following code to load the data into your session as a dataframe called df
df <- read.csv("https://mphitchman.com/stats/data/achieve-degree.csv")
This data frame has three columns, a timestamp (when a survey was submitted), and the student’s response to the achievement question and the degree question.
table(df$achieve,df$degree)
Use the addmargins()
command on the table:
addmargins(table(df$achieve,df$degree))
If we provide R with a two-way table, it will gladly crank out all the relevant calculations associated with a chi-square test of independence and store them for us in a big list of things. We use the chisq.test()
command to do this, and we store the results by giving it a name such as X
:
X <- chisq.test(table(df$achieve,df$degree))
Note: If you run this code you may get a warning notice in RStudio. Perhaps the expected counts are too low for the chi-square model to be reliable?
What information is stored in the object X
? Click on X
in the environment tab to check it out. Among the information you can find
Item in X |
Description |
---|---|
X$observed |
the observed counts (the original 2-way table) |
X$expected |
the expected counts (computed assuming \(H_o\) is true) |
X$statistic |
the chi-square test statistic |
X$parameter |
the degrees of freedom of the \(\chi^2\) test statistic |
X$p.value |
the p-value of the test of independence. This should equal 1-pchisq(X$statistic,X$parameter) |