The question

In the start of term survey MATH 140 students are asked several questions, including these two:

  1. Where at Linfield do you intend to earn your major (School of Nursing, School of Business, or the College of Arts and Sciences).
  2. If you could choose one of the following accomplishments for your life, which would you choose: To win an Olympic gold medal, To win a Nobel Prize, To win an Academy Award, or To become President of the United States.

Is there an association between these two categorical variables? That is, is a student’s answer to the achievement question independent of where at Linfield they intend to earn their major?

In this page we provide code to load the raw data into an RStudio session and conduct a chi-square test of independence on these hypotheses:

  • \(H_o\): There is no association between the location of a person’s intended major at Linfield and their answer to the achievement question.
  • \(H_a\): There is an association between these categorical variables.

Code for Chi-square test for independence activity

Load the data

Run the following code to load the data into your session as a dataframe called df

df <- read.csv("https://mphitchman.com/stats/data/achieve-degree.csv")

This data frame has three columns, a timestamp (when a survey was submitted), and the student’s response to the achievement question and the degree question.

A two-way table of observed counts

table(df$achieve,df$degree)

Adding row and column totals

Use the addmargins() command on the table:

addmargins(table(df$achieve,df$degree))

Chi-square Test Calculations

If we provide R with a two-way table, it will gladly crank out all the relevant calculations associated with a chi-square test of independence and store them for us in a big list of things. We use the chisq.test() command to do this, and we store the results by giving it a name such as X:

X <- chisq.test(table(df$achieve,df$degree))

Note: If you run this code you may get a warning notice in RStudio. Perhaps the expected counts are too low for the chi-square model to be reliable?

What information is stored in the object X? Click on X in the environment tab to check it out. Among the information you can find

Item in X Description
X$observed the observed counts (the original 2-way table)
X$expected the expected counts (computed assuming \(H_o\) is true)
X$statistic the chi-square test statistic
X$parameter the degrees of freedom of the \(\chi^2\) test statistic
X$p.value the p-value of the test of independence. This should equal 1-pchisq(X$statistic,X$parameter)