Lesson 4.2: Testing Goodness of Fit in One-way Tables

Software Lab 4.2

Chi-Square Goodness-of-Fit Test

As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 4.2 Solutions.

Remember to complete the graded Software Lab Questions for this section in Moodle.

Is a Die Fair?

Consider the example in Supplementary Notes 4.2. A fair six-sided die was rolled 600 times. Download the data frame dice600 [CSV file], and open the data in jamovi. The variable outcome records which of the outcomes one to six occurred in each roll.

Visualize the distribution of the variable outcome using a frequency table and a bar plot by selecting Analyses > Exploration > Descriptives and then selecting Frequency tables and Plots > Bar plot. Confirm that the observed cell frequencies match those in Supplementary Notes 4.2, i.e., 96 for “1,” 94 for “2,” etc.

1. We’ll conduct a chi-square goodness-of-fit test to check if the die is fair. If the die is fair, how many of each of the outcomes (one to six) would you expect? Check the condition that all expected cell frequencies must be at least five. Check your answer by consulting the Software Lab 4.2 Solutions.
2. Calculate the test statistic, \chi^2 = \sum \dfrac{(Obs-Exp)^2}{Exp}.
3. Select Analyses > R > Rj Editor and use the R function pchisq to calculate the p-value. Then evaluate the hypothesis test based on a significance level \alpha = 0.05 and draw a conclusion in the context of the problem.
4. We can also use one of jamovi’s built-in tests to conduct the test automatically. Select Analyses > Frequencies > One Sample Proportion Tests > N Outcomes χ2 Goodness of fit and move outcome to the Variable box. Confirm the value of the test statistic from question 2 and the p-value from question 3.

Are Trading Days Independent?

Consider the stock market example in Section 6.3.5 of the textbook. Download the data from sp500seq [CSV file] (OpenIntro, n.d.) and open it in jamovi. The variable days records the number of waiting days for a positive trading day during 10 years for the Standard and Poor’s 500 (S&P 500) stock market index.

Visualize the distribution of the variable days using a frequency table and a bar plot by selecting Analyses > Exploration > Descriptives and then selecting Frequency tables and Plots > Bar plot. Confirm that the observed cell frequencies match those in the textbook; i.e., 717 for “1,” 369 for “2,” etc.

5. We’ll conduct a chi-square goodness-of-fit test to confirm if the S&P 500 being up or down on a given day is independent from all other days, which would mean that the number of days until an “up” day should follow a geometric distribution. Calculate the expected cell frequencies using the following formula: Expected frequency for D days = 1362 \times (1-0.545)^{D-1}(0.545), where D \in \{1, 2, 3, 4, 5, 6\}. Note: Your calculated values should match all but one of the values in Figure 6.11 in the textbook. One of the values in the textbook is a mistake, perhaps due to a rounding error in its calculation.
6. Calculate the expected frequency for 7+ days by subtracting the sum of the expected frequencies for 1 to 6 days from 1,362.
7. Check the condition that all expected cell frequencies must be at least five.
8. Calculate the test statistic, \chi^2 = \sum \dfrac{(Obs-Exp)^2}{Exp}. Your calculated value should be close to the one in the textbook, but the textbook value is inaccurate because of the mistake in one of the expected cell frequencies.
9. Select Analyses > R > Rj Editor and use the R function pchisq to calculate the p-value. Then evaluate the hypothesis test based on a significance level \alpha = 0.05 and draw a conclusion in the context of the problem. Again, your p-value should be close to the one in the textbook, but won’t be exactly the same.
10. We can also use one of jamovi’s built-in tests to conduct the test automatically. Select Analyses > Frequencies > N Outcomes χ2 Goodness of fit and move days to the Variable box. Then click Expected Proportions and type-in the expected cell frequencies you calculated in questions 5 and 6 in the “Ratio” boxes. Confirm the value of the test statistic from question 8 and the p-value from question 9.

References

OpenIntro. (n.d.). Data sets [Data sets]. https://openintro.org/data/

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Probability and Statistics Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book