Lesson 4.1: Inference for Proportions
Software Lab 4.1
Inference for Proportions
This software lab is adapted from Inference for Categorical Data (OpenIntro, n.d.-b) CC BY-SA 4.0 at OpenIntro Labs for jamovi.
As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 4.1 Solutions.
Remember to complete the graded Software Lab Questions for this section in Moodle.
Getting Started: The Data
Every two years, the United States’ Centers for Disease Control and Prevention (CDC) conducts the Youth Risk Behavior Surveillance System (YRBSS) survey, where it takes data from high-schoolers in grades 9 through 12 to analyze health patterns. In this lab, you will work with a selected group of variables from a random sample of observations during one of the years the YRBSS was conducted.
Download the data from yrbss_text [CSV file] (OpenIntro, n.d.-a), and open it in jamovi. The variables are:
- gender: Gender of participant; can be male or female.
- text_while_driving_30d: During the 30 days preceding the survey, the frequency that participants texted or emailed while driving.
- text_ind: Whether the individual texted while driving for six or more days over the preceding 30 days.
We can quickly visualize the distribution of the variable text_while_driving_30d using a frequency table and a bar plot. Do this by selecting Analyses > Exploration > Descriptives and then selecting Frequency tables and Plots > Bar plot. Confirm that the frequency counts are 4,788 for “0 days,” 925 for “1–2 days,” etc.
Confidence Interval for a Single Proportion
First, we’ll focus on the variable text_ind to make inferences about the proportion of US high-schoolers who texted while driving for six or more days over the preceding 30 days. A confidence interval for a single proportion based on a normal model is:  , assuming the following conditions are satisfied:
, assuming the following conditions are satisfied:
- Independence: The individual responses in the sample are independent of each other.
- Random: The sample is random.
- Success/Failure Condition:  and and . .
- 10% Condition: The sample size  is no more than 10% of the population size. is no more than 10% of the population size.
text_ind and use it to check the success/failure condition for the confidence interval. Check your answer by consulting Software Lab 4.1 Solutions.Hypothesis Test for a Single Proportion
A journalist claims 15% of US high-schoolers texted while driving for six or more days over the preceding 30 days. We’ll conduct a two-sided hypothesis test for a single proportion to test the journalist’s claim. The hypotheses are H0: p = 0.15 (15%) versus HA: p ≠ 0.15. We’ll use a normal model for the sampling distribution of  that has a mean of
 that has a mean of  and a standard deviation of
 and a standard deviation of  , assuming the following conditions are satisfied:
, assuming the following conditions are satisfied:
- Independence: The individual responses in the sample are independent of each other.
- Random: The sample is random.
- Success/Failure Condition:  and and . .
- 10% Condition: The sample size  is no more than 10% of the population size. is no more than 10% of the population size.
 .
.Analyses > R > Rj Editor and use the R function pnorm to calculate the p-value. Then, evaluate the hypothesis test based on a significance level  and draw a conclusion in the context of the problem.
 and draw a conclusion in the context of the problem.Inference for a Single Proportion Using the Binomial Model
Section 6.1.4 in the textbook mentions alternate methods for making inferences for a single proportion when the conditions aren’t met, and we can’t use the normal model. One such method is to use the binomial model, as described in Small Sample Hypothesis Testing for a Proportion (Diez et al., 2019) online notes CC BY-SA 3.0.
We can use one of jamovi’s built-in tests to conduct a hypothesis test for a proportion using the binomial model as follows. Select Analyses > Frequencies > One Sample Proportion Tests > 2 Outcomes Binomial test and select the test_ind variable. Change Test value to 0.15, and click the Confidence intervals box. The p-value based on the binomial model is in the second row of the resulting binomial test table in the column labeled “p.” The bounds of the corresponding confidence interval are in the last two columns of the table. In this case, the values are the same (to three decimal places) as those based on the normal model (questions 2 and 5 above). They need not match exactly, however, particularly for applications with relatively small sample sizes.

| Binomial Test | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Level | Count | Total | Proportion | p | |||||||
| text_ind | no | 10847 | 12654 | 0.857 | < .001 | ||||||
| yes | 1807 | 12654 | 0.143 | < .001 | |||||||
| Note. Hₐ is proportion ≠ 0.5 | |||||||||||
Confidence Interval for the Difference Between Two Proportions
Next, we’ll examine whether there is a difference between males and females with respect to the proportion of US high-schoolers who texted while driving for six or more days over the preceding 30 days. A confidence interval for the difference between two proportions based on the central limit theorem is:  , assuming the following conditions are satisfied:
, assuming the following conditions are satisfied:
- Independence: The two groups are independent of each other. Within each group, the individual responses are independent of each other.
- Random: Each of the two samples is randomly drawn from their respective populations.
- Success/Failure Condition:  , , , , , and , and . .
- 10% Condition: Each of the two sample sizes,   and and , is no more than 10% of their respective population sizes. , is no more than 10% of their respective population sizes.
text_ind split by gender and use it to check the success/failure condition for the confidence interval.Hypothesis Test for the Difference Between Two Proportions
Conduct a two-sided hypothesis test to find-out if the population proportions of US high-schoolers who texted while driving for six or more days over the preceding 30 day are equal for males and females. The hypotheses are H0: p1 = p2 versus HA: p1 ≠ p2. We’ll use a normal model for the sampling distribution of  that has a mean of
 that has a mean of  and a standard deviation of
 and a standard deviation of  , assuming the following conditions are satisfied:
, assuming the following conditions are satisfied:
- Independence: The two groups are independent of each other. Within each group, the individual responses are independent of each other.
- Random: Each of the two samples is randomly drawn from their respective populations.
- Success/Failure Condition:  , , , , , and , and . .
- 10% Condition: Each of the two sample sizes,   and and , is no more than 10% of their respective population sizes. , is no more than 10% of their respective population sizes.
 .
.Analyses > R > Rj Editor and use the R function pnorm to calculate the p-value. Then, evaluate the hypothesis test based on a significance level  and draw a conclusion in the context of the problem.
 and draw a conclusion in the context of the problem.Analyses > Frequencies > Independent Samples > test of association, and select gender for the rows and text_ind for the columns. Click “Statistics” and select z test for difference in 2 proportions under “Tests,” and difference in proportions and confidence intervals under “Comparative Measures.” Confirm the lower and upper bounds of the confidence interval for question 7, the test statistic value from question 9, and the p-value from question 10.References
Diez, D.M., Çetinkaya-Rundel, M., Barr, C. D. (2019). Small sample hypothesis testing for a proportion [Online supplement]. In, OpenIntro statistics (4th ed.). http://www.openintro.org/redirect.php?go=stat_sim_prop_ht&referrer=os4_pdf
OpenIntro. (n.d.-a). Data sets [Data sets]. https://openintro.org/data/
OpenIntro. (n.d.-b) CC BY-SA 4.0. Inference for categorical data. OpenIntro Labs for jamovi. https://openintro.shinyapps.io/inf_for_categorical_data_jamovi/
