Lesson 4.1: Inference for Proportions
Software Lab 4.1
Inference for Proportions
This software lab is adapted from Inference for Categorical Data (OpenIntro, n.d.-b) CC BY-SA 4.0 at OpenIntro Labs for jamovi.
As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 4.1 Solutions.
Remember to complete the graded Software Lab Questions for this section in Moodle.
Getting Started: The Data
Every two years, the United States’ Centers for Disease Control and Prevention (CDC) conducts the Youth Risk Behavior Surveillance System (YRBSS) survey, where it takes data from high-schoolers in grades 9 through 12 to analyze health patterns. In this lab, you will work with a selected group of variables from a random sample of observations during one of the years the YRBSS was conducted.
Download the data from yrbss_text [CSV file] (OpenIntro, n.d.-a), and open it in jamovi. The variables are:
gender
: Gender of participant; can be male or female.text_while_driving_30d
: During the 30 days preceding the survey, the frequency that participants texted or emailed while driving.text_ind
: Whether the individual texted while driving for six or more days over the preceding 30 days.
We can quickly visualize the distribution of the variable text_while_driving_30d
using a frequency table and a bar plot. Do this by selecting Analyses > Exploration > Descriptives
and then selecting Frequency tables
and Plots > Bar plot
. Confirm that the frequency counts are 4,788 for “0 days,” 925 for “1–2 days,” etc.
Confidence Interval for a Single Proportion
First, we’ll focus on the variable text_ind
to make inferences about the proportion of US high-schoolers who texted while driving for six or more days over the preceding 30 days. A confidence interval for a single proportion based on a normal model is: , assuming the following conditions are satisfied:
- Independence: The individual responses in the sample are independent of each other.
- Random: The sample is random.
- Success/Failure Condition:
and
.
- 10% Condition: The sample size
is no more than 10% of the population size.
text_ind
and use it to check the success/failure condition for the confidence interval. Check your answer by consulting Software Lab 4.1 Solutions.Hypothesis Test for a Single Proportion
A journalist claims 15% of US high-schoolers texted while driving for six or more days over the preceding 30 days. We’ll conduct a two-sided hypothesis test for a single proportion to test the journalist’s claim. The hypotheses are H0: p = 0.15 (15%) versus HA: p ≠ 0.15. We’ll use a normal model for the sampling distribution of that has a mean of
and a standard deviation of
, assuming the following conditions are satisfied:
- Independence: The individual responses in the sample are independent of each other.
- Random: The sample is random.
- Success/Failure Condition:
and
.
- 10% Condition: The sample size
is no more than 10% of the population size.
![Rendered by QuickLaTeX.com Z = \dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}](https://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/ql-cache/quicklatex.com-3b55b5fb28792adf77db7be73f9be89c_l3.png)
Analyses > R > Rj Editor
and use the R function pnorm
to calculate the p-value. Then, evaluate the hypothesis test based on a significance level ![Rendered by QuickLaTeX.com \alpha = 0.05](https://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/ql-cache/quicklatex.com-ad6ce5c9ea5f3e49e839c4b3d5273902_l3.png)
Inference for a Single Proportion Using the Binomial Model
Section 6.1.4 in the textbook mentions alternate methods for making inferences for a single proportion when the conditions aren’t met, and we can’t use the normal model. One such method is to use the binomial model, as described in Small Sample Hypothesis Testing for a Proportion (Diez et al., 2019) online notes CC BY-SA 3.0.
We can use one of jamovi’s built-in tests to conduct a hypothesis test for a proportion using the binomial model as follows. Select Analyses > Frequencies > One Sample Proportion Tests > 2 Outcomes Binomial test
and select the test_ind
variable. Change Test value
to 0.15, and click the Confidence intervals
box. The p-value based on the binomial model is in the second row of the resulting binomial test table in the column labeled “p.” The bounds of the corresponding confidence interval are in the last two columns of the table. In this case, the values are the same (to three decimal places) as those based on the normal model (questions 2 and 5 above). They need not match exactly, however, particularly for applications with relatively small sample sizes.
![jamovi binomial test](http://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/uploads/sites/113/2022/08/Screen-Shot-2022-08-05-at-6.02.22-PM.png)
Binomial Test | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Level | Count | Total | Proportion | p | |||||||
text_ind | no | 10847 | 12654 | 0.857 | < .001 | ||||||
yes | 1807 | 12654 | 0.143 | < .001 | |||||||
Note. Hₐ is proportion ≠ 0.5 | |||||||||||
Confidence Interval for the Difference Between Two Proportions
Next, we’ll examine whether there is a difference between males and females with respect to the proportion of US high-schoolers who texted while driving for six or more days over the preceding 30 days. A confidence interval for the difference between two proportions based on the central limit theorem is: , assuming the following conditions are satisfied:
- Independence: The two groups are independent of each other. Within each group, the individual responses are independent of each other.
- Random: Each of the two samples is randomly drawn from their respective populations.
- Success/Failure Condition:
,
,
, and
.
- 10% Condition: Each of the two sample sizes,
and
, is no more than 10% of their respective population sizes.
text_ind
split by gender
and use it to check the success/failure condition for the confidence interval.Hypothesis Test for the Difference Between Two Proportions
Conduct a two-sided hypothesis test to find-out if the population proportions of US high-schoolers who texted while driving for six or more days over the preceding 30 day are equal for males and females. The hypotheses are H0: p1 = p2 versus HA: p1 ≠ p2. We’ll use a normal model for the sampling distribution of that has a mean of
and a standard deviation of
, assuming the following conditions are satisfied:
- Independence: The two groups are independent of each other. Within each group, the individual responses are independent of each other.
- Random: Each of the two samples is randomly drawn from their respective populations.
- Success/Failure Condition:
,
,
, and
.
- 10% Condition: Each of the two sample sizes,
and
, is no more than 10% of their respective population sizes.
![Rendered by QuickLaTeX.com Z = \dfrac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}](https://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/ql-cache/quicklatex.com-afb403f1140952b271ad606965b572e8_l3.png)
Analyses > R > Rj Editor
and use the R function pnorm
to calculate the p-value. Then, evaluate the hypothesis test based on a significance level ![Rendered by QuickLaTeX.com \alpha = 0.05](https://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/ql-cache/quicklatex.com-ad6ce5c9ea5f3e49e839c4b3d5273902_l3.png)
Analyses > Frequencies > Independent Samples > test of association
, and select gender
for the rows and text_ind
for the columns. Click “Statistics” and select z test for difference in 2 proportions
under “Tests,” and difference in proportions
and confidence intervals
under “Comparative Measures.” Confirm the lower and upper bounds of the confidence interval for question 7, the test statistic value from question 9, and the p-value from question 10.References
Diez, D.M., Çetinkaya-Rundel, M., Barr, C. D. (2019). Small sample hypothesis testing for a proportion [Online supplement]. In, OpenIntro statistics (4th ed.). http://www.openintro.org/redirect.php?go=stat_sim_prop_ht&referrer=os4_pdf
OpenIntro. (n.d.-a). Data sets [Data sets]. https://openintro.org/data/
OpenIntro. (n.d.-b) CC BY-SA 4.0. Inference for categorical data. OpenIntro Labs for jamovi. https://openintro.shinyapps.io/inf_for_categorical_data_jamovi/