Lesson 3.2: Confidence Intervals

Software Lab 3.2

Confidence Intervals

This software lab is adapted from Foundations for Statistical Inference – Confidence Intervals (OpenIntro, n.d.-a) CC BY-SA 4.0 at OpenIntro Labs for jamovi.

If we have access to data on an entire population, say the opinion of every adult in the United States on whether or not they think climate change is affecting their local community, it’s straightforward to answer questions like, “What percent of US adults think climate change is affecting their local community?”

However, if we have access to only a sample of the population, as is often the case, the task becomes more complicated. What is your best guess for this proportion if you only have data from a relatively small sample of adults? This type of situation requires that we use a sample to make inferences on what the population looks like.

As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 3.2 Solutions.

Remember to complete the graded Software Lab Questions for this section in Moodle.

Getting Started: The Data

A 2019 Pew Research Center report stated:

Roughly six-in-ten U.S. adults (62%) say climate change is currently affecting their local community either a great deal or some, according to a new Pew Research Center survey. (Hefferon, 2019)

In this lab, you will assume this 62% is a true population proportion, and you will learn about how sample proportions can vary from sample to sample by taking relatively small samples from the population. To keep our computation simple, we will assume a total population size of 100,000 even though that’s smaller than the population size of all US adults. This means 62,000 people (62% of the adult population) think climate change impacts their community, and the remaining 38,000 people do not think climate change impacts their community.

Download the us_adults [CSV file] (OpenIntro, n.d.-b) data frame, which represents the entire population, and load it into jamovi. The climate_change_affects variable contains responses to the question: Do you think climate change is affecting your local community?

We can quickly visualize the distribution of these responses using a bar plot. Do this by selecting Exploration > Descriptives > Plots > Bar plot. We can also obtain summary statistics to confirm we constructed the data frame correctly by selecting Frequency tables.

In this lab, you’ll work with a simple random sample of size 60 from this population. As in the last lab, we will use the SAMPLE function. Create a new computed variable, using the formula SAMPLE(climate_change_affects,60), and name this variable sample1.

1. What percentage of the adults in your sample think climate change affects their local community? Check your answer by consulting the Software Lab 3.2 Solutions.
2. Would you expect the proportion from another sample to be identical to yours? Would you expect it to be similar? Why or why not?

Confidence Intervals

Return for a moment to the question that first motivated this lab: Based on this sample, what can we infer about the population?

With just one sample, the best estimate of the proportion of US adults who think climate change affects their local community would be the sample proportion, \hat{p}, in question 1. That serves as a good point estimate, but it would be useful to also communicate how uncertain you are of that estimate. This uncertainty can be quantified using a confidence interval based on the Central Limit Theorem: \hat{p} \pm z^* \times SE(\hat{p}) \approx \hat{p} \pm z^* \times \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}. Remember, for a 95% interval, z^*=1.960.

3. Use your sample of size 60 to calculate a 95% interval for the proportion of US adults who think climate change affects their local community.

In this case, we have the rare luxury of knowing the true population proportion (p=62\%) since we have data on the entire population.

4. Does your confidence interval from question 3 contain the true population proportion of US adults who think climate change affects their local community?
5. If repeated simple random samples of size 60 were taken from the population, each sample would result in a slightly different 95% confidence interval. What proportion of those intervals would you expect to contain the true population proportion? Why?
6. Use Proportion Confidence Interval Simulator [Application] (CPM Educational Program, 2023) to simulate calculating 95% confidence intervals for the proportion based on 100 simple random samples of size 60 taken from a population with a proportion of “successes” equal to 0.62. Use “normal model with standard error (from p-hat)” as the “method for calculating.” Set “width of the x-axis” to 0.6 (this setting just affects the visual appearance of the figure). The simulation application will produce a figure similar to Figure 5.6 in Section 5.2 of OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. What proportion of the 100 confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why.

More Practice

7. Would you expect a 90% confidence interval to be wider or narrower than a 95% confidence interval (all else equal)? Explain your reasoning.
8. Using data from the one sample you have (sample1), calculate a 90% confidence interval for the proportion of US adults who think climate change is affecting their local community and interpret the result. Is this 90% interval wider or narrower than the 95% interval from question 3?
9. Using the simulation app from question 6, simulate calculating 90% confidence intervals for the proportion based on 100 simple random samples of size 60 taken from a population with a proportion of “successes” equal to 0.62. What proportion of the 100 confidence intervals include the true population proportion? How does this percentage compare to the confidence level selected for the intervals?
10. Repeat questions 7, 8, and 9 for a 99% confidence level. Briefly describe your findings.

References

CPM Educational Program. (2023). Proportion confidence interval simulator [Application]. https://stats.cpm.org/propCIs/

Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/

Hefferon, M. (2019, Dec. 2). Most Americans say climate change impacts their community, but effects vary by region. Pew Research Center. https://www.pewresearch.org/short-reads/2019/12/02/most-americans-say-climate-change-impacts-their-community-but-effects-vary-by-region/

OpenIntro. (n.d.-a) CC BY-SA 4.0. Foundations for statistical inference – confidence intervals. OpenIntro Labs for jamovi. https://openintro.shinyapps.io/confidence_intervals_jamovi/

OpenIntro. (n.d.-b). us_adults [Data set]. https://github.com/OpenIntroStat/oilabs-jamovi/raw/main/05b_confidence_intervals/more/us_adults.csv

 

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Software Lab 3.2 Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book