Lesson 1.4: Summarizing Categorical Data
![""](http://introprobabilityandstatistics.pressbooks.tru.ca/wp-content/uploads/sites/113/2022/05/3261815454_9e4fd888ba_c.jpg)
Lesson Learning Objectives
- Build a frequency table for categorical data.
- Construct bar, stacked bar, and mosaic plots to display the frequency distributions of categorical variables.
- Build two-way contingency tables to investigate possible relationships between two categorical variables.
- Calculate marginal and conditional distributions for variables in a contingency table.
- Use conditional distributions to discuss informally the issue of independence for two categorical variables.
- Compare the distributions of two or three groups with a histogram.
- Compare several groups with boxplots, which make it easy to compare centres and spreads, and spot outliers.
Lesson 1.4 Checklist
Learning activity | Graded? | Estimated time |
---|---|---|
Read OpenIntro Statistics sections 2.2 and 2.3 and supplementary notes | No | 30 mins |
Watch instructional videos | No | 15 mins |
Answer two lesson check-in questions | Yes | 15 mins |
Work through virtual statistical software lab | No | 45 mins |
Answer two virtual statistical software lab questions | Yes | 15 mins |
Work on practice exercises | No | 1.5 hours |
Explore suggested websites | No | 15 mins |
Complete and submit Unit 1 Assignment | Yes | 2 hours |
Learning Activities
Readings 📖 and Instructional Videos 🎦
Examining Numerical Data
Read Section 2.2: Considering Categorical Data in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. This section introduces a variety of techniques for summarizing categorical data in tables and graphs. As you read, look up new terminology in the Glossary and self-assess your understanding by attempting the guided practice exercises.
Watch the video, Exploring Categorical Data (Barr & Diez, 2014), on this topic (duration 00:04:58):
Summarizing Categorical Data
Read Supplementary Notes 1.4, which provides more details on frequency tables, contingency tables, and independence in contingency tables.
Case Study: Malaria Vaccine
Read Section 2.3: Case Study: Malaria Vaccine (Diez et al., 2019) CC BY-SA 3.0. This application describes an experimental study into the effectiveness of a malaria vaccine. The resulting categorical data is summarized in a contingency table. This section also goes on to introduce the notion of statistical inference. We’ll return to this more formally in Unit 3, but here the book simply uses simulations to intuitively draw a conclusion from the data about whether the variables in the contingency table are independent. Understanding the critical reasoning that underlies this case study will give you a solid foundation for the rest of this course.
Case Study: Gender Discrimination
Watch the following video, Using Randomization to Analyze a Gender Discrimination Study (Diez & McClintock, 2014), which is similar to the malaria vaccine case study, but concerns gender discrimination (duration 00:04:41).
Lesson Check-in Questions ✍
Virtual Statistical Software Lab 💻
Work through the virtual statistical software lab: Software Lab 1.4: Summarizing Categorical Data. This lab will show you how to generate simple graphical and numerical summaries of categorical data in jamovi. As you work through the lab, answer the exercises in the shaded boxes. These exercises are not graded but the solutions are available: Software Lab 1.4 Solutions. The lab should take you no more than 45 minutes to complete.
Virtual Statistical Software Lab Questions ✍
Practice Exercises 🖊
Work on the following exercises in OpenIntro Statistics: Exercises 2.21, 2.23, and 2.25 (Diez et al., 2019) CC BY-SA 3.0. Check your answers using these solutions (Diez et al., 2019) CC BY-SA 3.0. You’ll deepen your understanding much more effectively if you genuinely attempt the questions by yourself before checking the solutions.
Work on the questions in the WeBWork platform, which are linked from your Moodle course. Check your answers using the solutions provided.
Suggested Websites 🌎
- Nicole Hamilton has a nice series of videos on statistics, including on Displaying and Describing Categorical Data [Video Playlist] (Hamilton, 2023).
- Pie charts aren’t particularly useful for displaying data, but they can be pretty funny (Korolkovaite, 2023).
Unit Assignment ✍
Media Attributions
References
Barr, C. D., & Diez, D. [OpenIntroOrg]. (2014, Jan. 26). Exploring categorical data [Video]. YouTube. https://youtu.be/7NhNeADL8fA
Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Diez, D., & McClintock, S. [OpenIntroOrg]. (2014, Mar. 14). Using randomization to analyze a gender discrimination study [Video]. YouTube. https://www.youtube.com/watch?v=2pHhjx9hyM4
Hamilton, N. (2023, Mar. 14). Displaying and describing categorical data [Video Playlist]. YouTube. Retrived July 10, 2023, from https://www.youtube.com/playlist?list=PLVYFumktsk6RgbchC5UoO74u9SwQQ5wD9
Korolkovaite, I. (2023, May 27). 38 hilarious pie charts that are absolutely true. Bored Panda. https://www.boredpanda.com/funny-pie-charts/
Thorp, J. [blprnt_van]. (2009). NYTimes: Superheroes [Photograph]. Flickr. https://flic.kr/p/faYgZH