Lesson 1.4: Summarizing Categorical Data

""
“NYTimes: Superheroes” by Jer Thorp is licensed under CC BY 2.0

Lesson Learning Objectives

  • Build a frequency table for categorical data.
  • Construct bar, stacked bar, and mosaic plots to display the frequency distributions of categorical variables.
  • Build two-way contingency tables to investigate possible relationships between two categorical variables.
  • Calculate marginal and conditional distributions for variables in a contingency table.
  • Use conditional distributions to discuss informally the issue of independence for two categorical variables.
  • Compare the distributions of two or three groups with a histogram.
  • Compare several groups with boxplots, which make it easy to compare centres and spreads, and spot outliers.

Lesson 1.4 Checklist

Learning activity Graded? Estimated time
Read OpenIntro Statistics sections 2.2 and 2.3 and supplementary notes No 30 mins
Watch instructional videos No 15 mins
Answer two lesson check-in questions Yes 15 mins
Work through virtual statistical software lab No 45 mins
Answer two virtual statistical software lab questions Yes 15 mins
Work on practice exercises No 1.5 hours
Explore suggested websites No 15 mins
Complete and submit Unit 1 Assignment Yes 2 hours

 

Learning Activities

Readings 📖 and Instructional Videos 🎦

Examining Numerical Data

Read Section 2.2: Considering Categorical Data in OpenIntro Statistics (Diez et al., 2019) CC BY-SA 3.0. This section introduces a variety of techniques for summarizing categorical data in tables and graphs. As you read, look up new terminology in the Glossary and self-assess your understanding by attempting the guided practice exercises.

Watch the video, Exploring Categorical Data (Barr & Diez, 2014), on this topic (duration 00:04:58): 

Summarizing Categorical Data

Read Supplementary Notes 1.4, which provides more details on frequency tables, contingency tables, and independence in contingency tables.

Case Study: Malaria Vaccine

Read Section 2.3: Case Study: Malaria Vaccine (Diez et al., 2019) CC BY-SA 3.0. This application describes an experimental study into the effectiveness of a malaria vaccine. The resulting categorical data is summarized in a contingency table. This section also goes on to introduce the notion of statistical inference. We’ll return to this more formally in Unit 3, but here the book simply uses simulations to intuitively draw a conclusion from the data about whether the variables in the contingency table are independent. Understanding the critical reasoning that underlies this case study will give you a solid foundation for the rest of this course.

Case Study: Gender Discrimination

Watch the following video, Using Randomization to Analyze a Gender Discrimination Study (Diez & McClintock, 2014), which is similar to the malaria vaccine case study, but concerns gender discrimination (duration 00:04:41).

Lesson Check-in Questions ✍

Answer the two check-in questions for Lesson 1.4 in your Moodle course. The questions are based on the material covered in the readings and instructional videos. The questions are multiple-choice, fill-in-the-blank, matching, or calculation questions, and they are auto-graded in Moodle. Once you access the questions, you have 15 minutes to submit your answers. Overall the Lesson Check-in Questions count 6% toward your total grade.

Virtual Statistical Software Lab 💻

Work through the virtual statistical software lab: Software Lab 1.4: Summarizing Categorical Data. This lab will show you how to generate simple graphical and numerical summaries of categorical data in jamovi. As you work through the lab, answer the exercises in the shaded boxes. These exercises are not graded but the solutions are available: Software Lab 1.4 Solutions. The lab should take you no more than 45 minutes to complete.

Virtual Statistical Software Lab Questions ✍

Answer the two virtual statistical software lab questions for Software Lab 1.4 in your Moodle course. The questions are based on the lab you just completed. The questions are multiple-choice, fill-in-the-blank, matching, or calculation questions, and they are auto-graded in Moodle. Once you access the questions, you have 15 minutes to submit your answers. Overall the Software Lab Questions count 6% toward your total grade.

Practice Exercises 🖊

Work on the following exercises in OpenIntro Statistics: Exercises 2.21, 2.23, and 2.25 (Diez et al., 2019) CC BY-SA 3.0.  Check your answers using these solutions (Diez et al., 2019) CC BY-SA 3.0. You’ll deepen your understanding much more effectively if you genuinely attempt the questions by yourself before checking the solutions.

Work on the questions in the WeBWork platform, which are linked from your Moodle course. Check your answers using the solutions provided.

Suggested Websites 🌎

Unit Assignment ✍

Having completed the four lessons in this unit, you should now do the Unit 1 Assignment in your Moodle course, which counts 6% towards your overall grade. There are six questions—a mix of short-answer, multiple-choice, and calculation questions—and you submit your answers directly in the assignment in Moodle. There is no time limit for completing the assignment, and you do not have to complete it in one sitting. Three of the questions will be auto-graded in Moodle, and three will be manually graded by your Open Learning Faculty Member. You are recommended to submit this assignment before you start the next unit. That way you can benefit from your Open Learning Faculty Member’s feedback while working on subsequent assignments.

Media Attributions

NYTimes: Superheroes, by Jer Thorp (2009), on Flickr, CC BY 2.0

References

Barr, C. D., & Diez, D. [OpenIntroOrg]. (2014, Jan. 26). Exploring categorical data [Video]. YouTube. https://youtu.be/7NhNeADL8fA

Diez, D. M., Çetinkaya-Rundel, M., Barr, C. D. (2019). OpenIntro Statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/

Diez, D., & McClintock, S. [OpenIntroOrg]. (2014, Mar. 14). Using randomization to analyze a gender discrimination study [Video]. YouTube. https://www.youtube.com/watch?v=2pHhjx9hyM4

Hamilton, N. (2023, Mar. 14). Displaying and describing categorical data [Video Playlist]. YouTube. Retrived July 10, 2023, from https://www.youtube.com/playlist?list=PLVYFumktsk6RgbchC5UoO74u9SwQQ5wD9

Korolkovaite, I. (2023, May 27). 38 hilarious pie charts that are absolutely true. Bored Panda. https://www.boredpanda.com/funny-pie-charts/

Thorp, J. [blprnt_van]. (2009). NYTimes: Superheroes [Photograph]. Flickr. https://flic.kr/p/faYgZH

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Probability and Statistics Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book