Lesson 1.1 Organizing Data

Supplementary Notes 1.1

The 5 W’s Plus H: Who, What, When, Where, Why, and How

What is the most basic ingredient of any statistical study?

Data.

What are data?

Information—numerical or categorical—recorded with a context to give it meaning.

How do we identify the context?

We answer the 5 W’s plus H, if possible: the who, what, when, where, why, and how.

ExampleIn the Figure 1 data table, the most important “who” and “what” can be identified, but without more information we really can’t answer the “when,” “where,” “why,” and “how.”

Name Age (yrs)

High School Grade

Daily Internet Use (hours)

Main Internet Activity
Krisa B. 17 12 1.4 School Related
Andy K. 18 12 3 Online Gaming
Mohsen A. 15 10 1 Social Media
Alison H. 16 10 2 School Related
Jon B. 14 9 2.2 Online Gaming
Kun Ha W. 17 11 2 Social Media
Alexa K. 16 10 2.4 Online Gaming
Sarah M. 17 12 2.3 Online Gaming
Mandip R. 16 11 2.8 Online Gaming
Wei Lyn N. 17 12 1.2 Social Media

Figure 1: Basic data table showing student internet activity. The variables are the columns and represent the “What”. The rows are the cases and represent the “Who”

Variables

A variable is a characteristic of a subject or, more generally, an experimental unit whose value varies from case to case.

What are the two types of variables?

  • Categorical (qualitative) variables: The values are categories.
  • Numerical (quantitative) variables: The values are numbers.

For the example in the previous data table (Fig. 1), let’s identify the variables and their types:

Categorical:

  1. Student Name
  2. Sex
  3. Main Internet Activity

Numerical:

  1. Age
  2. High School Grade
  3. Daily Internet Use Time

What’s meant by the values of a variable?

The values of a variable are the specific categories (for categorical variables) or numbers (for numerical variables) that the variable can take on.

Examples:

  • Categorical variable: Eye colour of humans. Possible values: brown, blue, green, black, ….
  • Numerical variable: Number of children in the family. Possible values: 0, 1, 2, 3, … children.

Do numerical variables always have units?

Absolutely, and it’s very important to include the units whenever we quote values of the variable.

Example:

  • Numerical variable: Cost of a home in Whistler, BC.
  • Possible values: 545,000 dollars, $780,000, 1.2 million dollars

Example: Identifying Five W’s plus H

Later in the course, we’ll often look at abstracts (summaries) of studies that have been published in various health science journals. The following excerpt is an example of an abstract from an article published in the Yonsei Medical Journal. The full article (Kim et al., 2006) is available from TRU Library for students registered in the TRU course.

How many of the “Five W’s plus H” can we identify from the abstract to Effects on Weight Reduction and Safety of Short-Term Phentermine Administration in Korean Obese People (Kim et al., 2006)?

Effects on Weight Reduction and Safety of Short-Term Phentermine Administration in Korean Obese People (Kim et al., 2006)

Adapted from Effects on Weight Reduction and Safety of Short-Term Phentermine Administration in Korean Obese People (Kim et al., 2006) CC BY-NC 4.0
Abstract Assessment Five W’s plus H
AimThe phentermine, an appetite suppressant, has been widely applied in Korea since 2004. However, there have been relatively few reports about the efficacy and the safety of phentermine in Korea. The aim of this study is to verify the effect of phentermine on weight reduction and the safety in Korean patients.  Why?
DesignThis randomized, double-blind, placebo-controlled study had been performed between February and July, 2005, in Seoul on 68 relatively healthy obese adults whose body mass index was 25 kg/m2 or greater. They received phentermine-HCl 37.5 mg or placebo once daily with behavioral therapy for obesity. The primary endpoints were the changes of body weight and waist circumference from the baseline in the intention-to-treat population.  How? (More on this in Unit 3.)

When and where?

Who?

What?

ResultsMean decrease of both body weight and waist circumference in phentermine-treated subjects were significantly greater than that of placebo group (weight: -6.7 ± 2.5 kg, p < 0.001; waist circumference: -6.2 ± 3.5 cm, p < 0.001). Significant number of subjects in phentermine group accomplished weight reduction of 5% or greater from the baseline and 10% or more (p < 0.001). There were no significant differences in systolic and diastolic blood pressure between the groups (p = 0.122 for systolic BP; p = 0.219 for diastolic BP). Dry mouth and insomnia were the only statistically significant adverse events that occurred more frequently in phentermine group. Most side effects of phentermine were mild to moderate in intensity.  What do p < 0.001, p = 0.122, and p = 0.219 mean?

These are probabilities called P-values, and they are calculated from statistical hypothesis tests. (Performing hypothesis tests, interpreting P-values, and statistical significance will be major topic areas later in the course.)

ConclusionShort-term phentermine administration induced significant weight reduction and reduction of waist circumference without clinically problematic adverse events on relatively healthy Korean obese people.

 

There are six variables (the “whats”) reported in the abstract:

  • “Treatment received” is a categorical variable with values “phentermine” and “placebo.”
  • “Decrease in body weight” is a numerical variable with units of “kg.”
  • “Decrease in waist circumference” is a numerical variable with units of “cm.”
  • “Difference in systolic blood pressure” is a numerical variable with units of “mmHg.”
  • “Difference in diastolic blood pressure” is a numerical variable with units of “mmHg.”
  • “Adverse events” is a categorical variable with values “dry mouth” and “insomnia” given in the abstract.

References

Kim, K.K., Cho, H.J., Kang, H.C., Youn, B.B. & Lee, K.R. (2006). Effects on weight reduction and safety of short-term phentermine administration in Korean obese people. Yonsei Medical Journal, 47(5) pp. 614-625. Adapted with permission.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Probability and Statistics Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book