Lesson 5.2: Inference for Difference in Means from Two Independent Groups

Software Lab 5.2

Two-Sample t-Tests and t-Intervals

Part of this software lab is adapted from Inference for Numerical Data (OpenIntro, n.d.-b) CC BY-SA 4.0.

As you work through the lab, answer the ungraded exercises in the shaded boxes. Check your answers by consulting the Software Lab 5.2 Solutions.

Remember to complete the graded Software Lab Questions for this section in Moodle.

North Carolina Births: The Data

Download ncbirths150 [CSV file] (OpenIntro, n.d.-a) and load it into jamovi. This dataset is a random sample of 100 births for babies in North Carolina where the mother was not a smoker and another 50 where the mother was a smoker. This dataset is analyzed in Section 7.3.2 in the textbook. The variables we’ll be using in this lab are:

  • weight: birth weight of the baby
  • smoke: whether or not the mother was a smoker

Hypothesis Test for the Difference of Two Independent Means

Is there convincing evidence that newborns from mothers who smoke have a different average birth weight than newborns from mothers who don’t smoke? We’ll conduct a two-sided hypothesis test for the difference of two independent means to answer this question.

The hypotheses are H0: µ1 − µ2 = 0 versus HA: µ1 − µ2 ≠ 0 (µ1 is the mean for the nonsmoker group, µ2 is the mean for the smoker group). The test statistic t = \dfrac{(\overline{y}_1-\overline{y}_2)-(\mu_1-\mu_2)}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}} can be modeled by Student’s t-model with degrees of freedom given by df = \dfrac{\left( \dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2} \right)^2}{\dfrac{1}{n_1-1}\left( \dfrac{s_1^2}{n_1} \right)^2 + \dfrac{1}{n_2-1}\left( \dfrac{s_2^2}{n_2} \right)^2}, assuming the following conditions are satisfied:

  • Independence Between Groups: The two groups that we are comparing are independent of each other. This means that there is no linkage or association between the two groups. This would be the case in a completely randomized experiment where the two groups are formed at random, but it would not be the case if we used twin pairs, for example, to form the two groups.
  • Independence Within Groups: Within each group, the individual measurements are independent of each other.
  • Random: Each of the two samples is randomly drawn from their respective populations.
  • Nearly Normal Condition: For each of the two samples, the data come from a population that is nearly normal. This condition is important for small data sets, but if each sample is relatively large (say > 30), we don’t have to worry about it too much.
  • 10% Condition: Each of the two sample sizes, n1 and n2, is no more than 10% of their respective population sizes.
1. Select Analyses > T-Tests > Independent Samples T-Test, move weight to the Dependent Variables box, move smoke to the Grouping Variable box, and under Tests select Welch's. Unselect Student's if it is selected already. Also, under Additional Statistics select Mean difference. Calculate the test statistic using the “Mean difference” and “SE difference” and check it matches the value in the “Independent Samples T-Test” output (within rounding error). Hint: Your calculation won’t match the value in the textbook, since there are rounding errors in the textbook calculation. It also won’t match the value you get if you input the sample statistics into an online t-test calculator [Application] , since there are rounding errors involved with that too. Check your answer by consulting the Software Lab 5.2 Solutions.
2. Under Additional Statistics select Descriptives. Use the group sample sizes and standard deviations to calculate the degrees of freedom, and check it matches the value in the “Independent Samples T-Test” output (within rounding error).
3. Calculate the p-value, and check it matches the value in the “Independent Samples T-Test” output (within rounding error). Hint: Select R > Rj Editor and run the following code: 2*(1-pt(1.50, df=89.3)). Your calculation won’t match the value in the textbook, which uses the wrong degrees of freedom value.
4. Evaluate the hypothesis test based on a significance level \alpha = 0.05 and draw a conclusion in the context of the problem.

High-Schoolers’ Physical Activity: The Data

Download yrbss_activity [CSV file] (OpenIntro, n.d.-a) and load it into jamovi. This dataset is based on the United States’ Centers for Disease Control and Prevention Youth Risk Behavior Surveillance System (YRBSS) survey. We used data from the survey previously in Software Lab 4.1 and Software Lab 4.3.

The variables we’ll be using in this lab are:

  • height: self-reported height in metres
  • weight: self-reported weight in kilograms
  • physically.active.7d: days per week that the participant is physically active

After opening the data in jamovi, create the following new variables (go to the Data tab and double-click the header of the first empty column):

  • bmi: use formula weight/height^2
  • physical_3plus: use formula IF(physically_active_7d>2,"yes","no")

Hypothesis Test for the Difference of Two Independent Means

Is there convincing evidence that high-schoolers who are physically active at least three days a week have a different average body mass index (BMI) than high-schoolers who are physically active two or fewer days a week? As with the North Carolina births example, we’ll conduct a two-sided hypothesis test for the difference of two independent means to answer this question.

5. Select Analyses > T-Tests > Independent Samples T-Test, move bmi to the Dependent Variables box, move physical_3plus to the Grouping Variable box, and under Tests select Welch's. Unselect Student's if it is selected already. Also, under Additional Statistics select Mean difference. Calculate the test statistic using the “Mean difference” and “SE difference,” and check it matches the value in the “Independent Samples T-Test” output (within rounding error).
6. Under Additional Statistics select Descriptives. Use the group sample sizes and standard deviations to calculate the degrees of freedom and check it matches the value in the “Independent Samples T-Test” output (within rounding error).
7. Confirm the calculation of the p-value. Hint: Select R > Rj Editor and run the following code: 2*pt(-4.52, df=6959).
8. Evaluate the hypothesis test based on a significance level \alpha = 0.05 and draw a conclusion in the context of the problem.

Confidence Interval for the Difference of Two Independent Means

A confidence interval for the difference of two independent means is (\overline{y}_1-\overline{y}_2) \pm t^* \times \sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}, where t^* comes from a t-distribution with degrees of freedom given above.

9. Still in the “Independent Samples T-Test” dialog, under Additional Statistics select Confidence interval. Confirm the calculation of the 95% confidence interval for the mean difference using the “Mean difference,” “SE difference,” and the appropriate value of t^* (within rounding error). Hint: Find the value of t^* by selecting R > Rj Editor and running the following code: qt(0.975, df=6959).
10. Interpret the interval in the context of the problem. Would you say this result is practically significant?

References

OpenIntro. (n.d.-a). Data sets [Data sets]. https://openintro.org/data/

OpenIntro. (n.d.-b) CC BY-SA 4.0. Inference for numerical data. OpenIntro Labs for jamovi. https://openintrostat.github.io/oilabs-jamovi/07_inf_for_numerical_data/inf_for_numerical_data.html

Statistics Kingdom. (n.d.). Two sample t-test calculator (Welch’s t-test) [Application]. https://www.statskingdom.com/150MeanT2uneq.html

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Software Lab 5.2 Copyright © 2023 by Thompson Rivers University is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book