Supplementary Notes 5.1

Iain Pardoe

Lesson 5.1: Inference for One Mean or a Mean Difference from Two Paired Groups

Supplementary Notes 5.1

Simulating the Sampling Distribution of a Mean

On average, how much do BC post-secondary students owe?

Too much is the quick answer; however, statistically, the question asks us to estimate the mean student indebtedness, $\mu$ , for the population of all BC students. When we take our sample, we will calculate the mean indebtedness for students in the sample, $\overline{y}$ , and use it to estimate $\mu$ .

But, we know that the sample mean, $\overline{y}$ , would fluctuate from sample to sample depending on which students are actually selected for the sample. To be able to answer questions relating to how accurately $\overline{y}$ estimates $\mu$ , we need an answer to the same question that we posed and answered for sample proportions in Lesson 4.1.

As the sample mean varies from sample to sample, what pattern does it follow, where is it centred, and how much variability does it have?

Simulation Model

For simplicity, let’s suppose student debt is uniformly distributed between $0 and $3,000. For this simplified population model, the true (population) mean is $\mu=$1,500$ and the standard deviation is $\sigma=$870$ . Let’s use a sample size of n = 200 students. The first simulated sample based on this model produces $\overline{y}=$1,580$ , which is a little high in its estimate of $\mu=$1,500$ .

Three more simulated samples of 200 students produce sample means of $1,550, $1,520, and $1,470. As expected, the sample means are fluctuating around the population mean of $1,500.

So, 100 simulated sample means fall between $1,280 and $1,630 and produce the following histogram, which approximates the sampling distribution for a mean.

**Figure 2:** Sampling distribution histogram of means from 100 random samples

To answer the questions posed earlier: The pattern (shape) of the histogram looks somewhat like a normal model. Its centre is approximately $1,500 (the same as the population mean), and its variability (measured by its standard deviation) is only about $60 (quite a bit smaller than the population standard deviation).

Normal Model for the Sampling Distribution of a Mean

The development of the sampling distribution for a mean, $\overline{y}$ , parallels very closely our earlier development of the sampling distribution for a proportion in Supplementary Notes 3.1. If we visualize taking all possible samples, $\overline{y}$ fluctuates according to a probability distribution called a sampling distribution for $\overline{y}$ . As with proportions, we’ll use the sampling distribution for $\overline{y}$ to judge whether or not a particular value for $\overline{y}$ is unusual.

In general, the exact sampling distribution for $\overline{y}$ is quite complex, but under some reasonable conditions, we can approximate it quite accurately with a normal model.

**Figure 3:** Sampling a population mean

When we developed the sampling distribution for a proportion in Supplementary Notes 3.1, we introduced the central limit theorem (CLT), which stated that for large independent samples, the sampling distribution of the sample proportion, $\hat{p}$ , can be approximated by a normal model with mean, $\mu(\hat{p})=p$ , and standard deviation, $\sigma(\hat{p})=\sqrt{\dfrac{p(1-p)}{n}}$ . The CLT can also be applied to the sampling distribution for a mean:

Recall from Supplementary Notes 3.1 that the standard deviation of a sample estimate based on its sampling distribution is called the standard error of the estimate. So, in this case, we say the standard error of the sample mean is $SE(\overline{y})=\dfrac{\sigma}{\sqrt{n}}$ .

What’s so Amazing About the CLT?

The amazing thing is that no matter what the shape is for the distribution of the parent population, the shape of the sampling distribution of $\overline{y}$ is approximately normal. In practical terms, we could be dealing with any numeric population (i.e., blood pressures, reaction times, temperatures, puppy weights, etc.) and the CLT says that the sample mean will fluctuate according to a normal model (approximately at least). The normal model tells us that on average the estimate $\overline{y}$ will “hit” the target $\mu$ and that it will typically deviate from $\mu$ by $\sigma(\overline{y})=\sigma/\sqrt{n}$ (which decreases as the sample size, n, increases). Equivalently, $z=\dfrac{\overline{y}-\mu}{\sigma/\sqrt{n}}$ follows a standard normal model with mean 0 and standard deviation 1, written N(0, 1).

Problem! We typically don’t know $\sigma$ , the population standard deviation. Can we just use the sample standard deviation, $s$ , as an approximation for $\sigma$ ? In other words, can we estimate the standard error of the sample mean by $SE(\overline{y})\approx\dfrac{s}{\sqrt{n}}$ ?

Great idea, but it turns out that $\dfrac{\overline{y}-\mu}{s/\sqrt{n}}$ does not follow a normal model.

What model can we use?

Amazingly, a Guinness brewery worker proved that under certain conditions, the correct model to use is a t-model.

Student’s t-Models

Working in quality control experiments in the early 1900s, a Guinness brewery worker named William Gosset discovered that the standardized sample mean, $\dfrac{\overline{y}-\mu}{s/\sqrt{n}}$ , didn’t quite follow the standard normal model, N(0, 1). Writing under the pseudonym Student, he derived a new probability model for $\dfrac{\overline{y}-\mu}{s/\sqrt{n}}$ called Student’s t-distribution that is very similar to the N(0, 1) curve but is somewhat flatter in the middle and wider (more probability in the tails). This extra probability in the tails compensates for our additional uncertainty that results from our replacement of the population SD, $\sigma$ , by the sample SD, $s$ .

Student's t curve — **Figure 5:** Student’s t-curve (broken line) compared to a normal curve (solid line)

Degrees of Freedom

In fact, Student’s t-model isn’t just one curve, it’s a whole family of curves where each curve depends on a parameter called its degrees of freedom (df). For small degrees of freedom (like 2), the t-curve is much flatter than the N(0, 1) bell curve, but as the degrees of freedom increase, the t-curves get closer and closer to the N(0, 1) bell curve.

t-curves — **Figure 6:** Various t-curve shapes: Standard normal N(0, 1) curve (solid line); Student’s t-curve with 10 df (dotted line); Student’s t-curve with 2 df (dash and dot line)

Student’s t-Model Conditions

When the conditions below are met, $t=\dfrac{\overline{y}-\mu}{s/\sqrt{n}}$ follows a Student’s t-model with n − 1 df.

Conditions:

Independence: The individual responses in the sample are independent of each other.
Random: The sample is random.
10% Condition: The sample size n is no more than 10% of the population size.
Nearly Normal Condition: The data come from a population that is nearly normal. This condition is important for small data sets (say n < 30), but its importance wears-off as the sample size n increases (for n ≥ 30 we can essentially ignore the nearly normal condition as long as there are no particularly extreme outliers).

How to Check the Nearly Normal Condition?

The best way to confirm the nearly normal condition is to draw a graph of the data. Which graph? For very small data sets, the normal probability plot (Q–Q plot) from Supplementary Notes 2.3 is the best choice. For larger data sets, a histogram can be used; although the normal probability plot is still the best choice.

One-Sample t-Test for the Mean

Let’s explore this topic using examples.

Example: Oatmeal Raisin Cookies

A manufacturer of oatmeal raisin cookies advertises a mean fat content of 3 grams per cookie. The fat content (in grams) for a random sample of 12 cookies are: 3.2, 3.4, 3.7, 2.7, 2.8, 3.4, 3.0, 3.5, 3.2, 3.1, 3.6, and 2.9.

Download the data from cookie [CSV file] and open it in jamovi. Does this data provide sufficient evidence to conclude that the mean fat content, µ, for the population of all oatmeal raisin cookies is greater than the advertised 3.0 grams?

H₀: µ = 3.0 grams
H_A: µ > 3.0 grams

Model Conditions

Independence: Reasonable to assume cookie-to-cookie fat content is independent as long as the random sample was drawn from different batches of cookies.
Random: We are told in the question that the sample is random.
10% Condition: Since the cookie population size is essentially unlimited, the sample of 12 cookies is far less than 10% of the population size.
Nearly Normal Condition: Is it plausible that the fat content measurements come from a normal model? Let’s investigate by drawing a normal probability plot in jamovi using Analyses > Exploration > Descriptives.

normal probability plot - cookies — **Figure 7:** A normal probability plot for cookie fat content

The points in the normal probability plot (Fig. 7) are quite close to the straight line and there are no outliers, so it is plausible that the fat contents come from a normal model.

It’s now reasonable to proceed with the t-model for $t=\dfrac{\overline{y}-3}{s/\sqrt{n}}$ with 12 − 1 = 11 df.

Mechanics

Test Statistic Calculation: $t=\dfrac{\overline{y}-3}{s/\sqrt{n}}=\dfrac{3.21-3}{0.320/\sqrt{12}} = 2.27$ .
P-Value Calculation: 1 - pt(2.27, df=11) ≈ 0.022.

Cookie t-test p-value — **Figure 8:** The t-test calculation for cookie fat content p-value

Alternatively, use jamovi’s built-in t-test by selecting Analyses > T-Tests > One Sample T-Test, move fat to the Dependent Variables box, type “3” for the Test value, and select “> Test value” for the (alternative) hypothesis. You can also view the normal probability plot by selecting “Q–Q Plot.” The test statistic value is 2.25 and the p-value is 0.023 (these values are more accurate than the previous hand calculations, which involved some rounding error):

jamovi t-test - cookies — **Figure 9:** Cookie fat content t-test in jamovi

Conclusion

With a p-value as small as 0.023, reject the null hypothesis in favour of the alternative hypothesis; i.e., there is evidence that the true mean fat content of the cookies is greater than the advertised 3.0 grams.

Alternatively, reject H₀ in favour of H_A if the test statistic is in the rejection region (either less than the negative critical value or greater than the positive critical value). Do not reject H₀ if the test statistic is not in the rejection region (i.e., it is between the negative and positive critical values). The critical value in the cookies example is 2.2010, the 97.5^th percentile of the t distribution with 11 degrees of freedom. Since the test statistic, $t=2.25$ is greater than 2.2010, it is in the rejection region, so we reject H₀ in favour of H_A.

Example: Laser Eye Surgery

A laser eye surgery knife (microkeratome) is designed to cut a corneal flap with a mean flap thickness of 160 micrometres. The flap thicknesses (in micrometres) for a random sample of ten surgeries are: 156.4, 160.3, 162.7, 160.1, 156.8, 163.9, 157.6, 155.1, 162.2, and 151.3. Download the data from cornea [CSV file] and open it in jamovi.

Does this data provide sufficient evidence to conclude that the knife cuts a corneal flap with a mean population flap thickness, µ, different from 160 micrometres?

H₀: µ = 160 micrometres
H_A: µ ≠ 160 micrometres