Lesson 5.2: Sampling Distributions and the Central Limit Theorem

Introduction

In this lesson, we will explore the concepts of sampling distributions and the Central Limit Theorem (CLT). Understanding these topics is critical as they form the foundation for making inferences about populations based on sample data, a crucial aspect of statistical analysis.

Learning Objectives

By the end of this lesson, you, students, should be able to:

Describe the distribution of the sample mean when the underlying variable is normally distributed.
Understand the Central Limit Theorem for large samples (with sample size $n \geq 30$) when the underlying population is not normal.
Identify situations where the Central Limit Theorem applies.
State the distribution of the sample mean for a normal population.
Apply the Central Limit Theorem to calculate probabilities for a sample mean derived from a non-normal population with a sufficiently large sample size.

Sampling Distributions

What is a Sampling Distribution?

A sampling distribution is the probability distribution of a statistic (for example, the sample mean) obtained by selecting random samples from a population. This distribution captures the variability of the sample statistic across all possible samples from the population.

Example: Sampling Distribution of the Mean

Imagine we have a population of test scores from a class of students:

Student	Score
1	70
2	80
3	90
4	95
5	100

If we were to take all possible samples of size 2 from this population, we could calculate the sample means for each sample.

Calculation of Sample Means

Let's consider the sample combinations:

Sample (70, 80): Sample Mean = $\frac{70 + 80}{2} = 75$
Sample (70, 90): Sample Mean = $\frac{70 + 90}{2} = 80$
Sample (70, 95): Sample Mean = $\frac{70 + 95}{2} = 82.5$
Sample (70, 100): Sample Mean = $\frac{70 + 100}{2} = 85$
Sample (80, 90): Sample Mean = $\frac{80 + 90}{2} = 85$
Sample (80, 95): Sample Mean = $\frac{80 + 95}{2} = 87.5$
Sample (80, 100): Sample Mean = $\frac{80 + 100}{2} = 90$
Sample (90, 95): Sample Mean = $\frac{90 + 95}{2} = 92.5$
Sample (90, 100): Sample Mean = $\frac{90 + 100}{2} = 95$
Sample (95, 100): Sample Mean = $\frac{95 + 100}{2} = 97.5$

With these calculated means, we can create a sampling distribution for the sample means.

The Central Limit Theorem (CLT)

What is the Central Limit Theorem?

The Central Limit Theorem states that, irrespective of the population's distribution, the distribution of the sample means approaches a normal distribution as the sample size ($n$) becomes large (typically when $n \geq 30$). This is crucial because it allows us to make inferences using normal distribution models even if the population itself does not follow a normal distribution.

Importance of the Central Limit Theorem

The CLT is particularly useful because:

It allows for the use of the normal model in hypothesis testing and confidence intervals, making it easier to calculate probabilities.
For populations that are not normally distributed, as long as the sample size is large enough, the distribution of the sample mean will be approximately normal.

Applying the Central Limit Theorem

Example: Application of the CLT

Let’s suppose we have a population of plant heights with a mean height of $\mu = 50$ cm and a standard deviation of $\sigma = 10$ cm, but the distribution of the plant heights is skewed. We want to find the probability that the mean height of a sample of size $n = 40$ cm is greater than 52 cm.

Calculate the Standard Error (SE):

The standard error of the mean is given by the formula:

$$SE = \frac{\sigma}{\sqrt{n}}$$

Substituting the values, we get:

$$SE = \frac{10}{\sqrt{40}} \approx 1.58$$

Find the Z-Score:

Next, we calculate the Z-score for a sample mean of 52 cm:

$$Z = \frac{\bar{x} - \mu}{SE} = \frac{52 - 50}{1.58} \approx 1.27$$

Use Z-Table to find Probability:

Using a Z-table, we find that a Z-score of 1.27 corresponds to a probability of approximately 0.8980. Therefore, the probability of obtaining a sample mean greater than 52 cm is:

$$P(X > 52) = 1 - 0.8980 = 0.1020$$

Thus, there is a 10.20% chance that the mean height of a random sample of 40 plants will exceed 52 cm.

Handling Non-Normal Populations

In many cases, we will encounter populations that are not normally distributed. The Central Limit Theorem assures us that for large enough sample sizes, we can still use the normal distribution to model the means of these samples.

Practical Implications

This is particularly useful in fields like quality control, polling, and research where data often comes from various distributions. When sample sizes exceed 30, we can apply inference procedures, knowing that our sample means will resemble a normal distribution.

Conclusion

In this lesson, we explored the vital concepts of sampling distributions and the Central Limit Theorem. We learned how sample means are distributed and how the CLT allows us to make statistical inferences even when the underlying population is not normally distributed, as long as we have a sufficiently large sample size.

Understanding these concepts will greatly enhance your capability to perform hypothesis testing and develop confidence intervals.

Study Notes

A sampling distribution is a distribution of a statistic based on many random samples.
The sampling distribution of the sample mean captures the variability of sample means in relation to the population.
The Central Limit Theorem states that the distribution of sample means approaches normality as sample size increases, regardless of the population's distribution (for $n \geq 30$).
The standard error quantifies the variability of the sample mean.
For non-normal populations, apply the CLT to make inferences about the population using sample data.