Lesson 5.1: Sampling Distributions and the Distribution of the Sample Mean

Introduction

Statistical inference is a crucial aspect of statistics that allows us to make predictions or generalizations about a population based on a sample. In this lesson, we will explore the concept of sampling distributions and the distribution of the sample mean. We will cover key terminologies like statistics and parameters, the significance of unbiased estimators, and the standard error. By the end of this lesson, you will understand how to calculate and interpret the sampling distribution of the sample mean derived from a normal population.

Learning Objectives

Understand the difference between statistics and parameters, including the notation $\mu$ and $\sigma^2$ for a population mean and variance, respectively.
Define and explain what an unbiased estimator is and comprehend the concept of standard error.
Learn the characteristics of the sampling distribution of the sample mean, especially for samples taken from a normal population.
Recognize the importance of random sampling for valid inferences.
Use the correct notation $\mu$ and $\sigma^2$ when discussing population characteristics.

Section 1: Understanding Statistics and Parameters

Statistics and parameters are fundamental concepts in statistics. Let's clarify these terms:

1.1 Definitions

Parameter: A parameter is a numerical characteristic or measure obtained by using all the data from a specific population. For example, the average height of all students in a school (\$\mu\$) is a population parameter.
Statistic: A statistic is a numerical characteristic or measure obtained by using data from a sample. For instance, the average height of a sample of 30 students from that school is a sample statistic.

1.2 Notation

Population Mean: $\mu$ (mu)
Population Variance: $\sigma^2$ (sigma squared)
Sample Mean: $\bar{x}$
Sample Variance: $s^2$

Common Misconceptions

One common misconception is that a statistic (like $\bar{x}$) can provide an exact value for a population parameter (like $\mu$). In reality, a statistic estimates a parameter, and it may be subject to sampling variability.

Example 1:

Consider the heights of all students in a school. Suppose the average height (parameter) is $\mu = 160$ cm. If we take a sample of 50 students and find their average height to be $\bar{x} = 158$ cm, $\bar{x}$ serves as an estimate of $\mu$, but it may not equal $\mu$.

Section 2: Unbiased Estimators and Standard Error

To perform statistical inference, we need to ensure our sample statistics are reliable estimates of population parameters.

2.1 Unbiased Estimator

An estimator is defined as unbiased if the expected value of the estimator equals the parameter it estimates. In terms of sample means:

$$E(\bar{x}) = \mu$$

This means that on average, the sample mean will equal the population mean when repeatedly sampling.

2.2 Standard Error

The standard error (SE) is the measure of the amount of variability or dispersion of a sample statistic from the true population parameter. It is calculated as follows:

$$SE = \frac{\sigma}{\sqrt{n}}$$

where $\sigma$ is the population standard deviation and $n$ is the sample size. The standard error decreases as the sample size increases, indicating larger samples provide more reliable estimates of the population parameters.

Example 2:

If the population standard deviation $\sigma = 15$ and we draw a sample of 36 students, the standard error will be:

$$SE = \frac{15}{\sqrt{36}} = \frac{15}{6} = 2.5$$

This suggests that the average of our sample means will typically be within 2.5 cm of the true population mean.

Section 3: Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean is critical for inference.

3.1 Definition

The sampling distribution of the sample mean is the probability distribution of averages obtained from all possible samples of a specific size from a population.

3.2 Properties

Central Limit Theorem: Regardless of the population distribution, the sampling distribution of the sample mean approaches a normal distribution as the sample size $n$ increases (usually $n \geq 30$).
Mean of the Sample Mean: The mean of this sampling distribution will equal the population mean: $$\mu_{\bar{x}} = \mu$$
Variance of the Sample Mean: The variance of the sampling distribution will be: $$Var(\bar{x}) = \frac{\sigma^2}{n}$$

3.3 Importance of Random Sampling

Random sampling is essential because it ensures that every member of the population has an equal chance of being selected. This reduces bias and allows for valid inferences about the entire population.

Example 3:

Suppose we have a population of 1000 students with a population mean height ($\mu$) of 160 cm and a population variance ($\sigma^2$) of 225 cm². If we take random samples of size 36:

The mean height of our sample will be approximately $E(\bar{x}) = 160$ cm.
The variance of the sampling distribution of $\bar{x}$ will be $Var(\bar{x}) = \frac{225}{36} = 6.25$ cm².
The standard deviation of our sample mean (standard error) will be $SE = \sqrt{6.25} = 2.5 \text{ cm}$.

Conclusion

In this lesson, we have explored the concepts of statistics and parameters, unbiased estimators, the standard error, and the sampling distribution of the sample mean. We learned that understanding these concepts is fundamental to making valid inferences from a sample to a population. Remember that while we can estimate population parameters from samples, there is always a degree of uncertainty involved.

Study Notes

Statistics vs. Parameters: Parameters characterize populations, while statistics characterize samples.
Notation: $\mu$ is the population mean; $\sigma^2$ is the population variance.
Unbiased Estimator: An estimator whose expected value equals the parameter it estimates.
Standard Error: Measures sampling variability; calculated as $SE = \frac{\sigma}{\sqrt{n}}$.
Sampling Distribution: The distribution of the sample mean becomes normal as the sample size increases (Central Limit Theorem).
Random Sampling: Crucial for valid statistical inference to reduce bias.