Lesson 6.4: Tests for Proportions and Paired Tests

Introduction

In this lesson, students will learn about hypothesis testing, specifically focusing on tests for proportions and paired tests. By the end of this lesson, students should be able to:

Test the difference between two binomial proportions using a pooled estimate.
Conduct paired tests, including the sign test, Wilcoxon signed-rank test, and paired t-test, and select the appropriate test based on context.
Understand the validity conditions for each paired test and interpret results in context.

Tests for Proportions

Understanding Proportions

A proportion is a type of ratio that compares a part to a whole. In statistical hypothesis testing, we often want to know whether two populations have different proportions. To address this, we use a method called the pooled proportion test.

Hypothesis Testing for Binomial Proportions

When testing the difference between two proportions, we typically formulate the following hypotheses:

Null Hypothesis ($H_0$): There is no difference in proportions between the two groups (i.e., $p_1 = p_2$).
Alternative Hypothesis ($H_a$): There is a difference (i.e., $p_1 \neq p_2$), or specifically $p_1 < p_2$ or $p_1 > p_2$ depending on the context.

Pooled Estimate of Proportion

When the null hypothesis is that the two proportions are equal, we can use a pooled estimate of the proportion ($\hat{p}$) as follows:

$$\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}$$

where:

$x_1$ = number of successes in sample 1
$x_2$ = number of successes in sample 2
$n_1$ = total number of trials in sample 1
$n_2$ = total number of trials in sample 2

Standard Error of the Difference

The standard error (SE) for the difference between two proportions can be calculated using the formula:

$$SE = \sqrt{\hat{p}(1 - \hat{p}) \left( \frac{1}{n_1} + \frac{1}{n_2} ight)}$$

Z-test for Proportions

To test the difference in proportions, we calculate the Z-test statistic:

$$Z = \frac{\hat{p_1} - \hat{p_2}}{SE}$$

This statistic follows a standard normal distribution under the null hypothesis.

Worked Example: Testing Differences in Proportions

Problem Statement:

Let’s assume you want to determine if a new advertising campaign leads to higher success rates in sales compared to the old campaign. In the old campaign, 30 out of 100 customers made a purchase, and in the new campaign, 50 out of 120 customers made a purchase.

Define the hypotheses:

$H_0$: $p_{old} = p_{new}$
$H_a$: $p_{old} \neq p_{new}$

Calculate the pooled proportion:

$$\hat{p} = \frac{30 + 50}{100 + 120} = \frac{80}{220} \approx 0.3636$$

Compute the standard error:

$$SE = \sqrt{0.3636(1 - 0.3636) \left( \frac{1}{100} + \frac{1}{120} ight)} \approx \sqrt{0.3636 \times 0.6364 \left(0.01 + 0.00833 ight)}$$

$$\approx \sqrt{0.3636 \times 0.6364 \times 0.01833} \approx 0.0789$$

Calculate the Z-score:

$$Z = \frac{0.30 - 0.4167}{0.0789} = \frac{-0.1167}{0.0789} \approx -1.477$$

Decide on the significance level (e.g., $\alpha = 0.05$) and determine the critical z-value. For a two-tailed test:

Critical z-values are approximately $\pm 1.96$.

Conclusion:

Since $-1.477$ does not fall in the critical region, we fail to reject the null hypothesis. There is not enough evidence to suggest that the new campaign is more effective.

Paired Tests

Understanding Paired Tests

Paired tests are used when you have two related samples, such as measurements taken on the same subjects before and after an intervention. The key idea is to analyze the differences between the pairs.

Types of Paired Tests

Sign Test: This is the simplest paired test that analyzes the signs of the differences.
Wilcoxon Signed-Rank Test: This test considers both the signs and the ranks of the differences.
Paired t-test: This parametric test is used when the differences between paired observations are normally distributed.

Conducting a Paired t-test

Hypotheses: Formulate the hypotheses:

$H_0$: The mean difference between pairs is zero.
$H_a$: The mean difference is not zero.

Calculate differences:

If you have paired observations $(X_i, Y_i)$, compute the differences $D_i = Y_i - X_i$.

Mean and Standard Deviation of Differences:

Mean difference: $$\bar{D} = \frac{\sum D_i}{n}$$
Standard deviation: $$s_D = \sqrt{\frac{\sum (D_i - \bar{D})^2}{n - 1}}$$

t-statistic:

The t-statistic can be calculated as:

$$t = \frac{\bar{D}}{s_D / \sqrt{n}}$$

Degrees of Freedom (df): For a paired t-test, df = $n - 1$ where $n$ is the number of pairs.

Decision: Compare the computed t-statistic against the critical t-value from the t-distribution for the given alpha level and degrees of freedom.

Worked Example: Paired t-test

Problem Statement:

A researcher tests a new diet on a sample group for one month and measures their weight loss in pounds. The data collected is as follows:

Original weights: 200, 180, 150, 160
Weights after diet: 195, 175, 145, 155

Calculate differences:

$D_1 = 195 - 200 = -5$
$D_2 = 175 - 180 = -5$
$D_3 = 145 - 150 = -5$
$D_4 = 155 - 160 = -5$

Thus, the differences are: -5, -5, -5, -5.

Mean of differences:

$$\bar{D} = \frac{-5 -5 -5 -5}{4} = -5$$

Standard deviation of differences:

$s_D = 0$ (since all differences are the same)

Compute the t-statistic:

Since $s_D = 0$, we cannot compute a t-statistic because it leads to division by zero. This indicates all subjects lost a consistent weight. If a paired t-test isn’t valid due to this, consider using a non-parametric test if necessary.

Conclusion

This lesson introduced students to further hypothesis testing methods focused on proportions and paired groups. students learned both how to conduct tests for two proportions using a pooled estimator and to perform paired tests, assessing the differences between related samples. Grasping these concepts is essential for analyzing data that involves comparing groups under different conditions.

Study Notes

Pooled Proportion: Estimate derived from both samples.
Standard Error (SE): Reflects variability in the sample proportions.
Z-Test for Proportions: Involves computing a Z statistic to analyze the difference.
Paired Tests: Important for analyzing dependent samples.
Statistical Significance: Importance of context in interpreting results.