Lesson 6.5: Non-parametric tests, ANOVA and effect size

Introduction

In this lesson, we will explore non-parametric tests, analysis of variance (ANOVA), and the concept of effect size. By the end of this lesson, you, students, should be able to select the appropriate test for your data, understand the assumptions involved, and interpret results in context. We will begin by discussing non-parametric tests, which are valuable when we cannot assume our data follow a normal distribution or when data are measured on a nominal or ordinal scale. We will then move into ANOVA, which is used to compare means across groups, followed by a detailed look at effect size, which helps us understand the practical significance of our findings.

Learning Objectives

Understand and apply the Sign and Wilcoxon signed-rank tests for a single population median.
Conduct the Wilcoxon rank-sum (Mann-Whitney) test for differences between two independent samples, including validity conditions.
Perform one-way and two-way (without replication) ANOVA using completely randomized and randomized block designs, with considerations for normality and equal variances.
Define and use effect size, particularly Cohen's d, with standard guideline boundaries for small, medium, and large effects.
Select and carry out the appropriate non-parametric test for single-sample, paired, or two-independent-sample data.
Conduct a one-way or two-way ANOVA, stating assumptions and interpreting the results in context.

Non-parametric Tests

Non-parametric tests do not assume a specific distribution for the data. They are especially useful when dealing with ordinal data or when the sample sizes are small. Let's look at two common non-parametric tests: the Sign test and the Wilcoxon signed-rank test.

Sign Test

The Sign test is one of the simplest non-parametric tests, used primarily to determine whether the median of a single population differs from a specified value.

Example of the Sign Test

Suppose we want to know if the median height of students in a class is different from 160 cm. We gather the following height data (in cm):

State the null and alternative hypotheses:

$H_0$: The median height is 160 cm.
$H_1$: The median height is not 160 cm.

Calculate the signs:

The differences from 160 cm are:
155 - 160 = -5 → Sign: -
162 - 160 = 2 → Sign: +
158 - 160 = -2 → Sign: -
165 - 160 = 5 → Sign: +
151 - 160 = -9 → Sign: -
Count the number of positive and negative signs:
Positives: 2, Negatives: 3

Perform the Sign test:

Find the smaller count among positive and negative signs: 2.
Use a binomial distribution with $n=5$ and $p=0.5$ to determine the critical value.
If we use a significance level of $\alpha = 0.05$, we find the critical value is 0 or 1.
Since the number of negative signs exceeds the critical value, we fail to reject $H_0$. Thus, we conclude that there is not enough evidence to say the median height is different from 160 cm.

Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test builds on the sign test by considering the magnitude of the differences instead of just their signs. It is used for paired samples.

Example of the Wilcoxon Signed-Rank Test

Imagine we want to test if a new teaching method improves student performance. We have scores from 10 students before and after applying the new method:

Before: 50, 55, 45, 60, 70, 80, 55, 65, 75, 85
After: 60, 65, 55, 70, 75, 90, 70, 78, 80, 90

Calculate the differences:

10, 10, 10, 10, 5, 10, 15, 13, 5, 5

Rank the absolute differences (disregarding sign):

Ranks: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Assign the signs back to the ranks:

All positive. Total ranks = 55.

Calculate the test statistic $W$:

$W = $ Sum of ranks of positive signs = 55.

Determine statistical significance:

Compare $W$ against critical values in Wilcoxon tables or calculate a p-value.
If $W$ is greater than the critical value, reject $H_0$. We conclude there is enough evidence to say that the new teaching method significantly improved scores.

Analysis of Variance (ANOVA)

ANOVA is used to test the differences between the means of three or more independent groups. It helps us determine if at least one group mean is different from the others.

One-Way ANOVA

One-way ANOVA assesses one independent variable (factor) divided into different groups. The major assumptions for ANOVA include:

Independence of observations
Normality of distribution (grouped data)
Equality of variance across groups (homogeneity of variance)

Example of One-Way ANOVA

Suppose we want to test whether three different diets lead to different weight loss results.

Group A (Diet 1): 2, 3, 4
Group B (Diet 2): 6, 7, 5
Group C (Diet 3): 8, 9, 7

Calculate the group means:

Mean for A: $ \bar{x}_A = \frac{2+3+4}{3} = 3\ \bar{x}_B = \frac{6+7+5}{3} = 6\ \bar{x}_C = \frac{8+9+7}{3} = 8$

Calculate the overall mean:

Overall mean: $\bar{x} = \frac{3 + 6 + 8}{3} = 5.67$

Calculate the Sum of Squares Between (SSB) and Sum of Squares Within (SSW):

$SSB = n(\bar{x}_A - \bar{x})^2 + n(\bar{x}_B - \bar{x})^2 + n(\bar{x}_C - \bar{x})^2$ (where $n$ is the number of observations per group)
$SSW = \sum (x_{ij}-\bar{x}_i)^2$ (individual group members around their group mean)

Calculate the F-statistic:

$F = \frac{SSB/(k-1)}{SSW/(N-k)}$, where $k$ is the number of groups and $N$ is the total number of observations.

Compare with critical value from F-distribution table:

If $F$ is greater than the critical value, reject $H_0$. This suggests that there are significant differences between the means of the diets.

Two-Way ANOVA

Two-way ANOVA involves two independent variables and assesses their individual and interactive effects on a dependent variable. It considers more complexity but can provide richer insights.

Example of Two-Way ANOVA

Imagine we want to understand the effects of both diet and exercise on weight loss.

Factor A (Diet): Low-Fat, Low-Carb
Factor B (Exercise): None, Moderate, Intense
We gather weight loss data accordingly.

Outline hypotheses:

$H_0$: No effect of diet, exercise, or interaction.
$H_1$: At least one condition changes the outcome.

Calculate the sums of squares for factors A and B, and their interaction:

$SS_A, SS_B, SS_{AB}$

Calculate critical F values and compare:

Substitute the SSB and SSW into the respective means to get results for factors.
Evaluate significance for both main and interaction effects.

Effect Size

While significance testing tells us if an effect exists, effect size quantifies the size of that effect. Cohen's d is a widely used measure of effect size.

Cohen's d

Cohen's d measures the difference between two means relative to the standard deviation of the values. It gives us a context for interpreting statistical significance.

Definition

Cohen's d is defined as:

$$d = \frac{M_1 - M_2}{SD}$$

where:

$M_1$ = mean of group 1
$M_2$ = mean of group 2
$SD$ = pooled standard deviation

Guideline Boundaries

Small effect: $0.2$
Medium effect: $0.5$
Large effect: $0.8$

Example of Cohen's d

For our previous diet example:

Assume Diet 1 (Mean: 3, SD: 1) vs Diet 2 (Mean: 6, SD: 1)

$$d = \frac{3 - 6}{1} = -3$$

This indicates a large effect size as it exceeds the boundary of 0.8, suggesting a significant difference between the diets.

Conclusion

In this lesson, we have tackled several advanced topics in hypothesis testing, particularly focusing on non-parametric tests, ANOVA, and effect size. Understanding these concepts allows you to analyze data without the strict requirements of normal distribution, providing flexibility in dealing with varied datasets. Using non-parametric methods along with ANOVA enhances your ability to interpret data trends effectively. Lastly, incorporating effect size into your analysis ensures that you understand both the statistical significance and practical implications of your findings. Always remember to check the assumptions of each test before proceeding and interpret your results in context.

Study Notes

Non-parametric tests do not assume normality and can be used with ordinal data.
The Sign test checks for a single population median; the Wilcoxon signed-rank test checks for differences in paired samples.
ANOVA is useful for comparing means across multiple groups.
Always check assumptions: normality and variance before performing ANOVA.
Cohen's d provides a measure of effect size: help infer the practical significance of findings.
Use appropriate statistical software or tables for calculated p-values or critical values during hypothesis testing.