Lesson 6.2: Chi-squared Goodness-of-Fit Tests

Introduction

In statistics, the goodness-of-fit test is essential for determining how well a sample distribution fits a theoretical distribution. In this lesson, we will focus on the chi-squared goodness-of-fit test, a statistical method used to evaluate whether observed frequencies in categorical data match expected frequencies derived from a specific distribution such as binomial, Poisson, normal, or exponential distributions. This lesson will help students understand how to conduct a chi-squared goodness-of-fit test, calculate expected frequencies, determine degrees of freedom, and interpret results in context.

Learning Objectives

By the end of this lesson, students will be able to:

Conduct a chi-squared goodness-of-fit test for various distributions.
Calculate expected frequencies for proposed distributions.
Determine degrees of freedom and carry out the goodness-of-fit test.
State hypotheses and conclusions based on test results.

What is the Chi-squared Goodness-of-Fit Test?

The chi-squared goodness-of-fit test is used when you have categorical data and want to see if the frequency distribution of a variable conforms to a specific distribution. This test is based on the chi-squared statistic, which measures the discrepancy between observed and expected frequencies. The formula for the chi-squared statistic is given by:

$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$

Where:

$O_i$ is the observed frequency for each category.
$E_i$ is the expected frequency for each category.
The summation is across all categories.

Requirements for the Test

Before conducting a chi-squared goodness-of-fit test, a few conditions must be met:

The data should be in frequency counts (not percentages or proportions).
The categories or groups should be mutually exclusive.
Each expected frequency ($E_i$) should ideally be at least 5. If any expected frequency is less than 5, you should consider combining categories.

Step-by-step Guide to Performing a Chi-squared Goodness-of-Fit Test

To effectively conduct a chi-squared goodness-of-fit test, follow these steps:

Step 1: State the Hypotheses

You need to define two hypotheses:

Null Hypothesis ($H_0$): The observed frequencies fit the expected distribution (e.g., the distribution follows a binomial distribution).
Alternative Hypothesis ($H_a$): The observed frequencies do not fit the expected distribution.

Example

Suppose we want to test if a die is fair. We roll the die 60 times and get the following results:

1: 10 times
2: 8 times
3: 12 times
4: 15 times
5: 7 times
6: 8 times

We expect each number to appear 10 times if the die is fair, making the expected frequency for each side $E_i = 10$. So, our hypotheses are:

$H_0$: The die is fair.
$H_a$: The die is not fair.

Step 2: Calculate the Expected Frequencies

Expected frequencies are calculated based on the hypothesized distribution. In our die example, since we expect fair rolls, the expected frequency for each side of the die would be:

$E_1 = E_2 = E_3 = E_4 = E_5 = E_6 = 10$.

Step 3: Compute the Chi-squared Statistic

Using the formula mentioned earlier, we can calculate the chi-squared statistic for our die:

$\chi^2$ = $\frac{(10 - 10)^2}{10}$ + $\frac{(8 - 10)^2}{10}$ + $\frac{(12 - 10)^2}{10}$ + $\frac{(15 - 10)^2}{10}$ + $\frac{(7 - 10)^2}{10}$ + $\frac{(8 - 10)^2}{10}$

Calculating each term gives:

First term: $0$
Second term: $\frac{(8 - 10)^2}{10} = \frac{4}{10} = 0.4$
Third term: $\frac{(12 - 10)^2}{10} = \frac{4}{10} = 0.4$
Fourth term: $\frac{(15 - 10)^2}{10} = \frac{25}{10} = 2.5$
Fifth term: $\frac{(7 - 10)^2}{10} = \frac{9}{10} = 0.9$
Sixth term: $\frac{(8 - 10)^2}{10} = \frac{4}{10} = 0.4$

Thus, our chi-squared statistic becomes:

$$\chi^2 = 0 + 0.4 + 0.4 + 2.5 + 0.9 + 0.4 = 4.6$$

Step 4: Determine Degrees of Freedom

Degrees of freedom for a chi-squared goodness-of-fit test are calculated as:

$$\text{Degrees of Freedom} = k - 1$$

Where $k$ is the number of categories. In our example, since we have 6 sides of the die:

$$\text{Degrees of Freedom} = 6 - 1 = 5$$

Step 5: Compare with Chi-squared Distribution

Using a chi-squared distribution table, or a calculator, find the critical value for $\chi^2$ at the desired significance level (e.g., 0.05) and with the calculated degrees of freedom:

For 5 degrees of freedom and $p = 0.05$, the critical value of $\chi^2$ is approximately 11.070.

Step 6: Make a Decision

If the calculated chi-squared statistic (4.6) is less than the critical value (11.070), we do not reject the null hypothesis. Thus, our conclusion is:
There is not enough evidence to suggest that the die is unfair.

Step 7: Interpret the Results in Context

In the context of our problem, we would interpret the result by stating that the data does not provide sufficient evidence to conclude that the die does not behave as a fair die. We observed frequencies that could plausibly occur due to chance.

Common Misconceptions

Interpreting the chi-squared statistic: A higher chi-squared value indicates a poor fit, but it is the comparison with the critical value that determines the acceptance or rejection of the null hypothesis.
Sample size: The chi-squared test requires a sufficient sample size to ensure that the expected frequencies are valid; using small samples can lead to misleading results.
Mutually exclusive categories: Categories must be mutually exclusive; overlapping categories cannot be analyzed using this test.

Conclusion

The chi-squared goodness-of-fit test is a powerful tool for determining if observed categorical data fit a specified distribution. By following the outlined steps and understanding the underlying concepts, students can confidently conduct such tests and interpret their results. Remember that selecting the correct hypothesis, calculating expected frequencies, understanding degrees of freedom, and referencing the chi-squared distribution are all critical for performing accurate analysis.

Study Notes

The chi-squared goodness-of-fit test assesses the fit of observed frequencies to expected frequencies.
The chi-squared statistic is calculated using the formula: $$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$$
Degrees of freedom are calculated as: $k - 1$ where $k$ is the number of categories.
A critical value from the chi-squared distribution is necessary to make decisions about the null hypothesis.
Ensure expected frequencies are at least 5; combine categories if necessary for valid results.