Lesson 5.2: Quartiles and the Interquartile Range
Introduction
In statistics, understanding how data is distributed is essential for interpreting averages accurately. Two datasets can share the same mean, yet behave very differently if their data values are spread out over different ranges. In this unit, we will delve into the concepts of quartiles and the interquartile range (IQR), which are key measures of spread within a dataset.
Learning Objectives
- Splitting ordered data into quarters to find the lower and upper quartiles.
- Understanding the interquartile range (IQR) as the spread of the middle half of the data.
- Learning to read quartiles from cumulative frequency distributions.
- Discovering why the interquartile range is more robust to extreme values compared to the range itself.
- Finding the lower and upper quartiles of a dataset.
What are Quartiles?
Quartiles are values that divide a dataset into four equal parts. When a dataset is ordered from smallest to largest, you can calculate three quartiles:
- First Quartile (Q1): This is the value at the 25th percentile, which means 25% of the data falls below this point.
- Second Quartile (Q2): Also known as the median, this is the value at the 50th percentile, splitting the dataset into two equal halves.
- Third Quartile (Q3): This is the value at the 75th percentile, meaning 75% of the data falls below this point.
To find these quartiles, follow these steps:
- Order the data from smallest to largest.
- Locate the median (Q2). If there is an even number of observations, then the median will be the average of the two middle values.
- Determine Q1 by finding the median of the lower half of the data.
- Determine Q3 by finding the median of the upper half of the data.
Example Calculation of Quartiles
Let's start with an example dataset: 3, 7, 8, 12, 13, 14, 18, 21, 23, 26.
- Order the data: (it is already ordered)
- Calculate Q2: The middle value is the average of the 5th and 6th numbers:
$$ Q2 = \frac{13 + 14}{2} = 13.5 $$
- Calculate Q1: The lower half of the data is 3, 7, 8, 12, 13. The median of this is the 3rd number, which is 8.
$$ Q1 = 8 $$
- Calculate Q3: The upper half of the data is 14, 18, 21, 23, 26. The median of this is the 3rd number, which is 21.
$$ Q3 = 21 $$
Thus, for our dataset, we find:
$- First Quartile (Q1) = 8$
$- Second Quartile (Q2) = 13.5$
$- Third Quartile (Q3) = 21$
The Interquartile Range (IQR)
The interquartile range is a measure of variability that describes the range within which the central 50% of data points fall. It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1):
$$ IQR = Q3 - Q1 $$
Example Calculation of the IQR
Using the quartiles we calculated in the previous example:
$- Q1 = 8$
$- Q3 = 21$
So, the IQR would be:
$$ IQR = 21 - 8 = 13 $$
This means that the middle half of the data values spans 13 units. The IQR is a useful tool because it is less affected by extreme values (outliers) than other measures of spread, such as the range.
Why Use the IQR?
The interquartile range is particularly useful because it gives a clearer picture of a data set's variability than the simple range. To illustrate:
- Consider the dataset: 1, 2, 3, 4, 100. Here, the average is heavily influenced by the outlier (100). While the range is 99 (100 - 1), the IQR will be less influenced by this extreme value. For this dataset:
- Q1 = 2 (median of lower half)
- Q3 = 4 (median of upper half)
- IQR = 4 - 2 = 2
In this case, the IQR tells us that the spread of the middle half of the data is quite small, suggesting that most values are clustered together, despite the large outlier.
Reading Quartiles from Cumulative Frequency
Cumulative frequency can also be a helpful tool in finding quartiles, especially when dealing with large datasets.
- Create a cumulative frequency table from the ordered dataset.
- Use the following to find Q1, Q2, and Q3:
- For Q1, find the value at position $ \frac{N}{4} $ in the cumulative frequency.
- For Q2, find the value at position $ \frac{N}{2} $.
- For Q3, find the value at position $ \frac{3N}{4} $.
Where $ N $ is the total number of observations.
Example of Cumulative Frequency
Consider the following dataset: 2, 3, 5, 7, 8, 8, 9, 10, 12.
First, calculate the cumulative frequency:
- At 2: 1
- At 3: 2
- At 5: 3
- At 7: 4
- At 8: 6
- At 9: 7
- At 10: 8
- At 12: 9
Now, we have:
- Total observations $ N = 9 $
- For $ Q1 = \frac{9}{4} = 2.25 $ → Since we look for the value above this on the cumulative frequency table, we find the value is at 3.
- For $ Q2 = \frac{9}{2} $ which is 4.5 → The value above that is 7.
- For $ Q3 = \frac{3 \times 9}{4} = 6.75 $ → The value above this cumulative point is at 9.
Thus, we find:
$- Q1 = 3$
$- Q2 = 7$
$- Q3 = 9$
Conclusion
In summary, quartiles and the interquartile range are crucial measures in understanding the spread and variability of datasets. The quartiles give us insights into how the data is distributed by identifying key thresholds, whereas the IQR helps to focus on the central portion of the data while minimizing the influence of extreme values. This unit has equipped you with the ability to calculate quartiles and understand their utility in descriptive statistics.
Study Notes
- Quartiles split data into four equal parts (Q1 = 25th percentile, Q2 = median, Q3 = 75th percentile).
- IQR is calculated as $ IQR = Q3 - Q1 $, representing the spread of the middle 50% of data.
- Cumulative frequency tables allow for easier calculation of quartiles in larger datasets.
- The IQR is less affected by outliers compared to the range.
