1. Exploring One-Variable Data

Summary Statistics For A Quantitative Variable

Summary Statistics for a Quantitative Variable

Every day, data help people make decisions 🎯. A coach looks at players’ times, a school checks test scores, and a doctor studies blood pressure. When the data are numerical, AP Statistics uses summary statistics to describe the center and spread of the data, and to spot unusual values. In this lesson, students, you will learn what summary statistics mean, why they matter, and how to use them correctly for a quantitative variable.

What summary statistics do

A quantitative variable is a variable that measures or counts something with numbers, such as height, time, temperature, or number of siblings. Summary statistics turn a long list of numbers into a smaller set of useful facts.

The main goals are to describe:

  • Center: a typical or middle value
  • Spread: how much the data vary
  • Shape: how the data are arranged
  • Unusual values: values that stand out

For AP Statistics, the most common summary statistics for one quantitative variable are the mean, median, quartiles, range, interquartile range, variance, and standard deviation.

For example, suppose a class records the number of minutes each student spends reading in one night: $12, 15, 15, 18, 20, 22, 30. These numbers can be summarized with statistics instead of listing every value each time. That makes it easier to compare groups and make conclusions.

A good summary must match the shape of the data. If the data are roughly symmetric, the mean is often a good measure of center. If the data are skewed or have outliers, the median is usually better because it is resistant to extreme values.

Measures of center: mean and median

The mean is the arithmetic average:

$$\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n}$$

Here, $\bar{x}$ means the sample mean, and $n$ is the number of data values. The mean uses every value in the data set, so it is sensitive to unusual values.

Example: If quiz scores are $70, 80, 80, 90, then

$$\bar{x} = \frac{70 + 80 + 80 + 90}{4} = 80$$

The median is the middle value when the data are ordered. If there are two middle values, the median is the average of those two values.

Example: For the ordered data $3, 5, 8, 10, 12$, the median is $8$. For $3, 5, 8, 10, the median is

$$\frac{5 + 8}{2} = 6.5$$

The median is resistant, which means a very large or very small value does not pull it strongly. That makes it helpful when data are skewed.

A useful AP Statistics comparison is this:

  • Symmetric data: use the mean and standard deviation
  • Skewed data or data with outliers: use the median and interquartile range

This is a major idea in exploring one-variable data because it helps you choose the right summary for the shape of the distribution.

Measures of spread: range, quartiles, and interquartile range

Center alone is not enough. Two groups can have the same mean but very different variability. Spread tells how close or far the values are from each other.

The range is the difference between the largest and smallest values:

$$\text{Range} = \text{max} - \text{min}$$

Example: For $4, 7, 8, 10, 13$, the range is $13 - 4 = 9.

The quartiles divide ordered data into four parts:

  • $Q_1$: the first quartile, or 25th percentile
  • $Q_2$: the median, or 50th percentile
  • $Q_3$: the third quartile, or 75th percentile

The interquartile range is

$$\text{IQR} = Q_3 - Q_1$$

The IQR measures the spread of the middle 50% of the data and is resistant to outliers.

Example: Suppose the ordered data are $2, 4, 6, 8, 10, 12, 14, 16. Then the median is

$$Q_2 = \frac{8 + 10}{2} = 9$$

The lower half is $2, 4, 6, 8, so

$$Q_1 = \frac{4 + 6}{2} = 5$$

The upper half is $10, 12, 14, 16, so

$$Q_3 = \frac{12 + 14}{2} = 13$$

Thus,

$$\text{IQR} = 13 - 5 = 8$$

The IQR is especially useful in boxplots, which display the five-number summary:

$$\text{minimum},\ Q_1,\ \text{median},\ Q_3,\ \text{maximum}$$

Boxplots are a strong AP Statistics tool because they make it easier to compare centers and spreads across groups.

Variance and standard deviation

While the range and IQR describe spread in simple ways, the standard deviation gives a more detailed measure of how far data values typically are from the mean. The variance is the average squared distance from the mean.

For a sample, the variance is

$$s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$$

and the sample standard deviation is

$$s = \sqrt{s^2}$$

Why is there an $n-1$ instead of $n$? In AP Statistics, the sample variance uses $n-1$ because it is based on sample data and helps produce a better estimate of the population variability.

Example: For the data $2, 4, 4, 6,

$$\bar{x} = \frac{2 + 4 + 4 + 6}{4} = 4$$

Now find the squared distances from the mean:

$$(2-4)^2 = 4,\quad (4-4)^2 = 0,\quad (4-4)^2 = 0,\quad (6-4)^2 = 4$$

So,

$$s^2 = \frac{4+0+0+4}{4-1} = \frac{8}{3}$$

and

$$s = \sqrt{\frac{8}{3}} \approx 1.63$$

Standard deviation is in the same units as the original data, while variance is in squared units. That is why standard deviation is usually easier to interpret.

A common interpretation is that a smaller standard deviation means the data are clustered more tightly around the mean, while a larger standard deviation means the data are more spread out 📊.

Shape, outliers, and choosing the right summary

A distribution’s shape matters when describing it. The three main shapes you will see are:

  • Symmetric: left and right sides are roughly balanced
  • Skewed right: a long tail goes to the right
  • Skewed left: a long tail goes to the left

Outliers are values that are unusually far from the rest of the data. They can affect the mean and standard deviation a lot, but they usually have less effect on the median and IQR.

A standard AP Statistics rule for outliers uses the IQR:

$$\text{Lower fence} = Q_1 - 1.5(\text{IQR})$$

$$\text{Upper fence} = Q_3 + 1.5(\text{IQR})$$

Any value below the lower fence or above the upper fence may be considered an outlier.

Example: If $Q_1 = 10$ and $Q_3 = 18$, then

$$\text{IQR} = 18 - 10 = 8$$

The fences are

$$10 - 1.5(8) = -2$$

$$18 + 1.5(8) = 30$$

So values greater than $30$ would be possible outliers.

When you write a summary in AP Statistics, use context. Instead of saying “the median is $20$,” say “the median number of minutes spent reading is $20$ minutes.” Context makes your interpretation complete and meaningful.

Connecting summary statistics to AP Statistics reasoning

Summary statistics are not just calculations; they are tools for making decisions. students, in AP Statistics you should be able to connect numbers to what they mean in the real world.

For instance, imagine two schools report exam scores:

  • School A: mean $= 82$, standard deviation $= 4$
  • School B: mean $= 82$, standard deviation $= 12$

The means are the same, so the typical score is similar. But School A’s scores are much more consistent, while School B’s scores vary much more. This matters if a teacher wants to know whether most students are near the same level.

Another example: Suppose the heights of two plants are summarized.

  • Group 1: median $= 15$ cm, IQR $= 2$ cm
  • Group 2: median $= 15$ cm, IQR $= 8$ cm

Both groups have the same center, but Group 1 is more uniform. This kind of comparison is common when comparing distributions in one-variable data.

AP Statistics often asks you to describe a distribution using SOCS:

  • Shape
  • Outliers
  • Center
  • Spread

Summary statistics provide the center and spread parts of that description. They also help you choose graphs and make comparisons between groups.

Conclusion

Summary statistics for a quantitative variable give a concise picture of a data set. The mean and median describe center, the range, IQR, variance, and standard deviation describe spread, and quartiles help identify the middle part of the data and possible outliers. The best summary depends on the shape of the distribution and whether unusual values are present. In AP Statistics, these ideas are important because they help you describe, compare, and interpret real data clearly and accurately âś….

Study Notes

  • A quantitative variable is numerical and measures or counts something.
  • Use the mean for roughly symmetric data without strong outliers.
  • Use the median for skewed data or data with outliers because it is resistant.
  • The range is $\text{max} - \text{min}$.
  • The interquartile range is $\text{IQR} = Q_3 - Q_1$ and describes the middle $50\%$ of the data.
  • The sample variance is $s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$.
  • The sample standard deviation is $s = \sqrt{s^2}$.
  • Standard deviation measures typical distance from the mean in the original units.
  • Quartiles split ordered data into four parts: $Q_1$, $Q_2$, and $Q_3$.
  • Outliers may be identified using fences: $Q_1 - 1.5(\text{IQR})$ and $Q_3 + 1.5(\text{IQR})$.
  • Always describe statistics in context, such as minutes, dollars, or scores.
  • Summary statistics help you compare distributions and make AP Statistics reasoning more precise.

Practice Quiz

5 questions to test your understanding

Summary Statistics For A Quantitative Variable — AP Statistics | A-Warded