4. Statistics and Probability

Cumulative Frequency

Cumulative Frequency

Welcome, students 👋 In this lesson, you will learn how cumulative frequency helps us organize data, spot patterns, and make sense of large sets of numbers. By the end, you should be able to explain what cumulative frequency means, read and build cumulative frequency tables and graphs, and use them to estimate values such as medians and quartiles. These ideas are important in statistics because they turn raw data into information that is easier to compare and interpret.

In real life, cumulative frequency appears in exam score reports, survey results, sports statistics, and even delivery times for online orders. For example, if a school wants to know how many students scored at or below a certain mark, cumulative frequency gives that answer quickly. It is a simple but powerful tool in data analysis 📊

What cumulative frequency means

Cumulative frequency is the running total of frequencies. A frequency tells us how many times a value or class interval occurs. A cumulative frequency adds frequencies together as we move through the data in order.

Suppose the scores in a test are grouped into intervals:

  • $0$ to $9$
  • $10$ to $19$
  • $20$ to $29$
  • $30$ to $39$

If the frequencies are $3$, $5$, $8$, and $4$, then the cumulative frequencies are:

  • $3$
  • $3+5=8$
  • $3+5+8=16$
  • $3+5+8+4=20$

This means $8$ students scored below $20$, and $16$ students scored below $30$. The final cumulative frequency is the total number of data values, which is $20$.

The key idea is that cumulative frequency answers questions such as “How many are at most this value?” or “How many are less than this class boundary?” These are very common in statistics and probability because they help describe the distribution of data.

Building a cumulative frequency table

A cumulative frequency table is made by listing data in order and adding the frequencies step by step. This is especially useful when data are grouped into classes, because grouped data are common in IB Mathematics: Applications and Interpretation SL.

Here is a simple example of test marks for a class:

| Mark interval | Frequency | Cumulative frequency |

|---|---:|---:|

| $0$-$9$ | $4$ | $4$ |

| $10$-$19$ | $6$ | $10$ |

| $20$-$29$ | $9$ | $19$ |

| $30$-$39$ | $5$ | $24$ |

| $40$-$49$ | $2$ | $26$ |

The cumulative frequency column is built by adding each new frequency to the total before it. For example, the cumulative frequency for $20$-$29$ is $4+6+9=19$.

When reading the table, we can interpret it directly. For instance, $19$ students scored below $30$, and $26$ students took the test in total. If a question asks how many students scored at least $30$, then we use the total and subtract the number below $30$:

$$26-19=7$$

This is a good example of how cumulative frequency supports data interpretation in a practical way. ✅

Cumulative frequency graphs and ogives

A cumulative frequency graph is often called an ogive. It shows cumulative frequency against the upper class boundaries of grouped data. The graph is usually increasing because totals only go up as we move through the data.

To draw an ogive:

  1. Find the cumulative frequencies.
  2. Plot each cumulative frequency against the upper class boundary.
  3. Start from the lower boundary with cumulative frequency $0$.
  4. Join the points with a smooth curve or line segments, depending on the context.

For the table above, the plotted points would be based on the upper boundaries $9.5$, $19.5$, $29.5$, $39.5$, and $49.5$, with cumulative frequencies $4$, $10$, $19$, $24$, and $26$. The starting point would be at the lower boundary $-0.5$ with cumulative frequency $0$.

Why do we use class boundaries instead of class labels? Because grouped data represent intervals, and class boundaries make the graph continuous. This is important in IB statistics because it allows sensible estimation between values.

An ogive is useful because you can read medians, quartiles, and percentiles from it. For example, the median is the value at the $\frac{n}{2}$th position, where $n$ is the total frequency. If $n=26$, then the median is around the $13$th data value. On the graph, you locate cumulative frequency $13$, move across to the curve, then down to the horizontal axis to estimate the median mark.

Using cumulative frequency to estimate median and quartiles

Cumulative frequency is especially helpful when exact data values are hidden inside grouped intervals. In that case, you estimate the median and quartiles by using the cumulative frequency graph or table.

Let $n$ be the total frequency.

  • The median is at position $\frac{n}{2}$.
  • The lower quartile $Q_1$ is at position $\frac{n}{4}$.
  • The upper quartile $Q_3$ is at position $\frac{3n}{4}$.

For the class example with $n=26$:

  • $Q_1$ is around the $6.5$th value.
  • The median is around the $13$th value.
  • $Q_3$ is around the $19.5$th value.

Since the data are grouped, these positions are estimated using the cumulative frequency curve. If the cumulative frequencies are $4$, $10$, $19$, $24$, and $26$, then:

  • $Q_1$ lies in the $10$-$19$ interval because $4<6.5\leq 10$
  • the median lies in the $20$-$29$ interval because $10<13\leq 19$
  • $Q_3$ lies in the $20$-$29$ interval because $10<19.5\leq 19$ is not true, so it lies in the $30$-$39$ interval because $19<19.5\leq 24$

This shows how cumulative frequency helps locate important summary values. These summary values are part of statistical analysis because they tell us about center and spread. For example, the interquartile range is $Q_3-Q_1$ and measures the spread of the middle $50\%$ of the data.

Real-world interpretation and decision-making

Cumulative frequency is not just about drawing graphs. It also helps people make decisions using data. Imagine a coach analyzing sprint times for a team. If the coach wants to know how many athletes ran $100$ m in under $13$ seconds, cumulative frequency gives that count quickly. A company might use it to study delivery times and answer questions like, “What percentage of deliveries arrive within $2$ days?”

Suppose a store tracks customer waiting times in minutes:

| Waiting time | Frequency | Cumulative frequency |

|---|---:|---:|

| $0$-$2$ | $7$ | $7$ |

| $3$-$5$ | $12$ | $19$ |

| $6$-$8$ | $9$ | $28$ |

| $9$-$11$ | $4$ | $32$ |

If the total number of customers is $32$, then the number who waited at most $5$ minutes is $19$. The proportion is

$$\frac{19}{32}$$

and the percentage is

$$\frac{19}{32}\times 100\%\approx 59.4\%$$

This is a clear example of using statistical evidence to understand performance. In the IB course, this kind of reasoning is valuable because it connects data handling with interpretation and real-world context.

Cumulative frequency also helps compare distributions. If one class has a curve that rises faster, it means more students have lower values. If another curve rises more slowly at first but catches up later, its data are spread differently. Comparing cumulative frequency graphs can reveal differences that are not obvious from raw lists of numbers.

Common mistakes to avoid

When working with cumulative frequency, students, watch out for these mistakes:

  • mixing up frequency and cumulative frequency
  • forgetting that cumulative frequency must increase or stay the same
  • plotting the graph against class midpoints instead of upper class boundaries
  • using the wrong total when finding median or quartiles
  • reading values from the graph without checking whether the question asks for “less than” or “at most”

Another important point is that cumulative frequency is based on ordered data. If the data are not in order, the running total would not describe the distribution correctly. Always sort the data into ascending order or use class intervals that are already ordered.

Conclusion

Cumulative frequency is a core idea in statistics because it organizes data into running totals and makes patterns easier to understand. It helps you build tables, draw ogives, estimate median and quartiles, and answer real-world questions about how data are spread across intervals. In IB Mathematics: Applications and Interpretation SL, it is especially useful because it connects raw data to interpretation and decision-making.

If you remember one thing, remember this: cumulative frequency tells you how many data values are at or below a certain point. That simple idea is powerful in data analysis, probability reasoning, and practical problem-solving. Keep practicing with tables and graphs, students, and you will become much more confident with statistical interpretation 😊

Study Notes

  • Cumulative frequency is the running total of frequencies.
  • It is used to answer questions like “How many values are at most this amount?”
  • The final cumulative frequency equals the total number of data values.
  • A cumulative frequency graph is called an ogive.
  • Ogives are drawn using upper class boundaries and cumulative frequencies.
  • Cumulative frequency helps estimate the median, quartiles, and percentiles.
  • For grouped data, the median is at position $\frac{n}{2}$, $Q_1$ is at position $\frac{n}{4}$, and $Q_3$ is at position $\frac{3n}{4}$.
  • Cumulative frequency is useful in real life for test scores, waiting times, delivery times, and sports data.
  • It supports statistical interpretation by showing how data accumulate across intervals.
  • It is a key part of data analysis in IB Mathematics: Applications and Interpretation SL.

Practice Quiz

5 questions to test your understanding