Scatter Diagrams 📈

students, imagine looking at two things at once and asking: Do they seem connected? For example, do students who study more hours tend to score higher marks? Do warmer days lead to more ice cream sales? A scatter diagram helps us explore questions like these by showing how two numerical variables may be related.

In this lesson, you will learn the main ideas and vocabulary of scatter diagrams, how to interpret patterns, and how they fit into the wider study of statistics and probability. By the end, you should be able to describe relationships clearly, use mathematical language accurately, and make sensible conclusions from data. ✅

What is a Scatter Diagram?

A scatter diagram is a graph that plots paired numerical data on a Cartesian plane. Each pair of values is shown as a point. One variable is placed on the horizontal axis and the other on the vertical axis.

For example, suppose we record the number of hours studied, $x$, and the test score, $y$, for several students. Each student gives one point $(x, y)$ on the graph. The picture formed by all the points can reveal whether the variables are linked.

Scatter diagrams are useful because they help us see patterns that are not obvious in a table. A table may show the data clearly, but a graph shows the relationship quickly. This is important in IB Mathematics: Applications and Interpretation HL because statistics is not only about calculation; it is also about interpreting real data and making justified decisions.

Key terms to know:

Variable: a quantity that can change, such as height, time, or temperature.
Bivariate data: data involving two variables measured for each individual or object.
Coordinate pair: a pair of values written as $(x, y)$.
Independent variable: the variable usually placed on the $x$-axis.
Dependent variable: the variable usually placed on the $y$-axis.

The words “independent” and “dependent” are not about cause and effect all the time, but they often help describe how one variable may respond to another.

Reading Patterns from a Scatter Diagram

The first thing to look for is the direction of the relationship.

If the points generally rise from left to right, the data show a positive correlation. This means that as one variable increases, the other tends to increase too. For example, taller people often weigh more, so height and weight may show positive correlation.

If the points generally fall from left to right, the data show a negative correlation. This means that as one variable increases, the other tends to decrease. For example, the number of hours of free time might decrease as homework time increases.

If the points do not show a clear upward or downward pattern, there may be no correlation or a very weak correlation.

Another important idea is the strength of the relationship. A strong relationship means the points lie close to a line or curve. A weak relationship means the points are more spread out. The closer the points are to a clear pattern, the easier it is to make predictions.

Scatter diagrams can also show the form of the relationship:

Linear: the points follow roughly a straight-line pattern.
Non-linear: the points follow a curve or another shape.

A curved pattern may happen in real life. For example, the relationship between speed and braking distance is not linear. As speed increases, braking distance may increase more rapidly.

Finally, we should look for outliers. An outlier is a point that lies far away from the rest of the data. Outliers can be caused by measurement error, unusual conditions, or simply a rare case. They matter because they can affect conclusions, especially when finding correlation or drawing a trend line.

Correlation Does Not Mean Causation

This is one of the most important ideas in statistics, students. A scatter diagram may show that two variables are related, but that does not prove that one causes the other.

For example, there may be a positive correlation between ice cream sales and temperature. That does not mean ice cream causes hot weather. The real explanation is that a third factor, temperature, influences both. This third factor is called a lurking variable or confounding variable.

Another example is the relationship between shoe size and reading ability in young children. Bigger shoe size may seem linked to better reading, but age is the hidden variable. Older children tend to have larger feet and better reading skills.

In IB statistics, you should always be careful when interpreting scatter diagrams. A correlation can support a possible relationship, but it does not prove cause and effect.

Using a Line of Best Fit

When the data show a roughly linear trend, we may draw a line of best fit. This line should pass through the middle of the points so that there are about as many points above the line as below it.

A line of best fit helps us estimate values and make predictions. For example, if study hours and test score show a positive linear trend, the line can help estimate the expected score for a student who studies a certain number of hours.

A common model for a straight-line relationship is

$$y=mx+c$$

where $m$ is the gradient and $c$ is the intercept.

The gradient tells us how much $y$ changes when $x$ increases by $1$. In a real context, this gives the average rate of change. For example, if the model predicts a test score increase of $4$ marks for each extra hour studied, then $m=4$.

When using a line of best fit, remember these points:

The line is a model, not a perfect truth.
Predictions are more reliable within the range of the data than outside it.
Extrapolation, which means predicting beyond the data, can be risky.

For example, if your data covers ages $10$ to $16$, using the same line to predict values at age $30$ may not be sensible because the relationship could change.

Measuring Association in IB Statistics

Scatter diagrams are often used together with measures of association. One important measure is the Pearson correlation coefficient, written as $r$.

The value of $r$ lies between $-1$ and $1$:

$r=1$ means a perfect positive linear relationship.
$r=-1$ means a perfect negative linear relationship.
$r=0$ means no linear relationship.

Values close to $1$ or $-1$ indicate a strong linear relationship, while values close to $0$ indicate a weak linear relationship.

However, a correlation coefficient should always be interpreted alongside the scatter diagram. Why? Because two data sets can have the same $r$ value but very different shapes. One set may be linear, while another may have a curve or an outlier that changes the meaning.

That is why IB expects you to combine numerical evidence with graphical interpretation. A good statistics answer uses both.

Example: suppose a scatter diagram of sleep time and reaction time shows a negative trend. If $r=-0.86$, then the relationship is strong and negative. This supports the idea that more sleep is associated with faster reaction times, although it still does not prove causation.

Interpreting Real-World Data Carefully

Scatter diagrams are everywhere in real life. Scientists use them to study climate and pollution. Sports analysts use them to compare practice time and performance. Economists use them to examine income and spending. Doctors use them to investigate risk factors and health outcomes.

To interpret a scatter diagram well, ask yourself:

Is the relationship positive, negative, or none?
Is it strong or weak?
Is it linear or non-linear?
Are there any outliers?
Is the graph being used to make a prediction, and is that prediction reasonable?

Let’s look at a simple example. Suppose a school surveys students on hours of revision and exam scores. The points rise from left to right and cluster near a straight line. This suggests that more revision is associated with higher scores. If one student studies many hours but gets a very low score, that point may be an outlier. Perhaps the student was ill on the exam day. A scatter diagram helps us notice such unusual cases.

Another example is population data. Countries with higher gross domestic product may sometimes have higher life expectancy. A scatter diagram can show this broad trend, but exceptions often exist. Richer countries may still vary because health care systems, inequality, and lifestyle also matter. This shows how statistics connects with real-world complexity.

Conclusion

Scatter diagrams are a simple but powerful tool in statistics. They let us explore the relationship between two numerical variables, identify trends, notice outliers, and make careful predictions. In IB Mathematics: Applications and Interpretation HL, students, you should be able to describe the direction, strength, and form of a pattern, use a line of best fit when appropriate, and remember that correlation does not prove causation. 📊

When you combine a scatter diagram with sound reasoning, you can turn raw data into meaningful conclusions. That skill is central to statistical thinking and to making informed decisions in real life.

Study Notes

A scatter diagram plots paired numerical data as points $(x, y)$.
The variable on the horizontal axis is usually the independent variable, and the vertical axis is usually the dependent variable.
Positive correlation means both variables tend to increase together.
Negative correlation means one variable tends to increase while the other tends to decrease.
No clear pattern suggests no correlation or very weak correlation.
Strength describes how tightly the points cluster around a line or curve.
Form describes whether the relationship is linear or non-linear.
Outliers are points far from the main pattern.
Correlation does not mean causation.
A lurking variable can affect both variables and create a misleading relationship.
A line of best fit helps estimate and predict values, usually with the model $y=mx+c$.
Predictions are safer within the range of the data than outside it.
The correlation coefficient $r$ satisfies $-1\text{ }leq r\text{ }leq 1$ and measures linear association.
Always interpret the graph and the numerical measure together for a full statistical conclusion.