Normal Distribution
Introduction
Hi students, in this lesson you will learn one of the most important ideas in statistics: the normal distribution 📊. Many real-world measurements, such as heights, test scores, reaction times, and manufacturing sizes, often cluster around a middle value and become less common as you move farther away from that center. This pattern is so common that statisticians give it a special name: the normal distribution.
Learning objectives
By the end of this lesson, you should be able to:
- explain the main ideas and vocabulary of the normal distribution,
- apply IB Mathematics: Applications and Interpretation SL methods involving the normal distribution,
- connect normal distribution to statistical analysis and probability,
- summarize why the normal distribution is useful in real-world decision-making,
- use evidence and examples to interpret normal distribution problems.
The normal distribution helps us describe data, compare groups, and make predictions when we understand how values are spread around the mean. It is a key tool in the broader study of statistics and probability because it turns data patterns into usable models.
What the normal distribution looks like
The normal distribution is a continuous probability distribution shaped like a symmetric bell curve. Its center is the mean $\mu$, and its spread is controlled by the standard deviation $\sigma$. A smaller $\sigma$ makes the curve narrow and tall, while a larger $\sigma$ makes it wider and flatter.
A normal distribution is completely described by two parameters: $\mu$ and $\sigma$. We often write this as $X \sim N(\mu, \sigma^2)$, meaning the random variable $X$ follows a normal distribution with mean $\mu$ and variance $\sigma^2$.
Important features of the normal curve include:
- symmetry about $\mu$,
- mean, median, and mode being equal,
- total area under the curve being $1$,
- probabilities represented by areas under the curve.
Because the curve is continuous, the probability that $X$ equals exactly one value is $0$. So we talk about intervals such as $P(X < 70)$ or $P(60 \le X \le 80)$ rather than $P(X = 70)$.
Example idea
Suppose the heights of a large group of students follow $X \sim N(170, 6^2)$. The mean height is $170$ cm, and the standard deviation is $6$ cm. Most students will be near $170$ cm, and fewer will be very short or very tall.
Main terminology and meaning
To use normal distribution correctly, students, you need the key vocabulary.
Mean and standard deviation
The mean $\mu$ is the center of the distribution. In a normal model, it is the balance point of the curve. The standard deviation $\sigma$ measures the typical distance of values from the mean.
For example, if two classes both have mean test score $75$, but one class has $\sigma = 4$ and the other has $\sigma = 12$, the second class has much more spread. That means scores are more varied.
Symmetry
A normal distribution is perfectly symmetric. If a value is the same distance above the mean as another value is below the mean, the probabilities are balanced. For instance, in $N(100, 15^2)$, values at $85$ and $115$ are equally likely because they are both $15$ units from the mean.
Standardization
To compare values from different normal distributions, we convert to a standard normal variable using the $z$-score:
$$z = \frac{x - \mu}{\sigma}$$
This tells us how many standard deviations a value $x$ is from the mean. A positive $z$ means the value is above the mean, and a negative $z$ means it is below the mean.
For example, if a score of $88$ comes from a distribution with $\mu = 80$ and $\sigma = 4$, then
$$z = \frac{88 - 80}{4} = 2$$
So the score is $2$ standard deviations above the mean.
Probability with the normal distribution
One of the most useful ideas in statistics is that probabilities can be found from areas under the curve. For the normal distribution, these areas are usually found using technology, calculators, or standard normal tables.
Basic probability statements
If $X \sim N(\mu, \sigma^2)$, then:
- $P(X < a)$ is the area to the left of $a$,
- $P(X > a)$ is the area to the right of $a$,
- $P(a < X < b)$ is the area between $a$ and $b$.
Because the curve is symmetric, the area to the left of the mean is $0.5$, and the area to the right is also $0.5$.
Using technology or tables
IB Mathematics: Applications and Interpretation SL often expects you to interpret values rather than only calculate by hand. In practice, you may use a calculator function for normal cumulative probabilities. What matters is understanding what the calculator output means.
For example, if $X \sim N(50, 10^2)$ and you want $P(X < 60)$, you are asking for the area to the left of $60$. Since $60$ is one standard deviation above the mean, this probability is greater than $0.5$.
Worked example
Let $X \sim N(100, 15^2)$. Find $P(85 < X < 115)$.
First, notice that $85 = 100 - 15$ and $115 = 100 + 15$, so the interval is exactly within one standard deviation of the mean on both sides. Converting to $z$-scores:
$$z_1 = \frac{85 - 100}{15} = -1$$
$$z_2 = \frac{115 - 100}{15} = 1$$
So the problem becomes finding the area between $z = -1$ and $z = 1$ in the standard normal distribution. This area is about $0.6827$, or $68.27\%$.
This means about $68\%$ of values lie within one standard deviation of the mean in a normal distribution. That is a famous rule known as the empirical rule.
The empirical rule and interpreting spread
The empirical rule gives quick estimates for a normal distribution:
- about $68\%$ of values lie within $\mu \pm 1\sigma$,
- about $95\%$ of values lie within $\mu \pm 2\sigma$,
- about $99.7\%$ of values lie within $\mu \pm 3\sigma$.
These values are approximations, not exact rules, but they are very useful for estimating how unusual a value is.
Real-world example
Imagine SAT-style scores that are approximately normal with mean $500$ and standard deviation $100$. A score of $700$ is $2$ standard deviations above the mean because
$$z = \frac{700 - 500}{100} = 2$$
Using the empirical rule, a score this high is fairly uncommon because only about $2.5\%$ of values are above $\mu + 2\sigma$ in a normal distribution.
This helps schools, employers, and researchers judge whether a result is unusual or typical.
Why the normal distribution matters in statistics and probability
Normal distribution is not just a formula. It is a model for understanding data and making decisions. In statistics, we often collect samples from a population and look for patterns. If the data are approximately normal, we can make stronger conclusions about likely values, unusual observations, and comparisons between groups.
Connection to data analysis
When analyzing data, students, you may use histograms, boxplots, and summary statistics to judge whether a set of values looks approximately normal. A normal-shaped histogram is symmetric and has one clear peak near the center.
If the data are skewed, the normal model may not fit well. For example, income data are often right-skewed, so a normal distribution is usually not a good model for income. However, heights of adults in a population are often close to normal.
Connection to inferential reasoning
In inferential statistics, we use sample data to draw conclusions about populations. Normal distribution is useful because many statistical methods rely on normal models or on approximations that become normal when sample sizes are large. This makes normal distribution an important bridge between raw data and decision-making.
Example of interpretation
A company makes bolts that should be $10$ mm long. If bolt lengths are approximately normal with mean $10$ mm and small standard deviation, most bolts will be close to the target. A few will be too short or too long. The company can use normal distribution to estimate the proportion of faulty bolts and decide whether the production process needs improvement.
Limitations and careful use
It is important to use the normal distribution only when it is a reasonable model. Real data are not always exactly normal. Some distributions are skewed, have multiple peaks, or contain outliers that break the normal shape.
A strong understanding of context matters. Ask:
- Is the variable continuous?
- Is the distribution roughly symmetric?
- Is there evidence of a bell-shaped pattern?
- Does the situation justify a normal model?
For example, exam scores may be approximately normal, but the number of cars in a household is discrete and cannot be modeled well by a normal distribution.
Conclusion
The normal distribution is one of the central ideas in statistics and probability because it gives a simple way to model many real-world measurements. Its bell-shaped curve is defined by the mean $\mu$ and standard deviation $\sigma$, and probabilities are found as areas under the curve. By using $z$-scores, the empirical rule, and technology, students, you can interpret data, compare outcomes, and make informed decisions.
In IB Mathematics: Applications and Interpretation SL, the normal distribution connects data analysis, probability models, and inferential reasoning. It helps you understand not only what values are typical, but also what values are unusual and why that matters in real situations.
Study Notes
- The normal distribution is a continuous, symmetric, bell-shaped distribution.
- It is written as $X \sim N(\mu, \sigma^2)$.
- The mean $\mu$ is the center, and the standard deviation $\sigma$ controls spread.
- The total area under the curve is $1$.
- Probabilities are areas under the curve, such as $P(X < a)$ or $P(a < X < b)$.
- A single exact value has probability $0$ in a continuous distribution.
- The $z$-score formula is $z = \frac{x - \mu}{\sigma}$.
- The standard normal distribution has mean $0$ and standard deviation $1$.
- The empirical rule says about $68\%$, $95\%$, and $99.7\%$ of values lie within $1\sigma$, $2\sigma$, and $3\sigma$ of the mean.
- Normal distribution is useful for measuring typical values, unusual values, and real-world variation.
- Always check whether a normal model is reasonable before using it.
