4. Statistics and Probability

The Normal Distribution

The Normal Distribution

Introduction

students, imagine measuring the heights of everyone in a school. Most students are close to the average height, while very short or very tall students are less common 📊. This kind of pattern appears in many real situations, from test scores to reaction times to measurement errors. The normal distribution is one of the most important models in statistics because it helps us describe and predict data that cluster around a center.

In this lesson, you will learn:

  • what the normal distribution looks like and why it matters,
  • key terms such as mean, standard deviation, and standard normal distribution,
  • how to use the $z$-score to compare values,
  • how to find probabilities using the normal distribution,
  • how this topic connects to the wider study of statistics and probability.

The normal distribution is a major tool in IB Mathematics: Analysis and Approaches HL because it links data description with probability. It helps turn real data into mathematical predictions.

What the Normal Distribution Is

A normal distribution is a continuous probability distribution with a bell-shaped curve. Its shape is symmetric about the mean, so the left and right sides are mirror images. The mean, median, and mode are all equal in a perfect normal distribution.

Two parameters define it:

  • the mean $\mu$, which gives the center of the distribution,
  • the standard deviation $\sigma$, which measures the spread.

We write a normal distribution as $X \sim N(\mu, \sigma^2)$, where $X$ is a random variable. Notice that the variance is $\sigma^2$, not $\sigma$. This notation tells us the full distribution of the variable.

If the standard deviation is small, the curve is narrow and tall because the values are tightly clustered. If the standard deviation is large, the curve is wider and flatter because the data are more spread out.

A normal curve has total area $1$, which represents the entire probability of all possible outcomes. Since it is continuous, the probability that $X$ equals exactly one value is $0$. Instead, probabilities are found over intervals such as $P(60 < X < 70)$.

The normal distribution is important because many natural and human-made measurements are approximately normal. Examples include exam marks in a large group, adult heights, and repeated measurement errors in science experiments 🔬.

Key Features and Terminology

When studying the normal distribution, students, it helps to know the main vocabulary.

Mean and standard deviation

The mean $\mu$ is the balance point of the distribution. It tells us where the center lies. The standard deviation $\sigma$ tells us how far values typically are from the mean.

Symmetry

Because the normal distribution is symmetric, probabilities on one side of the mean match probabilities on the other side. For example, if $X \sim N(50, 9)$, then the chance of scoring more than $50$ is the same as the chance of scoring less than $50$.

Inflection points

The curve changes concavity at $\mu - \sigma$ and $\mu + \sigma$. These points help show where the steepest part of the curve occurs.

Empirical rule

For a normal distribution, about:

  • $68\%$ of values lie within $\mu \pm \sigma$,
  • $95\%$ of values lie within $\mu \pm 2\sigma$,
  • $99.7\%$ of values lie within $\mu \pm 3\sigma$.

This is a useful estimate, but it is not exact. It gives a quick way to judge whether a value is unusual.

For example, if test scores are approximately normal with mean $70$ and standard deviation $5$, then about $95\%$ of scores lie between $60$ and $80$. A score of $85$ is more unusual because it is $3$ standard deviations above the mean.

Using the Standard Normal Distribution

To compare values from different normal distributions, we convert them to the standard normal distribution. This distribution has mean $0$ and standard deviation $1$ and is written as $Z \sim N(0,1)$.

The key formula is the $z$-score:

$$z = \frac{x - \mu}{\sigma}$$

This tells us how many standard deviations a value $x$ is above or below the mean.

  • If $z > 0$, the value is above the mean.
  • If $z < 0$, the value is below the mean.
  • If $z = 0$, the value equals the mean.

Example 1

Suppose exam scores are distributed as $X \sim N(72, 8^2)$. What is the $z$-score of a student who scored $84$?

Using the formula:

$$z = \frac{84 - 72}{8} = \frac{12}{8} = 1.5$$

So the score of $84$ is $1.5$ standard deviations above the mean. This helps us compare it with values from other tests, even if those tests have different means and spreads.

The standard normal distribution is often used with tables or technology to find probabilities. In IB, you may use a calculator to evaluate normal probabilities directly, but the $z$-score remains the key idea.

Finding Probabilities and Percentiles

Probabilities under a normal curve are areas under the graph. For example, $P(X < a)$ means the area to the left of $a$, and $P(X > b)$ means the area to the right of $b$.

Example 2

Let $X \sim N(100, 15^2)$. Find the probability that a value is less than $85$.

First, standardize the value:

$$z = \frac{85 - 100}{15} = -1$$

So we want $P(Z < -1)$. From standard normal values, this is about $0.1587$. Therefore,

$$P(X < 85) \approx 0.1587$$

This means about $15.87\%$ of values are below $85$.

Example 3

Now find the probability that $X$ lies between $85$ and $115$.

Standardize both values:

$$z_1 = \frac{85 - 100}{15} = -1$$

$$z_2 = \frac{115 - 100}{15} = 1$$

So we want $P(-1 < Z < 1)$. The area is about $0.6826$, which matches the empirical rule’s $68\%$ estimate.

Percentiles are another important idea. A percentile is a value below which a given percentage of the data fall. If a student is at the $90$th percentile, then about $90\%$ of values are below that student’s score.

To find a percentile, you work backward from a probability to a value. This often uses technology or inverse normal methods. For instance, if the top $5\%$ of scores are rewarded, you look for the value with $P(X < x) = 0.95$.

Real-World Interpretation and Connection to Statistics

students, the normal distribution is not just a graph; it is a model for understanding data. In statistics, models help summarize and predict what might happen in a larger population based on samples.

If a data set looks roughly normal, you can use the mean and standard deviation to describe it efficiently. This is much better than listing every value. For example, a teacher might summarize class marks by saying the scores are approximately normal with $\mu = 68$ and $\sigma = 12$.

This connects to the broader topic of statistics and probability in several ways:

  • Data collection and statistical description: the normal distribution helps describe patterns in collected data.
  • Regression and correlation: residuals in regression are often assumed to be approximately normally distributed in many applications.
  • Conditional probability and Bayes’ theorem: normal models can be used in decision-making when probabilities depend on thresholds.
  • Discrete and continuous probability distributions: the normal distribution is a continuous distribution, unlike the binomial distribution, which is discrete.

A common real-world use is quality control in manufacturing 🏭. Suppose a machine makes metal rods with lengths approximately normally distributed around a target length. If some rods are far from the target, they may be rejected. The normal model helps estimate the chance of defects.

Another use is in science experiments. Measurement errors often follow an approximately normal pattern because many small random effects combine together. This idea links to the Central Limit Theorem, which explains why averages of large samples often become approximately normal even when the original data are not.

Important IB Reasoning Skills

In IB Mathematics: Analysis and Approaches HL, students, you are expected not only to calculate but also to interpret results carefully.

When solving normal distribution problems:

  1. Identify the distribution as $N(\mu, \sigma^2)$.
  2. Convert values to $z$-scores if needed.
  3. Decide whether the question asks for a left tail, right tail, or middle interval.
  4. Use technology or tables correctly.
  5. Interpret the probability in context.

For example, if a hospital measures waiting times that are normally distributed with $\mu = 20$ minutes and $\sigma = 4$ minutes, then a waiting time of $28$ minutes has

$$z = \frac{28 - 20}{4} = 2$$

A wait of $28$ minutes is unusual but not impossible. If the probability of waiting more than $28$ minutes is small, the hospital may investigate causes of delay.

It is also important to recognize when the normal model is not appropriate. Data that are very skewed, have strong outliers, or are bounded in a way that breaks symmetry may not fit a normal distribution well. Good statistical reasoning means checking whether the model matches the situation.

Conclusion

The normal distribution is one of the central ideas in probability and statistics because it gives a simple and powerful way to describe many real-world data sets. Its symmetric bell shape, mean $\mu$, standard deviation $\sigma$, and standard normal version $Z \sim N(0,1)$ make it useful for calculation and interpretation.

By using $z$-scores, probabilities under the curve, and percentile ideas, you can solve problems involving test scores, heights, measurement error, and many other situations. In IB Mathematics: Analysis and Approaches HL, the normal distribution supports both mathematical technique and statistical thinking. It connects data description, probability, and real-world decision-making in a single model 📈.

Study Notes

  • A normal distribution is a continuous, symmetric, bell-shaped distribution.
  • It is written as $X \sim N(\mu, \sigma^2)$.
  • The mean $\mu$ is the center, and the standard deviation $\sigma$ measures spread.
  • The total area under the curve is $1$.
  • Probabilities are found from areas under the curve, not from single points.
  • The standard normal distribution is $Z \sim N(0,1)$.
  • The $z$-score formula is $$z = \frac{x - \mu}{\sigma}$$
  • A positive $z$-score means the value is above the mean; a negative $z$-score means it is below.
  • The empirical rule says about $68\%$, $95\%$, and $99.7\%$ of values lie within $1$, $2$, and $3$ standard deviations of the mean.
  • Normal distributions are useful in real-life contexts such as test scores, heights, science measurements, and quality control.
  • In IB, always interpret answers in context and check whether the normal model is suitable.

Practice Quiz

5 questions to test your understanding

The Normal Distribution — IB Mathematics Analysis And Approaches HL | A-Warded