4. Statistics and Probability

Probability Density Functions

Probability Density Functions πŸ“ˆ

students, imagine you are looking at the heights of students in a school. Some are shorter, some are taller, and many are somewhere in between. If you want to describe how likely it is to find a student around $170\text{ cm}$ tall, you need a tool that works for continuous data. That tool is the probability density function or pdf. In this lesson, you will learn what a pdf means, how it connects to probability, and how it fits into the bigger picture of Statistics and Probability in IB Mathematics: Analysis and Approaches HL.

What a Probability Density Function Means

A probability density function is used for a continuous random variable. A continuous random variable can take any value in an interval, such as time, mass, height, or temperature. For a discrete random variable, you count exact values. For a continuous random variable, you measure values, and there are infinitely many possible outcomes between any two numbers.

The pdf of a continuous random variable $X$ is a function $f(x)$ with these key ideas:

  • $f(x)\geq 0$ for all $x$
  • The total area under the curve is $1$
  • Probabilities come from areas under the curve, not from the height at a single point

This last point is very important. For a continuous random variable, $P(X=a)=0$ for any exact value $a$. That may feel strange at first, but it makes sense because there are infinitely many possible values. Instead, we calculate probabilities over intervals such as $P(2<X<5)$.

A pdf is not itself a probability. It is a density, which means it tells us how probability is spread across values. A taller graph means more probability is concentrated near that value, but the probability of an exact point is still zero.

Area Under the Curve and Probability

The core rule for pdfs is that the probability of an interval is equal to the area under the curve over that interval. If $X$ has pdf $f(x)$, then

$$P(a\leq X\leq b)=\int_a^b f(x)\,dx$$

This works because integration measures area. Since a probability must be between $0$ and $1$, the total area must satisfy

$$\int_{-\infty}^{\infty} f(x)\,dx=1$$

If the pdf is only defined on a certain interval, then the integral over that interval must be $1$.

For example, suppose a random variable $X$ represents the waiting time in minutes for a bus, and its pdf is defined on $0\leq x\leq 10$. If the graph rises in the middle, that means waiting times near the middle are more likely than very short or very long waits. A probability such as $P(2\leq X\leq 6)$ is found by calculating the area under the curve from $x=2$ to $x=6$.

This is one of the main links between probability and calculus in IB HL. You are not just memorizing formulas. You are using the area idea to interpret the shape of a distribution.

Working with Uniform and Other Simple PDFs

A very common starting point is the uniform distribution, where every value in an interval is equally likely. If $X\sim U(a,b)$, then the pdf is

$$f(x)=\frac{1}{b-a}\quad \text{for } a\leq x\leq b$$

and $f(x)=0$ otherwise.

Why does this work? The graph is a rectangle with width $b-a$ and height $\frac{1}{b-a}$, so the area is

$$\left(b-a\right)\cdot\frac{1}{b-a}=1$$

Suppose $X\sim U(0,4)$. Then

$$f(x)=\frac{1}{4}\quad \text{for } 0\leq x\leq 4$$

To find $P(1\leq X\leq 3)$, use area:

$$P(1\leq X\leq 3)=\int_1^3 \frac{1}{4}\,dx=\frac{1}{4}(3-1)=\frac{1}{2}$$

This matches the idea of a rectangle: width $2$ and height $\frac{1}{4}$.

Not all pdfs are flat. Many are curved, and their shape can be based on real data. For example, exam scores might cluster around a middle value, while very low or very high scores are less common. A pdf helps model this kind of pattern.

Important Properties and Terminology

To understand pdfs well, students, you should know these terms:

  • Random variable: a numerical outcome from a random process
  • Continuous random variable: a variable that can take any value in an interval
  • Probability density function: a function describing how probability is distributed for a continuous variable
  • Support: the values where the pdf is greater than $0$
  • Cumulative probability: probability up to a certain value, such as $P(X\leq x)$
  • Median: the value that splits the total area into two equal parts
  • Expected value: the long-run average value, often written as $E(X)$

For a continuous random variable, the expected value is given by

$$E(X)=\int_{-\infty}^{\infty} x f(x)\,dx$$

This formula is a weighted average. Values with higher density count more strongly in the average.

A useful related function is the cumulative distribution function or cdf, written as $F(x)$. It gives the probability that the variable is at most $x$:

$$F(x)=P(X\leq x)$$

For a pdf $f(x)$,

$$F(x)=\int_{-\infty}^{x} f(t)\,dt$$

The cdf always increases from $0$ to $1$, while the pdf shows how probability is spread at each point.

Example of Interpreting a PDF in Context

Consider a machine that fills bottles with juice. The amount filled is supposed to be close to $500\text{ mL}$, but small variation happens. Let $X$ be the fill amount in millilitres. A pdf could model how likely each fill amount is.

If the curve is highest near $500$, then fills close to $500\text{ mL}$ are most common. If the graph has a longer tail to the right, then occasional overfills happen but are less likely. If the graph has a shorter range on one side, that tells you the machine has a practical limit.

Suppose the probability that a bottle is filled between $495\text{ mL}$ and $505\text{ mL}$ is high. That means most of the area under the curve lies in that interval. A smaller area outside that range indicates fewer bottles outside the target.

This is how pdfs help in quality control, manufacturing, and science. They describe patterns in measurements and allow us to make decisions based on probability.

How PDFs Fit into the IB Statistics and Probability Topic

Probability density functions connect several parts of Statistics and Probability:

  • From data collection and statistical description, pdfs help model continuous data such as heights, reaction times, or test scores.
  • From regression and correlation, probability models help explain variability and uncertainty in measurements.
  • From conditional probability and Bayes’ theorem, pdf ideas support reasoning about continuous outcomes, especially when combining probability information.
  • From discrete and continuous probability distributions, pdfs are the continuous counterpart of probability mass functions for discrete variables.

In IB Mathematics: Analysis and Approaches HL, you are expected to reason carefully about what the graph means, not just calculate. For example, if a pdf is given, you should be able to:

  • identify that exact-point probability is $0$
  • find interval probabilities using integrals
  • interpret area as probability
  • use the total area condition to determine constants in a pdf
  • compare shapes of distributions and what they mean in context

If a pdf has an unknown constant $k$, you often find it by making the total area equal to $1$. For example, if

$$f(x)=kx\quad \text{for } 0\leq x\leq 2$$

then

$$\int_0^2 kx\,dx=1$$

which gives

$$k\left[\frac{x^2}{2}\right]_0^2=1$$

so

$$2k=1$$

and therefore

$$k=\frac{1}{2}$$

This kind of question is common in HL because it combines algebra, calculus, and probability reasoning.

Common Mistakes to Avoid

students, here are some errors students often make:

  • Thinking $f(x)$ is the probability at $x$
  • Forgetting that probabilities come from area, not height
  • Ignoring the condition that total area must be $1$
  • Using exact values like $P(X=3)$ for a continuous random variable and expecting a nonzero answer
  • Confusing the pdf with the cdf

A helpful check is to ask: β€œAm I finding a height or an area?” If it is probability, you need area.

Conclusion

Probability density functions are a major idea in continuous probability. They describe how probability is spread across values, and they allow us to find probabilities using area under a curve. In IB Mathematics: Analysis and Approaches HL, pdfs connect calculus with probability and support modeling in real-world situations such as measurements, timing, and quality control. If you understand that $f(x)$ is a density, that total area is $1$, and that interval probability comes from integration, you have the core of the topic. 🎯

Study Notes

  • A probability density function, or pdf, is used for a continuous random variable.
  • For a pdf $f(x)$, $f(x)\geq 0$ and $\int_{-\infty}^{\infty} f(x)\,dx=1$.
  • Probabilities are found using area: $P(a\leq X\leq b)=\int_a^b f(x)\,dx$.
  • For a continuous random variable, $P(X=a)=0$ for any exact value $a$.
  • The cumulative distribution function is $F(x)=P(X\leq x)$.
  • The expected value is $E(X)=\int_{-\infty}^{\infty} x f(x)\,dx$.
  • Uniform distributions are simple examples of pdfs with constant height over an interval.
  • In IB HL, you may need to find constants in a pdf by using the fact that total area must equal $1$.
  • PDFs connect to real-world data such as height, time, mass, and measurement variation.
  • Always interpret whether a question needs a height, an area, or a probability.

Practice Quiz

5 questions to test your understanding