Combining Random Variables

Introduction: Why combine random variables? 🎯

students, in statistics we often study one random variable at a time, such as the number of students who pass a quiz or the amount of time a bus takes to arrive. But real situations usually involve more than one random variable happening at once. For example, a school fundraiser might earn money from ticket sales and snack sales, or a sports team might score points from field goals and free throws. To analyze these situations, statisticians combine random variables.

The big idea is simple: when two or more random variables are added, subtracted, or otherwise combined, we can still describe the new random variable if we know the behavior of the original variables. This lesson will help you understand how to find the mean and variance of combined random variables, why independence matters, and how these ideas connect to AP Statistics probability and distributions.

Learning objectives

Explain the key ideas and vocabulary for combining random variables.
Apply AP Statistics reasoning to find the mean and variance of combinations of random variables.
Connect combinations of random variables to probability distributions and simulation.
Use examples and evidence to interpret combined random variables correctly.

What it means to combine random variables

A random variable is a numerical description of a chance process. If $X$ is the number of goals scored by one soccer player and $Y$ is the number scored by another player, then $X+Y$ is a new random variable representing the total goals scored by both players.

The most common combinations are:

a sum, such as $X+Y$
a difference, such as $X-Y$
a scaled variable, such as $2X$
a linear combination, such as $aX+bY+c$

These are called linear combinations because they use multiplication by constants and addition or subtraction.

It is important to remember that the random variables themselves do not have to be counts. They can represent money, time, temperature, or any measurable quantity. For example, if $X$ is the cost of lunch and $Y$ is the cost of transportation, then $X+Y$ is the total cost for the day.

Example: school club fundraiser

Suppose a club earns money from two sources:

$X$ = money from ticket sales
$Y$ = money from snack sales

Then the total revenue is $T=X+Y$. If ticket sales and snack sales vary from day to day, then $T$ also varies from day to day. We want to understand the center and spread of $T$.

Finding the mean of a combination 😊

The mean of a random variable is its expected value. For combined random variables, the mean behaves very nicely:

$$E(X+Y)=E(X)+E(Y)$$

More generally,

$$E(aX+bY+c)=aE(X)+bE(Y)+c$$

This works whether or not the variables are independent. That is a very useful fact in AP Statistics.

Why this matters

If you know the average of each part, you can find the average of the total without listing every possible outcome. This is especially helpful when the number of outcomes is large.

Example

Suppose $E(X)=12$ and $E(Y)=8$. Then the average total is

$$E(X+Y)=12+8=20$$

If a store’s daily profit is modeled by $P=3X-2Y+5$, where $E(X)=10$ and $E(Y)=4$, then

$$E(P)=3(10)-2(4)+5=30-8+5=27$$

So the expected profit is $27$ dollars. The constant $5$ shifts the mean upward by $5$.

Key idea

When you combine random variables, the expected value of the result is the sum of the expected values after applying any constants. This makes the mean of a combined variable easy to compute.

Finding the variance of a combination 📊

Variance measures spread. Standard deviation is the square root of variance, so both are about how much a random variable tends to vary.

For two random variables $X$ and $Y$,

$$\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)+2\operatorname{Cov}(X,Y)$$

If $X$ and $Y$ are independent, then $\operatorname{Cov}(X,Y)=0$, so

$$\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)$$

This is one of the most important AP Statistics rules for combining random variables.

For a linear combination,

$$\operatorname{Var}(aX+bY+c)=a^2\operatorname{Var}(X)+b^2\operatorname{Var}(Y)$$

when $X$ and $Y$ are independent. The constant $c$ does not affect variance because it shifts every value by the same amount.

Why independence matters

Independence means the outcome of one random variable does not affect the other. If two variables are independent, their combined spread can be found by adding their variances.

If they are not independent, the relationship between them matters. For example, if two test scores come from the same student, a high score on one test may be related to a high score on another. In that case, you cannot simply add variances unless the problem states independence or gives enough information to account for dependence.

Example: two machines

Suppose machine A produces parts with variance $4$ and machine B produces parts with variance $9$. If the production amounts are independent, then the variance of the total produced is

$$4+9=13$$

The standard deviation of the total is

$$\sqrt{13}$$

This tells us the total production is more spread out than either machine alone.

Important warning

Do not add standard deviations directly. In general,

$$\operatorname{SD}(X+Y)\ne \operatorname{SD}(X)+\operatorname{SD}(Y)$$

Instead, add variances when independence applies, then take the square root if needed.

Difference of random variables

Sometimes we compare two random variables by subtracting them. For example, if $X$ is the score of Team A and $Y$ is the score of Team B, then $X-Y$ shows the score difference.

The mean of a difference is

$$E(X-Y)=E(X)-E(Y)$$

If $X$ and $Y$ are independent, then

$$\operatorname{Var}(X-Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)$$

Notice that the variance still adds. The minus sign changes the center, but not the spread, because squaring removes the sign in the variance formula.

Example: comparing quiz scores

Suppose $X$ is the score on Quiz 1 and $Y$ is the score on Quiz 2. If $E(X)=78$, $E(Y)=74$, and the quizzes are independent with variances $16$ and $25$, then the expected difference is

$$E(X-Y)=78-74=4$$

and the variance of the difference is

$$16+25=41$$

So the average student scores $4$ points higher on Quiz 1, with a spread measured by variance $41$.

Using combined random variables in AP Statistics reasoning 🧠

On the AP Statistics exam, combining random variables often appears in context. You may be asked to find an expected total, interpret a difference, or compare two different plans.

A strong solution usually includes these steps:

Define the random variables clearly.
State whether they are independent if needed.
Use the correct formula for mean or variance.
Interpret the result in context.

Example: choosing a phone plan

Suppose Plan A charges a fixed fee plus a random usage fee, and Plan B has a different structure. If $X$ represents usage cost and the total cost is $C=X+20$, then

$$E(C)=E(X)+20$$

The fixed fee increases the average total cost but does not change the spread.

If another fee is modeled by $D=2X+20$ and $X$ has variance $9$, then

$$\operatorname{Var}(D)=2^2(9)=36$$

The standard deviation becomes

$$\sqrt{36}=6$$

This means doubling the usage cost doubles the standard deviation, because standard deviation scales by the absolute value of the multiplier.

Connection to distributions and simulation

Combining random variables fits into the larger AP Statistics topic of probability distributions. A distribution tells us what values a random variable can take and how likely they are. When we combine variables, we create a new distribution.

Sometimes the combined distribution is easy to describe with formulas for mean and variance. Other times, especially when the distributions are complicated, simulation helps. For example, if a game involves two random draws and a score based on both, a simulation can estimate the distribution of the total score.

This is why combining random variables connects to simulation and probability. It helps statisticians model real-world uncertainty in a practical way.

Real-world example

A delivery company may model the total time for a route as

$$T=X+Y$$

where $X$ is travel time and $Y$ is unloading time. If each part varies, then the total time also varies. The company can use the mean to estimate average delivery time and the variance to understand how much delivery times change.

Conclusion

Combining random variables is a major tool in AP Statistics because it lets you model totals, differences, and other real-world quantities built from more than one random process. The most important facts to remember are:

means add
variances add for independent variables
constants change the mean but not the variance
signs affect the mean, not the variance

students, when you see a problem about total cost, total score, or difference in measurements, think about how the random variables are combined. If you choose the right formula and interpret the result in context, you will be using powerful AP Statistics reasoning correctly. ✅

Study Notes

A random variable assigns numerical values to outcomes of a chance process.
A combined random variable can be a sum, difference, or linear combination such as $aX+bY+c$.
The expected value of a combination follows linearity:

$$E(aX+bY+c)=aE(X)+bE(Y)+c$$

For independent random variables, variances add:

$$\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)$$

For independent variables,

$$\operatorname{Var}(aX+bY+c)=a^2\operatorname{Var}(X)+b^2\operatorname{Var}(Y)$$

A constant changes the mean but does not change the variance.
Standard deviations do not add directly.
The variance of a difference still adds when the variables are independent.
Independence is important for variance calculations.
Combined random variables are common in totals, comparisons, costs, scores, and time models.
Simulation can help estimate distributions when direct calculation is difficult.
In AP Statistics, always define variables clearly and interpret results in context.