3. Collecting Data

Introduction To Planning A Study

Introduction to Planning a Study

students, imagine you want to find out whether students at your school sleep more on weekends than on weekdays 😴📊. You could ask a few friends, but that might not give a fair answer. In AP Statistics, the first big step is planning a study so that the data you collect can actually answer your question. Good planning helps you avoid misleading results and makes your conclusions more trustworthy.

In this lesson, you will learn how to design a study, choose a good sample, and avoid common sources of bias. You will also see how planning a study connects to the rest of Collecting Data, which includes random sampling, bias in sampling, experimental design, and choosing the right method for inference.

Why Planning Matters

Before any data are collected, a statistician asks: What question am I trying to answer? That question guides everything else. If the question is unclear, the study will be unclear too.

For example, suppose a school wants to know whether a new lunch menu is better than the old one. If the school only asks students who already like the cafeteria, the results may look too positive. If the school only asks students who never eat lunch at school, the results may look too negative. In both cases, the way the study is planned affects the result.

Planning matters because data can be misleading when the study design is weak. AP Statistics focuses not just on getting numbers, but on getting good data. Good data come from good design.

A good plan usually starts with these questions:

  • What is the population of interest?
  • What is the goal of the study?
  • Will we estimate a value, compare groups, or examine a relationship?
  • How will we choose people or units?
  • Will we observe or impose a treatment?
  • What sources of bias could occur?

Thinking through these questions first can save time and improve the quality of the study ✅.

Key Terms in Study Planning

To plan a study correctly, you need to know some important vocabulary.

A population is the entire group you want to learn about. For example, if you want to know the average number of hours of sleep for all students at your school, the population is all students at your school.

A sample is the smaller group you actually collect data from. Because it is usually impossible to measure everyone, we use a sample to represent the population.

A parameter is a numerical value that describes a population, such as the true mean sleep time for all students. A statistic is a numerical value calculated from a sample, such as the mean sleep time for the students surveyed.

A sampling method is the way the sample is chosen. Common methods include:

  • Simple random sample: every possible sample of a given size has an equal chance of being chosen.
  • Stratified sample: the population is divided into groups called strata, and a random sample is taken from each group.
  • Cluster sample: the population is divided into clusters, some clusters are chosen at random, and then every individual in those clusters is sampled.
  • Systematic sample: every $k$th individual is selected after a random start.

These terms are not just memorization. They help you decide whether a study gives a fair picture of the population.

Random Sampling and Representativeness

The goal of sampling is to collect data from a sample that represents the population well. A sample is representative when it reflects important characteristics of the population.

Random sampling is important because it helps reduce bias. If selection is random, every individual has a chance to be chosen, and the sample is less likely to be skewed toward one type of person.

For instance, suppose a teacher wants to know how much time students spend on homework each night. If the teacher only asks students in an advanced math class, the sample may not represent the whole school. A better method might be to randomly select students from the entire student list.

Here is a simple example of how random sampling works:

  • Put every student’s name in a list.
  • Use a random number generator to choose $30$ names.
  • Survey those $30$ students.

This method gives each student the same chance of being selected, which helps make the sample more reliable.

However, random sampling does not guarantee a perfect sample. A random sample can still be unusual by chance. But it gives us a strong starting point and is much better than choosing people based on convenience.

Bias in Sampling

Bias happens when a study design systematically favors one outcome over another. In sampling, bias makes the sample unrepresentative of the population.

Common types of sampling bias include:

  • Convenience bias: choosing individuals who are easiest to reach.
  • Voluntary response bias: people choose themselves to respond, often leading to extreme opinions.
  • Undercoverage: some groups in the population are left out or are less likely to be selected.
  • Nonresponse bias: selected individuals do not respond, and the nonresponders may differ from responders.
  • Response bias: people give inaccurate answers, often because of wording, privacy concerns, or trying to please the researcher.

Example: A website poll asks, “Do you support a longer school day?” Only people who care strongly about the issue are likely to respond. That makes the results unreliable because the sample is self-selected.

Another example: A survey about phone use during class is given only during first period. Students absent that day are not included, so the sample may miss important voices.

To reduce bias, planners should ask:

  • Are all groups in the population included?
  • Is the sample selected randomly?
  • Could people refuse to answer?
  • Are the questions worded in a neutral way?

Good planning helps protect the study from these problems.

Observational Studies and Experiments

There are two major ways to collect data: observational studies and experiments.

In an observational study, researchers observe individuals and measure variables without trying to change anything. For example, a school might record whether students who sleep more also have higher grades. The researchers do not assign sleep habits; they only observe them.

In an experiment, researchers impose a treatment on some individuals and compare the results. For example, a teacher might test two review methods by assigning one group of students to use flashcards and another group to use practice quizzes.

The biggest difference is control. In an experiment, the researcher controls the treatment. In an observational study, the researcher does not.

Experiments are usually better for showing cause and effect, but only when they are well designed. If you want to know whether one thing causes another, the best design usually includes random assignment.

Experimental Design: Control, Randomization, and Replication

A well-planned experiment often uses three important ideas: control, randomization, and replication.

Control means keeping other variables the same as much as possible. If one group gets a treatment in a quiet room and another in a noisy room, the room itself could affect the outcome. Control helps isolate the effect of the treatment.

Randomization means assigning individuals to treatment groups at random. This helps balance out lurking variables, which are hidden factors that might affect results. Random assignment does not erase all differences, but it makes groups more comparable.

Replication means using enough subjects so the results are more reliable. A study with only two students in each group is too small to be convincing. Larger samples usually give more stable results.

A common AP Statistics design also includes a control group, which does not receive the treatment or receives a standard treatment for comparison.

Example: A school wants to know whether a new tutoring app improves quiz scores.

  • Group 1 uses the app.
  • Group 2 uses standard study materials.
  • Students are randomly assigned to groups.
  • The same quiz is given to both groups.

If the app group scores higher, the school has stronger evidence that the app may help. Because the students were randomly assigned, differences between groups are less likely to be caused by other factors.

Choosing the Right Plan for the Question

Not every question needs an experiment. The right design depends on the goal.

Use a sample survey when you want to estimate something about a population, such as the percent of students who prefer online homework.

Use an observational study when you want to look for relationships but cannot or should not impose a treatment, such as studying the link between sleep and mood.

Use an experiment when you want to test whether a treatment causes a change, such as comparing two different study methods.

students, a helpful AP Statistics habit is to match the design to the question:

  • If the question asks “How many?” or “What percent?”, think about sampling.
  • If it asks “Is there a relationship?”, think about observational data.
  • If it asks “Does this cause that?”, think about an experiment.

This choice affects what kind of conclusion you can make. A sample survey can estimate population values, but it cannot prove cause and effect. A well-designed experiment can support causal conclusions.

Connection to Inference

Planning a study is closely connected to later inference procedures. Inference means using sample data to draw conclusions about a population. But inference is only as good as the data collection plan.

If a sample is biased, then even a perfect formula will not fix the problem. If an experiment is poorly controlled, then a difference in results may not come from the treatment. That is why AP Statistics emphasizes study design before calculations.

For example, if a sample is selected randomly and large enough, a confidence interval for a mean or proportion is more likely to be useful. If a treatment group and control group are randomly assigned, a hypothesis test comparing them is more meaningful.

In short, planning study design is the foundation for valid statistical conclusions. Strong design leads to stronger evidence 📈.

Conclusion

Planning a study is one of the most important skills in AP Statistics because it determines whether collected data can be trusted. By identifying the population, choosing an appropriate sampling method, reducing bias, and deciding whether the study should be observational or experimental, you build a stronger foundation for analysis and inference.

Remember, students: good statistics starts long before the calculator. It starts with a careful plan, clear purpose, and thoughtful design.

Study Notes

  • A population is the entire group of interest, and a sample is the part of the population actually studied.
  • A parameter describes a population, while a statistic describes a sample.
  • Random sampling helps create a representative sample and reduces bias.
  • Common sampling methods include simple random, stratified, cluster, and systematic sampling.
  • Sampling bias can come from convenience bias, voluntary response bias, undercoverage, nonresponse bias, and response bias.
  • An observational study measures variables without changing them.
  • An experiment imposes a treatment and can support cause-and-effect conclusions when well designed.
  • Good experiments use control, randomization, and replication.
  • A control group provides a comparison for the treatment group.
  • The type of study should match the question: estimate a value, study a relationship, or test a cause.
  • Study design affects how much confidence you can have in statistical inference.
  • In AP Statistics, careful planning is the first step toward trustworthy data and valid conclusions.

Practice Quiz

5 questions to test your understanding