Lesson 1.2: Populations, Sampling Units and Random Samples
Introduction
In this lesson, we will delve into the fundamental concepts of populations, sampling units, and random sampling, which are essential for understanding data collection and analysis. By grasping these concepts, you will establish a strong foundation that will support your future studies in statistics. The objectives for this lesson include:
- Understanding the definitions of populations, sampling frames, and sampling units.
- Differentiating between parameters and statistics.
- Learning about simple random samples both with and without replacement.
- Exploring methods for obtaining random samples using random number tables and calculators.
- Understanding the conditions required for a randomly selected sample of size $ n $.
Populations and Sampling Units
A population is a complete set of items or individuals that share at least one characteristic relevant to a statistical analysis. For example, if we want to study the average height of students in a school, the population would include all students in that school. It’s crucial to note that a population can be large or small — it can consist of hundreds, thousands, or even millions of individuals.
Sampling Frame
A sampling frame is a list of individuals or items from which the sample is drawn. It is essential because it allows researchers to define who or what is eligible to be part of the sample. Continuing with our example, if the population includes all students at a school, the sampling frame might be the official school roster.
Sampling Units
A sampling unit is the individual member of the population from which data is collected. In our school example, each student represents a sampling unit. Understanding the distinction between a population, sampling frame, and sampling unit is fundamental in statistics, as the effectiveness of a study heavily relies on these definitions.
Worked Example 1
Suppose we want to find the average number of books read by students at students's school during the last year.
- Population: All students at students's school
- Sampling Frame: The official school enrollment list
- Sampling Units: Each student listed on the enrollment list
In this case, if the school has 500 students, the entire set of 500 students represents the population. The list that includes these 500 names is the sampling frame, and each student is a sampling unit.
Parameters and Statistics
A parameter is a numerical value that describes a characteristic of a population, while a statistic is a numerical value that describes a characteristic of a sample drawn from that population. Parameters are generally unknown and require extensive data collection to estimate them, while statistics are calculated from the data collected in a sample.
Examples
- If we want to know the average height of all students (population), the actual average height, say $ 162 \, cm $, is a parameter.
- If we survey a sample of 30 students and find that their average height is $ 160 \, cm $, this average is a statistic.
Common Misconception
A common misconception is that parameters and statistics are interchangeable. In reality, a statistic is merely our best estimate of the corresponding parameter when we collect data from a sample. The accuracy of that statistic as an estimate of the parameter will depend on how representative our sample is of the population.
Worked Example 2
Imagine you want to know the proportion of students at students's school who prefer chocolate ice cream.
- Population: All students at students's school
- Parameter: The true proportion of students who prefer chocolate ice cream, say $ p = 0.45 $.
- Sample Size: You survey 50 students and find that 20 prefer chocolate ice cream.
- Statistic: The proportion from your sample would be calculated as $ \frac{20}{50} = 0.4 $.
Here, $ p = 0.45 $ is unknown, but $ 0.4 $ is your statistic based on your sample.
Random Sampling
Random sampling is a fundamental technique in statistics to ensure that sample data reliably reflects the population from which it is drawn. There are two main types of random sampling: simple random sampling without replacement and unrestricted random sampling with replacement.
Simple Random Sampling Without Replacement
In simple random sampling without replacement, once an item is selected, it is not returned to the population for further selection. Therefore, each item can only be selected once for the sample.
Conditions for Simple Random Sampling Without Replacement
- Each member of the population must have an equal chance of being selected in the sample.
- No member can be selected more than once.
Worked Example 3
Suppose students wants to select 5 students from a school of 100 without replacement.
- List: Each student's name is included in a random number generator or random sampling table.
- Outcome: If students 12, 45, 68, 23, and 37 are selected, then these five students comprise the sample.
- Since this is without replacement, once a student is chosen, they cannot be chosen again.
Unrestricted Random Sampling With Replacement
In this method, when an item is selected, it is returned to the population for further selection. In this case, the same item can be selected multiple times.
Conditions for Unrestricted Random Sampling With Replacement
- Each member of the population must have an equal chance of being selected for each draw.
- Since items are replaced after selection, the same item can appear in the sample multiple times.
Worked Example 4
Consider the same school of 100 students. If students wants to select 5 students with replacement, the selection process is as follows:
- After selecting student 12, that student remains in the pool for subsequent selections.
- Possible outcomes might be students 12, 12, 45, 12, and 23, showing that student 12 could be chosen multiple times.
Using Random Number Tables and Calculators
Random number tables or random number generators on calculators can be very helpful tools in obtaining random samples from a population. The following are steps to use random numbers to select a sample:
Steps
- Define the Population: Clearly identify the entire population from which you need to sample.
- Number the Sampling Frame: Assign a unique identification number to each individual in the population.
- Select Sample Size: Decide how many members you wish to select from the population.
- Use Random Numbers: Utilize a random number table or a random number generator on a calculator to generate random numbers corresponding to the individuals in your sampling frame.
- Collect the Sample: Keep track of the selections made, considering whether you are sampling with or without replacement.
Common Misconceptions
Students often believe that simply picking names or numbers out of a hat constitutes random sampling, but true random sampling requires a systematic and unbiased selection process.
Worked Example 5
Using a random number generator, you need to select a sample of 5 students from 100. If numbers 37, 2, 45, 88, and 19 are generated, these will correspond to the students on your list. If you are sampling without replacement, once a student is selected, you will not select them again. If you are sampling with replacement, repeat selections are permitted.
Conclusion
In this lesson, we have explored the concepts of populations, sampling frames, sampling units, and random sampling. You learned the difference between parameters and statistics, how to obtain random samples, and the implications of sampling methods. Understanding these concepts is crucial as they form the backbone of statistical inquiry.
Study Notes
- A population is the complete set being studied.
- A sampling frame is a list of all possible subjects for sampling.
- Sampling units are the individuals chosen for the study.
- Parameters describe populations while statistics describe samples.
- Random sampling ensures that every member has an equal chance of selection.
- Simple random sampling without replacement means no repeating members in a sample, while unrestricted random sampling with replacement allows for duplicates.
- Random number tables and calculators are useful tools for generating random samples.
