4. Statistics and Probability

Types Of Data

Types of Data 📊

Welcome, students! In Statistics and Probability, the first big question is not “What answer do we want?” but “What kind of data do we have?” The type of data determines which graphs, summaries, and calculations make sense. If you choose the wrong method, your conclusions can be misleading. In this lesson, you will learn the main types of data, how to recognize them, and why they matter in real-world decision-making. By the end, you should be able to classify data, connect data type to appropriate statistical tools, and explain why this is a foundation for inference and probability models.

Objectives for this lesson:

  • Identify and describe the main types of data.
  • Distinguish between qualitative and quantitative data.
  • Distinguish between discrete and continuous data.
  • Recognize levels of measurement such as nominal, ordinal, interval, and ratio.
  • Choose suitable summaries and displays based on the type of data.
  • Connect data type to statistical reasoning in IB Mathematics: Applications and Interpretation HL.

1. What is data? 🔍

Data are values collected from observations, measurements, or responses. In mathematics and statistics, data help us describe the world and make decisions based on evidence. For example, a school might collect students’ favorite subjects, heights, exam scores, or the number of minutes spent on homework. These are all data, but they are not all the same kind.

The first step in any statistical investigation is to understand the variable being studied. A variable is a characteristic that can take different values. For instance, if you study students’ heights, the variable is height and the values might be $160\,\text{cm}$, $172\,\text{cm}$, or $181.5\,\text{cm}$. If you study preferred transport to school, the variable is categorical and the values might be bus, car, bike, or walk.

Why does this matter? Because different types of data answer different questions. A pie chart may work well for categories, but not for test scores. A mean makes sense for numerical scores, but not for eye color. Choosing the right method is a major part of good statistical practice.

2. Qualitative and quantitative data 🧠

The broadest classification is between qualitative data and quantitative data.

Qualitative data describe qualities or categories. They are non-numerical labels. Examples include eye color, nationality, music preference, and type of device used to access a learning app. In IB Statistics and Probability, qualitative data are often called categorical data because they place observations into groups.

Quantitative data are numerical values that represent amounts or measurements. Examples include age, mass, temperature, exam score, and number of siblings.

A useful way to remember the difference is this:

  • Qualitative data tell you “what kind.”
  • Quantitative data tell you “how much” or “how many.”

Example: Suppose students surveys a class about commuting to school. If the responses are walk, cycle, bus, or car, the data are qualitative. If students records the number of minutes each student takes to arrive, the data are quantitative.

Both kinds of data are important. Qualitative data are common in opinion surveys and group classification. Quantitative data are essential for measurement, comparison, and modelling.

3. Discrete and continuous quantitative data 📏

Quantitative data can be split into two more types: discrete and continuous.

Discrete data can take only separate, countable values. They often come from counting. Examples include number of siblings, number of goals scored, or number of students absent. You can list all possible values in a countable way, such as $0,1,2,3,\dots$.

Continuous data can take any value within a range. They often come from measuring. Examples include height, mass, time, temperature, and distance. A height might be $170\,\text{cm}$, but it could also be $170.2\,\text{cm}$ or $170.23\,\text{cm}$ depending on the precision of the measuring instrument.

This difference matters because continuous data can, in principle, have infinitely many possible values between two numbers. If a runner’s time is recorded as $12.4\,\text{s}$, the true time may be anywhere near that value depending on measurement accuracy.

A quick check:

  • If you count it, it is usually discrete.
  • If you measure it, it is usually continuous.

However, real data can be recorded in rounded form. For instance, a person’s age might be written as $16$ years, but the underlying variable is actually continuous because age changes continuously over time.

4. Levels of measurement: nominal, ordinal, interval, ratio 📚

IB statistics also uses levels of measurement. These help explain what operations are meaningful.

Nominal data

Nominal data are categories with no natural order. Examples include blood type, nationality, and color of car. You can count how many observations are in each category, but you cannot say one category is “greater” than another.

Example: If a class chooses a favorite sport, the data are nominal because football, basketball, and tennis are just names of categories.

Ordinal data

Ordinal data are categories with a meaningful order, but the gaps between values are not necessarily equal. Examples include survey ratings such as poor, fair, good, excellent; or competition places such as $1$st, $2$nd, and $3$rd.

Example: If students rate a lesson from 1 to 5, the numbers show order, but the difference between $1$ and $2$ may not mean the same as the difference between $4$ and $5$.

Interval data

Interval data are numerical scales with equal intervals, but no true zero. Temperature in degrees Celsius is a classic example. The difference between $10^\circ\text{C}$ and $20^\circ\text{C}$ is the same as the difference between $20^\circ\text{C}$ and $30^\circ\text{C}$, but $0^\circ\text{C}$ does not mean “no temperature.” Because of that, ratios such as “$20^\circ\text{C}$ is twice as hot as $10^\circ\text{C}$” are not valid.

Ratio data

Ratio data have equal intervals and a true zero. Examples include mass, length, time, age, and income. Since $0$ means none of the quantity, ratio comparisons make sense. For example, a distance of $10\,\text{km}$ is twice $5\,\text{km}$.

Understanding levels of measurement helps students decide what analysis is valid. For example, averages are meaningful for ratio and interval data, but not always for nominal categories.

5. Choosing the right summaries and displays 📈

Different data types need different statistical tools.

For qualitative data, common displays include:

  • Bar charts
  • Pie charts
  • Frequency tables
  • Pareto charts

For quantitative data, common displays include:

  • Histograms
  • Box plots
  • Stem-and-leaf plots
  • Dot plots
  • Scatter plots when comparing two variables

For numerical summaries, use:

  • Mean, median, and mode
  • Range, interquartile range, and standard deviation
  • Percentiles and quartiles

But not every summary is suitable for every data type. The mean is useful for quantitative data, especially when the distribution is roughly symmetric. The mode can be used for categorical data because it identifies the most common category. The median is useful for ordered or skewed data because it is resistant to extreme values.

Example: Suppose a group of students reports the following commute times in minutes: $8, 10, 10, 12, 30$. The value $30 is much larger than the others. The mean is pulled upward by the outlier, while the median gives a better sense of a typical commute time. This shows how the type of data and the shape of the distribution affect interpretation.

6. Data types in real investigations and probability models 🎯

Types of data are not just labels. They shape the entire statistical process. In a real investigation, students might ask: “Does exercise time relate to sleep time?” Both variables are quantitative, so a scatter plot and correlation analysis may be appropriate. If instead students asks: “Is there a relationship between transport method and whether a student arrives on time?” the variables may be categorical, so a contingency table and conditional probability are more suitable.

This connects directly to probability models. Categorical data often lead to probability questions like $P(\text{late} \mid \text{bus})$ or $P(\text{success})$. Quantitative data may be modelled using distributions such as the normal distribution, especially when measurements show natural variation.

Understanding data type also helps with inference. If data are qualitative, you might estimate proportions or compare categories. If data are quantitative, you might estimate means, variability, or relationships between variables. In IB Mathematics: Applications and Interpretation HL, this connection is essential because statistics is not just about calculating numbers; it is about making justified decisions from data.

7. Common mistakes to avoid ⚠️

A frequent mistake is treating every numerical code as quantitative data. For example, if a survey uses $1$ for “agree” and $2$ for “disagree,” those numbers are labels, not measured quantities. They are categorical data coded numerically.

Another mistake is using the mean for ordinal data as if the gaps are always equal. While some ordinal scales are sometimes treated as numerical in practice, the safest mathematical approach is to interpret them carefully and explain limitations.

A third mistake is ignoring the difference between discrete and continuous data when graphing. A bar chart is usually more suitable for discrete counts or categories, while a histogram is more appropriate for continuous data grouped into intervals.

Always ask three questions:

  1. Is the data categorical or numerical?
  2. If numerical, is it discrete or continuous?
  3. What level of measurement best describes it?

Conclusion ✅

Types of data are the starting point for sound statistical reasoning, students. Qualitative and quantitative data describe two broad ways information can be collected, while discrete and continuous data separate kinds of numerical values. The levels of measurement—nominal, ordinal, interval, and ratio—show what comparisons and calculations are valid. These ideas guide the choice of graphs, summaries, probability models, and inferential methods.

In Statistics and Probability, the quality of your analysis depends on recognizing the data type before doing any calculation. If you classify the data correctly, you can choose appropriate methods and draw stronger conclusions. That is why types of data are a core skill in IB Mathematics: Applications and Interpretation HL.

Study Notes

  • Data are values collected from observations, measurements, or responses.
  • Qualitative data are categories; quantitative data are numbers.
  • Discrete data are countable; continuous data are measurable on a scale.
  • Nominal data have no order.
  • Ordinal data have order, but unequal or unknown gaps.
  • Interval data have equal intervals but no true zero.
  • Ratio data have equal intervals and a true zero.
  • Bar charts and pie charts suit categorical data.
  • Histograms and box plots suit quantitative data, especially continuous data.
  • The mean is not suitable for every data type.
  • Correct data classification supports valid probability and inference methods.
  • Always match the statistical tool to the type of data.

Practice Quiz

5 questions to test your understanding