Data Representation in System Fundamentals

Introduction

students, every time you send a message, save a photo, or stream a video, a computer must turn real-world information into a form it can process. That process is called data representation 💡. Computers do not understand letters, colors, sounds, or numbers the same way humans do. Instead, they store and process everything using patterns of bits, usually $0$s and $1$s.

In this lesson, you will learn how computers represent text, numbers, images, sound, and more. You will also see why choices like bit depth, file size, and coding systems affect performance and storage in real systems. By the end, you should be able to explain the main ideas, use the correct terminology, and connect data representation to the larger topic of System Fundamentals.

Learning objectives

Explain the main ideas and terminology behind data representation.
Apply IB Computer Science HL reasoning to data representation problems.
Connect data representation to system architecture, storage, and performance.
Summarize how data representation fits into System Fundamentals.
Use evidence and examples to support explanations.

Bits, Bytes, and Why Computers Use Binary

At the lowest level, computers use binary, which means base $2$. Binary uses only two symbols: $0$ and $1$. These are convenient for electronic circuits because a switch can be in one of two states, such as off/on or low/high voltage ⚡.

A single binary digit is called a bit. Eight bits make one byte. Bytes are important because many computer systems measure storage and file sizes in bytes, kilobytes, megabytes, and beyond.

Binary is not just a storage method; it is the foundation of how the CPU and memory communicate. For example, when a program asks for the number $5$, the computer stores it as a binary pattern such as $101_2$. The same idea applies to text, images, and sound, but each type of data needs a different representation method.

Why binary matters in system fundamentals

Data representation affects how much memory a program uses, how fast data can be moved through a bus, and how much storage a file takes. For example, a high-resolution image usually takes more bytes than a simple text file. This matters for performance because systems must manage RAM, cache, disk storage, and network transfer efficiently.

Representing Numbers

Unsigned integers

The simplest number system used in computers is the unsigned integer. It represents only whole numbers $0$ and above. With $n$ bits, the largest unsigned value is $2^n - 1$.

For example, with $8$ bits:

$$2^8 - 1 = 255$$

So an $8$-bit unsigned integer can store values from $0$ to $255$.

This is useful in places like pixel brightness or simple counters, but it cannot represent negative numbers.

Signed integers

To represent negative numbers, computers use signed integers. One common method is two’s complement. In an $n$-bit two’s complement system, the range is from $-2^{n-1}$ to $2^{n-1} - 1$.

For $8$ bits, the range is:

$$-2^7 \text{ to } 2^7 - 1$$

$$-128 \text{ to } 127$$

Two’s complement is widely used because arithmetic is efficient in hardware. For example, subtraction can be handled using addition and bit patterns. This helps the CPU perform calculations quickly.

Floating-point numbers

Real-world values like $3.14$ or $0.001$ are represented using floating-point format. A floating-point number stores a sign, a mantissa, and an exponent. This allows computers to represent very large and very small values.

A simplified idea is:

$$\text{value} = \text{mantissa} \times 2^{\text{exponent}}$$

Floating-point is useful in science, graphics, and simulations. However, it can cause rounding errors because some decimal values cannot be represented exactly in binary. For example, $0.1$ may be stored as a nearby binary approximation rather than the exact decimal fraction. This is why small errors can appear in calculations ✨.

Representing Text, Images, and Sound

Text: character sets and encoding

Text is stored using character sets. A character set matches each symbol to a binary code. A well-known example is ASCII, which originally used $7$ bits to represent $128$ characters, including letters, digits, punctuation, and control characters.

Modern systems use Unicode, which supports many languages and symbols from around the world 🌍. Unicode is usually stored with encodings such as UTF-8, UTF-16, or UTF-32. These encodings differ in how many bytes they use for each character.

For example, ASCII can represent the letter $A$ as the binary code $01000001$. Unicode can represent much more than Latin letters, including emoji and characters from non-Latin scripts.

Images: pixels, resolution, and color depth

A digital image is made of tiny squares called pixels. The number of pixels in an image is called its resolution. A higher resolution usually means more detail but also a larger file size.

Each pixel stores color information. One common method is RGB, which uses red, green, and blue channels. If each channel uses $8$ bits, then each pixel uses $24$ bits in total. This gives over $16$ million possible colors because:

$$2^{24} = 16,777,216$$

The number of bits per pixel is called color depth. Higher color depth gives smoother color changes and better image quality. Lower color depth reduces file size but may cause visible banding or limited colors.

Image file size depends on:

resolution
color depth
whether compression is used

For example, a $1920 \times 1080$ image with $24$-bit color contains many more bits than a small icon. This is why photos take much more storage than plain text messages.

Sound: sampling and quantization

Digital sound is created by measuring sound waves at regular intervals. This is called sampling. The number of samples taken each second is the sampling rate, measured in hertz ($\text{Hz}$).

Each sample stores the amplitude of the sound wave. The number of possible amplitude values is determined by the sample size or bit depth. More bits allow more precise sound reproduction 🎧.

For example, audio recorded at $44.1\,\text{kHz}$ means $44,100$ samples per second. If the sample size is $16$ bits and the sound is stereo, the file size becomes much larger than for mono audio or lower-quality recordings.

Higher sampling rates and larger sample sizes improve quality but also increase storage use. This is an important trade-off in system design.

Compression and File Size

Data representation is closely linked to compression, which reduces file size.

Lossless compression

Lossless compression reduces size without losing any original data. When decompressed, the file is exactly the same as before. This is important for text files, software, and some image formats like PNG.

Lossy compression

Lossy compression removes some data permanently to reduce size more aggressively. This is common in images, audio, and video formats like JPEG and MP3. The result is usually small enough for faster transfer and lower storage use, but with some loss of quality.

Compression matters because it affects:

storage space
network bandwidth
loading time
playback quality

For example, a compressed video is easier to stream on a phone with limited data than an uncompressed version 📱.

How Data Representation Affects System Performance

Data representation is not just about storage; it also affects performance. Smaller file sizes can move through memory and networks faster. However, compression and decompression require processing power, so the system must balance speed, quality, and resource use.

A computer system has limited resources such as CPU time, RAM, cache, disk space, and bandwidth. Good data representation choices help the system work efficiently. For example:

A compact image format can reduce download time.
A higher bit depth can improve audio quality but increase storage needs.
Choosing the correct integer size can save memory in a large database.

This is why data representation is part of System Fundamentals: it connects directly to hardware, software, memory management, and communication between components.

Common IB-style Reasoning Example

Suppose students is asked to compare two image files:

File A: $800 \times 600$ pixels, $24$-bit color, uncompressed
File B: $1600 \times 1200$ pixels, $24$-bit color, uncompressed

File B has twice the width and twice the height, so it has $4$ times as many pixels. Since each pixel uses the same number of bits, File B will be about $4$ times larger than File A.

This kind of reasoning is useful in IB exam questions. You often need to identify the relevant factors, such as resolution, bit depth, or sampling rate, and then explain how they affect file size or quality.

Conclusion

Data representation is the method computers use to store and process information in binary form. students, by understanding bits, bytes, number systems, text encoding, image pixels, sound sampling, and compression, you can explain how different kinds of data are handled in real computer systems. These ideas are central to System Fundamentals because they influence storage, speed, accuracy, and communication. Strong understanding of data representation helps you analyze system performance and make clear, evidence-based comparisons in IB Computer Science HL.

Study Notes

A bit is a binary digit: $0$ or $1$.
A byte equals $8$ bits.
Binary is used because hardware can represent two states reliably.
Unsigned integers store values from $0$ to $2^n - 1$.
Two’s complement signed integers store values from $-2^{n-1}$ to $2^{n-1} - 1$.
Floating-point numbers represent very large or very small values but can include rounding errors.
ASCII is a $7$-bit character set; Unicode supports many more characters and languages.
Images are made of pixels; resolution and color depth affect quality and file size.
In RGB, $24$-bit color can represent $2^{24}$ colors.
Digital sound uses sampling; sampling rate and sample size affect quality and file size.
Lossless compression keeps all original data; lossy compression removes some data.
Data representation affects storage, bandwidth, and performance.
It is a core part of System Fundamentals because it connects data, hardware, and efficiency.