Text, Images, and Sound Representation

students, every digital message you see or hear on a computer starts as data. A text message, a photo, and a song all need to be stored in a way that a computer can understand and process. In this lesson, you will learn how text, images, and sound are represented using bits, why those representations matter, and how choices about representation affect quality, storage, and performance 💻🎵🖼️.

Learning objectives

Explain the main ideas and terminology behind text, images, and sound representation.
Apply IB Computer Science HL reasoning to calculate storage and understand representation methods.
Connect representation to the broader topic of system fundamentals, especially data representation and computer performance.
Summarize how these media types fit into computer systems.
Use examples and evidence to explain why representation choices matter in real life.

Text Representation

Text looks simple on screen, but a computer stores it as binary numbers. Each character, such as a letter, digit, punctuation mark, or emoji, is assigned a numeric code. The computer then stores that code in binary. This is why text representation is part of data representation: the device does not store the meaning of the word, only the code for each symbol.

A common system for text is Unicode, which is a large character set designed to represent characters from many languages and symbols used around the world. UTF-8 is a popular encoding that stores Unicode characters using one to four bytes. A byte is $8$ bits. In UTF-8, common English characters use $1$ byte, while many other characters need more bytes. This makes UTF-8 efficient for English text while still supporting global communication 🌍.

A character set is the list of characters a system can represent. An encoding is the rule that maps each character to binary. For example, the letter $A$ may be stored as a particular binary pattern, and the number $7$ as another. Different encodings can represent the same character differently, which is why text files may display incorrectly if the wrong encoding is used.

Consider a simple example. If a file contains the word "CAT" and each character uses $1$ byte, then the storage needed is:

$$3 \times 8 = 24 \text{ bits}$$

That is $3$ bytes total. For a longer message, storage depends on the number of characters and the encoding used. students, this is important in messaging apps, websites, and databases, where millions of characters are stored and transmitted every day.

Text representation also affects speed and compatibility. A system must know which encoding was used when the file was created. If a document created in one encoding is opened using another, characters may become unreadable, appearing as symbols or question marks. This is a practical reason why standard encodings such as UTF-8 are widely used in modern systems.

Image Representation

Images are represented as grids of tiny dots called pixels. Each pixel stores color information. A digital image can be thought of as a matrix with a certain width and height. The number of pixels is the image resolution, and higher resolution usually means more detail and more storage needed.

Most images use bitmap/raster representation, where every pixel is stored individually. For each pixel, the computer stores color values. In an RGB system, color is made by combining red, green, and blue light. If each color channel uses $8$ bits, then each pixel uses $24$ bits in total. That allows:

$$2^{24}$$

possible color combinations, which is about $16.7$ million colors.

To estimate image storage, use:

$$\text{storage} = \text{width} \times \text{height} \times \text{bits per pixel}$$

For example, a $1920 \times 1080$ image with $24$ bits per pixel requires:

$$1920 \times 1080 \times 24 = 49{,}766{,}400 \text{ bits}$$

which is about $6.22$ MB, ignoring compression and file overhead. This is why high-resolution photos take more space than small icons.

Images can also be compressed. Lossless compression reduces file size without losing any image data, so the original image can be perfectly restored. Lossy compression removes some detail to save more space. JPEG is a common lossy format for photographs, while PNG is often used for images that need transparency or exact detail. A social media app may use lossy compression to make images load faster, while a medical image archive may prefer lossless methods to preserve detail.

Another important idea is vector graphics. Unlike bitmaps, vector images store shapes using mathematical descriptions such as lines, curves, and coordinates. This means they can be resized without becoming blurry. Logos, icons, and diagrams are often stored as vector graphics because they scale well. In contrast, bitmaps can become pixelated when enlarged.

students, the choice between bitmap and vector depends on the job. A photograph of a sunset is usually a bitmap because it contains many small color changes. A school logo is often a vector graphic because it may need to appear on a website, a poster, and a sports jersey without losing sharpness ✨.

Sound Representation

Sound is a continuous wave in the real world, but computers store it digitally by converting it into discrete values. This process is called sampling. The microphone measures the air pressure of the sound wave at regular intervals, and each measurement becomes a number.

Three key terms are used for sound representation:

Sampling rate: the number of samples taken per second, measured in hertz.
Sample resolution: the number of bits used for each sample.
Channels: the number of separate audio tracks, such as mono or stereo.

A higher sampling rate means more measurements per second, which usually gives better sound quality because the waveform is captured more accurately. A higher sample resolution means each measurement can store more detail about loudness. Stereo sound uses $2$ channels, while mono uses $1$.

To estimate audio storage, use:

$$\text{storage} = \text{sampling rate} \times \text{sample resolution} \times \text{number of channels} \times \text{duration}$$

For example, a $3$ minute stereo recording at $44{,}100$ samples per second with $16$ bits per sample needs:

$$44{,}100 \times 16 \times 2 \times 180 = 254{,}016{,}000 \text{ bits}$$

This is about $31.75$ MB before compression. That is why uncompressed audio files are large and compressed formats like MP3 or AAC are common for music streaming.

Sound compression may be lossless or lossy. Lossy compression removes audio data that is less likely to be noticed by human hearing, reducing file size significantly. This is useful for online music and podcasts where fast downloading matters. However, in professional audio editing, a lossless format may be preferred because preserving detail is important.

The quality of sound depends on the choices made during sampling. If the sampling rate is too low, the audio may sound distorted or inaccurate. This is connected to the broader idea of system performance because better quality representation often means larger files, more memory usage, and more processing time.

Why Representation Matters in System Fundamentals

Text, images, and sound representation are not separate topics in isolation. They connect directly to the core ideas of System Fundamentals: how data is stored, processed, transmitted, and displayed. A computer system must balance quality, storage size, compatibility, and speed.

For example, a video call app handles all three types at once. It must encode text in chat messages, compress images from the camera, and sample sound from the microphone. If the internet connection is slow, the app may reduce image quality or audio quality to keep the call working smoothly. This shows the trade-off between fidelity and performance.

Representation also affects hardware and software design. More bits per pixel, more bits per sample, or more characters per message all require more storage and bandwidth. A computer with limited memory may struggle with very large images or uncompressed audio files. That is why file formats, compression, and efficient encoding are essential in real systems.

There are also social impacts. Incorrect text encoding can make international communication difficult. Poor image compression can reduce accessibility if text in images becomes unreadable. Low-quality audio can make educational podcasts harder to understand. In many contexts, accurate representation supports communication, access, and reliability.

A good IB-style response should explain not only what the terms mean, but also why they matter. For instance, if asked why UTF-8 is widely used, you should mention global character support and efficient storage for common characters. If asked why JPEG is useful, you should mention smaller file sizes and fast transmission, especially for photographs. If asked why the sampling rate matters, you should explain that it affects the accuracy of the sound wave representation.

Conclusion

students, text, images, and sound are all represented digitally using bits, but each type uses a different method to capture real-world information. Text uses character sets and encodings, images use pixels or mathematical shapes, and sound uses sampling and digital values. The main goal is always the same: represent information accurately enough for the task while keeping storage and processing manageable.

These ideas are central to System Fundamentals because every digital system depends on efficient data representation. When you understand how media is stored, you can better explain file sizes, quality differences, compatibility problems, and performance trade-offs. That knowledge will help you analyze real-world systems and answer IB Computer Science HL questions with confidence ✅.

Study Notes

Text is stored using character sets and encodings; Unicode supports many languages, and UTF-8 is widely used.
A character is represented by a binary code, not by the letter itself.
Images are often represented as pixels in a bitmap, with each pixel storing color values.
RGB color uses red, green, and blue channels; $24$ bits per pixel gives about $16.7$ million colors.
Image storage can be estimated with $\text{width} \times \text{height} \times \text{bits per pixel}$.
Vector graphics store shapes mathematically and can be scaled without losing quality.
Sound is represented by sampling a continuous wave at regular intervals.
Audio quality depends on sampling rate, sample resolution, and number of channels.
Audio storage can be estimated with $\text{sampling rate} \times \text{sample resolution} \times \text{channels} \times \text{duration}$.
Lossless compression preserves all data; lossy compression reduces file size by removing less important detail.
Representation choices affect file size, quality, compatibility, bandwidth, and system performance.
Text, images, and sound representation are key parts of System Fundamentals because they influence storage, transmission, and processing in real computer systems.