Data Representation

Hey students! 👋 Welcome to one of the most fundamental topics in computer science - data representation! In this lesson, we'll explore how computers store and process different types of information like text, whole numbers, and decimal numbers. By the end of this lesson, you'll understand why everything in a computer is ultimately just 1s and 0s, and how this simple system can represent incredibly complex data. This knowledge is crucial for understanding how computers work at their core and will help you appreciate the elegance of digital systems! 🖥️

Understanding Binary: The Language of Computers

Let's start with the basics, students! Computers are essentially electrical devices that can only understand two states: on (1) and off (0). This is called binary, and it's the foundation of all computer operations.

Think of binary like a light switch - it's either on or off, there's no in-between! 💡 While we humans use the decimal system (base 10) with digits 0-9, computers use the binary system (base 2) with only digits 0 and 1.

Each binary digit is called a bit (short for binary digit), and 8 bits make up a byte. Here's how binary numbers work:

In decimal: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
In binary: 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010

To convert from binary to decimal, each position represents a power of 2. For example, the binary number 1011 equals:

$$1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0 = 8 + 0 + 2 + 1 = 11$$

Fun fact: A single bit can store 2 different values, but 8 bits (1 byte) can store 256 different combinations! This is why bytes are so important in computer storage. 📊

Character Representation: From Letters to Numbers

Now, students, you might wonder: "If computers only understand numbers, how do they store letters and symbols?" Great question! The answer lies in character encoding.

ASCII (American Standard Code for Information Interchange) is the most fundamental character encoding system. It assigns a unique number to each character:

'A' = 65 (in binary: 01000001)
'a' = 97 (in binary: 01100001)
'0' = 48 (in binary: 00110000)
Space = 32 (in binary: 00100000)

ASCII uses 7 bits, allowing for 128 different characters. However, modern computers typically use 8 bits (1 byte) per ASCII character, giving us extended ASCII with 256 possible characters.

But ASCII has limitations - it only covers English characters! That's where Unicode comes in. Unicode is a much larger character encoding system that can represent over 1 million characters, including emojis, Chinese characters, Arabic script, and mathematical symbols. The most common Unicode encoding is UTF-8, which uses 1-4 bytes per character.

Real-world example: When you type "Hello! 😊" in a text message, your phone converts each character to its Unicode number, stores it in binary, and sends those 1s and 0s to your friend's phone, which converts them back to readable text!

Integer Representation: Whole Numbers in Binary

Integers (whole numbers) are stored in binary using fixed-size containers. students, let's explore how this works with different bit sizes:

8-bit integers can represent:

Unsigned (positive only): 0 to 255
Signed (positive and negative): -128 to 127

16-bit integers can represent:

Unsigned: 0 to 65,535
Signed: -32,768 to 32,767

32-bit integers can represent:

Unsigned: 0 to 4,294,967,295
Signed: -2,147,483,648 to 2,147,483,647

For negative numbers, computers use two's complement representation. This clever system allows the same binary addition circuits to work for both positive and negative numbers!

Here's a crucial concept: integer overflow. When you try to store a number larger than the maximum capacity, it "wraps around." For example, in 8-bit unsigned arithmetic: 255 + 1 = 0! This is like an odometer rolling over from 999,999 back to 000,000. 🚗

Floating Point Numbers: Decimals in the Digital World

Storing decimal numbers like 3.14159 or 0.001 is more complex than integers, students. Computers use floating point representation based on scientific notation.

A floating point number has three parts:

Sign bit: Indicates positive (0) or negative (1)
Exponent: Determines the magnitude
Mantissa (or Significand): Contains the significant digits

The IEEE 754 standard defines two common formats:

Single precision (32-bit): 1 sign bit, 8 exponent bits, 23 mantissa bits
Double precision (64-bit): 1 sign bit, 11 exponent bits, 52 mantissa bits

For example, the number 12.375 in 32-bit floating point:

Binary: 1100.011
Scientific notation: 1.100011 × 2³
Sign: 0 (positive)
Exponent: 130 (3 + 127 bias)
Mantissa: 10001100000000000000000

Precision limitations are crucial to understand! Floating point numbers can't represent all decimal numbers exactly. For instance, the simple decimal 0.1 cannot be represented exactly in binary floating point - it becomes a repeating binary fraction! This leads to small rounding errors that can accumulate in calculations. 🔢

Storage and Memory Implications

Understanding data representation helps explain computer storage, students! Here are some key insights:

Storage efficiency: Text files using ASCII need 1 byte per character, while Unicode text might need 1-4 bytes per character. A 1000-word essay (about 5000 characters) takes roughly 5KB in ASCII but could be larger in Unicode.

Memory alignment: Computers often store data in chunks aligned to specific boundaries (like 4-byte or 8-byte boundaries) for faster access, which can lead to some "wasted" space.

Data types matter: Choosing the right data type affects both storage space and processing speed. Using a 64-bit integer when you only need values 0-255 wastes memory!

Real-world impact: Video games often use 16-bit integers for coordinates to balance precision with memory usage. This is why some older games had invisible walls at specific distances - the coordinates reached their maximum values! 🎮

Conclusion

students, you've now explored the fascinating world of data representation! We've seen how computers use binary to store characters through ASCII and Unicode, represent integers with fixed-bit sizes and handle overflow, and store decimal numbers using floating point representation with its precision trade-offs. Understanding these concepts helps explain why computers behave the way they do and why programmers must carefully consider data types and storage requirements. This knowledge forms the foundation for understanding computer architecture, programming efficiency, and digital system design.

Study Notes

• Binary system: Base 2 numbering using only 0s and 1s; each position represents a power of 2

• Bit: Single binary digit (0 or 1); Byte: 8 bits = 256 possible combinations

• ASCII: 7-bit character encoding (128 characters); commonly stored in 8 bits (256 characters)

• Unicode: Universal character encoding supporting over 1 million characters; UTF-8 is most common

• Integer overflow: When a number exceeds storage capacity, it wraps around to minimum value

• Two's complement: Method for representing negative integers in binary

• Floating point: Decimal number representation using sign, exponent, and mantissa components

• IEEE 754: Standard for floating point representation (32-bit single, 64-bit double precision)

• Precision limitations: Not all decimal numbers can be exactly represented in binary floating point

• Storage considerations: Data type choice affects memory usage and processing efficiency

• 8-bit signed integers: Range -128 to 127; 8-bit unsigned: Range 0 to 255

• Character encoding formula: Number of possible characters = 2^(number of bits)