Congestion Control

Hey students! 👋 Ready to dive into one of the most fascinating aspects of computer networking? Today we're exploring congestion control - the traffic management system that keeps the internet running smoothly. By the end of this lesson, you'll understand how networks detect when they're getting overwhelmed, the clever algorithms that help manage traffic flow, and why fairness matters when millions of devices are trying to communicate at once. Think of it like learning how traffic lights and highway systems prevent gridlock, but for data packets! 🚦

Understanding Network Congestion

Imagine you're trying to get through a narrow hallway at school during passing period. When too many students try to squeeze through at once, everyone slows down and might even come to a complete stop. Network congestion works similarly - when too many data packets try to travel through network links that can't handle the volume, performance degrades dramatically.

Network congestion occurs when the demand for network resources exceeds the available capacity. This happens at routers, switches, and network links when incoming traffic exceeds the rate at which packets can be processed or forwarded. The consequences are serious: increased packet delay, packet loss, and reduced network throughput. In severe cases, congestion can lead to what's called "congestion collapse," where the network becomes so overwhelmed that useful work drops to nearly zero.

Real-world examples of congestion are everywhere! During major events like the Super Bowl or New Year's Eve, cellular networks often become congested as millions of people try to send messages and share photos simultaneously. Similarly, when a popular video goes viral on social media, the sudden spike in traffic can overwhelm servers and network infrastructure. Studies show that during peak usage hours (typically 7-11 PM), internet traffic can increase by up to 70% compared to off-peak times.

The detection of congestion relies on several indicators. Packet loss is the most obvious sign - when routers' buffers fill up, they start dropping incoming packets. Increased round-trip time (RTT) is another indicator, as packets spend more time waiting in queues. Modern networks also monitor queue lengths and utilization rates to predict congestion before it becomes severe.

TCP's Additive Increase Multiplicative Decrease (AIMD)

The Additive Increase Multiplicative Decrease algorithm is like a careful driver who gradually speeds up on a clear highway but hits the brakes hard when traffic appears ahead. AIMD is the cornerstone of TCP's congestion control strategy, designed to efficiently utilize network capacity while avoiding congestion.

Here's how AIMD works: During the additive increase phase, TCP gradually increases its sending rate by adding a fixed amount to the congestion window size for each successful round-trip time. Specifically, the congestion window increases by approximately one maximum segment size (MSS) per RTT when no packet loss is detected. This linear growth allows TCP to probe for available bandwidth without being too aggressive.

The multiplicative decrease phase kicks in when packet loss is detected, indicating congestion. TCP immediately cuts its congestion window in half, dramatically reducing the sending rate. This aggressive reduction helps alleviate congestion quickly and prevents network collapse. The mathematical representation is elegant: if $W$ is the current window size, then $W_{new} = W + \frac{1}{W}$ during additive increase, and $W_{new} = \frac{W}{2}$ during multiplicative decrease.

This approach creates the characteristic "sawtooth" pattern in TCP's sending rate over time. The window size gradually increases until packet loss occurs, then drops by half, and the cycle repeats. Research has shown that AIMD converges to a fair allocation of bandwidth among competing flows, making it mathematically optimal for network stability.

A practical example: imagine your laptop is downloading a large file. Initially, TCP sends data slowly, then gradually increases the rate. If packets start getting lost (indicating congestion), TCP immediately halves its sending rate, then begins the gradual increase again. This prevents your download from overwhelming the network while still achieving good performance.

Slow Start Algorithm

While AIMD is great for steady-state operation, it's too conservative when starting a new connection. That's where slow start comes in - despite its name, it's actually quite aggressive! 🚀

Slow start begins with a congestion window of just one or two segments. However, instead of increasing linearly like AIMD, slow start doubles the window size every round-trip time. This exponential growth allows TCP to quickly ramp up to utilize available bandwidth. The window grows from 1 to 2 to 4 to 8 to 16 segments, and so on.

The algorithm continues this exponential growth until one of three things happens: packet loss is detected, the receiver's advertised window is reached, or a predetermined threshold called the "slow start threshold" (ssthresh) is reached. When packet loss occurs during slow start, TCP sets ssthresh to half the current window size and restarts slow start from the beginning.

Here's the clever part: once the window size reaches ssthresh, TCP switches from slow start to the more conservative AIMD algorithm. This hybrid approach allows for rapid initial growth followed by careful probing for additional capacity. The mathematics show exponential growth as $W_{new} = W + MSS$ for each acknowledged segment during slow start.

Consider streaming a video on Netflix. When you first click play, TCP uses slow start to quickly ramp up the data rate until it finds the optimal speed for your connection. Without slow start, it might take minutes to reach full utilization of your broadband connection. Studies indicate that slow start can achieve 90% of available bandwidth in just a few seconds on most connections.

Fast Retransmit and Fast Recovery

Traditional TCP would wait for a timeout before retransmitting lost packets, but timeouts can take several seconds - an eternity in network terms! Fast retransmit and fast recovery algorithms provide a much quicker response to packet loss.

Fast retransmit works by monitoring duplicate acknowledgments (ACKs). When a packet is lost, subsequent packets that arrive successfully trigger duplicate ACKs from the receiver. If TCP receives three duplicate ACKs for the same sequence number, it assumes the packet is lost and immediately retransmits it without waiting for a timeout. This can recover from loss in just one round-trip time instead of waiting for a timeout.

Fast recovery goes hand-in-hand with fast retransmit. Instead of dropping back to slow start after detecting loss (which would be overly conservative), fast recovery cuts the congestion window in half and continues with AIMD. This maintains higher throughput while still responding appropriately to congestion signals.

The algorithm maintains a variable called "duplicate ACK count." When this count reaches three, TCP enters fast recovery mode, sets ssthresh to half the current window, and continues transmission at the reduced rate. This approach recognizes that receiving duplicate ACKs means the network is still delivering packets - it's not completely congested.

Real-world impact is significant: without fast retransmit and recovery, a single lost packet could cause TCP to wait for a timeout and restart from slow start, reducing throughput by up to 50%. With these algorithms, modern TCP implementations can maintain over 95% of optimal throughput even with moderate packet loss rates.

Fairness in TCP Variants

Fairness in networking means that competing flows should receive approximately equal shares of available bandwidth. This is crucial in a shared network where multiple users and applications compete for resources. TCP's AIMD algorithm naturally tends toward fairness, but achieving perfect fairness is challenging in practice.

The mathematical proof of AIMD fairness is elegant. Consider two competing flows sharing a bottleneck link. If both flows use AIMD, they will converge to equal bandwidth shares over time. This happens because the additive increase phase allows slower flows to catch up, while the multiplicative decrease affects all flows proportionally.

However, real networks present challenges to fairness. Flows with different round-trip times (RTTs) may not achieve equal bandwidth shares because TCP's window-based control is RTT-dependent. A connection between nearby cities might achieve higher throughput than an intercontinental connection simply due to RTT differences. Research shows that flows with RTTs differing by a factor of 10 might see throughput differences of 3-4 times.

Different TCP variants also affect fairness. TCP Reno, TCP NewReno, TCP CUBIC, and TCP BBR all have slightly different behaviors that can impact how fairly they share bandwidth with other flows. For example, TCP CUBIC is designed to be more aggressive on high-bandwidth, high-delay networks, which might give it an advantage over traditional TCP Reno flows.

Network operators use various techniques to improve fairness, including active queue management (AQM) algorithms like Random Early Detection (RED) and fair queuing mechanisms. These help ensure that no single flow can monopolize network resources at the expense of others.

Conclusion

Congestion control represents one of the internet's greatest engineering achievements, enabling billions of devices to share network resources efficiently and fairly. We've explored how AIMD provides stable, fair bandwidth allocation through its careful balance of gradual increase and rapid decrease. Slow start allows connections to quickly find their optimal operating point, while fast retransmit and recovery minimize the impact of packet loss. Together, these algorithms have scaled from the early internet's few thousand hosts to today's billions of connected devices. Understanding these mechanisms gives you insight into why the internet works as reliably as it does, despite its incredible complexity and scale.

Study Notes

• Network Congestion: Occurs when traffic demand exceeds network capacity, causing increased delay, packet loss, and reduced throughput

• AIMD Algorithm: Additive Increase ($W_{new} = W + \frac{1}{W}$), Multiplicative Decrease ($W_{new} = \frac{W}{2}$)

• Slow Start: Exponential growth phase that doubles congestion window each RTT until reaching ssthresh

• Slow Start Threshold (ssthresh): Point where TCP switches from exponential to linear growth

• Fast Retransmit: Retransmits lost packets after receiving 3 duplicate ACKs instead of waiting for timeout

• Fast Recovery: Maintains higher throughput by using AIMD instead of returning to slow start after packet loss

• Congestion Window: Controls the amount of unacknowledged data that can be in transit

• RTT Impact: Round-trip time affects TCP performance and fairness between competing flows

• Fairness Principle: AIMD algorithm mathematically converges to equal bandwidth sharing among competing flows

• Sawtooth Pattern: Characteristic TCP throughput pattern showing gradual increase followed by sharp decrease