Advanced SQL

Hey students! 👋 Ready to level up your SQL skills? In this lesson, we're diving deep into the advanced features of SQL that will transform you from someone who can write basic queries into a database wizard! 🧙‍♂️ You'll learn how to write complex queries using subqueries, master data aggregation and grouping, harness the power of window functions, and ensure data integrity with transactions. By the end of this lesson, you'll be able to handle sophisticated database operations that are essential for real-world applications and will definitely impress in your AS-level Computer Science exam!

Understanding Subqueries: Queries Within Queries

Think of subqueries as Russian dolls 🪆 - queries nested inside other queries! A subquery is essentially a query that runs inside another query, allowing you to break down complex problems into manageable pieces.

What makes subqueries so powerful? They let you use the result of one query as input for another. For example, imagine you're working with a school database and want to find all students who scored above the average grade. Instead of calculating the average manually, you can use a subquery to do it automatically!

SELECT student_name, grade 
FROM students 
WHERE grade > (SELECT AVG(grade) FROM students);

The inner query (SELECT AVG(grade) FROM students) calculates the average grade first, then the outer query uses that result to find students above average. Pretty neat, right? 😎

There are three main types of subqueries you need to know:

Scalar subqueries return a single value (like our average example above). Row subqueries return a single row with multiple columns. Table subqueries return multiple rows and columns, often used with operators like IN, EXISTS, or ANY.

Here's a real-world example: Netflix uses subqueries to recommend shows! They might query "find all users who watched the same shows as users who liked 'Stranger Things'" - that's a subquery in action!

Correlated vs Non-correlated subqueries is another important distinction. Non-correlated subqueries (like our average example) can run independently. Correlated subqueries depend on the outer query and run once for each row - they're more powerful but can be slower.

Mastering Aggregation and Grouping

Aggregation functions are your best friends when you need to summarize data! 📊 Think of them as mathematical tools that take multiple rows and squeeze them into meaningful insights.

The five essential aggregate functions are:

COUNT() - counts rows (like counting students in a class)
SUM() - adds up values (total sales revenue)
AVG() - calculates average (mean test score)
MIN() and MAX() - find smallest and largest values

But here's where it gets really interesting - GROUP BY lets you apply these functions to different categories of data simultaneously!

SELECT department, COUNT(*) as employee_count, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

This query tells you how many employees work in each department and their average salary - all in one go! Amazon uses similar queries to analyze sales by product category, region, and time period.

The HAVING clause is GROUP BY's partner in crime. While WHERE filters individual rows before grouping, HAVING filters groups after aggregation. Think of it this way: WHERE asks "which individual records should I include?" while HAVING asks "which groups should I show in the final result?"

SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

This shows only departments with more than 10 employees. Google uses HAVING clauses to analyze user behavior, showing only countries where they have significant user engagement.

Pro tip: The order matters! SQL processes clauses in this sequence: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Understanding this execution order helps you write more efficient queries and avoid common errors.

Window Functions: The Game Changer

Window functions are like having superpowers in SQL! 🦸‍♂️ They perform calculations across a set of rows related to the current row, without collapsing the result set like regular aggregate functions do.

What makes window functions special? Unlike GROUP BY which squashes your data into summary rows, window functions keep all your original rows while adding calculated columns. It's like having your cake and eating it too!

The basic syntax uses the OVER() clause:

SELECT employee_name, salary,
       AVG(salary) OVER() as company_avg_salary
FROM employees;

This shows each employee's salary alongside the company average - something impossible with regular aggregate functions alone!

PARTITION BY divides your data into groups (like GROUP BY), but keeps all rows:

SELECT employee_name, department, salary,
       RANK() OVER(PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

This ranks employees within their departments by salary. LinkedIn uses similar queries to rank job candidates within skill categories!

Common window functions include:

ROW_NUMBER() - assigns unique sequential numbers
RANK() - assigns ranks with gaps for ties
DENSE_RANK() - assigns ranks without gaps
LAG() and LEAD() - access previous or next row values

Real-world application: Spotify uses window functions to create "Top 10" playlists by ranking songs within genres, while keeping track of each song's complete information. E-commerce sites use them to show "customers who bought this also bought..." recommendations.

The ROWS and RANGE clauses let you define custom window frames - think of them as sliding windows that move through your data, calculating running totals or moving averages.

Transactions: Ensuring Data Integrity

Imagine you're transferring money between bank accounts 💰 - you need to subtract from one account and add to another. What if the system crashes after subtracting but before adding? You'd lose money! This is where transactions save the day.

A transaction is a sequence of database operations that must all succeed or all fail together. It's an all-or-nothing deal that ensures your database stays consistent even when things go wrong.

The four key properties of transactions (remember ACID):

Atomicity: All operations succeed or all fail
Consistency: Database rules are never violated
Isolation: Concurrent transactions don't interfere
Durability: Committed changes survive system failures

Transaction control commands:

BEGIN TRANSACTION;
UPDATE account SET balance = balance - 100 WHERE account_id = 1;
UPDATE account SET balance = balance + 100 WHERE account_id = 2;
COMMIT;

If anything goes wrong, you can use ROLLBACK to undo all changes since the transaction began.

Isolation levels control how transactions interact:

READ UNCOMMITTED: Fastest but allows dirty reads
READ COMMITTED: Prevents dirty reads (most common)
REPEATABLE READ: Prevents non-repeatable reads
SERIALIZABLE: Strictest isolation, prevents all anomalies

Real-world impact: Every time you buy something online, make a bank transfer, or even "like" a post on social media, transactions ensure data consistency. Without them, Amazon couldn't guarantee your order processing, and banks couldn't maintain accurate account balances!

Deadlocks occur when transactions wait for each other indefinitely. Modern database systems detect and resolve these automatically, but understanding them helps you write better code.

Conclusion

Congratulations students! 🎉 You've just mastered the advanced SQL techniques that separate database beginners from professionals. You now understand how subqueries let you build complex logic step by step, how aggregation and grouping transform raw data into meaningful insights, how window functions provide powerful analytical capabilities while preserving data detail, and how transactions ensure your database operations are reliable and consistent. These skills are essential for AS-level Computer Science and will serve you well in any data-driven career. Keep practicing with real datasets, and you'll soon be writing SQL queries that would make even experienced developers impressed!

Study Notes

• Subquery: A query nested inside another query, used to break complex problems into simpler steps

• Scalar subquery: Returns single value, often used with comparison operators

• Correlated subquery: References outer query columns, executes once per outer row

• Aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX() - summarize multiple rows into single values

• GROUP BY: Groups rows with same values, enables aggregation by category

• HAVING: Filters groups after aggregation (vs WHERE which filters before grouping)

• SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

• Window function: Performs calculations across related rows without collapsing result set

• OVER() clause: Defines the window for window functions

• PARTITION BY: Divides data into groups for window functions (like GROUP BY but keeps all rows)

• Common window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD()

• Transaction: Sequence of operations that must all succeed or all fail together

• ACID properties: Atomicity, Consistency, Isolation, Durability

• Transaction commands: BEGIN TRANSACTION, COMMIT, ROLLBACK

• Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE

• Deadlock: When transactions wait for each other indefinitely, automatically resolved by DBMS