DEV_NET_CORE
GET_STARTED
SQLAdvanced querying with window functions and CTEs

Replacing brittle subqueries with clearer query structure

Overview

Subqueries are queries nested inside another SQL statement. They are valid and often useful, but deeply nested, poorly aliased, or repeatedly correlated subqueries can make SQL hard to read, hard to test, and easy to break during maintenance.

Replacing brittle subqueries means rewriting a query into a clearer structure without changing the result. Common replacements include joins, EXISTS, NOT EXISTS, CTEs, derived tables, APPLY, window functions, and temporary tables. The goal is not to eliminate every subquery. The goal is to make intent, row cardinality, filtering, and performance behavior easier to reason about.

This topic matters because interviewers often present SQL that works on small data but is confusing, slow, or subtly wrong. Strong candidates can identify when a scalar subquery might return more than one row, when NOT IN breaks with NULL, when a correlated subquery should become a join or window function, and when a CTE improves readability without pretending to materialize results.

For interviews, good answers focus on correctness first, then readability, then performance. A clear query structure makes code review safer and makes execution-plan problems easier to diagnose.

Core Concepts

What Makes A Subquery Brittle

A subquery becomes brittle when small changes can easily break correctness or readability.

Common signs:

  • Multiple nested levels.
  • Outer column references that are not obvious.
  • Missing or ambiguous aliases.
  • Scalar subqueries that assume one row but can return many.
  • NOT IN logic that does not account for NULL.
  • Repeated correlated subqueries in the SELECT list.
  • Business rules hidden in several different nested predicates.
  • The same subquery copied in multiple places.
  • Poor performance caused by repeated evaluation or bad estimates.

Example of brittle structure:

Code
SELECT
    c.CustomerId,
    c.Email,
    (SELECT MAX(o.OrderDate)
     FROM dbo.Orders AS o
     WHERE o.CustomerId = c.CustomerId) AS LastOrderDate,
    (SELECT COUNT(*)
     FROM dbo.Orders AS o
     WHERE o.CustomerId = c.CustomerId
       AND o.Status = N'Completed') AS CompletedOrderCount
FROM dbo.Customers AS c
WHERE c.CustomerId IN
(
    SELECT o.CustomerId
    FROM dbo.Orders AS o
    WHERE o.OrderDate >= DATEADD(day, -90, SYSUTCDATETIME())
);

This may be valid, but the query repeats access to Orders and scatters customer-order rules across several places.

Subqueries Are Not Always Bad

Subqueries are appropriate when they express intent cleanly.

Good examples:

Code
SELECT ProductId, Name
FROM dbo.Products
WHERE CategoryId IN
(
    SELECT CategoryId
    FROM dbo.Categories
    WHERE IsActive = 1
);
Code
SELECT CustomerId, Email
FROM dbo.Customers AS c
WHERE EXISTS
(
    SELECT 1
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
);

The problem is not nesting itself. The problem is unclear nesting, fragile assumptions, and subqueries used where another structure expresses the business rule better.

Use Joins When You Need Columns From Both Sides

If the final result needs columns from both tables, a join is usually clearer than a subquery.

Subquery version:

Code
SELECT p.ProductId, p.Name
FROM dbo.Products AS p
WHERE p.CategoryId IN
(
    SELECT c.CategoryId
    FROM dbo.Categories AS c
    WHERE c.Name = N'Bikes'
);

Join version:

Code
SELECT
    p.ProductId,
    p.Name,
    c.Name AS CategoryName
FROM dbo.Products AS p
JOIN dbo.Categories AS c
    ON c.CategoryId = p.CategoryId
WHERE c.Name = N'Bikes';

The join version makes the relationship explicit and allows the query to return CategoryName. It also makes join cardinality easier to review.

Important caution: replacing a subquery with a join can duplicate rows if the joined table has multiple matches. Use constraints, EXISTS, grouping, or distinct selection based on the real requirement.

Use EXISTS For Existence Checks

When the question is "does a matching row exist?", EXISTS is often clearer than IN or a join.

Example:

Code
SELECT c.CustomerId, c.Email
FROM dbo.Customers AS c
WHERE EXISTS
(
    SELECT 1
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
      AND o.Status = N'Completed'
);

This expresses a semi-join: return customers that have at least one completed order.

Using a join can accidentally duplicate customers:

Code
SELECT c.CustomerId, c.Email
FROM dbo.Customers AS c
JOIN dbo.Orders AS o
    ON o.CustomerId = c.CustomerId
WHERE o.Status = N'Completed';

If a customer has five completed orders, this returns five customer rows unless you add DISTINCT or grouping. EXISTS avoids that duplication because it only asks whether a match exists.

Use NOT EXISTS Instead Of NOT IN With Nullable Data

NOT IN can behave unexpectedly when the subquery returns NULL.

Risky:

Code
SELECT c.CustomerId, c.Email
FROM dbo.Customers AS c
WHERE c.CustomerId NOT IN
(
    SELECT o.CustomerId
    FROM dbo.Orders AS o
);

If Orders.CustomerId contains NULL, the NOT IN logic can return no rows because comparisons with NULL produce unknown results.

Safer:

Code
SELECT c.CustomerId, c.Email
FROM dbo.Customers AS c
WHERE NOT EXISTS
(
    SELECT 1
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
);

This expresses the anti-semi join directly and avoids the NULL trap.

Use CTEs To Name Query Steps

CTEs are useful when a query has logical phases.

Brittle nested version:

Code
SELECT *
FROM
(
    SELECT
        o.OrderId,
        o.CustomerId,
        o.OrderDate,
        ROW_NUMBER() OVER
        (
            PARTITION BY o.CustomerId
            ORDER BY o.OrderDate DESC, o.OrderId DESC
        ) AS rn
    FROM dbo.Orders AS o
    WHERE o.Status = N'Completed'
) AS x
WHERE x.rn = 1;

Clearer CTE version:

Code
WITH RankedCompletedOrders AS
(
    SELECT
        o.OrderId,
        o.CustomerId,
        o.OrderDate,
        ROW_NUMBER() OVER
        (
            PARTITION BY o.CustomerId
            ORDER BY o.OrderDate DESC, o.OrderId DESC
        ) AS rn
    FROM dbo.Orders AS o
    WHERE o.Status = N'Completed'
)
SELECT
    OrderId,
    CustomerId,
    OrderDate
FROM RankedCompletedOrders
WHERE rn = 1;

The CTE names the intermediate result. It does not automatically materialize it. It mainly improves readability and creates a clean place to filter on window function output.

Use Derived Tables For Local Inline Structure

A derived table is a subquery in the FROM clause. It is useful when the transformation is short and local.

Example:

Code
SELECT
    totals.CustomerId,
    totals.CompletedOrderCount
FROM
(
    SELECT
        CustomerId,
        COUNT(*) AS CompletedOrderCount
    FROM dbo.Orders
    WHERE Status = N'Completed'
    GROUP BY CustomerId
) AS totals
WHERE totals.CompletedOrderCount >= 3;

This can be fine. If the query grows into several phases, a CTE may be easier to read.

Use APPLY For Per-Row Top-N Logic

CROSS APPLY and OUTER APPLY can make per-row correlated logic clearer, especially for "top one child row per parent" queries.

Example:

Code
SELECT
    c.CustomerId,
    c.Email,
    lastOrder.OrderId,
    lastOrder.OrderDate
FROM dbo.Customers AS c
OUTER APPLY
(
    SELECT TOP (1)
        o.OrderId,
        o.OrderDate
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
    ORDER BY o.OrderDate DESC, o.OrderId DESC
) AS lastOrder;

OUTER APPLY keeps customers with no orders and returns NULL for the order columns. CROSS APPLY would return only customers with a matching row from the apply input.

This can be clearer than a scalar subquery for each selected column. It also keeps all values from the chosen child row together.

Use Window Functions To Replace Repeated Aggregate Subqueries

Window functions can replace repeated subqueries when you need row detail plus group-level values.

Correlated aggregate subquery:

Code
SELECT
    o.OrderId,
    o.CustomerId,
    o.TotalAmount,
    (SELECT SUM(o2.TotalAmount)
     FROM dbo.Orders AS o2
     WHERE o2.CustomerId = o.CustomerId) AS CustomerTotal
FROM dbo.Orders AS o;

Window function:

Code
SELECT
    o.OrderId,
    o.CustomerId,
    o.TotalAmount,
    SUM(o.TotalAmount) OVER
    (
        PARTITION BY o.CustomerId
    ) AS CustomerTotal
FROM dbo.Orders AS o;

The window version states that the customer total is calculated over each customer's partition while preserving order rows.

Use Aggregation Once, Then Join

If you need several aggregate values from the same child table, compute them once and join.

Clear structure:

Code
WITH OrderSummary AS
(
    SELECT
        CustomerId,
        COUNT(*) AS OrderCount,
        SUM(CASE WHEN Status = N'Completed' THEN 1 ELSE 0 END) AS CompletedOrderCount,
        MAX(OrderDate) AS LastOrderDate
    FROM dbo.Orders
    GROUP BY CustomerId
)
SELECT
    c.CustomerId,
    c.Email,
    COALESCE(os.OrderCount, 0) AS OrderCount,
    COALESCE(os.CompletedOrderCount, 0) AS CompletedOrderCount,
    os.LastOrderDate
FROM dbo.Customers AS c
LEFT JOIN OrderSummary AS os
    ON os.CustomerId = c.CustomerId;

This is easier to maintain than three separate scalar subqueries against Orders.

Scalar Subquery Risk

A scalar subquery must return at most one value. If it returns more than one row, SQL Server raises an error.

Risky:

Code
SELECT
    c.CustomerId,
    (SELECT o.OrderDate
     FROM dbo.Orders AS o
     WHERE o.CustomerId = c.CustomerId) AS SomeOrderDate
FROM dbo.Customers AS c;

If a customer has more than one order, this fails.

Better options:

  • Use MAX(OrderDate) if the rule is latest date.
  • Use TOP (1) ... ORDER BY if the rule is one chosen row.
  • Use APPLY if several columns from the chosen row are needed.
  • Use a join if all matching child rows should be returned.

Correlated Subqueries

A correlated subquery references columns from the outer query.

Example:

Code
SELECT c.CustomerId, c.Email
FROM dbo.Customers AS c
WHERE EXISTS
(
    SELECT 1
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
);

This is a good use of correlation because it expresses existence clearly.

Correlation becomes brittle when:

  • It is deeply nested.
  • The same child table is queried repeatedly.
  • Outer references are hidden by weak aliases.
  • The query returns values rather than testing existence.
  • The query is hard to reason about under duplicates.

Aliasing And Column Qualification

Clear aliases prevent dangerous ambiguity.

Risky:

Code
SELECT CustomerId
FROM dbo.Customers
WHERE CustomerId IN
(
    SELECT CustomerId
    FROM dbo.Orders
);

Better:

Code
SELECT c.CustomerId
FROM dbo.Customers AS c
WHERE EXISTS
(
    SELECT 1
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
);

The second version makes each column's source obvious. This matters because SQL Server can resolve unqualified columns in subqueries in ways that surprise developers when a column name exists in an outer scope.

When To Use Temporary Tables

Use a temp table when a query needs a real phase break.

Example:

Code
CREATE TABLE #EligibleCustomers
(
    CustomerId BIGINT NOT NULL PRIMARY KEY
);

INSERT INTO #EligibleCustomers (CustomerId)
SELECT c.CustomerId
FROM dbo.Customers AS c
WHERE c.IsActive = 1;

SELECT
    ec.CustomerId,
    COUNT(*) AS OrderCount
FROM #EligibleCustomers AS ec
JOIN dbo.Orders AS o
    ON o.CustomerId = ec.CustomerId
GROUP BY ec.CustomerId;

Temp tables help when:

  • The intermediate result is reused by multiple statements.
  • You need indexes or statistics on staged rows.
  • You need to debug or inspect a phase.
  • Breaking the query improves estimates or plan quality.
  • The CTE or derived table is too large and repeated.

Refactoring Process

A safe refactoring process:

  • Write down the business question.
  • Identify the expected output grain, such as one row per customer.
  • Identify whether each child relationship is one-to-one or one-to-many.
  • Replace existence checks with EXISTS or NOT EXISTS.
  • Replace repeated scalar aggregates with a grouped CTE or window function.
  • Replace "choose one child row" logic with ROW_NUMBER or APPLY.
  • Add deterministic ORDER BY for top-one logic.
  • Test with duplicates, missing children, and NULL values.
  • Compare row counts before and after.
  • Review actual execution plans for important queries.

Common Mistakes

Common mistakes include:

  • Replacing EXISTS with a join and accidentally duplicating rows.
  • Using DISTINCT to hide a bad join.
  • Using NOT IN when the subquery can return NULL.
  • Writing scalar subqueries that can return multiple rows.
  • Assuming a CTE is materialized like a temp table.
  • Forgetting deterministic ordering in TOP (1) or ROW_NUMBER.
  • Repeating the same correlated subquery several times.
  • Not checking output grain after refactoring.
  • Optimizing for style while changing business semantics.

Best Practices

Best practices:

  • Keep subqueries when they express intent clearly.
  • Use EXISTS and NOT EXISTS for existence checks.
  • Use joins when the result needs columns from both sides.
  • Use CTEs to name logical query phases.
  • Use window functions for row detail plus group-level calculations.
  • Use APPLY for clear per-row top-N child selection.
  • Use temp tables when you need reuse, indexes, statistics, or debugging visibility.
  • Always test refactors with duplicate rows and NULL values.
  • Compare row counts and key sets before and after.
  • Prefer clarity first, then tune based on actual execution plans.

Interview Practice

PreviousOVER, PARTITION BY, ROW_NUMBER, RANK, and running aggregatesNext UpStandard and recursive CTEs