DEV_NET_CORE
GET_STARTED
SQLAdvanced querying with window functions and CTEs

OVER, PARTITION BY, ROW_NUMBER, RANK, and running aggregates

Overview

Window functions let SQL Server calculate values across a related set of rows while still returning one output row per input row. The OVER clause defines that related set. PARTITION BY splits the result into independent groups, ORDER BY defines the logical order inside each group, and ranking or aggregate functions calculate values over that window.

This is different from GROUP BY. GROUP BY collapses rows into one result row per group. Window functions keep row-level detail and add analytical values such as row numbers, ranks, totals, running totals, moving averages, percentages, and top-N-per-group markers.

This topic matters because many interview SQL problems are easier and clearer with window functions: finding the latest row per customer, ranking products by sales, calculating running balances, detecting duplicates, paginating result sets, and comparing each row with its group total.

For interviews, strong candidates can explain OVER, PARTITION BY, ROW_NUMBER, RANK, and aggregate windows; know when ORDER BY is required; understand tie behavior; and avoid common mistakes around nondeterministic ordering and default window frames.

Core Concepts

Window Functions

A window function calculates a value over a set of rows related to the current row.

Example:

Code
SELECT
    OrderId,
    CustomerId,
    OrderDate,
    TotalAmount,
    SUM(TotalAmount) OVER (PARTITION BY CustomerId) AS CustomerLifetimeTotal
FROM dbo.Orders;

This returns every order row while also showing the customer's lifetime total on each row. A GROUP BY CustomerId query would collapse the orders into one row per customer.

Common window function categories:

  • Ranking functions such as ROW_NUMBER, RANK, and DENSE_RANK.
  • Aggregate functions used as windows, such as SUM, AVG, COUNT, MIN, and MAX.
  • Offset functions such as LAG and LEAD.
  • Distribution functions such as NTILE, PERCENT_RANK, and CUME_DIST.

This subtopic focuses on OVER, PARTITION BY, ROW_NUMBER, RANK, and running aggregates.

The OVER Clause

The OVER clause defines the window that the function uses.

General shape:

Code
FunctionName(...) OVER
(
    PARTITION BY partition_column
    ORDER BY ordering_column
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

The parts mean:

  • PARTITION BY: divide rows into independent groups.
  • ORDER BY: define logical order inside each partition.
  • ROWS or RANGE: define which rows inside the ordered partition are included in the frame.

Not every function supports every part. Ranking functions require an ORDER BY inside OVER, but they do not accept ROWS or RANGE window frames.

PARTITION BY

PARTITION BY resets the window calculation for each group.

Example:

Code
SELECT
    OrderId,
    CustomerId,
    OrderDate,
    TotalAmount,
    ROW_NUMBER() OVER
    (
        PARTITION BY CustomerId
        ORDER BY OrderDate DESC, OrderId DESC
    ) AS CustomerOrderNumber
FROM dbo.Orders;

This numbers orders separately for each customer. Without PARTITION BY, all rows would be numbered as one global result set.

Use PARTITION BY for:

  • Ranking rows inside each customer, product, tenant, department, or category.
  • Calculating totals per group while retaining row details.
  • Finding the latest row per entity.
  • Detecting duplicates within a business key.

Window ORDER BY Vs Final ORDER BY

There are two different ORDER BY concepts.

The ORDER BY inside OVER controls how the window function calculates its value:

Code
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY OrderDate DESC)

The final query ORDER BY controls how the result rows are displayed:

Code
ORDER BY CustomerId, OrderDate DESC;

Example:

Code
SELECT
    OrderId,
    CustomerId,
    OrderDate,
    ROW_NUMBER() OVER
    (
        PARTITION BY CustomerId
        ORDER BY OrderDate DESC, OrderId DESC
    ) AS rn
FROM dbo.Orders
ORDER BY CustomerId, rn;

The window order calculates rn. The final order makes the output readable. Do not assume a window ORDER BY controls final result ordering.

ROW_NUMBER

ROW_NUMBER() assigns a unique sequential number to each row in the ordered window.

Example: latest order per customer.

Code
WITH RankedOrders AS
(
    SELECT
        OrderId,
        CustomerId,
        OrderDate,
        TotalAmount,
        ROW_NUMBER() OVER
        (
            PARTITION BY CustomerId
            ORDER BY OrderDate DESC, OrderId DESC
        ) AS rn
    FROM dbo.Orders
)
SELECT
    OrderId,
    CustomerId,
    OrderDate,
    TotalAmount
FROM RankedOrders
WHERE rn = 1;

This returns one latest order per customer. The OrderId DESC tie-breaker is important because multiple orders can share the same OrderDate.

Use ROW_NUMBER when:

  • You need exactly one row per group.
  • You need deterministic tie-breaking.
  • You are deduplicating rows.
  • You are implementing stable pagination.
  • You need a temporary sequence in the query result.

Important: ROW_NUMBER does not store permanent row numbers. It calculates them when the query runs.

Deterministic Ordering

ROW_NUMBER can be nondeterministic when the ordering columns are not unique.

Problem:

Code
ROW_NUMBER() OVER
(
    PARTITION BY CustomerId
    ORDER BY OrderDate DESC
) AS rn

If two orders for the same customer have the same OrderDate, SQL Server can choose either one first. The result may appear stable in testing and still be unsafe.

Better:

Code
ROW_NUMBER() OVER
(
    PARTITION BY CustomerId
    ORDER BY OrderDate DESC, OrderId DESC
) AS rn

Interview rule: if you use ROW_NUMBER to pick a single row, include a tie-breaker that makes the order unique and meaningful.

RANK

RANK() assigns the same rank to tied rows and leaves gaps after ties.

Example:

Code
SELECT
    ProductId,
    CategoryId,
    Revenue,
    RANK() OVER
    (
        PARTITION BY CategoryId
        ORDER BY Revenue DESC
    ) AS RevenueRank
FROM dbo.ProductRevenue;

If two products tie for rank 1, the next product gets rank 3.

Use RANK when:

  • Ties should receive the same rank.
  • Gaps after ties are acceptable or desired.
  • You want a competition-style ranking.

Related concept: DENSE_RANK also gives ties the same rank but does not leave gaps. If two rows tie for rank 1, the next rank is 2.

ROW_NUMBER Vs RANK

ROW_NUMBER and RANK answer different questions.

Example data:

Code
ProductId  Revenue
---------  -------
10         500
11         500
12         300

ROW_NUMBER result:

Code
ProductId  RowNumber
---------  ---------
10         1
11         2
12         3

RANK result:

Code
ProductId  Rank
---------  ----
10         1
11         1
12         3

Choose ROW_NUMBER when you need one row to win. Choose RANK when ties are meaningful and should remain visible.

Running Aggregates

A running aggregate calculates a cumulative value as rows progress through an ordered window.

Example: running account balance.

Code
SELECT
    AccountId,
    TransactionId,
    TransactionDate,
    Amount,
    SUM(Amount) OVER
    (
        PARTITION BY AccountId
        ORDER BY TransactionDate, TransactionId
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS RunningBalance
FROM dbo.AccountTransactions
ORDER BY AccountId, TransactionDate, TransactionId;

This computes a cumulative balance per account.

Key ideas:

  • PARTITION BY AccountId restarts the balance per account.
  • ORDER BY TransactionDate, TransactionId defines the transaction order.
  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW includes all prior rows in that account plus the current row.

ROWS Vs RANGE

For running totals in SQL Server, prefer an explicit ROWS frame in most practical queries.

Example:

Code
SUM(Amount) OVER
(
    PARTITION BY AccountId
    ORDER BY TransactionDate
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningBalance

If you specify an ORDER BY for an aggregate window but omit ROWS or RANGE, SQL Server uses a default frame for functions that accept frames. That default can behave like RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which treats tied ordering values as peers.

This can surprise developers when multiple rows have the same TransactionDate. Explicit ROWS plus a deterministic tie-breaker usually produces clearer running-total behavior.

Moving Averages

Window frames can limit the calculation to a sliding range of rows.

Example: three-order moving average.

Code
SELECT
    CustomerId,
    OrderId,
    OrderDate,
    TotalAmount,
    AVG(TotalAmount) OVER
    (
        PARTITION BY CustomerId
        ORDER BY OrderDate, OrderId
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS MovingAverage3Orders
FROM dbo.Orders;

This averages the current order and the two previous orders for each customer.

Top N Per Group

Top-N-per-group is a classic interview use case.

Example: top three products by revenue in each category.

Code
WITH RankedProducts AS
(
    SELECT
        ProductId,
        CategoryId,
        Revenue,
        ROW_NUMBER() OVER
        (
            PARTITION BY CategoryId
            ORDER BY Revenue DESC, ProductId
        ) AS rn
    FROM dbo.ProductRevenue
)
SELECT
    ProductId,
    CategoryId,
    Revenue
FROM RankedProducts
WHERE rn <= 3
ORDER BY CategoryId, rn;

If ties should be included, use RANK instead:

Code
WITH RankedProducts AS
(
    SELECT
        ProductId,
        CategoryId,
        Revenue,
        RANK() OVER
        (
            PARTITION BY CategoryId
            ORDER BY Revenue DESC
        ) AS RevenueRank
    FROM dbo.ProductRevenue
)
SELECT
    ProductId,
    CategoryId,
    Revenue
FROM RankedProducts
WHERE RevenueRank <= 3;

The first version returns at most three rows per category. The second version may return more than three rows when there are ties.

Deduplication With ROW_NUMBER

ROW_NUMBER is often used to identify duplicate rows while keeping one preferred row.

Example:

Code
WITH Duplicates AS
(
    SELECT
        CustomerId,
        Email,
        UpdatedAt,
        ROW_NUMBER() OVER
        (
            PARTITION BY Email
            ORDER BY UpdatedAt DESC, CustomerId DESC
        ) AS rn
    FROM dbo.Customers
)
SELECT *
FROM Duplicates
WHERE rn > 1;

This identifies duplicate email rows except the most recently updated row for each email.

In production, deduplication should usually be followed by a unique constraint or unique index so duplicates do not return.

Window Functions And WHERE

Window functions are calculated after WHERE, so you cannot directly use a window function in the same query's WHERE clause.

Invalid pattern:

Code
SELECT
    OrderId,
    CustomerId,
    ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY OrderDate DESC) AS rn
FROM dbo.Orders
WHERE rn = 1;

Correct pattern:

Code
WITH RankedOrders AS
(
    SELECT
        OrderId,
        CustomerId,
        ROW_NUMBER() OVER
        (
            PARTITION BY CustomerId
            ORDER BY OrderDate DESC, OrderId DESC
        ) AS rn
    FROM dbo.Orders
)
SELECT *
FROM RankedOrders
WHERE rn = 1;

Use a CTE, derived table, or subquery to filter on a window result.

Performance Considerations

Window functions often require sorting by partition and order columns. Large sorts can use memory, spill to tempdb, and become expensive.

Helpful indexing pattern:

Code
CREATE INDEX IX_Orders_Customer_OrderDate
ON dbo.Orders (CustomerId, OrderDate DESC, OrderId DESC)
INCLUDE (TotalAmount);

This supports a query like:

Code
ROW_NUMBER() OVER
(
    PARTITION BY CustomerId
    ORDER BY OrderDate DESC, OrderId DESC
)

General performance guidance:

  • Filter early to reduce input rows.
  • Index partition columns followed by order columns when the query is important.
  • Include columns needed by the query to reduce lookups.
  • Use deterministic order keys.
  • Watch for sort spills in actual execution plans.
  • Avoid calculating multiple unrelated windows over huge row sets without measuring.

Common Mistakes

Common mistakes include:

  • Confusing window ORDER BY with final result ORDER BY.
  • Using ROW_NUMBER without a deterministic tie-breaker.
  • Using ROW_NUMBER when ties should be preserved with RANK.
  • Forgetting that RANK leaves gaps after ties.
  • Omitting an explicit ROWS frame for running totals.
  • Expecting window functions to reduce rows like GROUP BY.
  • Trying to use a window function directly in WHERE.
  • Filtering rows before the window calculation when the window should include them.
  • Ignoring indexes and sort cost on large tables.

Best Practices

Best practices:

  • Use ROW_NUMBER when one deterministic winner is needed.
  • Use RANK when ties should share a rank.
  • Include stable tie-breakers in window ORDER BY clauses.
  • Use explicit ROWS frames for running totals and moving calculations.
  • Use CTEs or derived tables to filter windowed results.
  • Keep the final ORDER BY separate and explicit.
  • Add indexes that match PARTITION BY then ORDER BY for high-value queries.
  • Test with duplicates, ties, and same-date rows.
  • Compare window functions with simpler GROUP BY queries when row detail is not needed.

Interview Practice

PreviousWHERE vs HAVING and filtering before vs after aggregationNext UpReplacing brittle subqueries with clearer query structure