WHERE vs HAVING and filtering before vs after aggregation Interview Questions

Overview

WHERE and HAVING both filter SQL query results, but they filter at different stages. WHERE filters rows before grouping and aggregation. HAVING filters groups after grouping and aggregate calculations.

This distinction is central to writing correct SQL. If you filter too late, the query may do unnecessary work or produce the wrong aggregate. If you filter too early, you may exclude rows that should contribute to a group. If you put an aggregate condition in WHERE, the query is invalid because aggregates are not available yet.

This topic matters in interviews because it tests whether a candidate understands logical query processing, not just syntax. Strong answers explain row-level filters, group-level filters, aggregate expressions, performance implications, and common mistakes with GROUP BY, COUNT, joins, and nulls.

The practical rule is simple: use WHERE for conditions about individual rows, and use HAVING for conditions about grouped results.

Core Concepts

WHERE Filters Rows

WHERE specifies which source rows qualify before aggregation.

Code

SELECT
    CustomerId,
    COUNT(*) AS CompletedOrderCount
FROM dbo.Orders
WHERE Status = N'Completed'
GROUP BY CustomerId;

This query first keeps only completed orders. Then it groups those completed rows by customer.

Use WHERE for:

Status filters.
Date ranges.
Tenant filters.
Active/inactive flags.
Exact matches.
Search predicates.
Join-related row restrictions.

Example:

Code

WHERE OrderDate >= '20260101'
  AND OrderDate <  '20260201'
  AND Status = N'Completed'

These are row-level conditions.

HAVING Filters Groups

HAVING specifies which groups qualify after aggregation.

Code

SELECT
    CustomerId,
    COUNT(*) AS CompletedOrderCount
FROM dbo.Orders
WHERE Status = N'Completed'
GROUP BY CustomerId
HAVING COUNT(*) >= 5;

This query returns customers with at least five completed orders.

Use HAVING for:

COUNT(*) >= 5
SUM(TotalAmount) > 1000
AVG(DurationMs) < 200
MAX(OrderDate) >= @Cutoff
Conditions involving aggregate expressions.

The group must be formed before these values exist.

Logical Query Processing Order

SQL is written in one order but conceptually processed in another order.

Common mental model:

FROM and joins identify source rows.
WHERE filters rows.
GROUP BY forms groups.
Aggregates are calculated for each group.
HAVING filters groups.
SELECT returns expressions.
ORDER BY sorts the final result.

This explains why aggregate expressions belong in HAVING, not WHERE.

Bad:

Code

SELECT CustomerId, COUNT(*) AS OrderCount
FROM dbo.Orders
WHERE COUNT(*) >= 5
GROUP BY CustomerId;

Correct:

Code

SELECT CustomerId, COUNT(*) AS OrderCount
FROM dbo.Orders
GROUP BY CustomerId
HAVING COUNT(*) >= 5;

Filtering Before Aggregation

Filtering before aggregation changes which rows contribute to each group.

Example: completed revenue per customer.

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS CompletedRevenue
FROM dbo.Orders
WHERE Status = N'Completed'
GROUP BY CustomerId;

Only completed orders contribute to the revenue.

If you accidentally move the status filter into HAVING incorrectly, you either get an invalid query or a different business question.

Correct row filtering usually improves both correctness and performance because fewer rows need to be grouped.

Filtering After Aggregation

Filtering after aggregation keeps or removes whole groups.

Example: customers whose completed revenue exceeds 1000.

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS CompletedRevenue
FROM dbo.Orders
WHERE Status = N'Completed'
GROUP BY CustomerId
HAVING SUM(TotalAmount) > 1000;

The query still uses WHERE to define which rows count as completed revenue. Then it uses HAVING to keep only groups whose aggregate result exceeds the threshold.

This pattern is common:

Code

WHERE row_condition
GROUP BY grouping_columns
HAVING aggregate_condition

Non-Aggregate Conditions in HAVING

Sometimes SQL Server allows a condition in HAVING if the column is part of the grouping key.

Code

SELECT
    Status,
    COUNT(*) AS OrderCount
FROM dbo.Orders
GROUP BY Status
HAVING Status = N'Completed';

This works, but it is usually worse than:

Code

SELECT
    Status,
    COUNT(*) AS OrderCount
FROM dbo.Orders
WHERE Status = N'Completed'
GROUP BY Status;

The second query filters rows before grouping. It is clearer and usually gives the optimizer a better chance to reduce work early.

Use HAVING for group-level conditions, not as a substitute for WHERE.

Aggregate Conditions in WHERE Are Invalid

Aggregate functions are not available in WHERE because WHERE happens before grouping.

Invalid:

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS Revenue
FROM dbo.Orders
WHERE SUM(TotalAmount) > 1000
GROUP BY CustomerId;

Valid:

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS Revenue
FROM dbo.Orders
GROUP BY CustomerId
HAVING SUM(TotalAmount) > 1000;

If you need to filter by an aggregate in an outer query, use a derived table or common table expression:

Code

WITH CustomerRevenue AS
(
    SELECT
        CustomerId,
        SUM(TotalAmount) AS Revenue
    FROM dbo.Orders
    GROUP BY CustomerId
)
SELECT *
FROM CustomerRevenue
WHERE Revenue > 1000;

Here Revenue is a real column of the CTE result, so the outer WHERE can filter it.

WHERE, HAVING, and Aliases

In SQL Server, a SELECT alias is not available to WHERE or HAVING at the same query level.

Invalid:

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS Revenue
FROM dbo.Orders
GROUP BY CustomerId
HAVING Revenue > 1000;

Use the aggregate expression:

Code

HAVING SUM(TotalAmount) > 1000

Or use a CTE:

Code

WITH CustomerRevenue AS
(
    SELECT
        CustomerId,
        SUM(TotalAmount) AS Revenue
    FROM dbo.Orders
    GROUP BY CustomerId
)
SELECT *
FROM CustomerRevenue
WHERE Revenue > 1000;

CTEs and derived tables are useful when you want to name intermediate aggregate results and filter them more readably.

HAVING Without GROUP BY

HAVING is usually used with GROUP BY, but SQL can apply it to an implicit single group when no GROUP BY exists.

Code

SELECT COUNT(*) AS OrderCount
FROM dbo.Orders
HAVING COUNT(*) > 0;

This returns one row only if the whole table has at least one row.

This is uncommon in application queries but useful to understand. HAVING filters aggregate groups; without GROUP BY, the entire result is one group.

Conditional Aggregation

Sometimes the right answer is not moving conditions between WHERE and HAVING, but using conditional aggregation.

Example: find customers with at least five completed orders and at least one cancelled order.

Code

SELECT
    CustomerId,
    SUM(CASE WHEN Status = N'Completed' THEN 1 ELSE 0 END) AS CompletedOrders,
    SUM(CASE WHEN Status = N'Cancelled' THEN 1 ELSE 0 END) AS CancelledOrders
FROM dbo.Orders
GROUP BY CustomerId
HAVING SUM(CASE WHEN Status = N'Completed' THEN 1 ELSE 0 END) >= 5
   AND SUM(CASE WHEN Status = N'Cancelled' THEN 1 ELSE 0 END) >= 1;

If you put WHERE Status = N'Completed', cancelled orders would be removed before grouping and the cancelled-order count would always be zero.

This is a subtle but common reporting issue.

WHERE vs HAVING With Joins

With joins, decide whether a filter should remove source rows before grouping or remove groups after aggregation.

Example: customers with at least three completed orders:

Code

SELECT
    c.CustomerId,
    COUNT(*) AS CompletedOrderCount
FROM dbo.Customers AS c
JOIN dbo.Orders AS o
    ON o.CustomerId = c.CustomerId
WHERE o.Status = N'Completed'
GROUP BY c.CustomerId
HAVING COUNT(*) >= 3;

The WHERE clause says only completed orders count. The HAVING clause says only customers with three or more such orders should be returned.

For LEFT JOIN, be careful. A right-side filter in WHERE can remove unmatched left rows:

Code

SELECT
    c.CustomerId,
    COUNT(o.OrderId) AS CompletedOrderCount
FROM dbo.Customers AS c
LEFT JOIN dbo.Orders AS o
    ON o.CustomerId = c.CustomerId
   AND o.Status = N'Completed'
GROUP BY c.CustomerId
HAVING COUNT(o.OrderId) = 0;

This finds customers with zero completed orders while preserving customers with no orders.

WHERE, HAVING, and NULL

WHERE keeps rows where the predicate is TRUE. Rows where the predicate is FALSE or UNKNOWN are removed. This matters for nullable columns.

Code

WHERE Status <> N'Cancelled'

This does not keep rows where Status IS NULL because the comparison is UNKNOWN.

If missing status should count as not cancelled:

Code

WHERE Status <> N'Cancelled'
   OR Status IS NULL

Aggregates also interact with nulls:

Code

HAVING COUNT(ShippedAtUtc) = 0

This keeps groups where no row has a non-null ShippedAtUtc. It is different from COUNT(*) = 0, which cannot happen for a group produced from existing rows unless using an outer join pattern.

Performance Implications

WHERE usually reduces the input rows before grouping. This can reduce CPU, memory, spills, and sorting or hashing work.

Better:

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS Revenue
FROM dbo.Orders
WHERE Status = N'Completed'
GROUP BY CustomerId;

Worse style:

Code

SELECT
    CustomerId,
    SUM(TotalAmount) AS Revenue
FROM dbo.Orders
GROUP BY CustomerId, Status
HAVING Status = N'Completed';

The second query groups by an unnecessary column and filters later. The optimizer may simplify some cases, but code should communicate intent clearly and avoid unnecessary work.

Indexes that support WHERE filters and grouping keys often help:

Code

CREATE INDEX IX_Orders_Status_CustomerId
ON dbo.Orders(Status, CustomerId)
INCLUDE (TotalAmount);

Readability Pattern

A clean aggregate query often reads like this:

Code

SELECT
    GroupingColumn,
    AggregateExpression AS MetricName
FROM SourceTables
WHERE RowLevelCondition
GROUP BY GroupingColumn
HAVING AggregateCondition
ORDER BY MetricName DESC;

Example:

Code

SELECT
    CustomerId,
    COUNT(*) AS CompletedOrderCount,
    SUM(TotalAmount) AS CompletedRevenue
FROM dbo.Orders
WHERE Status = N'Completed'
  AND OrderDate >= '20260101'
  AND OrderDate <  '20270101'
GROUP BY CustomerId
HAVING COUNT(*) >= 5
ORDER BY CompletedRevenue DESC;

Each clause has one job.

Common Mistakes

Common mistakes include:

Putting aggregate functions in WHERE.
Using HAVING for simple row filters.
Filtering completed rows after grouping when only completed rows should contribute.
Filtering too early when multiple statuses are needed for conditional aggregates.
Selecting columns that are not grouped or aggregated.
Assuming a SELECT alias is available in HAVING.
Forgetting that WHERE runs before GROUP BY.
Filtering a LEFT JOIN in WHERE and losing unmatched rows.
Using COUNT(*) when COUNT(column) was intended, or the reverse.
Ignoring null behavior in filters and aggregates.

Best Practices

Best practices include:

Use WHERE for row-level predicates.
Use HAVING for aggregate predicates.
Filter early when it does not change the business meaning.
Keep conditional aggregation when multiple row categories must contribute.
Use CTEs or derived tables for readability when filtering aggregate aliases.
Check join grain before aggregating.
Be explicit with null-inclusive filters.
Add indexes for common WHERE filters and grouping keys.
Test with empty groups, null values, and duplicate joins.
Use clear aggregate aliases in the final SELECT.

WHERE vs HAVING and filtering before vs after aggregation

Overview

Core Concepts

WHERE Filters Rows

HAVING Filters Groups

Logical Query Processing Order

Filtering Before Aggregation

Filtering After Aggregation

Non-Aggregate Conditions in HAVING

Aggregate Conditions in WHERE Are Invalid

WHERE, HAVING, and Aliases

HAVING Without GROUP BY

Conditional Aggregation

WHERE vs HAVING With Joins

WHERE, HAVING, and NULL

Performance Implications

Readability Pattern

Common Mistakes

Best Practices

Interview Practice

Beginner Interview Practice

Intermediate Interview Practice

Advanced Interview Practice