DEV_NET_CORE
GET_STARTED
SQLSQL practical interview comparisons and SQL Server-specific features

UNION vs UNION ALL, EXCEPT, INTERSECT, and duplicate handling

Overview

UNION, UNION ALL, EXCEPT, and INTERSECT are SQL set operators. They combine or compare the results of two queries that return compatible columns. They are used when the question is about rows from multiple result sets, not about columns from related tables.

The most important interview distinction is duplicate handling. UNION removes duplicate rows. UNION ALL keeps duplicate rows. EXCEPT returns distinct rows from the left input that are not found in the right input. INTERSECT returns distinct rows that appear in both inputs.

This topic matters because set operators are common in reporting, data quality checks, migrations, reconciliation jobs, permissions checks, and queries that combine rows from multiple sources. The wrong operator can silently remove rows, preserve rows that should be deduplicated, or produce a result that looks correct on small test data but fails in production.

For interviews, strong candidates can explain the semantic difference, choose the right operator for the business requirement, describe how duplicates and NULL values are treated, and compare set operators with joins and EXISTS patterns.

Core Concepts

Set Operators At A Glance

SQL Server supports these common set operations:

  • UNION: combine result sets and remove duplicates.
  • UNION ALL: combine result sets and keep duplicates.
  • EXCEPT: return distinct rows from the left query that do not appear in the right query.
  • INTERSECT: return distinct rows that appear in both queries.

Example:

Code
SELECT Email
FROM dbo.Customers

UNION

SELECT Email
FROM dbo.NewsletterSubscribers;

This returns one distinct list of emails from both sources.

Result Shape Requirements

Set operators compare rows by position, not by column name.

The queries must return:

  • The same number of columns.
  • Columns in the same logical order.
  • Compatible data types in corresponding positions.

Example:

Code
SELECT CustomerId, Email
FROM dbo.Customers

UNION ALL

SELECT SubscriberId, EmailAddress
FROM dbo.NewsletterSubscribers;

This is valid if CustomerId is compatible with SubscriberId, and Email is compatible with EmailAddress. The output column names come from the left query.

Common mistake:

Code
SELECT CustomerId, Email
FROM dbo.Customers

UNION ALL

SELECT EmailAddress, SubscriberId
FROM dbo.NewsletterSubscribers;

This query has two columns on both sides, but the column order is wrong. SQL may convert data types or fail, and the logical result is incorrect.

UNION

UNION combines rows from multiple queries and removes duplicate rows.

Example:

Code
SELECT ProductId
FROM dbo.OnlineSales

UNION

SELECT ProductId
FROM dbo.StoreSales;

If product 42 appears in both tables, it appears once in the result.

UNION is useful when the business question asks for a distinct set:

  • All customers who bought online or in store.
  • All permissions assigned directly or through a role.
  • All product IDs that appear in any import source.
  • All active email addresses from multiple systems.

Trade-off: duplicate removal has a cost. SQL Server must compare rows and remove duplicates, which may require sorting, hashing, memory grants, and extra CPU.

UNION ALL

UNION ALL combines rows and keeps all rows, including duplicates.

Example:

Code
SELECT ProductId, Quantity
FROM dbo.OnlineSales

UNION ALL

SELECT ProductId, Quantity
FROM dbo.StoreSales;

If the same product appears in both sources, both rows remain. This is usually what you want for fact data, audit records, logs, transactions, and reporting inputs where each row represents an event.

UNION ALL is often faster than UNION because it does not need to remove duplicates.

Use UNION ALL when:

  • Duplicates are meaningful.
  • You know the inputs are already disjoint.
  • You plan to aggregate after combining.
  • Performance matters and duplicate removal is unnecessary.

Example:

Code
SELECT ProductId, SUM(Quantity) AS TotalQuantity
FROM
(
    SELECT ProductId, Quantity
    FROM dbo.OnlineSales

    UNION ALL

    SELECT ProductId, Quantity
    FROM dbo.StoreSales
) AS sales
GROUP BY ProductId;

Using UNION here would incorrectly remove identical sales rows before summing.

Duplicate Handling

Duplicate handling is based on the full projected row, not one column unless only one column is selected.

Example:

Code
SELECT CustomerId, SourceSystem
FROM dbo.CustomerImportA

UNION

SELECT CustomerId, SourceSystem
FROM dbo.CustomerImportB;

These two rows are not duplicates:

Code
CustomerId  SourceSystem
----------  ------------
100         CRM
100         Billing

They have the same CustomerId, but the full projected row differs.

If you want distinct customer IDs only, project only the customer ID:

Code
SELECT CustomerId
FROM dbo.CustomerImportA

UNION

SELECT CustomerId
FROM dbo.CustomerImportB;

This distinction is a common source of interview traps. SQL removes duplicate rows based on the columns you selected, not based on what you mentally consider the business key.

EXCEPT

EXCEPT returns distinct rows from the left query that are not returned by the right query.

Example:

Code
SELECT CustomerId
FROM dbo.Customers

EXCEPT

SELECT CustomerId
FROM dbo.Orders;

This returns customers with no matching customer ID in Orders, as a distinct set.

EXCEPT is useful for:

  • Finding missing rows between systems.
  • Validating migration results.
  • Identifying orphaned records.
  • Comparing expected versus actual outputs.
  • Checking which permissions or assignments are absent.

Direction matters. These are not equivalent:

Code
SELECT CustomerId FROM dbo.Customers
EXCEPT
SELECT CustomerId FROM dbo.Orders;

SELECT CustomerId FROM dbo.Orders
EXCEPT
SELECT CustomerId FROM dbo.Customers;

The first asks for customers with no orders. The second asks for order customer IDs that do not exist in customers.

INTERSECT

INTERSECT returns distinct rows that appear in both queries.

Example:

Code
SELECT CustomerId
FROM dbo.Customers
WHERE IsActive = 1

INTERSECT

SELECT CustomerId
FROM dbo.Orders
WHERE OrderDate >= DATEADD(year, -1, SYSUTCDATETIME());

This returns active customers who also have recent orders.

INTERSECT is useful for:

  • Finding overlap between two sets.
  • Checking which migrated rows exist in both systems.
  • Finding users who meet multiple independent criteria.
  • Comparing test output against expected output.

Like EXCEPT, INTERSECT returns distinct rows. If the same value appears many times on both sides, it appears once in the result.

NULL Handling

For EXCEPT and INTERSECT, SQL Server treats two NULL values as equal when determining distinct rows. This can surprise developers because normal comparisons with NULL using = do not behave that way.

Example:

Code
SELECT CAST(NULL AS INT) AS Value

INTERSECT

SELECT CAST(NULL AS INT) AS Value;

This returns one row with NULL.

For duplicate removal with set operators, think in terms of set comparison over projected rows, not normal WHERE Column = NULL predicates.

Operator Precedence

When combining set operators, parentheses make intent clear.

SQL Server evaluates set operator expressions using this precedence:

  • Parentheses first.
  • INTERSECT before EXCEPT and UNION.
  • EXCEPT and UNION from left to right.

Example:

Code
SELECT CustomerId FROM dbo.A
UNION
SELECT CustomerId FROM dbo.B
INTERSECT
SELECT CustomerId FROM dbo.C;

This may not mean what a reader casually expects. Prefer parentheses:

Code
(
    SELECT CustomerId FROM dbo.A
    UNION
    SELECT CustomerId FROM dbo.B
)
INTERSECT
SELECT CustomerId FROM dbo.C;

Parentheses are not just style; they prevent business logic bugs.

ORDER BY Rules

Set operator results are unordered unless the final query uses ORDER BY.

Correct:

Code
SELECT CustomerId, Email
FROM dbo.Customers

UNION

SELECT SubscriberId, EmailAddress
FROM dbo.NewsletterSubscribers

ORDER BY Email;

Do not rely on the physical order of rows from individual inputs. Also remember that output column names come from the left query, so ORDER BY should reference the final output names or positions.

Set Operators Vs Joins

Set operators combine or compare rows from separate result sets. Joins combine columns from related rows.

Use a join when:

  • You need columns from both tables in the same output row.
  • You are matching related entities.
  • You need one-to-many detail rows.

Use a set operator when:

  • The two queries already return the same shape.
  • You want a union, difference, or intersection of rows.
  • You are comparing two sets of keys or records.

Join example:

Code
SELECT c.CustomerId, c.Email, o.OrderId
FROM dbo.Customers AS c
JOIN dbo.Orders AS o
    ON o.CustomerId = c.CustomerId;

Set operator example:

Code
SELECT CustomerId
FROM dbo.Customers

INTERSECT

SELECT CustomerId
FROM dbo.Orders;

EXCEPT And INTERSECT Vs EXISTS

EXCEPT and INTERSECT return distinct rows. EXISTS and NOT EXISTS can be better when you want semi-join logic tied to keys and do not want implicit distinct behavior across the full projection.

Example with NOT EXISTS:

Code
SELECT c.CustomerId, c.Email
FROM dbo.Customers AS c
WHERE NOT EXISTS
(
    SELECT 1
    FROM dbo.Orders AS o
    WHERE o.CustomerId = c.CustomerId
);

This often expresses "customers with no orders" more explicitly than EXCEPT, especially when returning columns from one table and matching on a specific key.

Example with EXCEPT:

Code
SELECT CustomerId
FROM dbo.Customers

EXCEPT

SELECT CustomerId
FROM dbo.Orders;

This is concise for comparing sets of IDs.

Performance Considerations

Performance depends on data size, indexes, row width, duplicates, and plan choices.

General rules:

  • UNION ALL is usually cheaper than UNION.
  • UNION, EXCEPT, and INTERSECT perform distinct-style comparison.
  • Wider projected rows cost more to compare.
  • Indexes on compared columns can help.
  • Sorting or hashing may require memory.
  • Duplicate removal can hide data quality problems if used accidentally.

Example of avoiding unnecessary duplicate removal:

Code
SELECT OrderId, OrderDate, Total
FROM dbo.CurrentOrders

UNION ALL

SELECT OrderId, OrderDate, Total
FROM dbo.ArchivedOrders;

If CurrentOrders and ArchivedOrders are guaranteed disjoint, UNION ALL avoids unnecessary duplicate elimination.

Common Mistakes

Common mistakes include:

  • Using UNION by habit when UNION ALL is correct.
  • Accidentally removing duplicate transaction rows.
  • Forgetting that duplicates are based on the full selected row.
  • Reversing the sides of EXCEPT.
  • Expecting INTERSECT to preserve duplicate counts.
  • Mixing columns in the wrong order.
  • Relying on order without a final ORDER BY.
  • Ignoring NULL behavior in set comparisons.
  • Combining several set operators without parentheses.

Best Practices

Best practices:

  • Use UNION ALL unless you explicitly need duplicate removal.
  • Use UNION when you need a distinct combined set.
  • Use EXCEPT for concise set difference checks.
  • Use INTERSECT for concise overlap checks.
  • Project only the columns that define equality for the business question.
  • Add parentheses when mixing set operators.
  • Use a final ORDER BY for deterministic presentation.
  • Compare EXCEPT or INTERSECT with EXISTS or joins when performance or key-based semantics matter.
  • Test with duplicates and NULL values, not only clean sample data.

Interview Practice

PreviousStored procedure transaction patterns with `TRY...CATCH`, output parameters, and error handlingNext UpWHERE vs HAVING and filtering before vs after aggregation