Selectivity, covering indexes, and index maintenance trade-offs Interview Questions

Overview

Selectivity describes how well a predicate narrows a query to a smaller set of rows. A highly selective predicate returns a small percentage of the table, such as one customer by CustomerId. A low-selectivity predicate returns many rows, such as IsActive = 1 when almost every row is active.

Covering indexes are indexes that contain every column needed by a query, so SQL Server can satisfy the query from the index without returning to the base table. Covering can be powerful, but every extra key or included column increases storage, memory, I/O, and write-maintenance cost.

This topic matters because indexing is a balancing act. Good indexes make important reads fast. Too many or too-wide indexes slow inserts, updates, deletes, and merges. Interviewers expect candidates to explain not only how to create an index, but how to decide whether an index is worth its cost.

For interviews, strong candidates can discuss selectivity, SARGability, key column order, included columns, filtered indexes, key lookups, write overhead, index fragmentation, statistics, and workload-based tuning.

Core Concepts

Selectivity

Selectivity measures how much a predicate filters data.

Highly selective:

Code

WHERE OrderId = 12345

This likely returns one row.

Low selectivity:

Code

WHERE IsActive = 1

If 98 percent of rows are active, this predicate does not narrow the table much.

High selectivity usually makes an index more valuable because SQL Server can seek to a small set of rows. Low selectivity may still use an index, but a scan may be cheaper when many rows are needed.

Cardinality Vs Selectivity

Cardinality is a count. Selectivity is a fraction or degree of filtering.

Example table:

Code

Rows: 10,000,000
Rows where Status = 'Cancelled': 50,000

The predicate cardinality is 50,000 rows. The selectivity is 0.5 percent.

SQL Server uses statistics to estimate cardinality, and those estimates influence whether it chooses a seek, scan, join type, memory grant, and parallelism strategy.

SARGable Predicates

A SARGable predicate can use an index search argument efficiently.

Good:

Code

WHERE OrderDate >= @StartDate
  AND OrderDate < @EndDate

Poor:

Code

WHERE CAST(OrderDate AS DATE) = @ReportDate

The second query applies a function to the column, making it harder to seek on a normal index on OrderDate.

Index design and query design work together. A good index cannot fully compensate for predicates that hide searchable values behind functions, implicit conversions, or non-sargable expressions.

Key Column Order

Column order in a composite index matters.

Example:

Code

CREATE INDEX IX_Orders_Customer_OrderDate
ON dbo.Orders (CustomerId, OrderDate DESC);

This index helps:

Code

WHERE CustomerId = @CustomerId
ORDER BY OrderDate DESC;

It is less helpful for:

Code

WHERE OrderDate >= @StartDate
  AND OrderDate < @EndDate;

because OrderDate is not the leading key.

General guideline:

Put equality predicates early.
Then range predicates.
Then ordering/grouping columns when useful.
Keep keys narrow.
Use included columns for output columns that are not needed for seeking or sorting.

Covering Index

A covering index contains all columns needed by a query.

Example query:

Code

SELECT
    OrderDate,
    Status,
    TotalAmount
FROM dbo.Orders
WHERE CustomerId = @CustomerId
ORDER BY OrderDate DESC;

Covering index:

Code

CREATE INDEX IX_Orders_Customer_OrderDate
ON dbo.Orders (CustomerId, OrderDate DESC)
INCLUDE (Status, TotalAmount);

The key columns support filtering and ordering. The included columns satisfy the SELECT list without increasing the key size.

Key Columns Vs Included Columns

Key columns are part of the sorted index key. They can support seeks, range scans, joins, grouping, and ordering.

Included columns are stored at the leaf level of a nonclustered index. They are not part of the sorted key, but they can cover query output and avoid lookups.

Example:

Code

CREATE INDEX IX_Users_Tenant_Email
ON dbo.Users (TenantId, Email)
INCLUDE (DisplayName, Status, LastLoginAt);

TenantId and Email are key columns because they support search. DisplayName, Status, and LastLoginAt are included because the query needs to return them.

Interview rule: key columns are for access and order. Included columns are for coverage.

Key Lookups And Covering Trade-Offs

A key lookup is not automatically a problem.

Example:

Code

SELECT Email, DisplayName, LastLoginAt
FROM dbo.Users
WHERE TenantId = @TenantId
  AND Email = @Email;

If this returns one row, a key lookup to fetch DisplayName and LastLoginAt may be fine.

For this query:

Code

SELECT Email, DisplayName, LastLoginAt
FROM dbo.Users
WHERE TenantId = @TenantId
  AND Status = N'Active';

If it returns 200,000 rows, one lookup per row can be painful. A covering index may help, but it increases write cost and index size.

Filtered Indexes

A filtered index stores only a subset of rows.

Example:

Code

CREATE INDEX IX_Orders_Open_ByCustomer
ON dbo.Orders (CustomerId, OrderDate)
INCLUDE (TotalAmount)
WHERE Status = N'Open';

This can be useful when queries frequently target a small, well-defined subset of a large table.

Filtered indexes can:

Reduce storage.
Reduce maintenance cost compared with a full-table index.
Improve selectivity.
Provide filtered statistics for the subset.

They require query predicates to match the filter closely enough for the optimizer to use them.

Index Maintenance Cost

Indexes are maintained automatically as table data changes.

Costs include:

Additional storage.
Additional logging.
More work during INSERT, UPDATE, DELETE, and MERGE.
More memory and buffer pool usage.
Page splits and fragmentation.
Longer maintenance windows for rebuilds or reorganizations.
Potential concurrency overhead.

Example:

Code

UPDATE dbo.Users
SET Email = @NewEmail
WHERE UserId = @UserId;

If Email appears in three nonclustered indexes, all three may need maintenance.

Read-Heavy Vs Write-Heavy Workloads

Index choices should reflect the workload.

Read-heavy reporting table:

More indexes may be acceptable.
Covering indexes can be valuable.
Columnstore indexes may be considered for analytics.

Write-heavy OLTP table:

Fewer, narrower indexes are usually better.
Indexes should support the most critical queries.
Over-indexing can reduce throughput.

There is no universal best index. There is only a best index for a workload and set of trade-offs.

Duplicate And Overlapping Indexes

Duplicate indexes waste resources.

Example:

Code

CREATE INDEX IX_Orders_CustomerId
ON dbo.Orders (CustomerId);

CREATE INDEX IX_Orders_CustomerId_OrderDate
ON dbo.Orders (CustomerId, OrderDate);

The second index may make the first one redundant, depending on workload and included columns.

Before creating a new index:

Check existing indexes.
Check whether an existing index can be modified.
Check usage DMVs.
Check actual plans.
Avoid blindly accepting missing-index suggestions.

Index Maintenance Operations

Maintenance can include:

Rebuilding indexes.
Reorganizing indexes.
Updating statistics.
Dropping unused indexes.
Adjusting fill factor for some write patterns.
Monitoring fragmentation and page density.

Index rebuilds can update index statistics as a byproduct. Reorganizing an index does not update statistics in the same way. Do not assume every maintenance command solves every optimizer problem.

Index maintenance should be targeted. Rebuilding everything every night can waste resources and may not improve performance.

Measuring Index Value

Useful evidence:

Actual execution plans.
Query duration, CPU, reads, and writes.
Query Store data.
sys.dm_db_index_usage_stats.
sys.dm_db_index_operational_stats.
Waits and blocking patterns.
Workload replay or representative load tests.

Do not create an index just because a single plan suggests it. Consider how often the query runs, how expensive it is, and what write cost the index adds.

Common Mistakes

Common mistakes include:

Indexing low-selectivity columns alone and expecting major improvement.
Creating one index per query without considering overlap.
Adding every selected column as an included column.
Treating missing-index suggestions as complete designs.
Ignoring write overhead.
Ignoring key column order.
Adding indexes to fix non-sargable predicates instead of rewriting the query.
Forgetting filtered indexes for narrow subsets.
Not checking whether an index is actually used.
Rebuilding indexes frequently without measuring benefit.

Best Practices

Best practices:

Start with important workload queries, not isolated guesses.
Make predicates SARGable before adding indexes.
Use high-selectivity columns and common join/filter columns in keys.
Put equality columns before range columns when appropriate.
Use included columns to cover targeted queries without widening keys.
Prefer narrow indexes on write-heavy tables.
Consider filtered indexes for common subsets.
Review overlapping and unused indexes.
Validate with actual execution plans and runtime metrics.
Treat index design as ongoing workload maintenance.

Selectivity, covering indexes, and index maintenance trade-offs

Overview

Core Concepts

Selectivity

Cardinality Vs Selectivity

SARGable Predicates

Key Column Order

Covering Index

Key Columns Vs Included Columns

Key Lookups And Covering Trade-Offs

Filtered Indexes

Index Maintenance Cost

Read-Heavy Vs Write-Heavy Workloads

Duplicate And Overlapping Indexes

Index Maintenance Operations

Measuring Index Value

Common Mistakes

Best Practices

Interview Practice

Beginner Interview Practice

Intermediate Interview Practice

Advanced Interview Practice