DEV_NET_CORE
GET_STARTED
SQLQuery tuning, Query Store, and plan stability

Baselines, workload comparisons, and targeted tuning

Overview

Baselines, workload comparisons, and targeted tuning are the discipline of improving SQL Server performance with evidence instead of guesswork. A baseline captures normal performance for a representative workload. A workload comparison shows what changed between two periods, deployments, compatibility levels, or tuning attempts. Targeted tuning focuses effort on the few queries, indexes, schema choices, or configuration settings that actually move user-facing performance, throughput, stability, or cost.

This topic matters because database performance problems are rarely solved by optimizing random expensive-looking queries. A query that is slow once may not matter to the business. A query that is only moderately slow but runs 100,000 times per hour may dominate CPU, I/O, and user latency. Baselines and workload comparisons help distinguish real regressions from normal variance, workload growth, blocking, bad plans, stale statistics, missing indexes, and application behavior changes.

For interviews, this topic is important because it tests how you troubleshoot production systems. Strong candidates can describe what to measure, how to compare before and after results, how Query Store helps, how to avoid misleading averages, and how to choose focused tuning actions with a measurable success condition.

Core Concepts

What A Baseline Is

A baseline is a recorded picture of normal workload behavior. It gives you a reference point for deciding whether performance has changed.

A useful SQL Server baseline usually includes:

  • Query duration.
  • CPU time.
  • Logical reads.
  • Physical reads.
  • Writes.
  • Execution count.
  • Row count.
  • Wait categories.
  • Plan IDs and plan shapes.
  • Blocking and deadlock frequency.
  • Database and server resource usage.
  • Business timing, such as peak hours, batch windows, and reporting periods.

The best baseline is not one number. It is a set of metrics over representative time windows.

Why Baselines Matter

Without a baseline, every performance discussion becomes a loose story:

  • "It feels slower."
  • "CPU is high."
  • "This query looks expensive."
  • "The new release probably caused it."

With a baseline, you can ask better questions:

  • Did average duration change for the same query?
  • Did execution count increase?
  • Did the plan change?
  • Did logical reads per execution increase?
  • Did waits move from CPU to locks or I/O?
  • Did a deployment, index change, statistics update, or data load line up with the regression?

Baselines turn tuning into comparison.

Workload Comparisons

A workload comparison compares two sets of workload data. Common comparisons include:

  • Before vs after a deployment.
  • Before vs after a database compatibility level change.
  • Before vs after an index change.
  • Before vs after a query rewrite.
  • Peak hour today vs peak hour last week.
  • Current month-end batch vs last month-end batch.
  • Primary workload vs readable secondary workload.
  • Test run A vs test run B.

The key is to compare similar conditions. Comparing a quiet overnight window to a busy morning peak can mislead you.

Query Store As A Baseline Tool

Query Store is one of the most useful SQL Server tools for baselining because it records query text, plan history, runtime statistics, and wait statistics over time. It lets you compare query behavior across runtime intervals and plans.

Common Query Store questions include:

  • Which queries consumed the most CPU in the last hour?
  • Which queries had the highest total duration?
  • Which queries regressed compared with a previous period?
  • Which queries changed plans?
  • Which queries have high variation?
  • Which queries have high lock, memory, CPU, or I/O waits?

Example query for recent top CPU consumers:

Code
SELECT TOP (20)
    q.query_id,
    p.plan_id,
    qt.query_sql_text,
    SUM(rs.avg_cpu_time * rs.count_executions) AS total_cpu,
    SUM(rs.count_executions) AS executions,
    SUM(rs.avg_duration * rs.count_executions) AS total_duration
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q
    ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan AS p
    ON p.query_id = q.query_id
JOIN sys.query_store_runtime_stats AS rs
    ON rs.plan_id = p.plan_id
JOIN sys.query_store_runtime_stats_interval AS rsi
    ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE rsi.start_time >= DATEADD(hour, -1, SYSUTCDATETIME())
GROUP BY q.query_id, p.plan_id, qt.query_sql_text
ORDER BY total_cpu DESC;

The important idea is to rank by total workload impact, not just one execution.

Representative Time Windows

Choosing the wrong comparison window is a common tuning mistake.

Good comparison windows are:

  • Similar in business activity.
  • Long enough to smooth noise.
  • Short enough to isolate the change.
  • Aligned with the incident or deployment.
  • Aware of batch jobs and background processes.

Examples:

  • Compare Tuesday 9:00-10:00 AM to last Tuesday 9:00-10:00 AM.
  • Compare the hour before deployment to the hour after deployment only if traffic is stable.
  • Compare the same month-end batch step across months.
  • Compare a test replay with the same data volume and parameter mix.

Avoid comparing unrelated periods unless you explicitly account for differences.

Per-Execution Metrics Vs Total Workload

You need both per-execution and total metrics.

Total metrics answer: "What is hurting the system overall?"

Per-execution metrics answer: "Did this query become less efficient?"

Example:

  • Query A runs once and takes 30 seconds.
  • Query B runs 200,000 times and takes 20 milliseconds each.

Query A has the worst single execution. Query B may consume more total CPU and user time.

Weighted averages help avoid misleading comparisons:

Code
SELECT
    p.query_id,
    SUM(rs.avg_duration * rs.count_executions)
        / NULLIF(SUM(rs.count_executions), 0) AS weighted_avg_duration,
    SUM(rs.count_executions) AS execution_count,
    SUM(rs.avg_logical_io_reads * rs.count_executions) AS total_logical_reads
FROM sys.query_store_plan AS p
JOIN sys.query_store_runtime_stats AS rs
    ON rs.plan_id = p.plan_id
GROUP BY p.query_id
ORDER BY total_logical_reads DESC;

This treats an interval with many executions as more important than an interval with one execution.

Regression Detection

A regression is a meaningful performance drop compared with a baseline. It might show up as:

  • Higher duration.
  • Higher CPU.
  • Higher logical reads.
  • Higher physical reads.
  • More lock waits.
  • More memory grant waits.
  • More tempdb spills.
  • Lower throughput.
  • More timeouts.
  • A different execution plan.

Not every increase is a regression. More total CPU might simply mean more users or more executions. A true efficiency regression usually appears in per-execution cost, plan shape, row estimates, waits, or error rates.

Targeted Tuning

Targeted tuning means selecting changes based on measurable workload impact.

A typical targeted tuning workflow:

  • Identify top workload contributors.
  • Confirm the business symptom.
  • Inspect Query Store, actual plans, waits, and data distribution.
  • Form a hypothesis.
  • Make the smallest useful change.
  • Test with representative parameters and data volume.
  • Compare before and after metrics.
  • Monitor production after release.

The goal is not to make every query perfect. The goal is to remove the bottleneck that matters.

Choosing What To Tune First

Prioritize by impact and risk.

High-value candidates usually have:

  • High total CPU.
  • High total logical reads.
  • High execution count.
  • High user-facing latency.
  • Frequent timeouts.
  • Important business workflow impact.
  • Clear regression after a change.
  • A plan problem with a plausible fix.

Lower priority candidates include:

  • Rare admin queries.
  • Slow queries outside the critical path.
  • Queries with high duration but tiny business impact.
  • Queries already dominated by external waits.
  • Queries where the proposed fix has high risk and low benefit.

Before And After Validation

A tuning change is not done when the query is faster once. You need before-and-after validation.

Compare:

  • Duration per execution.
  • Total duration.
  • CPU per execution.
  • Total CPU.
  • Logical reads.
  • Physical reads.
  • Writes.
  • Execution count.
  • Row counts.
  • Wait categories.
  • Plan shape.
  • Memory grants.
  • tempdb spills.
  • Blocking impact.

A simple before/after table can prevent vague claims:

MetricBeforeAfterBetter?
Avg duration850 ms120 msYes
Avg CPU700 ms95 msYes
Avg logical reads120,0008,500Yes
Executions per hour5,0005,200Similar
Plan changedScan + hash joinSeek + nested loopsExpected

Testing With Representative Parameters

SQL Server performance can vary by parameter value. A query that improves for one parameter can worsen for another.

Representative testing includes:

  • Common parameter values.
  • Rare parameter values.
  • Large tenant and small tenant cases.
  • Empty result cases.
  • Peak data ranges.
  • Recent and historical date ranges.
  • High-cardinality and low-cardinality filters.

This is especially important before forcing a plan, adding a hint, or changing an index.

Targeted Index Tuning

Index tuning should come from workload evidence, not wishful thinking.

Good index candidates are tied to:

  • Frequent predicates.
  • Join columns.
  • Sort and grouping patterns.
  • Covering needs for hot queries.
  • Selectivity and data distribution.
  • Existing index overlap.
  • Write cost and maintenance cost.

Example:

Code
CREATE INDEX IX_Orders_CustomerId_OrderDate
ON dbo.Orders (CustomerId, OrderDate DESC)
INCLUDE (OrderStatus, TotalAmount);

This may be useful if the workload often retrieves recent orders for one customer and needs OrderStatus and TotalAmount without lookups.

Do not create indexes only because one execution plan suggests a missing index. Validate the workload, existing indexes, write cost, and whether modifying an existing index is better.

Query Rewrites

Sometimes the best tuning target is SQL text, not indexing.

Common query issues include:

  • Non-SARGable predicates.
  • Functions wrapped around indexed columns.
  • Implicit conversions.
  • Unnecessary SELECT *.
  • Optional filters that force generic plans.
  • Correlated subqueries that can be rewritten clearly.
  • Joins that multiply rows accidentally.
  • Filters applied after unnecessary work.

Example non-SARGable predicate:

Code
-- Harder to seek efficiently
WHERE YEAR(OrderDate) = 2026;

Better range predicate:

Code
WHERE OrderDate >= '20260101'
  AND OrderDate <  '20270101';

This kind of rewrite can reduce reads without adding a new index.

Workload-Level Trade-Offs

Tuning one query can hurt another. Every index, schema change, query rewrite, or hint has a workload trade-off.

Examples:

  • Adding an index can speed reads but slow writes.
  • Covering one query with a wide index can increase storage and memory pressure.
  • Forcing a plan can help one parameter value and hurt another.
  • Rewriting a query can change locking behavior.
  • Denormalization can improve reads but complicate writes and consistency.
  • Reducing duration can increase CPU if the new plan uses more parallelism.

A good tuning decision considers the whole workload.

Common Mistakes

Common mistakes include:

  • Tuning the slowest single query instead of the largest workload contributor.
  • Comparing nonrepresentative time windows.
  • Looking only at average duration.
  • Ignoring execution count.
  • Ignoring logical reads and CPU.
  • Ignoring waits and blocking.
  • Declaring success after one test execution.
  • Creating duplicate or speculative indexes.
  • Using hints before understanding the root cause.
  • Failing to monitor after deployment.

Best Practices

Best practices include:

  • Enable Query Store before major changes.
  • Capture a representative baseline.
  • Compare similar workload windows.
  • Rank candidates by total workload impact and business priority.
  • Validate with actual execution plans and Query Store history.
  • Test with representative parameter values.
  • Prefer small, targeted, reversible changes.
  • Measure both per-execution efficiency and total workload impact.
  • Document what changed, why it changed, and how success is measured.
  • Revisit tuning decisions as data and workload patterns change.

Interview Practice

PreviousLocking behavior and blockingNext UpKnowing when to change SQL, indexes, or schema instead of forcing hints