Performance tuning without a checklist is just educated guessing. You check what looks wrong, make a change, hope it helps, move on. Sometimes it works. Most of the time you are firing at shadows while the real problem sits three layers deeper in the engine.
This checklist exists because I have been on too many calls where someone spent a week tuning indexes that had nothing to do with the problem — while the actual bottleneck was a 40-second log flush on a misconfigured tempdb. So here it is: the 12 checks that catch 90% of real SQL Server performance problems, in the order I run them in production.
Before You Start: Capture a Baseline
Performance work without a baseline is archaeology — you are digging into past events with no reference for what "normal" looks like. Before you touch anything, snapshot the current state:
-- Snapshot wait stats (reset counter, run workload, then query again for delta)
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
-- Wait 30-60 minutes under normal load, then:
SELECT wait_type, waiting_tasks_count,
wait_time_ms / 1000.0 AS wait_time_seconds,
signal_wait_time_ms / 1000.0 AS signal_wait_seconds
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
'SLEEP_TASK','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_AUTO_EVENT',
'REQUEST_FOR_DEADLOCK_SEARCH','RESOURCE_QUEUE','SERVER_IDLE_CHECK',
'SLEEP_DBSTARTUP','SLEEP_DBRECOVER','SLEEP_MASTERDBREADY',
'SLEEP_MASTERMDREADY','SLEEP_MASTERUPGRADED','SLEEP_MSDBSTARTUP',
'SLEEP_TEMPDBSTARTUP','SNI_HTTP_ACCEPT','SP_SERVER_DIAGNOSTICS_SLEEP',
'SQLTRACE_BUFFER_FLUSH','WAITFOR','XE_DISPATCHER_WAIT','XE_TIMER_EVENT'
)
ORDER BY wait_time_ms DESC;
That delta is your baseline. Everything after this checklist gets measured against it.
1. Wait Stats — The First Five Minutes
Wait stats tell you what SQL Server is blocked on right now. Not what queries are slow — what the engine is waiting for. If you do nothing else from this list, do this:
- CXPACKET / CXCONSUMER — Parallelism contention. Queries are splitting across threads and waiting on each other. Check
cost threshold for parallelism(default of 5 is almost always wrong) andMAXDOPsettings. For NUMA servers, set MAXDOP to the number of cores per NUMA node, not the total. - LCK_M_* — Locking waits. Something is holding a lock and blocking others. Run
sys.dm_exec_requestsfiltered toblocking_session_id > 0to find the chain. - PAGEIOLATCH_SH / PAGEIOLATCH_EX — Reading data pages from disk. Either missing indexes (tables scanned instead of seeked) or storage is the actual bottleneck. If
avg disk sec/readin Performance Monitor consistently exceeds 20ms on SSD or 10ms on HDD, the storage path is the problem. - WRITELOG — Transaction log write stalls. Log writes are sequential by nature — if this is your top wait, your log file is on the same physical drive as your data files. Separate them. Bonus: enable instant file initialization for data files so they do not wait on zeroing.
- RESOURCE_SEMAPHORE — Queries waiting for memory grants. Sort spills to tempdb. This usually means the query optimizer is making bad cardinality estimates — see Check 5.
2. Missing Index Requests
SQL Server tracks indexes it wishes existed. Most DBAs either ignore these entirely or add every one blindly. Neither is right. Filter by impact:
SELECT
migs.avg_user_impact,
migs.avg_total_user_cost * migs.user_seeks AS estimated_impact,
mid.statement AS table_name,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE mid.database_id = DB_ID()
AND migs.avg_user_impact > 30 -- only interesting if it helps significantly
ORDER BY migs.avg_total_user_cost * migs.user_seeks DESC;
Add the index with the included columns covering the most expensive seeks. But first: check the existing indexes on that table. If you already have a similar index, the missing index request might be pointing at a missing index on a different column — or it might be a sign that statistics are stale and the optimizer is asking for something it already has.
3. Stale Statistics
Statistics tell the optimizer how many rows to expect. When they are wrong, the query plan is wrong. A table with 10 million rows but statistics last updated when it had 100,000 will generate plans built on false assumptions. Run:
SELECT
OBJECT_NAME(s.object_id) AS table_name,
s.name AS index_name,
STATS_DATE(s.object_id, s.index_id) AS last_updated,
s.rowcnt AS row_count,
s.rowmodctr AS rows_modified_since_stats
FROM sys.sysindexes s
JOIN sys.objects o ON s.id = o.object_id
WHERE o.type = 'U'
AND s.rowmodctr > 500 + (s.rowcnt * 0.1) -- threshold scales with table size
ORDER BY s.rowmodctr DESC;
Any table where rows_modified_since_stats exceeds the threshold gets an UPDATE STATISTICS tablename — or WITH FULLSCAN if the table is small enough that a scan is fast. Auto-update handles most of this, but high-DML tables (especially in ETL workflows) regularly outpace the sampling rate. For more on statistics and how they affect plan quality, see my post on SQL Server Statistics: How to Tune Execution Plans.
4. Index Usage — What Is Actually Being Used
You have indexes you think are helping. Are they? Check:
SELECT
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
i.type_desc,
s.user_seeks, s.user_scans, s.user_lookups, s.user_updates,
s.last_user_seek, s.last_user_scan
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
ORDER BY s.user_seeks + s.user_scans ASC;
Indexes with zero seeks and zero scans are being maintained (writes updated) but never read. Drop them. That maintenance cost is real — every INSERT, UPDATE, DELETE has to maintain those pages. For a full breakdown of index usage and how to evaluate what to keep, see SQL Server Index Tuning Strategies.
5. Execution Plans — The Big Three Problems
When you pull an actual plan (not an estimated plan — always get the actual one), you are looking for three things:
- Table scans on large tables — If you see a Scan on a table with millions of rows and you have a predicate, you need an index. See Execution Plan Analysis: Reading What the Optimizer Chose for the full breakdown of scan vs. seek.
- Nested loops on large inputs without index on the join column — Each row from the outer input triggers a lookup on the inner. On a million-row outer, that is a million lookups. Add the missing index.
- Sort and hash operators on large rows — If a Sort is showing millions of rows and 50%+ estimated cost, the optimizer is wrong about the row count. Update statistics (Check 3) or add a hint to force a different join type.
6. Memory Pressure — Check the Buffer Pool
SQL Server is supposed to use memory to cache data pages and avoid disk I/O. But sometimes it is configured wrong, or other processes are stealing it, or the plan cache is bloated with single-use plans:
SELECT
object_name AS counter_name,
cntr_value / 1024.0 AS mb
FROM sys.dm_os_performance_counters
WHERE counter_name IN ('Database Cache Memory (KB)', 'Free Memory (KB)', 'Target Server Memory (KB)');
-- Also check for plan cache bloat:
SELECT TOP 5
COUNT(*) AS plan_count,
SUM(CAST(qp.query_plan AS XML).value('count(//RelOp)', 'int')) AS total_operators,
AVG(CAST(qp.query_plan AS XML).value('count(//RelOp)', 'float')) AS avg_operators_per_plan,
MIN(qs.creation_time) AS oldest_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
GROUP BY qt.dbid
ORDER BY plan_count DESC;
If Database Cache Memory is well below Target Server Memory, SQL Server is not getting the memory it expects. Check the max server memory setting. If it looks right, look for memory leaks in SQL Server itself or competing processes on the host. For a deeper dive into CPU and memory pressure patterns, see my post on Diagnosing SQL Server CPU Pressure.
7. Tempdb Contention — The Silent Performance Killer
Tempdb is where work happens: sorts that do not fit in memory, hash joins on large datasets, version store for snapshot isolation, table variables, and intermediate results in complex queries. When contention is high, everything backs up:
SELECT session_id, wait_type, wait_duration_ms, resource_description
FROM sys.dm_os_waiting_tasks
WHERE wait_type LIKE 'PAGELATCH%' OR wait_type LIKE 'PAGEIOLATCH%';
-- Also check tempdb usage:
SELECT
SUM(user_object_reserved_page_count) * 8 AS user_object_kb,
SUM(internal_object_reserved_page_count) * 8 AS internal_object_kb,
SUM(version_store_reserved_page_count) * 8 AS version_store_kb
FROM sys.dm_db_file_space_usage
WHERE database_id = 2; -- tempdb
Two fixes that help immediately, regardless of SQL Server version: create multiple equally-sized tempdb data files (one per logical core, up to 8), and enable trace flag 1118 (SQL 2014 and earlier — forces uniform extents, reduces SGAM contention). In SQL 2016+, tempdb is already configured better by default, but verify your data file count.
8. I/O Bottlenecks — Latency, Not Throughput
Most I/O problems are latency problems, not throughput problems. SQL Server does not need a fast sequential read — it needs a fast random read. Throughput benchmarks are irrelevant:
SELECT
DB_NAME(mf.database_id) AS database_name,
mf.physical_name,
num_of_reads, num_of_writes,
io_stall_read_ms, io_stall_write_ms,
io_stall_read_ms / NULLIF(num_of_reads, 0) AS avg_read_latency_ms,
io_stall_write_ms / NULLIF(num_of_writes, 0) AS avg_write_latency_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
JOIN sys.master_files mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
ORDER BY io_stall_read_ms + io_stall_write_ms DESC;
Average read latency above 20ms on SSD or 10ms on HDD is a problem. If your log file is on the same physical drive as your data files, fix that first — log writes are sequential and cannot share the head with random data reads. More detail in SQL Server I/O Bottleneck Analysis.
9. Query Store — Turn It On Now
If you are on SQL Server 2016 or later and Query Store is not enabled, enable it. Right now. Query Store captures executed plans, their performance characteristics, and runtime stats — and it lets you force a known-good plan when the optimizer regresses. This alone has saved more production incidents than any other feature added to SQL Server in the last decade:
ALTER DATABASE [YourDatabase] SET QUERY_STORE = ON;
ALTER DATABASE [YourDatabase] SET QUERY_STORE (
OPERATION_MODE = READ_WRITE,
MAX_STORAGE_SIZE_MB = 500,
WAIT_STATS_CAPTURE_MODE = ON,
QUERY_CAPTURE_MODE = AUTO
);
Query Store gives you historical context the optimizer does not have by default. If a query was fast last week and is slow now, you can compare the plans side by side and see exactly what changed. More on this in my post on Using Query Store for Performance Analysis.
10. Parameter Sniffing — The Plan Cache Trick
Parameter sniffing is when SQL Server compiles a plan for a specific parameter value and that plan is wrong for other values of the same parameter. Classic example: a stored procedure that has different plans for a high-cardinality parameter (narrow plan, seek) vs. a low-cardinality parameter (broad scan, different join order). Both plans are valid — neither is always right.
Three ways to fix it without changing application code:
-- Option 1: OPTION (RECOMPILE) at the problem query
-- Use for highly variable parameters where sniffing always hurts
SELECT col1, col2 FROM MyTable WHERE date >= @StartDate OPTION (RECOMPILE);
-- Option 2: Plan guide with OPTIMIZE FOR
-- Use when you know the most common parameter value
EXEC sp_create_plan_guide
@name = N'Guide_For_CommonParameter',
@stmt = N'SELECT col1 FROM MyTable WHERE date >= @StartDate',
@type = N'SQL',
@params = N'@StartDate datetime',
@hints = N'OPTION (OPTIMIZE FOR (@StartDate = ''20260101''))';
-- Option 3: Local variable trick (bypasses sniffing)
DECLARE @StartDateFilter datetime = @StartDate;
SELECT col1, col2 FROM MyTable WHERE date >= @StartDateFilter;
If you want to understand parameter sniffing in the context of execution plans, the execution plan post has examples at Execution Plan Analysis.
11. Check These Server-Level Settings
- max server memory — Leave at least 4GB for the OS, plus 1GB per 4 cores above 16. A max memory set to "available RAM" at installation time is almost always wrong on a production box with other software running.
- cost threshold for parallelism — Default is 5. That means any query estimated to cost more than 5 seconds gets a parallel plan. For most OLTP workloads, 25-50 is a better starting point. Eliminates CXPACKET waits on small queries that do not need parallelism.
- max degree of parallelism (MAXDOP) — Set to number of cores per NUMA node, capped at 8. Zero (unlimited) is almost never the right answer on high-core-count servers.
- backup compression default — Enabled by default in newer versions, but check. Compressed backups finish faster and use less disk I/O, which matters when your backup window is tight.
12. AI Tools in 2026 — Where They Help, Where They Do not
Query Store has built-in performance insights. Azure Advisor flags configuration problems. Several third-party tools use AI to surface plan regressions and suggest missing indexes. I use some of these. Here is what I have found:
Where AI tools actually help: Surfacing unknown unknowns — patterns in your workload you have not considered. Suggesting index additions by analyzing execution frequency and join patterns. Detecting regressions between plan versions that are hard to catch manually. Automated performance baselines and alerting on deviation.
Where experienced DBAs still outperform automation: Root cause diagnosis when multiple wait types are competing. Understanding that an index fix in dev causes a write performance problem in prod. Knowing when the real fix is architecture — partitioning, archiving old data, changing the access pattern — not a missing index. Context that only exists in your environment: why this stored procedure runs at 3 AM with a specific input pattern, why this report query does not need to be real-time.
Use AI to find the problems. Use experience to decide which problems to fix and how. The checklist above is where that experience starts.
Run through these 12 checks and you will have a clear picture of where your SQL Server is spending its time. Most problems surface in checks 1-5. The rest are for the cases that survive the first pass.
If you want a faster way to get a read on your current situation, the 2AM Database Test runs a subset of these checks against your environment and gives you a prioritized list of what to tackle first.
Need hands-on help with a specific performance issue? Database Performance Tuning — scoped engagements, no retainer required.