Production SQL Server workloads often degrade when queries scan large tables or generate unnecessary tempdb spills. Start by capturing actual execution plans on representative queries rather than relying on estimated plans alone.
The goal is to shift from reactive troubleshooting to proactive pattern recognition. Recent SQL Server releases provide improved cardinality estimator feedback and query store metrics that make this shift practical.
#Reading Execution Plans Efficiently
Focus first on operators that dominate cost: table or index scans, hash joins with high memory grants, and sorts that spill to disk. Compare estimated versus actual row counts to identify stale statistics.
Enable query store and capture plans over a full business cycle. This data reveals regressions after statistics updates or plan recompiles without requiring constant profiler traces.
#Indexing for Selectivity and Maintenance
Create indexes that cover the most frequent WHERE and JOIN predicates while keeping key columns narrow. Include columns only when they eliminate key lookups and the resulting index remains under 10-15 percent of table size.
- Rebuild or reorganize indexes when fragmentation exceeds 30 percent or when page density drops below 60 percent.
- Schedule statistics updates after large data loads using FULLSCAN only on high-cardinality columns.
#Avoiding Common T-SQL Anti-Patterns
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2025;
The example above prevents index seeks on OrderDate. Rewrite it to use a range predicate on the date column so the optimizer can use a seek and range scan.
Replace scalar UDFs in SELECT lists with inline table-valued functions or computed columns when possible. Recent optimizer improvements still struggle with row-by-row function invocations.
#Ongoing Monitoring and Validation
Track wait stats and query store runtime statistics weekly. Correlate high CXPACKET or PAGEIOLATCH waits with specific query hashes before adding more indexes.
Test changes in a restored copy of production using the same compatibility level and resource governor settings. Measure both average and 95th-percentile latency, not just total duration.
Apply these steps iteratively: capture plans, adjust indexes or rewrite predicates, validate with query store, and repeat. The result is stable throughput without constant firefighting.
Comments
No comments yet