📈
AG Monitoring & Diagnostics
10 queries
  • Synchronization health — replica states, sync partner, suspended databases
  • Log send queue & redo rate — catch-up estimation per database
  • Database replica link latency — network throughput vs. commit threshold
  • Always On health session events — last 24h cluster events, failover reasons
  • sp_CheckAG integration — automated health scoring report
Pre-Failover Validation
Checklist
  • Confirm all replicas are synchronized (SYNCHRONIZED)
  • Verify log backup chain is intact on primary and secondary
  • Check cluster quorum — vote count, cloud witness or file share witness
  • Review Estimated data loss window (RPO) for each availability database
  • Notify application teams of planned maintenance window
🛡
Planned Failover Runbook
Step-by-step
  • Step 1: Drain active connections from primary replica
  • Step 2: Force cluster group to secondary via WSFC or SSMS
  • Step 3: Bring databases out of suspended state on new primary
  • Step 4: Redirect connection strings or update AG listener DNS
  • Step 5: Validate application connectivity and query execution
🛠
Post-Failover Validation
Checklist
  • Verify AG is fully synchronized on new primary — all replicas healthy
  • Confirm log backup chain is functioning on new primary
  • Check for blocked processes, long-running queries, or connection spikes
  • Update monitoring alerts to reflect new primary replica
  • Log the failover event with timing and any anomalies for audit trail
🚀
Contained AG — SQL Server 2022
New feature
  • Automatic database seeding with contained availability groups
  • Distributed availability groups — cross-site, cross-domain scenarios
  • SQL Server 2022-only DMV queries for contained AG diagnostics
  • Availability group replicas across Azure regions — routing URL config
  • Re-Linux availability groups integration (if applicable)
📅
30-Day AG Health Program
Sustained program
  • Week 1: Baseline snapshot — replica health, latency, quorum status
  • Week 2: Failover drill — execute planned failover in non-production window
  • Week 3: Backup chain validation — test restore of AG databases from log backups
  • Week 4: Documentation review — update runbooks, contact lists, SLAs
  • Ongoing: Monthly DMV snapshot and trend analysis for proactive alerting

Get the Full Playbook as a PDF

Download the complete 10-page playbook — formatted for printing, sharing with your team, and keeping at your desk for the next failover window.

Want a live review of your AG environment?

Book a no-commitment 30-minute health call — we'll run through your AG setup and flag anything that needs attention before the next failover.

Take the Free Assessment