Always-On Availability Groups are one of SQL Server's most powerful high-availability features. They're also one of the most complex to monitor. A secondary replica can fall behind. A replica can be disconnected. Synchronization can silently degrade. And if you're not watching closely, you won't know until a failover attempt fails — or worse, until someone calls you at 2 AM because the app is down.
That's where sp_checkAG comes in.
sp_checkAG is a free, open-source stored procedure written by Brian Kelley (Straight Path SQL) designed specifically to surface Availability Group issues before they become outages. This post walks through what it does, how to deploy it, how to interpret its output, and what to do when it flags a problem.
What Is sp_checkAG?
sp_checkAG is a T-SQL stored procedure that queries the AG-related DMVs and rolls them into a single diagnostic output. Rather than forcing you to write custom DMV queries every time something looks wrong, it gives you a consolidated health snapshot in seconds.
The procedure checks for:
- Replica connectivity — Is every replica in a CONNECTED state?
- Synchronization health — Are all databases in a HEALTHY or PARTIALLY_HEALTHY state?
- Redo queue depth — Is the secondary keeping up with the primary, or is it falling behind?
- Send queue depth — Is the log stream being transmitted from primary to secondary without stacking up?
- Failover readiness — For synchronous replicas, is automatic failover actually viable?
- Role mismatches — Is a replica unexpectedly in RESOLVING state?
- Database-level issues — Are individual databases within the AG NOT SYNCHRONIZING when they should be?
Installing sp_checkAG
sp_checkAG is available on GitHub. The recommended installation is on the primary replica.
-- Run on your primary replica (or any instance you use for DBA monitoring)\nUSE DBA;\nGO\n\n-- Paste the sp_checkAG CREATE PROCEDURE script here\n-- Source: https://github.com/bertwagner/sp_checkAG\n-- or: https://straightpathsql.comOnce installed, execution is simple:
EXEC sp_checkAG;Basic Usage and Output
Running sp_checkAG with no parameters returns a result set covering all Availability Groups on the instance.
Key output columns to watch:
connected_state: CONNECTED vs DISCONNECTED — a disconnected replica isn't receiving log at allsynchronization_health: HEALTHY, PARTIALLY_HEALTHY, or NOT_HEALTHYredo_queue_size_kb: Log waiting to be hardened on the secondary — growing redo queue means the secondary is I/O boundsend_queue_size_kb: Log waiting to be sent from primary — growing send queue means a network or endpoint bottleneckis_failover_ready: 1 means automatic failover will fire; 0 means it won't — treat a sync replica with is_failover_ready=0 as a P1
Filtering to a Specific AG
EXEC sp_checkAG @AGName = 'AG_PROD';Checking Database-Level Status
Beyond replica-level health, sp_checkAG also surfaces database-level synchronization states — critical because a replica can appear connected and healthy while individual databases within it are NOT SYNCHRONIZING.
EXEC sp_checkAG @ShowDatabaseDetails = 1;This adds a second result set showing each database within each replica. A database showing NOT SYNCHRONIZING on an otherwise healthy replica is the issue most likely to be missed without database-level visibility.
Real-World Scenarios and Remediation
Redo Queue Growing on a Synchronous Secondary
The secondary can't apply log records fast enough — almost always I/O-bound on the secondary. Move AG log files to faster storage (NVMe preferred). Check for blocking on the secondary — long-running read queries can block the redo thread.
Replica DISCONNECTED After Network Event
Check endpoint status: SELECT name, state_desc FROM sys.endpoints WHERE type = 4;. Common causes: endpoint stopped (ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED), firewall blocking port 5022, or certificate expiry if using certificate auth.
is_failover_ready = 0 on a Synchronous Replica
Your automatic failover won't fire. Check synchronization health, WSFC quorum health, and whether a database was recently added to the AG but not yet fully synchronized.
Database NOT SYNCHRONIZING in an Otherwise Healthy AG
This is the silent gap. Common causes: log chain broken (re-seed needed), full transaction log on primary, or automatic seeding failure. Check sys.dm_hadr_automatic_seeding for recent failures.
Using sp_checkAG in a Monitoring Job
Set it up as a SQL Server Agent job running every 5–10 minutes. Alert when any replica shows CONNECTED_STATE != CONNECTED or synchronization_health != HEALTHY. Pair with Database Mail for alerts and reference the AG Health Playbook for response procedures.
DECLARE @issues INT = 0;\nSELECT @issues = COUNT(*) FROM sys.dm_hadr_availability_replica_states\nWHERE role_desc = 'SECONDARY' AND (connected_state_desc != 'CONNECTED' OR synchronization_health_desc != 'HEALTHY');\nIF @issues > 0 BEGIN EXEC sp_checkAG @ShowDatabaseDetails = 1; ENDsp_checkAG vs. Manual DMV Queries
You can get this information from DMVs directly. The question is whether you want to write and maintain those queries, or use a community tool that already does it correctly. For teams without commercial monitoring, sp_checkAG fills the gap cleanly.
Integration with SQL Server 2025
AG behavior is largely unchanged in SQL Server 2025 at the monitoring layer — sp_checkAG works the same way it does on 2019/2022. The DMVs are stable. One area to watch: SQL Server 2025's OPTIMIZED_LOCKING can affect redo thread behavior on secondaries in high-write workloads during mixed-version AG transitions.
Key Takeaways
- Run
sp_checkAGwith@ShowDatabaseDetails = 1— database-level issues are invisible without it - Redo queue growth on a sync replica = secondary I/O problem. Fix the storage, not the AG
- Send queue growth on primary = network or secondary endpoint bottleneck
is_failover_ready = 0on a sync replica means your automatic failover won't fire. Treat it as P1- Automate it — a SQL Agent job running every 5–10 minutes catches problems before users do
The procedure is freely available from Brian Kelley at Straight Path SQL. Need help setting up AG monitoring or designing a failover runbook? Start with a free database health assessment.