SQL Server Backup and Restore Best Practices for Production DBAs

Here is a story I have heard more times than I care to count. A company runs into a major incident — ransomware, a botched migration, a developer with too much access and not enough sleep. They open their backup tool, find the most recent full backup, start the restore, and discover something terrible: the backup job has been failing silently for three weeks. The files exist. They just cannot be read.

The backup that was not actually a backup. It is the most predictable disaster in database administration, and it keeps happening because teams treat backup as a checkbox rather than a system.

This post covers what a real SQL Server backup strategy looks like — the backup types, the recovery models, the verification steps most teams skip, and the common mistakes that turn a routine incident into a multi-day outage.

Understanding Your Recovery Models First

Everything downstream of your recovery model choice. Get this wrong and the rest of your backup strategy is built on sand.

Simple recovery model truncates the transaction log automatically after each checkpoint. You can do full and differential backups, but you cannot do point-in-time recovery. Simple is fine for dev and test environments. It is not acceptable for production systems where data loss has business consequences.

Full recovery model retains the transaction log until you back it up. Combined with regular log backups, it gives you point-in-time recovery down to the minute or second. This is what most production OLTP databases should use.

Bulk-Logged recovery model is a middle ground — it minimally logs bulk operations to reduce log growth during large data loads. Use it tactically during ETL windows if log growth is a real problem.

The Three-Tier Backup Strategy

Free · 2 Minutes

How healthy is your database, really?

Get your free database health score — spot risks before they become incidents.

Get my health score

A production SQL Server backup strategy has three components working together.

Full backups are your baseline. For most production databases, once per day is right — typically during a low-traffic window. For very large databases (1TB+), consider backup compression aggressively or shift to weekly fulls with more frequent differentials.

BACKUP DATABASE [YourDatabase]
TO DISK = N'\\backup-server\sql-backups\YourDatabase_FULL_20260507.bak'
WITH COMPRESSION,
     CHECKSUM,
     STATS = 10;

Differential backups capture everything changed since the last full backup. Running these every 4-6 hours means your worst-case restore is: apply the last full, apply the last differential, then replay transaction logs. Much faster than replaying 20 hours of logs.

BACKUP DATABASE [YourDatabase]
TO DISK = N'\\backup-server\sql-backups\YourDatabase_DIFF_20260507_1200.bak'
WITH DIFFERENTIAL,
     COMPRESSION,
     CHECKSUM,
     STATS = 10;

Transaction log backups are how you get point-in-time recovery and keep your log file from growing unbounded. For most OLTP systems, every 15-30 minutes is appropriate. High-transaction systems may need every 5 minutes.

BACKUP LOG [YourDatabase]
TO DISK = N'\\backup-server\sql-backups\YourDatabase_LOG_20260507_120000.trn'
WITH COMPRESSION,
     CHECKSUM,
     STATS = 10;

Is your backup strategy actually protecting you?

Take the 2-minute database assessment to find out where your recovery gaps are before an incident does.

Take the Free Assessment

CHECKDB and Backup Verification — The Steps Everyone Skips

Most teams run DBCC CHECKDB irregularly or not at all. This is a significant mistake. CHECKDB detects corruption that can silently propagate into backups. If you have an undetected corruption and your backups are capturing corrupted data, your restore chain is worthless.

Run CHECKDB at minimum weekly on production. For mission-critical databases, nightly is worth the I/O cost.

DBCC CHECKDB ('YourDatabase')
WITH NO_INFOMSGS, ALL_ERRORMSGS;

Beyond CHECKDB, verify your backups. The CHECKSUM option in the backup commands above calculates and validates page checksums during backup. But that does not prove the file is restorable. Periodically restore to a non-production server and run DBCC CHECKDB on the restored database. Quarterly at minimum.

RESTORE VERIFYONLY
FROM DISK = N'\\backup-server\sql-backups\YourDatabase_FULL_20260507.bak'
WITH CHECKSUM;

RESTORE VERIFYONLY confirms the backup is readable and passes checksum validation. Actual test restores are not optional.

Point-in-Time Recovery Walkthrough

This is the scenario: your database has a problem at 3:47 PM. You need to restore to 3:46 PM — right before the incident.

Step 1: Restore the last full backup with NORECOVERY.

RESTORE DATABASE [YourDatabase_Recovery]
FROM DISK = N'\\backup-server\sql-backups\YourDatabase_FULL_20260507.bak'
WITH NORECOVERY,
     MOVE N'YourDatabase' TO N'D:\Data\YourDatabase_Recovery.mdf',
     MOVE N'YourDatabase_log' TO N'D:\Log\YourDatabase_Recovery_log.ldf',
     STATS = 10;

Step 2: Apply the last differential backup taken before 3:47 PM, still with NORECOVERY.

RESTORE DATABASE [YourDatabase_Recovery]
FROM DISK = N'\\backup-server\sql-backups\YourDatabase_DIFF_20260507_1200.bak'
WITH NORECOVERY;

Step 3: Apply transaction log backups in sequence, stopping at your target time.

RESTORE LOG [YourDatabase_Recovery]
FROM DISK = N'\\backup-server\sql-backups\YourDatabase_LOG_20260507_120000.trn'
WITH NORECOVERY;

RESTORE LOG [YourDatabase_Recovery]
FROM DISK = N'\\backup-server\sql-backups\YourDatabase_LOG_20260507_154500.trn'
WITH STOPAT = '2026-05-07T15:46:00',
     RECOVERY;

One critical step before all of this: take a tail-log backup. If the database is still accessible, capture the tail of the log before anything else touches it.

BACKUP LOG [YourDatabase]
TO DISK = N'\\backup-server\sql-backups\YourDatabase_LOG_TAIL_20260507.trn'
WITH NO_TRUNCATE, CHECKSUM;

Backup Compression and Performance

Enable compression. Always. It reduces backup file size by 60-70% on typical OLTP data and speeds up backup time because the bottleneck is usually I/O, not CPU.

Set it at the server level so you do not have to specify it per job:

EXEC sp_configure 'backup compression default', 1;
RECONFIGURE;

For very large databases, consider backup striping — writing to multiple files simultaneously. It can double or triple throughput on systems with fast enough disk.

BACKUP DATABASE [YourDatabase]
TO DISK = N'\\backup-server\sql-backups\YourDatabase_FULL_1.bak',
   DISK = N'\\backup-server\sql-backups\YourDatabase_FULL_2.bak',
   DISK = N'\\backup-server\sql-backups\YourDatabase_FULL_3.bak'
WITH COMPRESSION, CHECKSUM, STATS = 10;

Common Mistakes That Cost You in Production

Backing up to the same disk as your data files. When that disk fails, you lose both the database and the backup. Backups go to a different physical location, and ideally offsite or cloud storage.

Never testing restores. A backup you have never restored is a guess. Put quarterly restore tests on the calendar. Time them so you know how long an actual restore takes before you need to tell your CEO.

Missing tail-log backups in the restore runbook. If you do not have a procedure that starts with capturing the tail of the log, your DBA under pressure at 2 AM will skip it and truncate the transaction log.

Relying on job success emails instead of monitoring backup age. SQL Agent jobs can succeed with zero rows backed up. Query msdb.dbo.backupset and alert when the last backup age exceeds your RPO threshold.

SELECT database_name,
       MAX(backup_finish_date) AS last_backup,
       DATEDIFF(HOUR, MAX(backup_finish_date), GETDATE()) AS hours_since_backup
FROM msdb.dbo.backupset
WHERE type = 'D'
GROUP BY database_name
ORDER BY hours_since_backup DESC;

Retention Policies That Actually Work

The question is not how long to keep backups — it is how long recovery takes and what your compliance obligations are.

A practical starting point for most production OLTP databases:

Full backups: 30 days on-premises, 90 days in cold cloud storage
Differential backups: 7 days
Transaction log backups: 7 days

Adjust based on your data sensitivity, compliance requirements (HIPAA, SOC 2), and actual restore test times. If a restore takes 4 hours, your business needs to know their RTO is 4+ hours.

The first question in the 2am database test is always the same: when did you last successfully restore from backup? If you cannot answer that with a specific date, you have work to do.

Need help designing a bulletproof backup and recovery strategy? See how we work with production DBA teams

Backup Strategies and Recovery Planning PostgreSQL Disaster Recovery Always On AGs Configuration