💾 Backup & Recovery (Immutable + 3-2-1) — Module 6
1) Theory — Backups that actually recover
“We had backups” doesn’t mean “we can recover.” Ransomware operators target snapshots, NAS shares, backup servers and credentials first. Reliable recovery requires immutable copies, separate credentials, tested restore paths, and a design that matches business RPO/RTO needs. Aim for boring, scripted, and repeatable.
- 3-2-1 (or 3-2-1-1-0): 3 copies, 2 media, 1 offsite, 1 immutable/air-gapped, 0 restore errors from testing.
- Immutability: object-lock/WORM or true air-gap; block deletions/edits for retention window.
- Credential separation: backup admin identity isolated from AD/SSO; MFA and break-glass.
- Scope: servers, databases, endpoints, and SaaS (M365/Google/Slack/CRM) — vendor retention ≠ backup.
- Tested restores: tabletop + actual restore drills (files, DBs, whole VMs) with time and integrity checks.
1.1 Backup architecture essentials
- Media diversity: primary + secondary (disk) + offsite (object storage/tape).
- Immutable tier: S3/Object-Lock, Azure immutability, on-prem WORM, or tape vaulted off-network.
- Network isolation: no write path from production to immutable after commit; dedicated networks if possible.
- Encryption: AES-256 at rest; TLS in transit; protect keys in HSM/KMS with dual control.
- Backups of identity: back up AD/Entra/IdP configs and critical SaaS configs; document restore order.
1.2 RPO, RTO, tiers & schedules
- RPO (point): how much data you can lose (e.g., 4h). Drives backup frequency (hourly logs, nightly fulls).
- RTO (time): how quickly you must restore (e.g., 8h). Drives storage/media and warm-standby choices.
- Tiering: gold (mission-critical warm), silver (standard), bronze (cold/long-term).
- Schedules: GFS (grandfather-father-son), log shipping, snapshots + replication + immutable copy.
1.3 Safeguards against ransomware
- MFA + least privilege for backup consoles; separate admin identities and networks.
- Write-once policies on immutable buckets/volumes; retention locks to prevent tampering.
- Credential vaulting and rotation; no domain-joined backup servers where avoidable.
- Monitor indicators: mass snapshot deletions, backup job failures, policy changes.
- Out-of-band copy: physical/tape export or cloud-to-cloud with different creds.
1.4 What to back up (don’t forget SaaS)
- Infra: VM images, containers, databases (full + logs), file servers, hypervisor configs.
- Endpoints: key user folders and browser-stored creds (where policy allows).
- SaaS: M365/Google (mail, Drive/SharePoint), Teams/Slack, CRM, code repos, wiki — use API-based backups.
- Configs & keys: firewall/EDR policies, IaC/git, KMS/HSM backups, license keys, runbooks.
1.5 Restore testing & validation
- Drill regularly: quarterly file/DB restore; semi-annual full system; note timing vs RTO.
- Integrity: checksum/hash compare; malware scan backups and restores.
- Document order: identity/DNS/DHCP first; then DBs, apps; finally user services.
- Automation: scripted, repeatable restores; capture exact commands and versions.
- Metrics: track success rate, mean restore time, errors to achieve “0 tested errors.”
1.6 Incident playbook — recovering from ransomware
- Contain the outbreak; verify clean state before restore (EDR scans, network isolation).
- Choose clean point (pre-infection) on immutable/offline copy; confirm with indicators.
- Restore crown jewels first to meet RTO (identity, ERP DBs, core apps); validate functionally.
- Rotate credentials, revoke tokens; rebuild from templates; re-join to clean domains.
- Post-restore hardening: patch, enforce MFA, segment, and verify monitoring before user cutover.
2) Real-world example
Manufacturing plant, weekend ransomware: Operators wiped snapshots and tried to delete backup chains, but immutable object storage (WORM 14 days) blocked it. IR isolated networks Saturday morning; by Sunday 14:00 the team restored AD, ERP DBs, and shop-floor servers from the immutable tier. Monday shift started on time. The difference wasn’t “having backups” — it was immutable copies + practiced restores.
3) Assessment — 18 Professional Questions
Choose the best answer for each question. Answers and feedback appear after you submit.
4) Finish
When you’re done, mark this module as completed to update your Premium Hub progress.
