Disaster recovery recovers from a technical failure. Business continuity is the broader discipline — how the business keeps serving customers through technical, operational, personnel, or vendor disruptions.
Scope
This plan covers:- Technical disruptions — outages, ransomware, region failure (covered in depth by Disaster recovery).
- Operational disruptions — key personnel unavailability, communication tool outages.
- Vendor disruptions — LLM provider outage, email provider outage, payment provider outage.
- Business disruptions — funding, legal action, compliance enforcement.
- Pandemic / extended remote-only — already our steady state; no special plan needed.
Business impact analysis (BIA)
Services ranked by impact on customers:| Service | Max tolerable downtime (MTD) | Impact of loss |
|---|---|---|
| Document upload + denial detection | 4 hours | Customers can’t process new denials; backlog risk |
| Appeal letter generation | 8 hours | Customers can delay to next day; deadlines at risk for hot items |
| Appeal submission (portal/fax/mail) | 24 hours | Submission fallbacks exist (customer-side manual) |
| Tracking + dashboards | 24 hours | Inconvenient but not blocking |
| Webhooks + integrations | 24 hours | Most customers don’t depend on real-time webhooks for denial response |
| Admin panel | 72 hours | Daily ops continue via existing sessions |
Target objectives
| Metric | Target |
|---|---|
| RPO (recovery point objective) | ≤ 5 minutes data loss |
| RTO (recovery time objective) | ≤ 4 hours |
| MBCO (minimum business continuity objective) | Upload + detection restored within 2 hours |
Scenarios
Scenario 1 — technical outage (regional)
Full response is documented in Disaster recovery. Summary:- Provision new GCP infrastructure in secondary region via Terraform.
- Restore Cloud SQL from cross-region backup.
- Update DNS (Cloudflare) to point at new endpoints.
- Target: RTO ≤ 4 hours.
Scenario 2 — ransomware / data integrity incident
- Isolate affected infrastructure.
- Provision clean infrastructure via Terraform.
- Restore from the latest known-clean backup (before the incident window).
- Rotate all affected secrets.
- See Incident response for triage detail.
Scenario 3 — critical vendor outage
LLM (Anthropic) outage
LLM (Anthropic) outage
Impact: New denial detection + new appeal draft generation pauses. Existing denials and already-drafted appeals remain accessible.Mitigation:
- Queue incoming documents; process when API returns.
- Notify affected customers via in-app banner.
- For prolonged outages (>4 hours), degrade to fallback: rule-based extraction for obvious denial patterns; flag documents for HITL review.
Email (Amazon SES) outage
Email (Amazon SES) outage
Impact: Notifications delayed (magic links, outcome emails). Users with active sessions continue working.Mitigation:
- SES alternative already identified: Resend (secondary provider, wired in secrets.yaml as optional).
- Failover can be enabled within 1 hour.
- For critical-path emails (password reset): manual outreach via customer support.
Stripe (planned) outage
Stripe (planned) outage
Impact: Billing actions pause (post Stripe go-live, Q3 2026).Mitigation:
- Billing operations retry automatically on Stripe recovery.
- Customers see normal service; billing is behind-the-scenes.
GCP region-wide outage
GCP region-wide outage
Impact: Full platform outage.Mitigation:
- See Scenario 1 (DR to secondary region).
- Expected to be rare given Google’s infrastructure history.
DocuSeal outage
DocuSeal outage
Impact: E-signature for new appeals paused. Already-signed appeals are unaffected.Mitigation:
- Fallback: generate signature-ready PDFs for manual signing, deliverable via existing channels (fax, mail).
Scenario 4 — key personnel unavailability
Denialbase’s small team size means key-person risk is real. Mitigations:| Role | Primary | Backup | Cross-training |
|---|---|---|---|
| Security Officer | Named person | CTO | Quarterly cross-training on incident response |
| CTO | Named person | Senior engineer | Documented runbooks for every production-only-CTO task |
| Privacy Officer | Named person | Outside counsel on retainer | — |
| CEO | Named person | Board member on retainer | — |
| Engineering on-call | Rotating | Named escalation path | Everyone on rotation has run at least one incident |
Scenario 5 — communication channel disruption
- Slack outage: fall back to email + SMS bridge for incident comms.
- Google Workspace outage: fall back to a pre-provisioned alternate email domain + personal phone numbers for leadership.
- GitHub outage: fall back to local repos + mirrored backups (hourly GitHub → GCS clone).
- Cloudflare outage: direct DNS via GCP Cloud DNS as backup (documented).
Scenario 6 — major compliance enforcement action
In the event of a HIPAA audit / breach finding / investigation:- Legal counsel engaged within 24 hours.
- All required documentation produced within regulator-specified timelines.
- Customer communication via regulator-approved templates.
- Operations continue; no customer-impacting changes without documented justification.
Exercising the plan
| Exercise | Cadence | Scope | Status |
|---|---|---|---|
| Tabletop — regional outage | Annual | All leadership | Q2 2026 |
| Tabletop — vendor outage | Semi-annual | CTO + senior eng | Q3 2026 |
| Partial drill — DR failover to secondary region | Annual | CTO + eng | Q4 2026 |
| Full drill — BCP activation | Every 2 years | All | 2027 |
| Post-incident review with BCP lens | After any Sev-1 | Actual responders | Ongoing |
Roles during a BCP activation
| Role | Responsibilities |
|---|---|
| Incident Commander | Declares activation, coordinates response, owns customer comms |
| Technical Lead | Executes DR / failover procedures |
| Customer Lead | Manages customer-facing communications, status page, email |
| Legal / Compliance Lead | Regulatory notifications, HIPAA breach assessment |
| Scribe | Documents decisions and timeline for post-mortem |
Activation criteria
BCP is activated when any of the following is true for more than 30 minutes:- Platform is unreachable for all customers.
- PHI integrity is known or suspected compromised.
- Key personnel are unavailable and no primary/backup can cover.
- A regulator has initiated an enforcement action requiring dedicated response.
Communication plan
Internal
- Incident channel in Slack (with email bridge for Slack outages).
- Status board auto-updated from monitoring + manual additions.
- Hourly updates to leadership during active response.
External
- Status page (status.denialbase.com) — updated within 15 min of activation.
- Customer email — sent within 1 hour for customer-affecting incidents.
- In-app banner — enabled by CTO or Security Officer.
- Social media — only for widespread public incidents; managed by marketing + CEO.
Recovery → normal operations
When the acute phase ends:- Services restored to steady state.
- Status page updated.
- Post-incident review within 5 business days.
- Corrective actions tracked in Risk register.
- BCP itself updated if gaps were found.
Annual review
Reviewed by CEO + CTO + Security Officer at the annual management review, including:- BIA refresh (services list, RTOs, RPOs).
- Vendor risk re-assessment.
- Succession plans.
- Exercise outcomes.
Related
- Disaster recovery — technical recovery procedures
- Incident response — acute response to specific incident types
- Vendor management — vendor-continuity details
- Risk register — BCP-related risks are tracked here