Skip to main content
Disaster recovery recovers from a technical failure. Business continuity is the broader discipline — how the business keeps serving customers through technical, operational, personnel, or vendor disruptions.
Owner: CEO · Technical owner: CTO · Policy version: 0.9 (draft)

Scope

This plan covers:
  • Technical disruptions — outages, ransomware, region failure (covered in depth by Disaster recovery).
  • Operational disruptions — key personnel unavailability, communication tool outages.
  • Vendor disruptions — LLM provider outage, email provider outage, payment provider outage.
  • Business disruptions — funding, legal action, compliance enforcement.
  • Pandemic / extended remote-only — already our steady state; no special plan needed.

Business impact analysis (BIA)

Services ranked by impact on customers:
ServiceMax tolerable downtime (MTD)Impact of loss
Document upload + denial detection4 hoursCustomers can’t process new denials; backlog risk
Appeal letter generation8 hoursCustomers can delay to next day; deadlines at risk for hot items
Appeal submission (portal/fax/mail)24 hoursSubmission fallbacks exist (customer-side manual)
Tracking + dashboards24 hoursInconvenient but not blocking
Webhooks + integrations24 hoursMost customers don’t depend on real-time webhooks for denial response
Admin panel72 hoursDaily ops continue via existing sessions
Our aggregate Max Tolerable Period of Disruption (MTPD) is 24 hours. Beyond this we risk customer deadline-driven harm.

Target objectives

MetricTarget
RPO (recovery point objective)≤ 5 minutes data loss
RTO (recovery time objective)≤ 4 hours
MBCO (minimum business continuity objective)Upload + detection restored within 2 hours

Scenarios

Scenario 1 — technical outage (regional)

Full response is documented in Disaster recovery. Summary:
  • Provision new GCP infrastructure in secondary region via Terraform.
  • Restore Cloud SQL from cross-region backup.
  • Update DNS (Cloudflare) to point at new endpoints.
  • Target: RTO ≤ 4 hours.

Scenario 2 — ransomware / data integrity incident

  • Isolate affected infrastructure.
  • Provision clean infrastructure via Terraform.
  • Restore from the latest known-clean backup (before the incident window).
  • Rotate all affected secrets.
  • See Incident response for triage detail.

Scenario 3 — critical vendor outage

Impact: New denial detection + new appeal draft generation pauses. Existing denials and already-drafted appeals remain accessible.Mitigation:
  • Queue incoming documents; process when API returns.
  • Notify affected customers via in-app banner.
  • For prolonged outages (>4 hours), degrade to fallback: rule-based extraction for obvious denial patterns; flag documents for HITL review.
Impact: Notifications delayed (magic links, outcome emails). Users with active sessions continue working.Mitigation:
  • SES alternative already identified: Resend (secondary provider, wired in secrets.yaml as optional).
  • Failover can be enabled within 1 hour.
  • For critical-path emails (password reset): manual outreach via customer support.
Impact: Billing actions pause (post Stripe go-live, Q3 2026).Mitigation:
  • Billing operations retry automatically on Stripe recovery.
  • Customers see normal service; billing is behind-the-scenes.
Impact: Full platform outage.Mitigation:
  • See Scenario 1 (DR to secondary region).
  • Expected to be rare given Google’s infrastructure history.
Impact: E-signature for new appeals paused. Already-signed appeals are unaffected.Mitigation:
  • Fallback: generate signature-ready PDFs for manual signing, deliverable via existing channels (fax, mail).

Scenario 4 — key personnel unavailability

Denialbase’s small team size means key-person risk is real. Mitigations:
RolePrimaryBackupCross-training
Security OfficerNamed personCTOQuarterly cross-training on incident response
CTONamed personSenior engineerDocumented runbooks for every production-only-CTO task
Privacy OfficerNamed personOutside counsel on retainer
CEONamed personBoard member on retainer
Engineering on-callRotatingNamed escalation pathEveryone on rotation has run at least one incident
Succession plans are documented in a restricted Google Drive folder; externally-mailed backup with the corporate attorney.

Scenario 5 — communication channel disruption

  • Slack outage: fall back to email + SMS bridge for incident comms.
  • Google Workspace outage: fall back to a pre-provisioned alternate email domain + personal phone numbers for leadership.
  • GitHub outage: fall back to local repos + mirrored backups (hourly GitHub → GCS clone).
  • Cloudflare outage: direct DNS via GCP Cloud DNS as backup (documented).

Scenario 6 — major compliance enforcement action

In the event of a HIPAA audit / breach finding / investigation:
  • Legal counsel engaged within 24 hours.
  • All required documentation produced within regulator-specified timelines.
  • Customer communication via regulator-approved templates.
  • Operations continue; no customer-impacting changes without documented justification.

Exercising the plan

As of April 2026, the business continuity plan has not yet been formally exercised. First tabletop targeted for Q2 2026, first partial drill Q3 2026.
ExerciseCadenceScopeStatus
Tabletop — regional outageAnnualAll leadershipQ2 2026
Tabletop — vendor outageSemi-annualCTO + senior engQ3 2026
Partial drill — DR failover to secondary regionAnnualCTO + engQ4 2026
Full drill — BCP activationEvery 2 yearsAll2027
Post-incident review with BCP lensAfter any Sev-1Actual respondersOngoing

Roles during a BCP activation

RoleResponsibilities
Incident CommanderDeclares activation, coordinates response, owns customer comms
Technical LeadExecutes DR / failover procedures
Customer LeadManages customer-facing communications, status page, email
Legal / Compliance LeadRegulatory notifications, HIPAA breach assessment
ScribeDocuments decisions and timeline for post-mortem

Activation criteria

BCP is activated when any of the following is true for more than 30 minutes:
  • Platform is unreachable for all customers.
  • PHI integrity is known or suspected compromised.
  • Key personnel are unavailable and no primary/backup can cover.
  • A regulator has initiated an enforcement action requiring dedicated response.
Activation can be declared by CEO, CTO, or Security Officer. Any of those three can un-declare when services are restored.

Communication plan

Internal

  • Incident channel in Slack (with email bridge for Slack outages).
  • Status board auto-updated from monitoring + manual additions.
  • Hourly updates to leadership during active response.

External

  • Status page (status.denialbase.com) — updated within 15 min of activation.
  • Customer email — sent within 1 hour for customer-affecting incidents.
  • In-app banner — enabled by CTO or Security Officer.
  • Social media — only for widespread public incidents; managed by marketing + CEO.

Recovery → normal operations

When the acute phase ends:
  1. Services restored to steady state.
  2. Status page updated.
  3. Post-incident review within 5 business days.
  4. Corrective actions tracked in Risk register.
  5. BCP itself updated if gaps were found.

Annual review

Reviewed by CEO + CTO + Security Officer at the annual management review, including:
  • BIA refresh (services list, RTOs, RPOs).
  • Vendor risk re-assessment.
  • Succession plans.
  • Exercise outcomes.