Skip to main content
Noxus implements a multi-layered backup and recovery strategy to ensure that your AI infrastructure is resilient to data loss, service failures, and regional outages.

Backup Strategy

Our backup philosophy is built on the principle of continuous availability. We categorize data into three distinct tiers with specific recovery objectives.

Persistence

PostgreSQL
  • Method: Daily snapshots + Point-in-Time Recovery (PITR).
  • Scope: User data, flow definitions, and knowledge base metadata.

Object Storage

S3 / GCS / MinIO
  • Method: Versioning + Cross-region replication.
  • Scope: Knowledge base documents, run-level logs, and artifacts.

Configuration

Secrets & Env
  • Method: Infrastructure-as-Code (IaC) versioning + Secret Manager backups.
  • Scope: auth_config, API keys, and deployment parameters.

Data Layer Resilience

PostgreSQL (Persistence Layer)

PostgreSQL is the source of truth for the platform. For production environments, we recommend:
  • Automated Snapshots: Daily full snapshots with a minimum 30-day retention.
  • PITR: Continuous transaction log (WAL) archiving to allow recovery to any specific second within the retention window.
  • Multi-AZ Failover: Deploy with a synchronous standby in a separate availability zone for zero-downtime failover.

Object Storage (Liquid Data)

As part of our Liquid Data architecture, object storage handles the bulk of your AI assets:
  • Versioning: Enable bucket versioning to protect against accidental deletions or overwrites.
  • Lifecycle Policies: Automatically transition older logs and artifacts to lower-cost storage classes (e.g., Glacier or Coldline) to optimize budgets.
  • Replication: For mission-critical deployments, enable cross-region replication to ensure data availability even during a total cloud region failure.

Recovery Objectives (RTO/RPO)

Noxus is designed to help you meet strict enterprise recovery targets:
ObjectiveTargetDescription
RPO (Recovery Point)< 5 MinutesThe maximum amount of data loss you can tolerate (driven by PITR).
RTO (Recovery Time)< 1 HourThe maximum time allowed to restore the platform to full operation.

Disaster Recovery (DR) Patterns

Depending on your deployment model, you can implement several DR patterns:
Maintain a secondary deployment in a different region. Data is continuously replicated, and the secondary stack can be scaled up rapidly during a failover.
Run Noxus services in multiple regions simultaneously. Traffic is routed to the nearest healthy region, providing the highest level of availability and lowest latency for global users.
For isolated environments, backups are stored on encrypted, physically separate media and recovered using verified offline procedures.

Operational Readiness

The Restore Drill: A backup is only as good as its last successful restore. We recommend performing quarterly restoration drills in a non-production environment to validate your runbooks.
1

Automate Everything

Use the Terraform and Helm assets in noxus-infra to automate the provisioning of backup resources.
2

Monitor Backup Health

Set up alerts for failed snapshots or replication lag in your monitoring dashboard.
3

Document the Runbook

Maintain a clear, step-by-step recovery guide that includes DNS switching and secret restoration.