Backup Strategy
Our backup philosophy is built on the principle of continuous availability. We categorize data into three distinct tiers with specific recovery objectives.Persistence
PostgreSQL
- Method: Daily snapshots + Point-in-Time Recovery (PITR).
- Scope: User data, flow definitions, and knowledge base metadata.
Object Storage
S3 / GCS / MinIO
- Method: Versioning + Cross-region replication.
- Scope: Knowledge base documents, run-level logs, and artifacts.
Configuration
Secrets & Env
- Method: Infrastructure-as-Code (IaC) versioning + Secret Manager backups.
- Scope:
auth_config, API keys, and deployment parameters.
Data Layer Resilience
PostgreSQL (Persistence Layer)
PostgreSQL is the source of truth for the platform. For production environments, we recommend:- Automated Snapshots: Daily full snapshots with a minimum 30-day retention.
- PITR: Continuous transaction log (WAL) archiving to allow recovery to any specific second within the retention window.
- Multi-AZ Failover: Deploy with a synchronous standby in a separate availability zone for zero-downtime failover.
Object Storage (Liquid Data)
As part of our Liquid Data architecture, object storage handles the bulk of your AI assets:- Versioning: Enable bucket versioning to protect against accidental deletions or overwrites.
- Lifecycle Policies: Automatically transition older logs and artifacts to lower-cost storage classes (e.g., Glacier or Coldline) to optimize budgets.
- Replication: For mission-critical deployments, enable cross-region replication to ensure data availability even during a total cloud region failure.
Recovery Objectives (RTO/RPO)
Noxus is designed to help you meet strict enterprise recovery targets:| Objective | Target | Description |
|---|---|---|
| RPO (Recovery Point) | < 5 Minutes | The maximum amount of data loss you can tolerate (driven by PITR). |
| RTO (Recovery Time) | < 1 Hour | The maximum time allowed to restore the platform to full operation. |
Disaster Recovery (DR) Patterns
Depending on your deployment model, you can implement several DR patterns:Active-Passive (Warm Standby)
Active-Passive (Warm Standby)
Maintain a secondary deployment in a different region. Data is continuously replicated, and the secondary stack can be scaled up rapidly during a failover.
Active-Active (Multi-Region)
Active-Active (Multi-Region)
Run Noxus services in multiple regions simultaneously. Traffic is routed to the nearest healthy region, providing the highest level of availability and lowest latency for global users.
Air-Gapped Recovery
Air-Gapped Recovery
For isolated environments, backups are stored on encrypted, physically separate media and recovered using verified offline procedures.
Operational Readiness
Automate Everything
Use the Terraform and Helm assets in
noxus-infra to automate the provisioning of backup resources.Monitor Backup Health
Set up alerts for failed snapshots or replication lag in your monitoring dashboard.