Backup & Recovery

Noxus implements a multi-layered backup and recovery strategy to ensure that your AI infrastructure is resilient to data loss, service failures, and regional outages.

Backup Strategy

Our backup philosophy is built on the principle of continuous availability. We categorize data into three distinct tiers with specific recovery objectives.

Persistence

PostgreSQL

Method: Daily snapshots + Point-in-Time Recovery (PITR).
Scope: User data, flow definitions, and knowledge base metadata.

Object Storage

S3 / GCS / MinIO

Method: Versioning + Cross-region replication.
Scope: Knowledge base documents, run-level logs, and artifacts.

Configuration

Secrets & Env

Method: Infrastructure-as-Code (IaC) versioning + Secret Manager backups.
Scope: auth_config, API keys, and deployment parameters.

Data Layer Resilience

PostgreSQL (Persistence Layer)

PostgreSQL is the source of truth for the platform. For production environments, we recommend:

Automated Snapshots: Daily full snapshots with a minimum 30-day retention.
PITR: Continuous transaction log (WAL) archiving to allow recovery to any specific second within the retention window.
Multi-AZ Failover: Deploy with a synchronous standby in a separate availability zone for zero-downtime failover.

Object Storage (Liquid Data)

As part of our Liquid Data architecture, object storage handles the bulk of your AI assets:

Versioning: Enable bucket versioning to protect against accidental deletions or overwrites.
Lifecycle Policies: Automatically transition older logs and artifacts to lower-cost storage classes (e.g., Glacier or Coldline) to optimize budgets.
Replication: For mission-critical deployments, enable cross-region replication to ensure data availability even during a total cloud region failure.

Recovery Objectives (RTO/RPO)

Noxus is designed to help you meet strict enterprise recovery targets:

Objective	Target	Description
RPO (Recovery Point)	< 5 Minutes	The maximum amount of data loss you can tolerate (driven by PITR).
RTO (Recovery Time)	< 1 Hour	The maximum time allowed to restore the platform to full operation.

Disaster Recovery (DR) Patterns

Depending on your deployment model, you can implement several DR patterns:

Active-Passive (Warm Standby)

Maintain a secondary deployment in a different region. Data is continuously replicated, and the secondary stack can be scaled up rapidly during a failover.

Active-Active (Multi-Region)

Run Noxus services in multiple regions simultaneously. Traffic is routed to the nearest healthy region, providing the highest level of availability and lowest latency for global users.

Air-Gapped Recovery

For isolated environments, backups are stored on encrypted, physically separate media and recovered using verified offline procedures.

Operational Readiness

The Restore Drill: A backup is only as good as its last successful restore. We recommend performing quarterly restoration drills in a non-production environment to validate your runbooks.

Automate Everything

Use the Terraform and Helm assets in noxus-infra to automate the provisioning of backup resources.

Monitor Backup Health

Set up alerts for failed snapshots or replication lag in your monitoring dashboard.

Document the Runbook

Maintain a clear, step-by-step recovery guide that includes DNS switching and secret restoration.

Storage Architecture

Understand the three storage layers being backed up.

noxus-infra Repo

Access automation scripts for backup and recovery.

Overview

Deployment Options

Configuration

Security

Operations

Backup Strategy

Persistence

Object Storage

Configuration

Data Layer Resilience

PostgreSQL (Persistence Layer)

Object Storage (Liquid Data)

Recovery Objectives (RTO/RPO)

Disaster Recovery (DR) Patterns

Operational Readiness

Storage Architecture

noxus-infra Repo

Overview

Deployment Options

Configuration

Security

Operations

​Backup Strategy

Persistence

Object Storage

Configuration

​Data Layer Resilience

​PostgreSQL (Persistence Layer)

​Object Storage (Liquid Data)

​Recovery Objectives (RTO/RPO)

​Disaster Recovery (DR) Patterns

​Operational Readiness

Storage Architecture

noxus-infra Repo

Backup Strategy

Data Layer Resilience

PostgreSQL (Persistence Layer)

Object Storage (Liquid Data)

Recovery Objectives (RTO/RPO)

Disaster Recovery (DR) Patterns

Operational Readiness