> ## Documentation Index
> Fetch the complete documentation index at: https://docs.noxus.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Operations

> Enterprise observability, scaling, and lifecycle management for Noxus

Noxus provides a comprehensive operational framework designed to give you deep visibility into your AI infrastructure and the tools to manage it at scale.

## Observability & Monitoring

Noxus leverages industry-standard tools to provide a 360-degree view of your deployment's health and performance.

<CardGroup cols={2}>
  <Card title="Prometheus Metrics" icon="chart-line">
    Standardized `/metrics` endpoints across all services provide real-time counters and histograms. Track flow execution rates, worker utilization, and system-wide throughput.
  </Card>

  <Card title="OpenTelemetry Tracing" icon="diagram-project">
    Distributed tracing powered by **OpenTelemetry** allows you to follow a single request across the frontend, backend, and worker pools to identify bottlenecks.
  </Card>
</CardGroup>

***

## Auditability & Compliance

Noxus maintains a high-fidelity record of all platform activity, ensuring you can meet strict regulatory and security requirements.

### Platform Audit Logs

Every administrative and management action is recorded in a tamper-proof **Audit Log**. This includes:

* **Identity**: User ID, email, and API key used for the action.
* **Context**: Tenant and Workspace identifiers.
* **Action**: The specific operation performed (e.g., `create`, `update`, `delete`, `execute`).
* **Resource**: The type and ID of the resource affected (e.g., `workflow`, `agent`, `knowledge_base`).
* **Payload**: The request body and metadata associated with the change.

### API & Access Logs

Detailed logs of every incoming API call are maintained to track usage patterns and security events:

* **Performance**: Request duration (ms) and response codes.
* **Routing**: HTTP method and exact route accessed.
* **Attribution**: Mapping of every call to a specific user, group, and API key.

***

## Maintenance & Backups

Ensure your AI solutions remain available and resilient through automated lifecycle management.

<Steps>
  <Step title="Automated Backups">
    Configure scheduled snapshots for your persistence layer (**PostgreSQL**) and **Object Storage**. We recommend a minimum 30-day retention for production environments.
  </Step>

  <Step title="Disaster Recovery">
    Implement multi-region deployment patterns for critical workloads to ensure zero-downtime failover and RTO/RPO compliance.
  </Step>

  <Step title="Liquid Data Lifecycle">
    Define data retention rules to automatically move information between high-performance cache and low-cost object storage based on active usage.
  </Step>
</Steps>

***

## Scaling & Resource Management

### Dynamic Worker Scaling

Leverage **KEDA** and **HPA** to scale your compute resources based on actual demand:

* **Queue-Driven**: Automatically spin up workers as task volume increases and scale-to-zero during idle periods.
* **Workload Isolation**: Deploy dedicated worker pools for specific workspaces or high-priority tasks.

<Card title="Detailed Operations Guide" icon="book-open" href="/deployment/operations/monitoring">
  Explore the full technical guide for monitoring, logging, and scaling your Noxus deployment.
</Card>
