Observability & Monitoring
Noxus leverages industry-standard tools to provide a 360-degree view of your deployment’s health and performance.Prometheus Metrics
Standardized
/metrics endpoints across all services provide real-time counters and histograms. Track flow execution rates, worker utilization, and system-wide throughput.OpenTelemetry Tracing
Distributed tracing powered by OpenTelemetry allows you to follow a single request across the frontend, backend, and worker pools to identify bottlenecks.
Auditability & Compliance
Noxus maintains a high-fidelity record of all platform activity, ensuring you can meet strict regulatory and security requirements.Platform Audit Logs
Every administrative and management action is recorded in a tamper-proof Audit Log. This includes:- Identity: User ID, email, and API key used for the action.
- Context: Tenant and Workspace identifiers.
- Action: The specific operation performed (e.g.,
create,update,delete,execute). - Resource: The type and ID of the resource affected (e.g.,
workflow,agent,knowledge_base). - Payload: The request body and metadata associated with the change.
API & Access Logs
Detailed logs of every incoming API call are maintained to track usage patterns and security events:- Performance: Request duration (ms) and response codes.
- Routing: HTTP method and exact route accessed.
- Attribution: Mapping of every call to a specific user, group, and API key.
Maintenance & Backups
Ensure your AI solutions remain available and resilient through automated lifecycle management.Automated Backups
Configure scheduled snapshots for your persistence layer (PostgreSQL) and Object Storage. We recommend a minimum 30-day retention for production environments.
Disaster Recovery
Implement multi-region deployment patterns for critical workloads to ensure zero-downtime failover and RTO/RPO compliance.
Scaling & Resource Management
Dynamic Worker Scaling
Leverage KEDA and HPA to scale your compute resources based on actual demand:- Queue-Driven: Automatically spin up workers as task volume increases and scale-to-zero during idle periods.
- Workload Isolation: Deploy dedicated worker pools for specific workspaces or high-priority tasks.
Detailed Operations Guide
Explore the full technical guide for monitoring, logging, and scaling your Noxus deployment.