> ## Documentation Index > Fetch the complete documentation index at: https://docs.noxus.ai/llms.txt > Use this file to discover all available pages before exploring further. # Monitoring > Enterprise observability, real-time metrics, and health tracking for Noxus Noxus provides deep visibility into its distributed architecture through standardized health endpoints, Prometheus-compatible metrics, and distributed tracing. ## Observability Architecture The platform is designed to be monitored at three distinct layers: the **Service Layer**, the **Coordination Layer**, and the **Data Layer**. ```mermaid theme={null} flowchart LR FE[Noxus Frontend] --> PM[Prometheus] BE[Noxus Backend] --> PM W[Noxus Workers] --> PM RE[Noxus Relays] --> PM BE --> OT[OpenTelemetry Collector] W --> OT RE --> OT PM --> GR[Grafana] ``` *** ## Key Performance Indicators (KPIs) To ensure a stable production environment, we recommend monitoring the following signals: * **Latency**: P95/P99 response times for API endpoints. - **Error Rates**: 4xx and 5xx response codes. - **Throughput**: Requests per second (RPS). * **Queue Depth**: Number of tasks waiting in the broker (Redis/RabbitMQ). - **Processing Lag**: Time between task creation and execution start. - **Worker Utilization**: CPU/Memory usage per worker pool. * **Connection Pressure**: Active vs. maximum allowed connections. - **Slow Queries**: Queries exceeding the 500ms threshold. - **IOPS**: Disk I/O utilization for vector search operations. * **Memory Saturation**: Percentage of available memory used. - **Eviction Rate**: Frequency of keys being removed due to memory limits. - **Command Latency**: Time taken to process coordination requests. *** ## Health & Metrics Endpoints All Noxus services expose standardized endpoints for automated health checks and metrics collection: * **Health Checks**: `/status/health` (Used by Kubernetes Liveness/Readiness probes). * **Prometheus Metrics**: `/metrics` (Exposes internal service counters and histograms). Noxus provides a set of **Default Grafana Dashboards** in the `noxus-infra` repository. These pre-configured dashboards provide immediate visibility into API performance, worker queue health, and resource utilization across your deployment. In Kubernetes deployments, the official Helm charts automatically annotate pods for Prometheus scraping, ensuring zero-config observability. *** ## Alerting Strategy We recommend setting up alerts for the following critical conditions: 1. **Service Availability**: Any core service reporting a non-healthy status. 2. **Queue Backlog**: Task queue depth exceeding defined thresholds for more than 5 minutes. 3. **Database Saturation**: PostgreSQL connection usage exceeding 80%. 4. **Model Provider Failures**: Sustained 5xx errors from external AI providers (OpenAI, Anthropic, etc.). Pair metrics with centralized logs for faster root cause analysis. Use monitoring signals to drive automated scaling policies.