> ## Documentation Index
> Fetch the complete documentation index at: https://docs.noxus.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Scaling

> Dynamic scaling strategies for control-plane and execution-plane workloads

Noxus is designed with a decoupled architecture that allows you to scale the **Control Plane** (API/Frontend) and the **Execution Plane** (Workers) independently based on their unique workload profiles.

## Service Scaling Model

The platform utilizes different scaling strategies for its various components to optimize for both performance and cost.

<CardGroup cols={2}>
  <Card title="Control Plane" icon="server">
    **Frontend & Backend**

    * Scaled via standard **HPA** (Horizontal Pod Autoscaler).
    * Triggers based on CPU and Memory utilization.
    * Optimized for consistent API responsiveness.
  </Card>

  <Card title="Execution Plane" icon="bolt">
    **Worker Pools**

    * Scaled per-pool via **KEDA** or **HPA**.
    * Triggers based on task queue depth or resource usage.
    * Optimized for high-throughput AI processing.
  </Card>
</CardGroup>

***

## Advanced Worker Pool Scaling

Worker pools are the most dynamic part of the Noxus infrastructure. They support sophisticated scaling patterns to handle unpredictable AI workloads.

### KEDA-Driven Scaling (Queue-Based)

For most production environments, we recommend using **KEDA** (Kubernetes Event-driven Autoscaling) for worker pools:

* **Scale-to-Zero**: Automatically shut down workers when no tasks are in the queue to save costs.
* **Rapid Bursts**: Instantly spin up dozens of workers when a high-volume batch job is submitted.
* **Queue Awareness**: Scaling is based on the actual number of pending tasks in Redis or RabbitMQ, not just CPU usage.

### Resource-Based Scaling (HPA)

For workloads with consistent, long-running tasks, standard HPA can be used to maintain a steady pool of workers based on CPU or Memory saturation.

***

## Multi-Region & Multi-Zone Scaling

For global enterprises, Noxus supports scaling across multiple geographic regions and availability zones.

* **Regional Replicas**: Deploy independent Frontend and Backend replicas in different regions to minimize latency for global users.
* **Zone Resilience**: Distribute worker pools across multiple availability zones to ensure continuous operation during a zone failure.
* **Independent Policies**: Configure unique autoscaling rules for each region based on local traffic patterns.

***

## Scaling Best Practices

* **Monitor Bottlenecks**: Always keep an eye on PostgreSQL and Redis performance, as these can become bottlenecks before your compute resources do.
* **Right-Size Pools**: Create dedicated worker pools for different task types (e.g., a GPU pool for inference, a high-memory pool for document processing).
* **Test Your Limits**: Conduct regular load tests to understand the scaling latency of your infrastructure (how long it takes to spin up a new worker).

<CardGroup cols={2}>
  <Card title="Kubernetes Guide" icon="square-stack" href="/deployment/kubernetes/overview">
    Learn how to configure scaling parameters in your Helm values.
  </Card>

  <Card title="Storage Architecture" icon="database" href="/deployment/configuration/storage">
    Understand how to scale your data layer alongside your compute.
  </Card>
</CardGroup>
