> ## Documentation Index > Fetch the complete documentation index at: https://docs.noxus.ai/llms.txt > Use this file to discover all available pages before exploring further. # Scaling > Dynamic scaling strategies for control-plane and execution-plane workloads Noxus is designed with a decoupled architecture that allows you to scale the **Control Plane** (API/Frontend) and the **Execution Plane** (Workers) independently based on their unique workload profiles. ## Service Scaling Model The platform utilizes different scaling strategies for its various components to optimize for both performance and cost. **Frontend & Backend** * Scaled via standard **HPA** (Horizontal Pod Autoscaler). * Triggers based on CPU and Memory utilization. * Optimized for consistent API responsiveness. **Worker Pools** * Scaled per-pool via **KEDA** or **HPA**. * Triggers based on task queue depth or resource usage. * Optimized for high-throughput AI processing. *** ## Advanced Worker Pool Scaling Worker pools are the most dynamic part of the Noxus infrastructure. They support sophisticated scaling patterns to handle unpredictable AI workloads. ### KEDA-Driven Scaling (Queue-Based) For most production environments, we recommend using **KEDA** (Kubernetes Event-driven Autoscaling) for worker pools: * **Scale-to-Zero**: Automatically shut down workers when no tasks are in the queue to save costs. * **Rapid Bursts**: Instantly spin up dozens of workers when a high-volume batch job is submitted. * **Queue Awareness**: Scaling is based on the actual number of pending tasks in Redis or RabbitMQ, not just CPU usage. ### Resource-Based Scaling (HPA) For workloads with consistent, long-running tasks, standard HPA can be used to maintain a steady pool of workers based on CPU or Memory saturation. *** ## Multi-Region & Multi-Zone Scaling For global enterprises, Noxus supports scaling across multiple geographic regions and availability zones. * **Regional Replicas**: Deploy independent Frontend and Backend replicas in different regions to minimize latency for global users. * **Zone Resilience**: Distribute worker pools across multiple availability zones to ensure continuous operation during a zone failure. * **Independent Policies**: Configure unique autoscaling rules for each region based on local traffic patterns. *** ## Scaling Best Practices * **Monitor Bottlenecks**: Always keep an eye on PostgreSQL and Redis performance, as these can become bottlenecks before your compute resources do. * **Right-Size Pools**: Create dedicated worker pools for different task types (e.g., a GPU pool for inference, a high-memory pool for document processing). * **Test Your Limits**: Conduct regular load tests to understand the scaling latency of your infrastructure (how long it takes to spin up a new worker). Learn how to configure scaling parameters in your Helm values. Understand how to scale your data layer alongside your compute.