Scaling

Noxus is designed with a decoupled architecture that allows you to scale the Control Plane (API/Frontend) and the Execution Plane (Workers) independently based on their unique workload profiles.

Service Scaling Model

The platform utilizes different scaling strategies for its various components to optimize for both performance and cost.

Control Plane

Frontend & Backend

Scaled via standard HPA (Horizontal Pod Autoscaler).
Triggers based on CPU and Memory utilization.
Optimized for consistent API responsiveness.

Execution Plane

Worker Pools

Scaled per-pool via KEDA or HPA.
Triggers based on task queue depth or resource usage.
Optimized for high-throughput AI processing.

Advanced Worker Pool Scaling

Worker pools are the most dynamic part of the Noxus infrastructure. They support sophisticated scaling patterns to handle unpredictable AI workloads.

KEDA-Driven Scaling (Queue-Based)

For most production environments, we recommend using KEDA (Kubernetes Event-driven Autoscaling) for worker pools:

Scale-to-Zero: Automatically shut down workers when no tasks are in the queue to save costs.
Rapid Bursts: Instantly spin up dozens of workers when a high-volume batch job is submitted.
Queue Awareness: Scaling is based on the actual number of pending tasks in Redis or RabbitMQ, not just CPU usage.

Resource-Based Scaling (HPA)

For workloads with consistent, long-running tasks, standard HPA can be used to maintain a steady pool of workers based on CPU or Memory saturation.

Multi-Region & Multi-Zone Scaling

For global enterprises, Noxus supports scaling across multiple geographic regions and availability zones.

Regional Replicas: Deploy independent Frontend and Backend replicas in different regions to minimize latency for global users.
Zone Resilience: Distribute worker pools across multiple availability zones to ensure continuous operation during a zone failure.
Independent Policies: Configure unique autoscaling rules for each region based on local traffic patterns.

Scaling Best Practices

Monitor Bottlenecks: Always keep an eye on PostgreSQL and Redis performance, as these can become bottlenecks before your compute resources do.
Right-Size Pools: Create dedicated worker pools for different task types (e.g., a GPU pool for inference, a high-memory pool for document processing).
Test Your Limits: Conduct regular load tests to understand the scaling latency of your infrastructure (how long it takes to spin up a new worker).

Kubernetes Guide

Learn how to configure scaling parameters in your Helm values.

Storage Architecture

Understand how to scale your data layer alongside your compute.

Overview

Deployment Options

Configuration

Security

Operations

Service Scaling Model

Control Plane

Execution Plane

Advanced Worker Pool Scaling

KEDA-Driven Scaling (Queue-Based)

Resource-Based Scaling (HPA)

Multi-Region & Multi-Zone Scaling

Scaling Best Practices

Kubernetes Guide

Storage Architecture

Overview

Deployment Options

Configuration

Security

Operations

​Service Scaling Model

Control Plane

Execution Plane

​Advanced Worker Pool Scaling

​KEDA-Driven Scaling (Queue-Based)

​Resource-Based Scaling (HPA)

​Multi-Region & Multi-Zone Scaling

​Scaling Best Practices

Kubernetes Guide

Storage Architecture

Service Scaling Model

Advanced Worker Pool Scaling

KEDA-Driven Scaling (Queue-Based)

Resource-Based Scaling (HPA)

Multi-Region & Multi-Zone Scaling

Scaling Best Practices