Tech Blog by vClusterPress and Media Resources

Introduction to Kubernetes Cluster Autoscaling

Apr 27, 2026
|
9
min Read
Introduction to Kubernetes Cluster Autoscaling

Your deployment scaled to 50 replicas during last night's traffic surge, but 30 pods are still stuck in Pending because the cluster exhausted available node capacity at 2 am. The Kubernetes scheduler can't place workloads without compute resources, your fixed node pool is at maximum capacity, and the system continues attempting to add replicas. Manual intervention is the only option, but it defeats the purpose of automated scaling in the first place.

This type of failure occurs because Kubernetes scales workloads without provisioning the infrastructure to run them. Cluster Autoscaler (CA) closes that gap by scaling the cluster itself.

This guide explores what Cluster Autoscaler is, when to use it, basic implementation patterns across major cloud providers, and when specialized alternatives like Karpenter or vCluster provide a better architectural fit.

What Is Kubernetes Horizontal Pod Autoscaler and Vertical Pod Autoscaler?

Before exploring how CA provisions nodes, you need to know how Kubernetes scales workloads.

Kubernetes provides two primary mechanisms for workload-level scaling:

  • The Horizontal Pod Autoscaler (HPA) adjusts the number of pod replicas based on observed metrics like CPU utilization, memory consumption, or custom application metrics.
  • The Vertical Pod Autoscaler (VPA) modifies the resource requests and limits of existing pods to match actual usage patterns, preventing over-provisioning or resource starvation.

Both autoscalers operate exclusively at the workload layer. HPA adds more pods when demand increases, while VPA right-sizes individual pods. However, they share a critical assumption: sufficient compute nodes must already exist in the cluster. When node capacity is exhausted, HPA cannot schedule new pod replicas, leaving them in a Pending state indefinitely. Similarly, VPA fails when pods require more resources than any single node can provide. This capacity limit reveals the problem with workload-only scaling.

To address this scheduling bottleneck, you need to scale at the infrastructure layer, which is what CA helps you do.

What Is the Kubernetes Cluster Autoscaler?

CA dynamically adjusts the number of nodes in a Kubernetes cluster based on pod scheduling demands. When pods cannot be scheduled due to insufficient node capacity, CA provisions additional compute nodes. Conversely, it also removes underutilized nodes when workloads scale down, preventing wasted infrastructure spend.

CA Diagram, courtesy of Damaso Sanoja

As explained, HPA and VPA assume sufficient compute capacity already exists. CA fills this infrastructure gap by interacting with your cloud provider's API to provision or remove nodes automatically, maintaining optimal capacity without manual intervention.

Originally developed by the Kubernetes SIG Autoscaling team, CA now integrates with all major cloud providers, including AWS (via Auto Scaling Groups), GCP (native node pool integration), and Azure (through Virtual Machine Scale Sets). It supports both managed Kubernetes services like AWS EKS, GKE, and AKS, as well as self-managed clusters that expose compatible cloud APIs.

The relationship between Kubernetes' three autoscaling mechanisms becomes clearer when you examine their respective targets:

Scaling TypeTargetPurpose
Horizontal Pod Autoscaler (HPA)PodsAdds/removes pod replicas based on metrics like CPU/Memory.
Vertical Pod Autoscaler (VPA)PodsAdjusts resource requests/limits for existing pods.
Cluster Autoscaler (CA)NodesAdds/removes compute nodes at the infrastructure layer.

CA's node-level focus complements pod-level autoscaling. But does every cluster actually need this infrastructure-layer capability?

Why You Need Cluster Autoscaling

Fixed node pools force you into a trade-off. If you underprovision, traffic spikes leave pods unschedulable during marketing campaigns or batch processing runs. If you overprovision for peak capacity, you waste cloud spending on idle nodes during normal operations.

CA solves this by scaling infrastructure. It provisions nodes when pods can't be scheduled, then removes them when utilization drops. You maintain service availability during demand spikes without paying for idle capacity during quiet periods.

When to Use Cluster Autoscaler

CA works best when you can't predict your capacity needs or when you exhibit significant temporal variability. It's great for workloads with unpredictable traffic patterns, dev and staging environments where you want to minimize costs, or applications with different reliability needs running on the same cluster.

The following are the most common patterns where node-level autoscaling becomes operationally necessary:

  • Dynamic workload environments: Applications with diurnal traffic patterns (regional business hours spanning multiple time zones), event-driven architectures processing webhook bursts, or seasonal workloads like tax processing systems that experience predictable annual spikes.
  • Cost optimization in non-production environments: Development and staging clusters that remain idle outside deployment windows, test environments active only during business hours, or tenant-isolated SaaS platforms where customer activity concentrates in specific time zones.
  • Scheduled and predictive scaling: While CA operates reactively by monitoring pod scheduling failures, it complements proactive strategies like cron-triggered placeholder pods or Cluster Proportional Autoscaler for workloads with known capacity curves.
  • Multi-node group clusters: Production environments balancing cost against availability by running stateless workloads on spot instances (70 percent cheaper but interruptible) while keeping stateful services on on-demand nodes with guaranteed capacity.
  • CI/CD workloads: Build pipelines and test runners with ephemeral resource needs; agents spin up for 10-20 minute job executions, then terminate, leaving extended idle periods between builds.

CA Implementation Examples

CA implementation varies by cloud provider, with each offering different integration patterns and operational constraints. The following examples demonstrate basic setup procedures with a few major cloud providers.

AWS EKS

AWS EKS deploys CA via Helm or Kubernetes manifests targeting EC2 Auto Scaling Groups (ASGs). The autoscaler requires an IAM role with permissions for autoscaling:SetDesiredCapacity, autoscaling:DescribeAutoScalingGroups, and ec2:DescribeInstances.

You can deploy the autoscaler with the following:

helm install cluster-autoscaler autoscaler/cluster-autoscaler \
 --set autoDiscovery.clusterName=<cluster-name> \
 --set awsRegion=<region>

Critical requirement: ASGs must include specific resource tags (k8s.io/cluster-autoscaler/<cluster-name>: owned and k8s.io/cluster-autoscaler/enabled: true). Missing tags prevent autodiscovery, causing CA to ignore the ASG entirely. Additionally, using multiple ASGs with different instance types requires careful configuration of node affinity rules to prevent scheduling pods on incompatible nodes.

Google Kubernetes Engine

Google Kubernetes Engine (GKE) provides native CA support through node pool configuration. You enable autoscaling when creating a node pool:

gcloud container node-pools create <pool-name> \
 --cluster=<cluster-name> \
 --enable-autoscaling \
 --min-nodes=1 \
 --max-nodes=10

Operational consideration: Setting overly conservative min/max bounds defeats CA's cost optimization benefits. Multiple autoscaling node pools can compete for workload placement, leading to inefficient resource distribution. GKE's autoscaler also respects pod disruption budgets, which can prevent scale-down operations if budgets are too restrictive.

Azure AKS

AKS integrates CA at the cluster level, requiring a managed identity or service principal with Virtual Machine Scale Set permissions:

az aks update \
 --resource-group <resource-group> \
 --name <cluster-name> \
 --enable-cluster-autoscaler \
 --min-count 1 \
 --max-count 10

Common pitfall: Mixing manual and autoscaled node pools in the same cluster creates ambiguous scaling behavior. AKS may scale the wrong pool if pod resource requests don't clearly map to node pool specifications. Additionally, the autoscaler's scale-down process respects system pod placement; nodes with kube-system pods often resist termination.

When You Should Look Beyond CA

CA operates exclusively at the node group level, requiring predefined instance types within Auto Scaling Groups or node pools. This prevents dynamic selection of optimal instance types based on actual pod requirements.

Additionally, CA's total provisioning time includes both its scan interval (10 seconds by default, tunable to 60 seconds with minimal impact) and cloud provider node launch time, which typically exceeds two minutes. For workloads requiring sub-minute scaling responses, this combined delay becomes operationally significant.

Several alternatives address these constraints by rethinking node provisioning strategies.

Karpenter

CA vs. Karpenter diagram

Karpenter addresses CA's speed limitations by provisioning instances directly rather than scaling node groups. When pods become unschedulable, Karpenter selects optimal instance types from the cloud provider's catalog in real time, often completing provisioning in under 60 seconds (actual times vary based on instance availability and cloud provider responsiveness).

Originally built for AWS EKS, Karpenter now supports multiple cloud providers, though AWS implementation is still ideal. The architecture eliminates pre-configured node pools but requires learning Karpenter's custom resource model. For workloads needing sub-minute scaling or dynamic instance selection, Karpenter offers advantages over CA's design. A detailed comparison will be covered in a future article.

Virtual Kubelet and KEDA

Virtual Kubelet and Kubernetes Event-Driven Autoscaling (KEDA) offer complementary approaches to serverless Kubernetes.

  • Virtual Kubelet creates virtual nodes backed by serverless platforms like Azure Container Instances or AWS Fargate, eliminating node management.
  • KEDA enables event-driven autoscaling by monitoring sources like message queues, HTTP traffic, and database connections, then scaling pod replicas based on actual demand.

Used together, these tools enable workloads to scale from zero while running on serverless infrastructure. However, because serverless platforms prioritize tenant isolation and stateless execution models, pods face restricted network policies, limited persistent storage options, and cold start latencies that can exceed traditional node boot times when provisioning compute on demand.

vCluster Auto Nodes

vCluster Auto Nodes addresses tenant-isolated scaling challenges by virtualizing Kubernetes clusters within a control plane cluster, then automatically provisioning nodes using Karpenter internally for each tenant cluster based on tenant workload demands. This isolation model prevents resource contention between teams while maintaining centralized infrastructure management.

Additionally, tenant clusters scale independently without affecting neighboring tenants, and administrators can enforce per-tenant resource quotas and autoscaling policies. However, the virtualization layer adds scheduling overhead and troubleshooting issues across tenant and physical cluster boundaries.

This approach fits organizations running platform-as-a-service offerings or large-scale development environments requiring strong tenant isolation.

Building Your Autoscaling Strategy

CA solves the node-level scaling problem that pod autoscalers can't address. In this article, you learned how CA monitors pod scheduling failures and provisions nodes automatically, when to use it based on workload patterns, and how to implement it across AWS, GCP, and Azure. For single-tenant applications on managed Kubernetes, CA provides reliable node scaling with minimal operational overhead.

However, tenant-isolated platforms introduce complexity that CA wasn't designed to handle. When teams share infrastructure, you need per-tenant autoscaling with strong isolation and accurate cost attribution. vCluster's Auto Nodes feature extends Karpenter-powered autoscaling to tenant clusters, giving each tenant independent scaling policies while maintaining centralized infrastructure management across any environment: public cloud, private cloud, or bare metal.

Ready to explore autoscaling at scale? Check out vCluster's Auto Nodes documentation to see how tenant clusters handle tenant-isolated infrastructure.

Share:
Ready to take vCluster for a spin?

Deploy your first virtual cluster today.