Skip to main content
Version: main 🚧

Control plane sizing and performance

This page explains how to evaluate and tune a Tenant Cluster's Virtual Control Plane for your environment. It covers the evaluation framework, monitoring thresholds, resource profile starting points, and validated tuning settings. For the test methodology and analysis behind the profiles, see Benchmark methodology.

Scope and caveats​

Control plane only

The numbers on this page describe the capacity of the Virtual Control Plane (API server, controller manager, scheduler, and embedded etcd) under simulated load. They don't describe data-plane capacity, networking throughput, storage backend behavior, or the runtime cost of your actual workloads.

The load is simulated, not real. Tests use KWOK fake nodes with no real container runtime or kubelet. kube-burner drives object lifecycles to stress the control plane's bookkeeping, watch fan-out, and etcd write path. The tests don't cover real pod startup and shutdown latency, CNI throughput, container image pulls, NetworkPolicy enforcement, or application behavior. Treat these results as an upper bound. Your workload's true ceiling is lower with significant data-plane demands.

There is no published mapping from KWOK fake-node counts to real-node counts. Real workloads have data-plane costs that KWOK doesn't model. If your customer-visible claim is a node count, validate it on real nodes at the scale that matters.

How to evaluate your environment​

The recommended approach is to run the benchmark framework against a representative subset of your real configuration: your operators, your CRDs, your object density. Measure the control-plane response as it runs. The same SLI checks the framework uses internally make a portable monitoring template you can apply to any Tenant Cluster.

Run the framework​

The framework is open source: loft-sh/vCluster-Control-Plane-Load-Testing. It includes per-profile EKS Terraform modules, kube-burner workload configurations, the 13-SLI evaluation checks, and the doubling-ladder and binary-search scripts. See Benchmark methodology for test setup details.

Customize the workload (kube-burner/configs/) to match your tenant — your CRD mix, your operator density, your churn pattern — then run the ladder against your candidate profile.

What to monitor​

These are the 13 conditions the framework evaluates after each stage. They also make a useful production monitoring set:

#ConditionThreshold
1Control plane replica ReadyAny pod not Ready → fail
2aContainer restart count> 2 → fail
2bOOMKilledAny event → fail
3aSustained 429 responses> 0.1 req/s → fail
3bSustained 5xx responses> 0.1 req/s → fail
3cSustained request terminations> 0.1 req/s → fail
4API P99 latency (non-watch)> 2 s sustained → fail
5aetcd leader lostAny interval → fail
5betcd leader changesAny → fail
5cetcd proposals failedAny → fail
5detcd WAL fsync P99> 25 ms → warn
5eetcd DB size> 4 GB → warn; > 12 GB → fail
6Node convergence< 98% of target N → fail

If your workload trips any condition against a candidate profile, scale up to the next profile or apply the tuning settings below. For production alert thresholds, see Operational guidance.

When to validate on real nodes​

Synthetic testing covers the control-plane portion of sizing. Real validation is needed for:

  • Pod startup latency SLOs — image pulls, scheduling under contention, and kubelet binding latency are not modeled by KWOK.
  • Network programming p99 — Service, EndpointSlice, kube-proxy, and CNI interactions are not modeled.
  • CRD-heavy operator workloads — each CRD adds a separate watch cache and informer that KWOK doesn't exercise.
  • Real container runtime behavior — image pulls, probes, and evictions create event storms that synthetic load cannot reproduce.

If your customer-visible claim is a node count or pod count, validate it on real nodes at the scale that matters.

Choose a resource profile​

The following profiles are per-replica resource limits for the Virtual Control Plane StatefulSet. Always run 3 replicas in HA mode for any production Tenant Cluster.

ScaleProfileCPUMemoryNotes
Dev / sandboxP112 GiBLight, single-team CI environments
Small teamP224 GiBUp to a few hundred long-running services
Standard productionP348 GiBMost production tenants — recommended starting point
Large productionP4816 GiBHeavy churn or dense deployments
Enterprise / AI CloudP51632 GiBHigh-density GPU clusters; operator-rich workloads (cert-manager, external-secrets, GPU Operator, Argo, KServe, etc.)
HyperscaleP63264 GiBLatency headroom above P5; same node ceiling, marginal API p99 improvement

Key principles:

  • P3 (4 CPU / 8 GiB) is the right default for most production Tenant Clusters.
  • Below 4 CPU, node capacity scales roughly linearly with CPU. The control plane saturates CPU before the etcd write path becomes the constraint.
  • Above 4 CPU, the ceiling flattens. The etcd write path becomes the binding constraint, not CPU. Additional resources buy latency headroom, not more nodes.
  • Step up to P5 when you have operator-rich workloads (20+ controllers issuing persistent watches) or are hitting latency-sensitive SLOs under sustained churn.
  • Memory scales with object count, not node count. Pod, Deployment, and Service count drives etcd DB size. Plan memory based on expected object density, not node count alone.
  • Always run 3-replica HA. A single replica can't tolerate even a brief etcd compaction stall without impacting tenant API availability.

To configure a resource profile, set controlPlane.statefulSet.resources in vcluster.yaml:

vcluster.yaml — P5 resource profile, 3-replica HA
controlPlane:
statefulSet:
highAvailability:
replicas: 3
resources:
limits:
cpu: "16"
memory: 32Gi
requests:
cpu: "16"
memory: 32Gi

Tune your control plane​

We validated these settings across all profiles under sustained synthetic load. The same knobs apply to real workloads, though gains depend on your traffic mix.

Go garbage collection pacing​

Set on the Virtual Control Plane container:

vcluster.yaml — Go GC tuning (P5 example)
controlPlane:
statefulSet:
env:
- name: GOGC
value: "200" # default is 100; doubling halves GC overhead
- name: GOMEMLIMIT
value: "24000MiB" # ~75% of the pod memory limit

GOGC=200 halves GC overhead at the cost of slightly higher steady-state memory use. GOMEMLIMIT prevents OOM events by giving the Go runtime a hard ceiling to plan against. This combination meaningfully reduces API p99 latency under sustained churn. Set GOMEMLIMIT to about 75% of your profile's memory limit. For example, use 12000MiB for a P4 16 GiB pod, or 6000MiB for a P3 8 GiB pod.

etcd storage class​

Embedded etcd is sensitive to WAL fsync latency. Use a high-IOPS StorageClass for the etcd PersistentVolume:

vcluster.yaml — etcd storage
controlPlane:
backingStore:
etcd:
embedded:
enabled: true
statefulSet:
persistence:
volumeClaim:
storageClass: <your-high-iops-storageclass>
size: 100Gi

Create a StorageClass with at least 16K IOPS and 500 MB/s throughput. Generic cloud defaults push WAL fsync p99 into the 25–50 ms range under heavy churn, above the 25 ms warn threshold. For example, AWS gp3 defaults to 3K IOPS and 125 MB/s. A high-IOPS class keeps WAL fsync p99 well within the warn threshold even under sustained load.

API server concurrency​

The Kubernetes defaults (400 / 200) are sufficient for most workloads. For operator-rich Tenant Clusters with many persistent controllers, double them to absorb the extra LIST and WATCH bookkeeping during churn cycles:

vcluster.yaml — API server concurrency (k8s distro)
controlPlane:
distro:
k8s:
apiServer:
extraArgs:
- --max-requests-inflight=800
- --max-mutating-requests-inflight=400

Controller manager and scheduler — client throttling​

Both components have --kube-api-qps and --kube-api-burst settings. The defaults (50 QPS / 100 burst) are too conservative under heavy churn:

vcluster.yaml — controller manager and scheduler QPS
controlPlane:
distro:
k8s:
controllerManager:
extraArgs:
- --kube-api-qps=200
- --kube-api-burst=400
scheduler:
extraArgs:
- --kube-api-qps=200
- --kube-api-burst=400

Syncer outbound rate limits​

The vCluster syncer reflects Tenant Cluster objects to the Control Plane Cluster. Under heavy churn, its outbound API rate becomes the next bottleneck after the tenant API server. Set the rate limits using environment variables on the control plane pod:

vcluster.yaml — syncer outbound rate limits
controlPlane:
statefulSet:
env:
- name: VCLUSTER_PHYSICAL_CLIENT_QPS
value: "400"
- name: VCLUSTER_PHYSICAL_CLIENT_BURST
value: "800"

Rate-limit alignment​

For high-density Tenant Clusters, raise all three rate-limit stacks together. Lifting one without the others shifts the bottleneck rather than removing it:

StackQPSBurst
Tenant controller manager / scheduler → tenant API server200400
Syncer → Control Plane Cluster API server400800
Client (kubectl, CI pipelines, etc.)100200

Bootstrap replica order​

When provisioning a new HA Tenant Cluster with embedded etcd, bring up replica 0 first. Wait for it to become Ready before scaling to 3. This avoids a race condition where all three replicas try to become the etcd cluster founder simultaneously. The vCluster CLI handles this automatically. In self-managed Helm pipelines, handle it explicitly.

Operational guidance​

Plan Control Plane Cluster node capacity. A P5 Virtual Control Plane runs comfortably on a c6a.8xlarge node. P6 needs c6a.16xlarge or larger. Karpenter with consolidationPolicy: WhenEmpty and a 5-minute consolidateAfter interval works well in steady state.

MetricAlert threshold
API p99 latency> 1 s
API 5xx rate> 0.1 req/s
etcd WAL fsync p99> 25 ms
etcd DB size> 4 GB (warn) / 12 GB (fail)
Dropped watches per second> 0 sustained
Node convergence ratio< 98%

Tenant Isolation is per-Tenant-Cluster. Each Tenant Cluster is fully isolated. Control-plane sizing is per-tenant, not aggregate. A single Control Plane Cluster can host many Tenant Clusters concurrently, within its own capacity limits. When planning, multiply the per-tenant profile by your expected tenant count.