Version: main 🚧

Limited vCluster Tenancy Configuration Support

This feature is only available when using the following worker node types:

Host Nodes

Configure NVIDIA KAI scheduler with vCluster

This guide explains how to configure vCluster to work with the NVIDIA KAI scheduler. KAI is a Kubernetes scheduler that optimizes resource allocation for both GPU and CPU workloads, with advanced queuing, fractional GPU allocation, and topology awareness.

NVIDIA KAI scheduler background

NVIDIA KAI scheduler is the open source version of Run:AI's scheduler technology, which is why it uses podgroups.scheduling.run.ai CRDs. KAI was open sourced by NVIDIA in 2025 under the Apache 2.0 license.

Prerequisites

Administrator access to a Kubernetes cluster: See Accessing Clusters with kubectl for more information. Run the command kubectl auth can-i create clusterrole -A to verify that your current kube-context has administrative privileges.

info
To obtain a kube-context with admin access, ensure you have the necessary credentials and permissions for your Kubernetes cluster. This typically involves using kubectl config commands or authenticating through your cloud provider's CLI tools.
helm: Helm v3.10 is required for deploying the platform. Refer to the Helm Installation Guide if you need to install it.
kubectl: Kubernetes command-line tool for interacting with the cluster. See Install and Set Up kubectl for installation instructions.

KAI scheduler installed in the host clusterHost ClusterThe physical Kubernetes cluster where virtual clusters are deployed and run. The host cluster provides the infrastructure resources (CPU, memory, storage, networking) that virtual clusters leverage, while maintaining isolation between different virtual environments.Related: Virtual Cluster

Understand the challenge

By default, when syncing workload pods from a vCluster to the host cluster, vCluster sets ownership references on the synced pods. When using custom schedulers like KAI, this causes issues because:

KAI's pod-grouper controller watches for pods with schedulerName: kai-scheduler
The pod-grouper controller traverses the owner references to group related pods
For vCluster workloads, the owner reference is set to the vCluster service in the host namespace
The pod-grouper may not have sufficient permissions to access these Service resources

Solution: Configure vCluster for KAI scheduler

To properly integrate vCluster with KAI scheduler, you need to configure one thing:

Turn off owner references to allow KAI's pod-grouper to process pods properly.

Add the following to your vCluster configuration (values.yaml for Helm or vCluster config):

vCluster configuration for KAI scheduler
experimental:
  syncSettings:
    setOwner: false

This configuration prevents owner reference traversal issues by not setting them on synced pods. The KAI scheduler's pod-grouper controller automatically creates PodGroup resources in the host cluster.

Apply this configuration when creating or updating your vCluster:

Create vCluster with KAI scheduler configuration
vcluster create my-vcluster --values kai-scheduler-values.yaml

Or for an existing vCluster, update the configuration:

Update existing vCluster configuration
vcluster create --upgrade my-vcluster --values kai-scheduler-values.yaml

Where kai-scheduler-values.yaml contains the configuration shown previously.

Use KAI scheduler for vCluster workloads

After applying this configuration, you can use the KAI scheduler for your vCluster workloads by specifying schedulerName: kai-scheduler in the pod specification.

Example: CPU-only workload

Here's an example of a CPU-only pod using the KAI scheduler:

CPU-only pod with KAI scheduler
apiVersion: v1
kind: Pod
metadata:
  name: cpu-only-pod
spec:
  schedulerName: kai-scheduler
  containers:
  - name: main
    image: ubuntu
    args:
    - sleep
    - infinity
    resources:
      requests:
        cpu: 100m
        memory: 250M

Example: GPU workload

Here's an example of a GPU pod using the KAI scheduler:

GPU pod with KAI scheduler
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  schedulerName: kai-scheduler
  containers:
  - name: main
    image: ubuntu
    command:
    - /bin/bash
    - -c
    - nvidia-smi && sleep infinity
    resources:
      requests:
        nvidia.com/gpu: '1'
      limits:
        nvidia.com/gpu: '1'

Example: Fractional GPU workload

KAI scheduler supports GPU sharing through fractional allocation. Here's an example using half a GPU:

Fractional GPU pod with KAI scheduler
apiVersion: v1
kind: Pod
metadata:
  name: gpu-sharing-pod
  labels:
    kai.scheduler/queue: test  # Optional: specify queue
  annotations:
    gpu-fraction: "0.5"  # Request half of a GPU
spec:
  schedulerName: kai-scheduler
  containers:
  - name: main
    image: ubuntu
    args: ["sleep", "infinity"]

GPU sharing notes

GPU sharing must be enabled during KAI installation with --set "global.gpuSharing=true"
KAI doesn't enforce memory isolation - applications should respect their allocation
You can also use gpu-memory: "2000" annotation to request specific MiB

Verify configuration

To verify that the KAI scheduler is properly configured with vCluster:

Confirm vCluster is using experimental.syncSettings.setOwner=false in its configuration
Check that pods are being scheduled by KAI scheduler:
Verify pod is using KAI scheduler
```
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.schedulerName}'
```
Ensure the KAI scheduler components are running:
Check KAI scheduler components
```
kubectl get pods -n kai-scheduler
```

Scheduler behavior

When using the KAI scheduler with vCluster and setOwner: false, you may observe:

The pod-grouper adds a podgroup annotation to your pod:

Annotations: pod-group-name: pg-pod-name-[uuid]

You might see a non-blocking message in the logs:

Detected pod with no owner but with podgroup annotation

This is expected and doesn't affect scheduling.

The setOwner: false configuration successfully resolves owner reference issues and allows the KAI scheduler to work properly with vCluster workloads in production environments.

How it works

The KAI scheduler's pod-grouper component is a controller that:

Watches pods with schedulerName: kai-scheduler
Traverses the owner references to find the topmost owner
Groups related pods for optimal scheduling decisions
Applies custom scheduling logic based on workload type

When vCluster syncs pods to the host cluster, it sets an owner reference to the vCluster service by default. By disabling this with setOwner: false, the pod-grouper can process the pods normally without needing to follow service references.

Limitations and considerations

The experimental.syncSettings.setOwner: false configuration is marked as experimental and may change in future vCluster releases
If you have other features that rely on pod ownership in vCluster, disabling owner references may affect those features
The KAI scheduler might require additional configuration for advanced features like GPU sharing and queue management
For GPU workloads, ensure that the host cluster has the necessary GPU drivers and device plugins installed

Troubleshoot common issues

Common problems

GPU sharing pods stuck in pending:
- Verify GPU sharing is enabled: --set "global.gpuSharing=true" during KAI installation
- Check if reservation pods are created in kai-resource-reservation namespace
Pod-grouper permission errors:
- With setOwner: false, these should not occur
- If they persist, check KAI scheduler logs: kubectl logs -n kai-scheduler -l component=scheduler

Prerequisites​

Understand the challenge​

Solution: Configure vCluster for KAI scheduler​

Use KAI scheduler for vCluster workloads​

Example: CPU-only workload​

Example: GPU workload​

Example: Fractional GPU workload​

Verify configuration​

How it works​

Limitations and considerations​

Troubleshoot common issues​

Common problems​