Skip to main content
Version: main 🚧

Configure NVIDIA KAI scheduler with vCluster

This guide explains how to configure vCluster to work with the NVIDIA KAI scheduler. KAI is a Kubernetes scheduler that optimizes resource allocation for both GPU and CPU workloads, with advanced queuing, fractional GPU allocation, and topology awareness.

NVIDIA KAI scheduler background

NVIDIA KAI scheduler is the open source version of Run:AI's scheduler technology, which is why it uses podgroups.scheduling.run.ai CRDs. KAI was open sourced by NVIDIA in 2025 under the Apache 2.0 license.

Prerequisites​

  • Administrator access to a Kubernetes cluster: See Accessing Clusters with kubectl for more information. Run the command kubectl auth can-i create clusterrole -A to verify that your current kube-context has administrative privileges.

    info

    To obtain a kube-context with admin access, ensure you have the necessary credentials and permissions for your Kubernetes cluster. This typically involves using kubectl config commands or authenticating through your cloud provider's CLI tools.

  • helm: Helm v3.10 is required for deploying the platform. Refer to the Helm Installation Guide if you need to install it.

  • kubectl: Kubernetes command-line tool for interacting with the cluster. See Install and Set Up kubectl for installation instructions.

  • KAI scheduler installed in the host clusterHost ClusterThe physical Kubernetes cluster where virtual clusters are deployed and run. The host cluster provides the infrastructure resources (CPU, memory, storage, networking) that virtual clusters leverage, while maintaining isolation between different virtual environments.Related: Virtual Cluster

Understand the challenge​

By default, when syncing workload pods from a vCluster to the host cluster, vCluster sets ownership references on the synced pods. When using custom schedulers like KAI, this causes issues because:

  • KAI's pod-grouper controller watches for pods with schedulerName: kai-scheduler
  • The pod-grouper controller traverses the owner references to group related pods
  • For vCluster workloads, the owner reference is set to the vCluster service in the host namespace
  • The pod-grouper may not have sufficient permissions to access these Service resources

Solution: Configure vCluster for KAI scheduler​

To properly integrate vCluster with KAI scheduler, you need to configure two things:

  1. Turn off owner references to allow KAI's pod-grouper to process pods
  2. Sync PodGroup resources to prevent controller conflicts

Add the following to your vCluster configuration (values.yaml for Helm or vCluster config):

Enterprise-Only Feature

This feature is an Enterprise feature. See our pricing plans or contact our sales team for more information.

info

For more information on syncing custom resources, see the Custom resources documentation.

Complete vCluster configuration for KAI scheduler
experimental:
syncSettings:
setOwner: false

sync:
fromHost:
customResources:
podgroups.scheduling.run.ai:
enabled: true
scope: Namespaced

This configuration:

  • Prevents owner reference traversal issues by not setting them on synced pods
  • Syncs PodGroup resources from the host cluster to avoid conflicts when KAI's controllers manage them

Apply this configuration when creating or updating your vCluster:

Create vCluster with KAI scheduler configuration
vcluster create my-vcluster --values kai-scheduler-values.yaml

Or for an existing vCluster, update the configuration:

Update existing vCluster configuration
vcluster create --upgrade my-vcluster --values kai-scheduler-values.yaml

Where kai-scheduler-values.yaml contains the configuration shown previously.

Use KAI scheduler for vCluster workloads​

After applying this configuration, you can use the KAI scheduler for your vCluster workloads by specifying schedulerName: kai-scheduler in the pod specification.

Example: CPU-only workload​

Here's an example of a CPU-only pod using the KAI scheduler:

CPU-only pod with KAI scheduler
apiVersion: v1
kind: Pod
metadata:
name: cpu-only-pod
spec:
schedulerName: kai-scheduler
containers:
- name: main
image: ubuntu
args:
- sleep
- infinity
resources:
requests:
cpu: 100m
memory: 250M

Example: GPU workload​

Here's an example of a GPU pod using the KAI scheduler:

GPU pod with KAI scheduler
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
schedulerName: kai-scheduler
containers:
- name: main
image: ubuntu
command:
- /bin/bash
- -c
- nvidia-smi && sleep infinity
resources:
requests:
nvidia.com/gpu: '1'
limits:
nvidia.com/gpu: '1'

Example: Fractional GPU workload​

KAI scheduler supports GPU sharing through fractional allocation. Here's an example using half a GPU:

Fractional GPU pod with KAI scheduler
apiVersion: v1
kind: Pod
metadata:
name: gpu-sharing-pod
labels:
kai.scheduler/queue: test # Optional: specify queue
annotations:
gpu-fraction: "0.5" # Request half of a GPU
spec:
schedulerName: kai-scheduler
containers:
- name: main
image: ubuntu
args: ["sleep", "infinity"]
GPU sharing notes
  • GPU sharing must be enabled during KAI installation with --set "global.gpuSharing=true"
  • KAI doesn't enforce memory isolation - applications should respect their allocation
  • You can also use gpu-memory: "2000" annotation to request specific MiB

Verify configuration​

To verify that the KAI scheduler is properly configured with vCluster:

  • Confirm vCluster is using experimental.syncSettings.setOwner=false in its configuration

  • Check that pods are being annotated with the podgroup name:

    Verify pod has podgroup annotation
    kubectl describe pod <pod-name> -n <namespace> | grep pod-group-name
  • Ensure the KAI scheduler components are running:

    Check KAI scheduler components
    kubectl get pods -n kai-scheduler
Scheduler behavior

When using the KAI scheduler with vCluster and setOwner: false, you may observe:

  1. The pod-grouper adds a podgroup annotation to your pod:
Annotations: pod-group-name: pg-pod-name-[uuid]
  1. You might see a non-blocking message in the logs:
Detected pod with no owner but with podgroup annotation

This is expected and doesn't affect scheduling.

The setOwner: false configuration successfully resolves owner reference issues and allows the KAI scheduler to work properly with vCluster workloads in production environments.

How it works​

The KAI scheduler's pod-grouper component is a controller that:

  • Watches pods with schedulerName: kai-scheduler
  • Traverses the owner references to find the topmost owner
  • Groups related pods for optimal scheduling decisions
  • Applies custom scheduling logic based on workload type

When vCluster syncs pods to the host cluster, it sets an owner reference to the vCluster service by default. By disabling this with setOwner: false, the pod-grouper can process the pods normally without needing to follow service references.

Limitations and considerations​

  • The experimental.syncSettings.setOwner: false configuration is marked as experimental and may change in future vCluster releases
  • If you have other features that rely on pod ownership in vCluster, disabling owner references may affect those features
  • The KAI scheduler might require additional configuration for advanced features like GPU sharing and queue management
  • For GPU workloads, ensure that the host cluster has the necessary GPU drivers and device plugins installed

Troubleshoot common issues​

PodGroup conflict errors​

If you see errors like:

Operation cannot be fulfilled on podgroups.scheduling.run.ai "pg-gpu-sharing1-c1ee7e85-52ec-4e4d-b8da-5a579f89817c":
the object has been modified; please apply your changes to the latest version and try again

This indicates multiple controllers are trying to update the same PodGroup resource. Ensure you have:

  1. Configured PodGroup synchronization: The sync.fromHost.customResources configuration is essential to prevent conflicts
  2. Only one KAI scheduler instance: Multiple scheduler instances can cause conflicts
  3. Proper RBAC permissions: vCluster needs permissions to sync PodGroup resources

Verify PodGroup synchronization​

Check if PodGroups are being properly synced:

Check PodGroups in host cluster
kubectl get podgroups.scheduling.run.ai -A

# Check PodGroups in vCluster
kubectl get podgroups.scheduling.run.ai -A

Other problems​

  1. Missing PodGroup CRD in vCluster:

    • Ensure KAI scheduler is installed in the host cluster before creating the vCluster
    • The CRD should be automatically copied when PodGroup sync is enabled
  2. GPU sharing pods stuck in pending:

    • Verify GPU sharing is enabled: --set "global.gpuSharing=true" during KAI installation
    • Check if reservation pods are created in kai-resource-reservation namespace
  3. Pod-grouper permission errors:

    • With setOwner: false, these should not occur
    • If they persist, check KAI scheduler logs: kubectl logs -n kai-scheduler -l component=scheduler