Skip to main content
Version: v0.27 Stable
Limited vCluster Tenancy Configuration Support

This feature is only available when using the following worker node types:

  • Host Nodes
  • Configure NVIDIA KAI scheduler with vCluster

    This guide explains how to configure vCluster to work with the NVIDIA KAI scheduler. KAI is a Kubernetes scheduler that optimizes resource allocation for both GPU and CPU workloads, with advanced queuing, fractional GPU allocation, and topology awareness.

    NVIDIA KAI scheduler background

    NVIDIA KAI scheduler is the open source version of Run:AI's scheduler technology, which is why it uses podgroups.scheduling.run.ai CRDs. KAI was open sourced by NVIDIA in 2025 under the Apache 2.0 license.

    Prerequisites​

    • Administrator access to a Kubernetes cluster: See Accessing Clusters with kubectl for more information. Run the command kubectl auth can-i create clusterrole -A to verify that your current kube-context has administrative privileges.

      info

      To obtain a kube-context with admin access, ensure you have the necessary credentials and permissions for your Kubernetes cluster. This typically involves using kubectl config commands or authenticating through your cloud provider's CLI tools.

    • helm: Helm v3.10 is required for deploying the platform. Refer to the Helm Installation Guide if you need to install it.

    • kubectl: Kubernetes command-line tool for interacting with the cluster. See Install and Set Up kubectl for installation instructions.

    • KAI scheduler installed in the host clusterHost ClusterThe physical Kubernetes cluster where virtual clusters are deployed and run. The host cluster provides the infrastructure resources (CPU, memory, storage, networking) that virtual clusters leverage, while maintaining isolation between different virtual environments.Related: Virtual Cluster

    Understand the challenge​

    By default, when syncing workload pods from a vCluster to the host cluster, vCluster sets ownership references on the synced pods. When using custom schedulers like KAI, this causes issues because:

    • KAI's pod-grouper controller watches for pods with schedulerName: kai-scheduler
    • The pod-grouper controller traverses the owner references to group related pods
    • For vCluster workloads, the owner reference is set to the vCluster service in the host namespace
    • The pod-grouper may not have sufficient permissions to access these Service resources

    Solution: Configure vCluster for KAI scheduler​

    To properly integrate vCluster with KAI scheduler, you need to configure one thing:

    Turn off owner references to allow KAI's pod-grouper to process pods properly.

    Add the following to your vCluster configuration (values.yaml for Helm or vCluster config):

    vCluster configuration for KAI scheduler
    experimental:
    syncSettings:
    setOwner: false

    This configuration prevents owner reference traversal issues by not setting them on synced pods. The KAI scheduler's pod-grouper controller automatically creates PodGroup resources in the host cluster.

    Apply this configuration when creating or updating your vCluster:

    Create vCluster with KAI scheduler configuration
    vcluster create my-vcluster --values kai-scheduler-values.yaml

    Or for an existing vCluster, update the configuration:

    Update existing vCluster configuration
    vcluster create --upgrade my-vcluster --values kai-scheduler-values.yaml

    Where kai-scheduler-values.yaml contains the configuration shown previously.

    Use KAI scheduler for vCluster workloads​

    After applying this configuration, you can use the KAI scheduler for your vCluster workloads by specifying schedulerName: kai-scheduler in the pod specification.

    Example: CPU-only workload​

    Here's an example of a CPU-only pod using the KAI scheduler:

    CPU-only pod with KAI scheduler
    apiVersion: v1
    kind: Pod
    metadata:
    name: cpu-only-pod
    spec:
    schedulerName: kai-scheduler
    containers:
    - name: main
    image: ubuntu
    args:
    - sleep
    - infinity
    resources:
    requests:
    cpu: 100m
    memory: 250M

    Example: GPU workload​

    Here's an example of a GPU pod using the KAI scheduler:

    GPU pod with KAI scheduler
    apiVersion: v1
    kind: Pod
    metadata:
    name: gpu-pod
    spec:
    schedulerName: kai-scheduler
    containers:
    - name: main
    image: ubuntu
    command:
    - /bin/bash
    - -c
    - nvidia-smi && sleep infinity
    resources:
    requests:
    nvidia.com/gpu: '1'
    limits:
    nvidia.com/gpu: '1'

    Example: Fractional GPU workload​

    KAI scheduler supports GPU sharing through fractional allocation. Here's an example using half a GPU:

    Fractional GPU pod with KAI scheduler
    apiVersion: v1
    kind: Pod
    metadata:
    name: gpu-sharing-pod
    labels:
    kai.scheduler/queue: test # Optional: specify queue
    annotations:
    gpu-fraction: "0.5" # Request half of a GPU
    spec:
    schedulerName: kai-scheduler
    containers:
    - name: main
    image: ubuntu
    args: ["sleep", "infinity"]
    GPU sharing notes
    • GPU sharing must be enabled during KAI installation with --set "global.gpuSharing=true"
    • KAI doesn't enforce memory isolation - applications should respect their allocation
    • You can also use gpu-memory: "2000" annotation to request specific MiB

    Verify configuration​

    To verify that the KAI scheduler is properly configured with vCluster:

    • Confirm vCluster is using experimental.syncSettings.setOwner=false in its configuration

    • Check that pods are being scheduled by KAI scheduler:

      Verify pod is using KAI scheduler
      kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.schedulerName}'
    • Ensure the KAI scheduler components are running:

      Check KAI scheduler components
      kubectl get pods -n kai-scheduler
    Scheduler behavior

    When using the KAI scheduler with vCluster and setOwner: false, you may observe:

    1. The pod-grouper adds a podgroup annotation to your pod:
    Annotations: pod-group-name: pg-pod-name-[uuid]
    1. You might see a non-blocking message in the logs:
    Detected pod with no owner but with podgroup annotation

    This is expected and doesn't affect scheduling.

    The setOwner: false configuration successfully resolves owner reference issues and allows the KAI scheduler to work properly with vCluster workloads in production environments.

    How it works​

    The KAI scheduler's pod-grouper component is a controller that:

    • Watches pods with schedulerName: kai-scheduler
    • Traverses the owner references to find the topmost owner
    • Groups related pods for optimal scheduling decisions
    • Applies custom scheduling logic based on workload type

    When vCluster syncs pods to the host cluster, it sets an owner reference to the vCluster service by default. By disabling this with setOwner: false, the pod-grouper can process the pods normally without needing to follow service references.

    Limitations and considerations​

    • The experimental.syncSettings.setOwner: false configuration is marked as experimental and may change in future vCluster releases
    • If you have other features that rely on pod ownership in vCluster, disabling owner references may affect those features
    • The KAI scheduler might require additional configuration for advanced features like GPU sharing and queue management
    • For GPU workloads, ensure that the host cluster has the necessary GPU drivers and device plugins installed

    Troubleshoot common issues​

    Common problems​

    1. GPU sharing pods stuck in pending:

      • Verify GPU sharing is enabled: --set "global.gpuSharing=true" during KAI installation
      • Check if reservation pods are created in kai-resource-reservation namespace
    2. Pod-grouper permission errors:

      • With setOwner: false, these should not occur
      • If they persist, check KAI scheduler logs: kubectl logs -n kai-scheduler -l component=scheduler