Configure NVIDIA KAI scheduler with vCluster
This guide explains how to configure vCluster to work with the NVIDIA KAI scheduler. KAI is a Kubernetes scheduler that optimizes resource allocation for both GPU and CPU workloads, with advanced queuing, fractional GPU allocation, and topology awareness.
Prerequisites​
-
Administrator access to a Kubernetes cluster: See Accessing Clusters with kubectl for more information. Run the command
kubectl auth can-i create clusterrole -A
to verify that your current kube-context has administrative privileges.infoTo obtain a kube-context with admin access, ensure you have the necessary credentials and permissions for your Kubernetes cluster. This typically involves using
kubectl config
commands or authenticating through your cloud provider's CLI tools. -
helm
: Helm v3.10 is required for deploying the platform. Refer to the Helm Installation Guide if you need to install it. -
kubectl
: Kubernetes command-line tool for interacting with the cluster. See Install and Set Up kubectl for installation instructions.
- KAI scheduler installed in the host cluster
Understand the challenge​
By default, when syncing workload pods from a vCluster to the host cluster, vCluster sets ownership references on the synced pods. When using custom schedulers like KAI, this causes issues because:
- KAI's pod-grouper controller watches for pods with
schedulerName: kai-scheduler
- The pod-grouper controller traverses the owner references to group related pods
- For vCluster workloads, the owner reference is set to the vCluster service in the host namespace
- The pod-grouper may not have sufficient permissions to access these Service resources
Solution: Disable owner references in vCluster​
To solve this issue, configure vCluster to not set owner references on synced pods using the experimental.syncSettings.setOwner
option. This simple approach allows the KAI scheduler to work with vCluster pods without requiring any modifications to the scheduler itself.
Add the following to your vCluster configuration (values.yaml
for Helm or vCluster config):
experimental:
syncSettings:
setOwner: false
Apply this configuration when creating or updating your vCluster:
vcluster create my-vcluster --set experimental.syncSettings.setOwner=false
Or for an existing vCluster, update the configuration:
vcluster create --upgrade my-vcluster --set experimental.syncSettings.setOwner=false
Use KAI scheduler for vCluster workloads​
After applying this configuration, you can use the KAI scheduler for your vCluster workloads by specifying schedulerName: kai-scheduler
in the pod specification.
Example: CPU-only workload​
Here's an example of a CPU-only pod using the KAI scheduler:
apiVersion: v1
kind: Pod
metadata:
name: cpu-only-pod
spec:
schedulerName: kai-scheduler
containers:
- name: main
image: ubuntu
args:
- sleep
- infinity
resources:
requests:
cpu: 100m
memory: 250M
Example: GPU workload​
Here's an example of a GPU pod using the KAI scheduler:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
schedulerName: kai-scheduler
containers:
- name: main
image: ubuntu
command:
- /bin/bash
- -c
- nvidia-smi && sleep infinity
resources:
requests:
nvidia.com/gpu: '1'
limits:
nvidia.com/gpu: '1'
Verify configuration​
To verify that the KAI scheduler is properly configured with vCluster:
-
Confirm vCluster is using
experimental.syncSettings.setOwner=false
in its configuration -
Check that pods are being annotated with the podgroup name:
Verify pod has podgroup annotationkubectl describe pod <pod-name> -n <namespace> | grep pod-group-name
-
Ensure the KAI scheduler components are running:
Check KAI scheduler componentskubectl get pods -n kai-scheduler
When using the KAI scheduler with vCluster and setOwner: false
, you may observe:
- The pod-grouper adds a podgroup annotation to your pod:
Annotations: pod-group-name: pg-pod-name-[uuid]
- You might see a non-blocking message in the logs:
Detected pod with no owner but with podgroup annotation
This is expected and doesn't affect scheduling.
The setOwner: false
configuration successfully resolves owner reference issues and allows the KAI scheduler to work properly with vCluster workloads in production environments.
How it works​
The KAI scheduler's pod-grouper component is a controller that:
- Watches pods with
schedulerName: kai-scheduler
- Traverses the owner references to find the topmost owner
- Groups related pods for optimal scheduling decisions
- Applies custom scheduling logic based on workload type
When vCluster syncs pods to the host cluster, it sets an owner reference to the vCluster service by default. By disabling this with setOwner: false
, the pod-grouper can process the pods normally without needing to follow service references.
Limitations and considerations​
- The
experimental.syncSettings.setOwner: false
configuration is marked as experimental and may change in future vCluster releases - If you have other features that rely on pod ownership in vCluster, disabling owner references may affect those features
- The KAI scheduler might require additional configuration for advanced features like GPU sharing and queue management
- For GPU workloads, ensure that the host cluster has the necessary GPU drivers and device plugins installed