Skip to main content

Pod Scheduling

vcluster Pod Scheduling
vcluster - Pod Scheduling

Vcluster runs your workloads by replicating pods from the virtual cluster to the host cluster. We call this process synchronization or sync for short. This process is executed by the "syncer" component of the vCluster. To control how vCluster pods are scheduled on the host cluster, you may need to pass additional arguments to the syncer, or set certain helm chart values during vCluster installation and upgrade. Some of these options are described in the chapters below.

Separate vCluster Scheduler

By default, vCluster will reuse the scheduler of the host cluster to schedule workloads. This saves computing resources, but also has some limitations:

  1. Labeling nodes inside the virtual cluster has no effect on scheduling
  2. Draining or tainting nodes inside the virtual cluster has no effect on scheduling
  3. You cannot use custom schedulers inside the vCluster

Sometimes you want to label a node inside the vCluster to modify workload scheduling through features such as affinity or topology spreading. vCluster supports running a scheduler inside the virtual cluster instead of reusing the host cluster's scheduler. vCluster will then only sync pods that already have a node assigned to the host cluster.

You can enable the virtual scheduler via the values.yaml of vCluster:

sync:
nodes:
enabled: true
enableScheduler: true
# Either syncAllNodes or nodeSelector is required
syncAllNodes: true

Then create or upgrade a vCluster with:

vcluster create my-vcluster -f values.yaml

Now you can taint and label nodes inside the virtual cluster without actually modifying the host cluster nodes.

info

If the persistentvolumeclaims syncer is also enabled, relevant csistoragecapacity,csinode, and csidriver objects will be mirrored to the virtual cluster so the scheduler can make storage-aware scheduling decisions.

Reuse Host Scheduler

If you don't want to use a separate scheduler inside the vCluster, you can also customize to a certain degree how the host scheduler will schedule your virtual cluster workloads.

Using priority classes

If you need to use priority classes, you can enable this by adding the following to your values.yaml:

sync:
priorityclasses:
enabled: true

then create or upgrade the vCluster with:

vcluster create my-vcluster --upgrade -f values.yaml

This will pass the necessary flags to the "syncer" container and create or update the ClusterRole used by vCluster to include necessary permissions.

Limiting pod scheduling to selected nodes

Vcluster allows you to limit on which nodes the pods synced by vCluster will run. You can achieve this by combining --node-selector and --enforce-node-selector syncer flags. The --enforce-node-selector flag is enabled by default. When --enforce-node-selector flag is disabled, and a --node-selector is specified nodes will be synced based on the selector, as well as nodes running pod workloads.

When using vCluster helm chart or CLI, there are two options for setting the --node-selector flag.

This first option is recommended if you are not enabling node synchronization, and use the fake nodes, which are enabled by default. In such case, you would write a string representation of your node selector(e.g. "nodeLabel=labelValue") and set it as the value of --node-selector argument for syncer in your values.yaml:

syncer:
extraArgs:
- --node-selector=nodeLabel=labelValue

then create or upgrade the vCluster with:

vcluster create my-vcluster --upgrade -f values.yaml

This second option is recommended if you are enabling synchronization of the real nodes via helm values. This is how you would then set the selector in your values.yaml:

sync:
nodes:
enabled: true
nodeSelector: "nodeLabel=labelValue"

then create or upgrade the vCluster with:

vcluster create my-vcluster --upgrade -f values.yaml
info

When sync of the real nodes is enabled and nodeSelector is set, all nodes that match the selector will be synced into vCluster. Read more about Node sync modes on the Nodes documentation page.

Automatically applying tolerations to all pods synced by vCluster

Kubernetes has a concept of Taints and Tolerations, which is used for controlling scheduling. If you have a use case requiring all pods synced by vCluster to have a toleration set automatically, then you can achieve this with the --enforce-toleration syncer flag. You can pass multiple --enforce-toleration flags with different toleration expressions, and syncer will add them to every new pod that gets synced by vCluster.

This is how toleration is set in yaml format:

- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"

We will need to write this information in a string format to pass it as value of the --enforce-toleration flag.

The example above would be represented as key1=value1:NoSchedule. The Exists operator is written as - key1:Effect. And if you need to write a toleration with empty effect, here is an example for that - key1=value1, or just key if value is also empty. There is also a special case of a toleration that contains only operator Exists without effect or key, which would match every taint, and we express this as *.

You can set the --enforce-toleration flags as arguments for syncer in your values.yaml:

syncer:
extraArgs:
- --enforce-toleration=key1=value1:NoSchedule
- --enforce-toleration=anotherKey1=value2:NoSchedule
info

vCluster does not support setting the tolerationSeconds field of a toleration through the syntax that the --enforce-toleration flag uses. If your use case requires this, please raise an issue in the vCluster repo on GitHub.