vcluster can drastically increase security in multi-tenancy clusters. vcluster provides you the three security benefits out of the box:
- Full control-plane isolation with separate api endpoint and data storage
- DNS isolation as vcluster workloads are not able to resolve any services of the host cluster
- Guarantee that all workloads, services and other namespaced objects are created in a single namespace in the host cluster. If deployed with default settings, vcluster also ensures that no access to any cluster scoped object is required.
Besides these benefits, vcluster by default will not provide any workload or network isolation. Starting with version v0.7.0, vcluster has a feature called isolated mode, which you can enable to prevent vcluster workloads from breaking out of their virtual environment.
In general, we recommend to deploy a single vcluster into a namespace and then isolate the namespace, which is far easier than isolating multiple vclusters from each other in a single namespace.
vcluster offers a feature called isolated mode to automatically isolate workloads in a virtual cluster. Isolated mode can be enabled via the
--isolate flag in
vcluster create or through the helm value
# Creates a new vcluster with isolated workloads
vcluster create my-vcluster --isolate
This feature imposes a couple of restrictions on vcluster workloads to make sure they do not break out of their virtual environment:
- vcluster enforces a Pod Security Standard on syncer level, which means that for example pods that try to run as a privileged container or mount a host path will not be synced to the host cluster. Current valid options are either baseline (default in isolated mode) or restricted. This works for every Kubernetes version regardless of Pod Security Standard support, as this is implemented in vcluster directly. Rejected pods will stay pending in the vcluster and in newer Kubernetes version they will be denied by the admission controller as well.
- vcluster deploys a resource quota as well as a limit range alongside the vcluster itself. This allows restricting resource consumption of vcluster workloads. If enabled, sane defaults for those 2 resources are chosen.
- vcluster deploys a network policy alongside itself that will restrict access of vcluster workloads as well as the vcluster control plane to other pods in the host cluster. (only works if your host cluster CNI supports network policies)
You can adjust isolation settings through helm values. The default values are (also check values.yaml):
In case you are using
--isolate flag or isolated mode along with the
--expose flag, make sure you appropriately bump up the
isolation.resourceQuotas.quota.services.nodeports accordingly as some LoadBalancer implementations rely on
vcluster by default will not isolate any workloads in the host cluster and only ensures that those are deployed in the same namespace. However, isolating workloads in a single namespace can be done with in-built Kubernetes features or using the isolated mode shown above.
Resource Quota & Limit Range
To ensure a vcluster will not consume too many resources in the host cluster, you can use a single ResourceQuota in the namespace where the virtual cluster is running. This could look like:
This allows the vcluster and all of the pods deployed inside it to only consume up to 10 vCores, 20GB of memory or to have 10 pods at maximum. If you use a resource quota, you probably also want to use a LimitRange that makes sure that needed resources are defined for each pod. For example:
This limit range would ensure that containers that do not set
resources.limits would get appropriate limits set automatically.
Besides restricting pod resources, it's also necessary to disallow certain potential harmful pod configurations, such as privileged pods or pods that use hostPath. If you are using Kubernetes v1.23 or higher, you can restrict the namespace where the virtual cluster is running in via the Pod Security Admission Controller:
To see all supported levels and modes, please take a look at the Kubernetes docs.
If you are using below Kubernetes v1.23 clusters, you can use the deprecated PodSecurityPolicies to disallow critical workloads.
Besides this basic workload isolation, you could also dive into more advanced isolation methods, such as isolating the workloads on separate nodes or through another container runtime. Using different nodes for your vcluster workloads can be accomplished through the --node-selector flag on vcluster syncer.
You should also be aware that pods created in the vcluster will set their tolerations, which will affect scheduling decisions. To prevent the pods from being scheduled to the undesirable nodes you can use the --node-selector flag or admission controller as mentioned above.
Workloads created by vcluster will be able to communicate with other workloads in the host cluster through their cluster ips. This can be sometimes beneficial if you want to purposely access a host cluster service, which is a good method to share services between vclusters. However, you often want to isolate namespaces and do not want the pods running inside vcluster to have access to other workloads in the host cluster. This requirement can be accomplished by using Network Policies for the namespace where vcluster is installed in or using the isolated mode shown above.
Network policies do not work in all Kubernetes clusters and need to be supported by the underlying CNI plugin.
Running as non root
vcluster is able to be ran as a non root user. Steps below show how to set the desired UID for syncer and control plane. The syncer also passes this UID down to the vcluster DNS deployment.
values.yaml file with the following lines:
Then create the vcluster with the following command:
vcluster create my-vcluster -f values.yaml
vcluster.yaml file described in the deployment guide.
You will need to add the
securityContext block as shown below:
Then, install helm chart using
vcluster.yaml for chart values as described in the deployment guide.
You will need to add the
securityContext blocks to the containers as shown below:
kubectl create namespace host-namespace-1
helm template my-vcluster vcluster --repo https://charts.loft.sh --set securityContext.runAsGroup=12345 --set fsGroup=12345 --set securityContext.runAsUser=12345 --set securityContext.runAsNonRoot=true -n host-namespace-1 | kubectl apply -f -
Workload & Network Isolation within the vcluster
The above mentioned methods also work for isolating workloads inside the vcluster itself, as you can just deploy resource quotas, limit ranges, admission controllers and network policies in there. To allow network policies to function correctly, you'll need to enable this in vcluster itself though.
Secret based Service Account tokens
By default vcluster will create Service Account Tokens for each pod and inject them as an annotation in the respective pods
metadata. This can be a security risk in certain senarios. To mitigate this there's a flag
--service-account-token-secrets in vcluster
which creates separate secrets for each pods Service Account Token and mounts it accordingly using projected volumes. This option
is not enabled by default but can be enabled on demand. To enable this one can use the
extraArgs options of the vcluster chart as follows