Skip to main content

Isolation & Security

vcluster can drastically increase security in multi-tenancy clusters. vcluster provides you the three security benefits out of the box:

  • Full control-plane isolation with separate api endpoint and data storage
  • DNS isolation as vcluster workloads are not able to resolve any services of the host cluster
  • Guarantee that all workloads, services and other namespaced objects are created in a single namespace in the host cluster. If deployed with default settings, vcluster also ensures that no access to any cluster scoped object is required.

Besides these benefits, vcluster by default will not provide any workload or network isolation. Starting with version v0.7.0, vcluster has a feature called isolated mode, which you can enable to prevent vcluster workloads from breaking out of their virtual environment.

In general, we recommend to deploy a single vcluster into a namespace and then isolate the namespace, which is far easier than isolating multiple vclusters from each other in a single namespace.

Isolated Mode

vcluster offers a feature called isolated mode to automatically isolate workloads in a virtual cluster. Isolated mode can be enabled via the --isolate flag in vcluster create or through the helm value isolation.enabled: true:

# Creates a new vcluster with isolated workloads
vcluster create my-vcluster --isolate

This feature imposes a couple of restrictions on vcluster workloads to make sure they do not break out of their virtual environment:

  1. vcluster enforces a Pod Security Standard on syncer level, which means that for example pods that try to run as a privileged container or mount a host path will not be synced to the host cluster. Current valid options are either baseline (default in isolated mode) or restricted. This works for every Kubernetes version regardless of Pod Security Standard support, as this is implemented in vcluster directly. Rejected pods will stay pending in the vcluster and in newer Kubernetes version they will be denied by the admission controller as well.
  2. vcluster deploys a resource quota as well as a limit range alongside the vcluster itself. This allows restricting resource consumption of vcluster workloads. If enabled, sane defaults for those 2 resources are chosen.
  3. vcluster deploys a network policy alongside itself that will restrict access of vcluster workloads as well as the vcluster control plane to other pods in the host cluster. (only works if your host cluster CNI supports network policies)

You can adjust isolation settings through helm values. The default values are (also check values.yaml):

isolation:
enabled: false

podSecurityStandard: baseline

resourceQuota:
enabled: true
quota:
requests.cpu: 10
requests.memory: 20Gi
requests.storage: "100Gi"
requests.ephemeral-storage: 60Gi
limits.cpu: 20
limits.memory: 40Gi
limits.ephemeral-storage: 160Gi
services.nodeports: 0
services.loadbalancers: 1
count/endpoints: 40
count/pods: 20
count/services: 20
count/secrets: 100
count/configmaps: 100
count/persistentvolumeclaims: 20
scopeSelector:
matchExpressions:
scopes:

limitRange:
enabled: true
default:
ephemeral-storage: 8Gi
memory: 512Mi
cpu: "1"
defaultRequest:
ephemeral-storage: 3Gi
memory: 128Mi
cpu: 100m

networkPolicy:
enabled: true
outgoingConnections:
ipBlock:
cidr: 0.0.0.0/0
except:
- 100.64.0.0/10
- 127.0.0.0/8
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16

:::warn In case you are using --isolate flag or isolated mode along with the --expose flag, make sure you appropriately bump up the isolation.resourceQuotas.quota.services.nodeports accordingly as some LoadBalancer implementations rely on NodePorts :::

Workload Isolation

vcluster by default will not isolate any workloads in the host cluster and only ensures that those are deployed in the same namespace. However, isolating workloads in a single namespace can be done with in-built Kubernetes features or using the isolated mode shown above.

Resource Quota & Limit Range

To ensure a vcluster will not consume too many resources in the host cluster, you can use a single ResourceQuota in the namespace where the virtual cluster is running. This could look like:

apiVersion: v1
kind: ResourceQuota
metadata:
name: vcluster-quota
spec:
hard:
cpu: "10"
memory: 20Gi
pods: "10"

This allows the vcluster and all of the pods deployed inside it to only consume up to 10 vCores, 20GB of memory or to have 10 pods at maximum. If you use a resource quota, you probably also want to use a LimitRange that makes sure that needed resources are defined for each pod. For example:

apiVersion: v1
kind: LimitRange
metadata:
name: vcluster-limit-range
spec:
limits:
- default:
memory: 512Mi
cpu: "1"
defaultRequest:
memory: 128Mi
cpu: 100m
type: Container

This limit range would ensure that containers that do not set resources.requests and resources.limits would get appropriate limits set automatically.

Pod Security

Besides restricting pod resources, it's also necessary to disallow certain potential harmful pod configurations, such as privileged pods or pods that use hostPath. If you are using Kubernetes v1.23 or higher, you can restrict the namespace where the virtual cluster is running in via the Pod Security Admission Controller:

apiVersion: v1
kind: Namespace
metadata:
name: my-vcluster-namespace
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted

To see all supported levels and modes, please take a look at the Kubernetes docs.

If you are using below Kubernetes v1.23 clusters, you can use the deprecated PodSecurityPolicies to disallow critical workloads.

If you want more control over this, you can also use an admission controller, that let's you define your own policies, such as OPA, jsPolicy or Kyverno.

Advanced Isolation

Besides this basic workload isolation, you could also dive into more advanced isolation methods, such as isolating the workloads on separate nodes or through another container runtime. Using different nodes for your vcluster workloads can be accomplished through the --node-selector flag on vcluster syncer.

You should also be aware that pods created in the vcluster will set their tolerations, which will affect scheduling decisions. To prevent the pods from being scheduled to the undesirable nodes you can use the --node-selector flag or admission controller as mentioned above.

Network Isolation

Workloads created by vcluster will be able to communicate with other workloads in the host cluster through their cluster ips. This can be sometimes beneficial if you want to purposely access a host cluster service, which is a good method to share services between vclusters. However, you often want to isolate namespaces and do not want the pods running inside vcluster to have access to other workloads in the host cluster. This requirement can be accomplished by using Network Policies for the namespace where vcluster is installed in or using the isolated mode shown above.

info

Network policies do not work in all Kubernetes clusters and need to be supported by the underlying CNI plugin.

Other Topics

Running as non root

vcluster is able to be ran as a non root user. Steps below show how to set the desired UID for syncer and control plane. The syncer also passes this UID down to the vcluster DNS deployment.

Create a values.yaml file with the following lines:

fsGroup: 12345
securityContext:
runAsUser: 12345
runAsGroup: 12345
runAsNonRoot: true

Then create the vcluster with the following command:

vcluster create my-vcluster -f values.yaml

Workload & Network Isolation within the vcluster

The above mentioned methods also work for isolating workloads inside the vcluster itself, as you can just deploy resource quotas, limit ranges, admission controllers and network policies in there. To allow network policies to function correctly, you'll need to enable this in vcluster itself though.

Secret based Service Account tokens

By default vcluster will create Service Account Tokens for each pod and inject them as an annotation in the respective pods metadata. This can be a security risk in certain senarios. To mitigate this there's a flag --service-account-token-secrets in vcluster which creates separate secrets for each pods Service Account Token and mounts it accordingly using projected volumes. This option is not enabled by default but can be enabled on demand. To enable this one can use the extraArgs options of the vcluster chart as follows

syncer:
extraArgs:
- --service-account-token-secrets=true