Isolation & Security
vcluster can drastically increase security in multi-tenancy clusters. vcluster provides you the three security benefits out of the box:
- Full control-plane isolation with separate api endpoint and data storage
- DNS isolation as vcluster workloads are not able to resolve any services of the host cluster
- Guarantee that all workloads, services and other namespaced objects are created in a single namespace in the host cluster. If deployed with default settings, vcluster also ensures that no access to any cluster scoped object is required.
Besides these benefits, vcluster by default will not provide any workload or network isolation. Starting with version v0.7.0, vcluster has a feature called isolated mode, which you can enable to prevent vcluster workloads from breaking out of their virtual environment.
In general, we recommend to deploy a single vcluster into a namespace and then isolate the namespace, which is far easier than isolating multiple vclusters from each other in a single namespace.
Isolated Mode
vcluster offers a feature called isolated mode to automatically isolate workloads in a virtual cluster. Isolated mode can be enabled via the --isolate
flag in vcluster create
or through the helm value isolation.enabled: true
:
# Creates a new vcluster with isolated workloads
vcluster create my-vcluster --isolate
This feature imposes a couple of restrictions on vcluster workloads to make sure they do not break out of their virtual environment:
- vcluster enforces a Pod Security Standard on syncer level, which means that for example pods that try to run as a privileged container or mount a host path will not be synced to the host cluster. Current valid options are either baseline (default in isolated mode) or restricted. This works for every Kubernetes version regardless of Pod Security Standard support, as this is implemented in vcluster directly. Rejected pods will stay pending in the vcluster and in newer Kubernetes version they will be denied by the admission controller as well.
- vcluster deploys a resource quota as well as a limit range alongside the vcluster itself. This allows restricting resource consumption of vcluster workloads. If enabled, sane defaults for those 2 resources are chosen.
- vcluster deploys a network policy alongside itself that will restrict access of vcluster workloads as well as the vcluster control plane to other pods in the host cluster. (only works if your host cluster CNI supports network policies)
You can adjust isolation settings through helm values. The default values are (also check values.yaml):
isolation:
enabled: false
podSecurityStandard: baseline
resourceQuota:
enabled: true
quota:
requests.cpu: 10
requests.memory: 20Gi
requests.storage: "100Gi"
requests.ephemeral-storage: 60Gi
limits.cpu: 20
limits.memory: 40Gi
limits.ephemeral-storage: 160Gi
services.nodeports: 0
services.loadbalancers: 1
count/endpoints: 40
count/pods: 20
count/services: 20
count/secrets: 100
count/configmaps: 100
count/persistentvolumeclaims: 20
scopeSelector:
matchExpressions:
scopes:
limitRange:
enabled: true
default:
ephemeral-storage: 8Gi
memory: 512Mi
cpu: "1"
defaultRequest:
ephemeral-storage: 3Gi
memory: 128Mi
cpu: 100m
networkPolicy:
enabled: true
outgoingConnections:
ipBlock:
cidr: 0.0.0.0/0
except:
- 100.64.0.0/10
- 127.0.0.0/8
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
:::warn
In case you are using --isolate
flag or isolated mode along with the --expose
flag, make sure you appropriately bump up the isolation.resourceQuotas.quota.services.nodeports
accordingly as some LoadBalancer implementations rely on NodePorts
:::
Workload Isolation
vcluster by default will not isolate any workloads in the host cluster and only ensures that those are deployed in the same namespace. However, isolating workloads in a single namespace can be done with in-built Kubernetes features or using the isolated mode shown above.
Resource Quota & Limit Range
To ensure a vcluster will not consume too many resources in the host cluster, you can use a single ResourceQuota in the namespace where the virtual cluster is running. This could look like:
apiVersion: v1
kind: ResourceQuota
metadata:
name: vcluster-quota
spec:
hard:
cpu: "10"
memory: 20Gi
pods: "10"
This allows the vcluster and all of the pods deployed inside it to only consume up to 10 vCores, 20GB of memory or to have 10 pods at maximum. If you use a resource quota, you probably also want to use a LimitRange that makes sure that needed resources are defined for each pod. For example:
apiVersion: v1
kind: LimitRange
metadata:
name: vcluster-limit-range
spec:
limits:
- default:
memory: 512Mi
cpu: "1"
defaultRequest:
memory: 128Mi
cpu: 100m
type: Container
This limit range would ensure that containers that do not set resources.requests
and resources.limits
would get appropriate limits set automatically.
Pod Security
Besides restricting pod resources, it's also necessary to disallow certain potential harmful pod configurations, such as privileged pods or pods that use hostPath. If you are using Kubernetes v1.23 or higher, you can restrict the namespace where the virtual cluster is running in via the Pod Security Admission Controller:
apiVersion: v1
kind: Namespace
metadata:
name: my-vcluster-namespace
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
To see all supported levels and modes, please take a look at the Kubernetes docs.
If you are using below Kubernetes v1.23 clusters, you can use the deprecated PodSecurityPolicies to disallow critical workloads.
If you want more control over this, you can also use an admission controller, that let's you define your own policies, such as OPA, jsPolicy or Kyverno.
Advanced Isolation
Besides this basic workload isolation, you could also dive into more advanced isolation methods, such as isolating the workloads on separate nodes or through another container runtime. Using different nodes for your vcluster workloads can be accomplished through the --node-selector flag on vcluster syncer.
You should also be aware that pods created in the vcluster will set their tolerations, which will affect scheduling decisions. To prevent the pods from being scheduled to the undesirable nodes you can use the --node-selector flag or admission controller as mentioned above.
Network Isolation
Workloads created by vcluster will be able to communicate with other workloads in the host cluster through their cluster ips. This can be sometimes beneficial if you want to purposely access a host cluster service, which is a good method to share services between vclusters. However, you often want to isolate namespaces and do not want the pods running inside vcluster to have access to other workloads in the host cluster. This requirement can be accomplished by using Network Policies for the namespace where vcluster is installed in or using the isolated mode shown above.
Network policies do not work in all Kubernetes clusters and need to be supported by the underlying CNI plugin.
Other Topics
Running as non root
vcluster is able to be ran as a non root user. Steps below show how to set the desired UID for syncer and control plane. The syncer also passes this UID down to the vcluster DNS deployment.
- vcluster
- helm
- kubectl
Create a values.yaml
file with the following lines:
fsGroup: 12345
securityContext:
runAsUser: 12345
runAsGroup: 12345
runAsNonRoot: true
Then create the vcluster with the following command:
vcluster create my-vcluster -f values.yaml
Update the vcluster.yaml
file described in the deployment guide.
You will need to add the securityContext
block as shown below:
fsGroup: 12345
securityContext:
runAsUser: 12345
runAsGroup: 12345
runAsNonRoot: true
Then, install helm chart using vcluster.yaml
for chart values as described in the deployment guide.
You will need to add the securityContext
blocks to the containers as shown below:
kubectl create namespace host-namespace-1
helm template my-vcluster vcluster --repo https://charts.loft.sh --set securityContext.runAsGroup=12345 --set fsGroup=12345 --set securityContext.runAsUser=12345 --set securityContext.runAsNonRoot=true -n host-namespace-1 | kubectl apply -f -
Workload & Network Isolation within the vcluster
The above mentioned methods also work for isolating workloads inside the vcluster itself, as you can just deploy resource quotas, limit ranges, admission controllers and network policies in there. To allow network policies to function correctly, you'll need to enable this in vcluster itself though.
Secret based Service Account tokens
By default vcluster will create Service Account Tokens for each pod and inject them as an annotation in the respective pods
metadata. This can be a security risk in certain senarios. To mitigate this there's a flag --service-account-token-secrets
in vcluster
which creates separate secrets for each pods Service Account Token and mounts it accordingly using projected volumes. This option
is not enabled by default but can be enabled on demand. To enable this one can use the extraArgs
options of the vcluster chart as follows
syncer:
extraArgs:
- --service-account-token-secrets=true