Skip to main content

Overview

Virtual clusters are Kubernetes clusters that run on top of other Kubernetes clusters. Compared to fully separate "real" clusters, virtual clusters do not have their own node pools or networking. Instead, they schedule workloads inside the underlying cluster while having their own control plane.

vCluster Architecture
vCluster - Architecture

Components

By default, vClusters run as a single pod (scheduled by a StatefulSet) that consists of 2 containers:

Host Cluster & Namespace

Every vCluster runs on top of another Kubernetes cluster, called the host cluster. Each vCluster runs as a regular StatefulSet inside a namespace of the host cluster. This namespace is called the host namespace. Everything that you create inside the vCluster lives either inside the vCluster itself or inside the host namespace.

It is possible to run multiple vClusters inside the same namespace and you can even run vClusters inside another vCluster (vCluster nesting).

Kubernetes Resources

The core idea of virtual clusters is to provision isolated Kubernetes control planes (e.g. API servers) that run on top of "real" Kubernetes clusters. When working with the virtual cluster's API server, resources first only exist in the virtual cluster. However, some low-level Kubernetes resources need to be synchronized to the underlying cluster.

High-Level = Purely Virtual

Generally, all Kubernetes resource objects that you create using the vCluster API server are stored in the data store of the vCluster (sqlite by default, see external datastore for more options). That applies in particular to all higher level Kubernetes resources, such as Deployments, StatefulSets, CRDs, etc. These objects only exist inside the virtual cluster and never reach the API server or data store (etcd) of the underlying host cluster.

Low-Level = Sync'd Resources

To be able to actually start pods, vCluster synchronizes certain low-level resources (e.g. Pods, ConfigMaps mounted in Pods) to the underlying host namespace, so that the scheduler of the underlying host cluster can schedule these pods.

Design Principles

vCluster has been designed following these principles:

1. Lightweight / Low-Overhead

vClusters should be as lightweight as possible to minimize resource overhead inside the underlying host cluster.

Implementation: This is mainly achieved by bundling the vCluster inside a single Pod using k3s as a control plane.

2. No Performance Degradation

Workloads running inside a vCluster (even inside nested vClusters) should run with the same performance as workloads that are running directly on the underlying host cluster. The computing power, the access to underlying persistent storage as well as the network performance should not be degraded at all.

Implementation: This is mainly achieved by synchronizing pods which means that the pods are actually being scheduled and started just like regular pods of the underlying host cluster, i.e. if you run a pod inside the vCluster and you run the same pod directly on the host cluster it will be exactly the same in terms of computing power, storage access, and networking.

3. Reduce Requests On Host Cluster

vClusters should greatly reduce the number of requests to the Kubernetes API server of the underlying host cluster by ensuring that all high-level resources remain in the virtual cluster only without ever reaching the underlying host cluster.

Implementation: This is mainly achieved by using a separate API server that handles all requests to the vCluster and a separate data store that stores all objects inside the vCluster. The syncer synchronizes only a few low-level resources to the underlying cluster which requires very few API server requests. All of this happens in an asynchronous, non-blocking fashion (as pretty much everything in Kubernetes is designed to be).

4. Flexible & Easy Provisioning

vCluster should not make any assumptions about how it is being provisioned. Users should be able to create vClusters on top of any Kubernetes cluster without requiring the installation of any server-side component to provision the vClusters, i.e. provisioning should be possible with any client-only deployment tool (vCluster CLI, helm, kubectl, kustomize, ...). An operator or CRDs may be added to manage vClusters (e.g. using Argo to provision vClusters) but a server-side management plane should never be required for spinning up a vCluster.

Implementation: This is mainly achieved by making vCluster basically run as a simple StatefulSet + Service (see kubectl deployment method for details) which can be deployed using any Kubernetes tool.

5. No Admin Privileges Required

To provision a vCluster, a user should never be required to have any cluster-wide permissions. If a user has the RBAC permissions to deploy a simple web application to a namespace, they should also be able to deploy vClusters to the namespace.

Implementation: This is mainly achieved by making vCluster basically run as a simple StatefulSet + Service (see kubectl deployment method for details) which typically every user has the privilege to run if they have any Kubernetes access at all.

6. Single Namespace Encapsulation

Each vCluster and all the workloads and data inside the vCluster should be encapsulated into a single namespace. Even if the vCluster has hundreds of namespaces, in the underlying host cluster, everything will be encapsulated into a single host namespace.

Implementation: This is mainly achieved by using a separate API server and data store and by the design of the syncer which synchronizes everything to a single underlying host namespace while renaming resources during the sync to prevent naming conflicts when mapping from multiple namespaces inside the vCluster to a single namespace in the host cluster.

7. Easy Cleanup

vClusters should not have any hard wiring with the underlying cluster. Deleting a vCluster or merely deleting the vCluster's host namespace should always be possible without any negative impacts on the underlying cluster (no namespaces stuck in terminating state or anything comparable) and should always guarantee that all vCluster-related resources are being deleted cleanly and immediately without leaving any orphan resources behind.

Implementation: This is mainly achieved by not adding any control plane or server-side elements to the provisioning of vClusters. A vCluster is just a StatefulSet and few other Kubernetes resources. All synchronized resources in the host namespace have an appropriate owner reference, which means if you delete the vCluster itself, everything that belongs to the vCluster will be automatically deleted by Kubernetes as well (this is a similar mechanism as Deployments and StatefulSets use to clean up their Pods).