Version: v0.26 Stable

Embedded etcd

Enterprise-Only Feature

This feature is an Enterprise feature. See our pricing plans or contact our sales team for more information.

When using this backing store option, etcd is deployed as part of the vCluster control plane pod to reduce the overall footprint.

controlPlane:
  backingStore:
    etcd:
      embedded:
        enabled: true

How embedded etcd works

Embedded etcd starts the etcd binary with the Kubernetes control plane inside the vCluster pod. This enables vCluster to run in high availability (HA) scenarios without requiring a separate StatefulSet or Deployment.

vCluster fully manages embedded etcd and provides these capabilities:

Dynamic scaling: Scales the etcd cluster up or down based on vCluster replica count.
Automatic recovery: Recovers etcd in failure scenarios such as corrupted members.
Seamless migration: Migrates from SQLite or deployed etcd to embedded etcd automatically.
Simplified deployment: Requires no additional StatefulSets or Deployments.

Scaling behavior

vCluster dynamically builds the etcd cluster based on the number of desired replicas. For example, when you scale vCluster from 1 to 3 replicas, vCluster automatically adds the new replicas as members to the existing single-member cluster. Similarly, vCluster removes etcd members when you scale down the cluster.

When scaling down breaks quorum (such as scaling from 3 to 1 replicas), vCluster rebuilds the etcd cluster without data loss or interruption. This enables dynamic scaling up and down of vCluster.

Disaster recovery

When embedded etcd encounters failures, vCluster provides both automatic and manual recovery options to restore cluster capabilities.

Automatic recovery

vCluster recovers the etcd cluster automatically in most failure scenarios by removing and readding the failing member. Automatic recovery occurs in these cases:

Unresponsive member: Etcd member is unresponsive for more than 2 minutes.
Detected issues: Corruption or another alarm is detected on the etcd member.

vCluster attempts to recover only a single replica at a time. If recovering an etcd member results in quorum loss, vCluster does not recover the member automatically.

Manual recovery

Recover a single replica

When a single etcd replica fails, vCluster can recover the replica automatically in most cases, including:

Replica database corruption
Replica database deletion
Replica PersistentVolumeClaim (PVC) deletion
Replica removal from etcd cluster using etcdctl member remove ID
Replica stuck as a learner

If vCluster cannot recover the single replica automatically, wait at least 10 minutes before deleting the replica pod and PVC. This action causes vCluster to rejoin the etcd member.

Recover the entire cluster

In rare cases, the entire etcd cluster requires manual recovery. This occurs when the majority of etcd member replicas become corrupted or deleted simultaneously (such as 2 of 3, 3 of 5, or 4 of 7 replicas). In this scenario, etcd fails to start and vCluster cannot recover automatically.

note

Normal pod restarts or terminations do not require manual recovery. These events trigger automatic leader election within the etcd cluster.

Recovery procedures depend on whether the first replica (the pod ending with -0) is among the failing replicas.

Use the following procedures when some replicas are still functioning:

First replica is not failing
First replica is failing

Scale the StatefulSet to one replica:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl scale statefulset my-vcluster --replicas=1 -n vcluster-my-team
Verify only one pod is running:
Modify the following with your specific values to generate a copyable command:
VCLUSTER LABEL
NAMESPACE
kubectl get pods -l app=vcluster -n vcluster-my-team
Monitor the rebuild process:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl logs -f my-vcluster-0 -n vcluster-my-team
Watch for log messages indicating etcd is ready and the cluster is in good condition.
Scale back up to your target replica count:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
DESIRED REPLICA COUNT
NAMESPACE
kubectl scale statefulset my-vcluster --replicas=3 -n vcluster-my-team
Verify all replicas are running:
Modify the following with your specific values to generate a copyable command:
VCLUSTER LABEL
NAMESPACE
VCLUSTER NAME
kubectl get pods -l app=vcluster -n vcluster-my-team kubectl logs my-vcluster-0 -n vcluster-my-team | grep "cluster is ready"

Stop all vCluster instances:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl scale statefulset my-vcluster --replicas=0 -n vcluster-my-team
Confirm all pods have terminated:
Modify the following with your specific values to generate a copyable command:
VCLUSTER LABEL
NAMESPACE
kubectl get pods -l app=vcluster -n vcluster-my-team
Delete the corrupted PVC for the first replica:
Modify the following with your specific values to generate a copyable command:
PVC PREFIX
VCLUSTER NAME
NAMESPACE
kubectl delete pvc data-my-vcluster-0 -n vcluster-my-team
Verify the PVC has been deleted:
Modify the following with your specific values to generate a copyable command:
VCLUSTER LABEL
NAMESPACE
kubectl get pvc -l app=vcluster -n vcluster-my-team

Create a new PVC by copying from a working replica:

Modify the following with your specific values to generate a copyable command:

PVC PREFIX

VCLUSTER NAME

NAMESPACE

STORAGE SIZE

STORAGE CLASS

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-my-vcluster-0
namespace: vcluster-my-team
spec:
accessModes:
  - ReadWriteOnce
resources:
  requests:
    storage: 5Gi
dataSource:
  name: data-my-vcluster-1
  kind: PersistentVolumeClaim
storageClassName: gp2

Apply the PVC:

kubectl apply -f pvc-restore.yaml

Start with one replica to verify the restored data:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl scale statefulset my-vcluster --replicas=1 -n vcluster-my-team
Monitor the startup:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl logs -f my-vcluster-0 -n vcluster-my-team
After it's stable, scale up to the desired number of replicas.

Complete data loss recovery

warning

This recovery method results in data loss up to the last backup point. Only proceed if you have verified that all etcd replicas are corrupted and no working replicas remain.

When the majority of etcd member replicas become corrupted or deleted simultaneously, the entire cluster requires recovery from backup.

Verify all PVCs are corrupted or inaccessible:
Modify the following with your specific values to generate a copyable command:
VCLUSTER LABEL
NAMESPACE
kubectl get pvc -l app=vcluster -n vcluster-my-team
Modify the following with your specific values to generate a copyable command:
PVC PREFIX
VCLUSTER NAME
NAMESPACE
kubectl describe pvc data-my-vcluster-0 data-my-vcluster-1 data-my-vcluster-2 -n vcluster-my-team
Stop all vCluster instances before beginning recovery:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl scale statefulset my-vcluster --replicas=0 -n vcluster-my-team
Delete all corrupted PVCs:
Modify the following with your specific values to generate a copyable command:
PVC PREFIX
VCLUSTER NAME
NAMESPACE
kubectl delete pvc data-my-vcluster-0 data-my-vcluster-1 data-my-vcluster-2 -n vcluster-my-team
Follow a backup restoration procedure. This typically involves restoring PVCs from your backup solution (Velero, CSI snapshots, or similar tools).

Restore from snapshot:
Modify the following with your specific values to generate a copyable command:
RESTORE FILE
kubectl apply -f backup-restore.yaml
Scale up to a single replica to verify the restoration:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl scale statefulset my-vcluster --replicas=1 -n vcluster-my-team
Monitor logs and verify the cluster starts successfully:
Modify the following with your specific values to generate a copyable command:
VCLUSTER NAME
NAMESPACE
kubectl logs -f my-vcluster-0 -n vcluster-my-team
After it's verified, scale to the desired number of replicas.

Config reference

`embedded` required object pro

Embedded defines to use embedded etcd as a storage backend for the virtual cluster

`enabled` required boolean false pro

Enabled defines if the embedded etcd should be used.

`migrateFromDeployedEtcd` required boolean false pro

MigrateFromDeployedEtcd signals that vCluster should migrate from the deployed external etcd to embedded etcd.

`snapshotCount` required integer pro

SnapshotCount defines the number of snapshots to keep for the embedded etcd. Defaults to 10000 if less than 1.

`extraArgs` required string[] [] pro

ExtraArgs are additional arguments to pass to the embedded etcd.

How embedded etcd works​

Scaling behavior​

Disaster recovery​

Automatic recovery​

Manual recovery​

Recover a single replica​

Recover the entire cluster​

Complete data loss recovery​

Config reference​

embedded required object pro​

enabled required boolean false pro​

migrateFromDeployedEtcd required boolean false pro​

snapshotCount required integer pro​

extraArgs required string[] [] pro​