Skip to main content
Version: v0.26 Stable

Embedded etcd

Enterprise-Only Feature

This feature is an Enterprise feature. See our pricing plans or contact our sales team for more information.

When using this backing store option, etcd is deployed as part of the vCluster control plane pod to reduce the overall footprint.

controlPlane:
backingStore:
etcd:
embedded:
enabled: true

How embedded etcd works​

Embedded etcd starts the etcd binary with the Kubernetes control plane inside the vCluster pod. This enables vCluster to run in high availability (HA) scenarios without requiring a separate StatefulSet or Deployment.

vCluster fully manages embedded etcd and provides these capabilities:

  • Dynamic scaling: Scales the etcd cluster up or down based on vCluster replica count.
  • Automatic recovery: Recovers etcd in failure scenarios such as corrupted members.
  • Seamless migration: Migrates from SQLite or deployed etcd to embedded etcd automatically.
  • Simplified deployment: Requires no additional StatefulSets or Deployments.

Scaling behavior​

vCluster dynamically builds the etcd cluster based on the number of desired replicas. For example, when you scale vCluster from 1 to 3 replicas, vCluster automatically adds the new replicas as members to the existing single-member cluster. Similarly, vCluster removes etcd members when you scale down the cluster.

When scaling down breaks quorum (such as scaling from 3 to 1 replicas), vCluster rebuilds the etcd cluster without data loss or interruption. This enables dynamic scaling up and down of vCluster.

Disaster recovery​

When embedded etcd encounters failures, vCluster provides both automatic and manual recovery options to restore cluster capabilities.

Automatic recovery​

vCluster recovers the etcd cluster automatically in most failure scenarios by removing and readding the failing member. Automatic recovery occurs in these cases:

  • Unresponsive member: Etcd member is unresponsive for more than 2 minutes.
  • Detected issues: Corruption or another alarm is detected on the etcd member.

vCluster attempts to recover only a single replica at a time. If recovering an etcd member results in quorum loss, vCluster does not recover the member automatically.

Manual recovery​

Recover a single replica​

When a single etcd replica fails, vCluster can recover the replica automatically in most cases, including:

  • Replica database corruption
  • Replica database deletion
  • Replica PersistentVolumeClaim (PVC) deletion
  • Replica removal from etcd cluster using etcdctl member remove ID
  • Replica stuck as a learner

If vCluster cannot recover the single replica automatically, wait at least 10 minutes before deleting the replica pod and PVC. This action causes vCluster to rejoin the etcd member.

Recover the entire cluster​

In rare cases, the entire etcd cluster requires manual recovery. This occurs when the majority of etcd member replicas become corrupted or deleted simultaneously (such as 2 of 3, 3 of 5, or 4 of 7 replicas). In this scenario, etcd fails to start and vCluster cannot recover automatically.

note

Normal pod restarts or terminations do not require manual recovery. These events trigger automatic leader election within the etcd cluster.

Recovery procedures depend on whether the first replica (the pod ending with -0) is among the failing replicas.

Use the following procedures when some replicas are still functioning:


  1. Scale the StatefulSet to one replica:

    Modify the following with your specific values to generate a copyable command:
    kubectl scale statefulset my-vcluster --replicas=1 -n vcluster-my-team

    Verify only one pod is running:

    Modify the following with your specific values to generate a copyable command:
    kubectl get pods -l app=vcluster -n vcluster-my-team
  2. Monitor the rebuild process:

    Modify the following with your specific values to generate a copyable command:
    kubectl logs -f my-vcluster-0 -n vcluster-my-team

    Watch for log messages indicating etcd is ready and the cluster is in good condition.

  3. Scale back up to your target replica count:

    Modify the following with your specific values to generate a copyable command:
    kubectl scale statefulset my-vcluster --replicas=3 -n vcluster-my-team

    Verify all replicas are running:

    Modify the following with your specific values to generate a copyable command:
    kubectl get pods -l app=vcluster -n vcluster-my-team
    kubectl logs my-vcluster-0 -n vcluster-my-team | grep "cluster is ready"

Complete data loss recovery​

warning

This recovery method results in data loss up to the last backup point. Only proceed if you have verified that all etcd replicas are corrupted and no working replicas remain.

When the majority of etcd member replicas become corrupted or deleted simultaneously, the entire cluster requires recovery from backup.

  1. Verify all PVCs are corrupted or inaccessible:

    Modify the following with your specific values to generate a copyable command:
    kubectl get pvc -l app=vcluster -n vcluster-my-team

    Modify the following with your specific values to generate a copyable command:
    kubectl describe pvc data-my-vcluster-0 data-my-vcluster-1 data-my-vcluster-2 -n vcluster-my-team
  2. Stop all vCluster instances before beginning recovery:

    Modify the following with your specific values to generate a copyable command:
    kubectl scale statefulset my-vcluster --replicas=0 -n vcluster-my-team
  3. Delete all corrupted PVCs:

    Modify the following with your specific values to generate a copyable command:
    kubectl delete pvc data-my-vcluster-0 data-my-vcluster-1 data-my-vcluster-2 -n vcluster-my-team
  4. Follow a backup restoration procedure. This typically involves restoring PVCs from your backup solution (Velero, CSI snapshots, or similar tools).


    Restore from snapshot:

    Modify the following with your specific values to generate a copyable command:
    kubectl apply -f backup-restore.yaml
  5. Scale up to a single replica to verify the restoration:

    Modify the following with your specific values to generate a copyable command:
    kubectl scale statefulset my-vcluster --replicas=1 -n vcluster-my-team

    Monitor logs and verify the cluster starts successfully:

    Modify the following with your specific values to generate a copyable command:
    kubectl logs -f my-vcluster-0 -n vcluster-my-team

    After it's verified, scale to the desired number of replicas.

Config reference​

embedded required object pro​

Embedded defines to use embedded etcd as a storage backend for the virtual cluster

enabled required boolean false pro​

Enabled defines if the embedded etcd should be used.

migrateFromDeployedEtcd required boolean false pro​

MigrateFromDeployedEtcd signals that vCluster should migrate from the deployed external etcd to embedded etcd.

snapshotCount required integer pro​

SnapshotCount defines the number of snapshots to keep for the embedded etcd. Defaults to 10000 if less than 1.

extraArgs required string[] [] pro​

ExtraArgs are additional arguments to pass to the embedded etcd.