Skip to main content
Version: v0.34 Stable

Distributed Compute Aggregation

One management plane for GPU compute from every source you own or contract. Whether that is a cloud region, a co-located data center, or a compute supplier, each site runs a Standalone Control Plane Cluster. Your central Platform instance sees the full fleet.

Typical stack: Central Platform (global). Per-DC Standalone Control Plane Cluster. Private GPU nodes co-located with each Standalone instance. vMetal for owned bare metal sites.

Compute aggregation architecture: central control plane with tenant clusters scheduling work onto distributed capacity sources
Central control plane managing tenant workloads across distributed compute capacity sources

What makes this path different: Every compute source, regardless of where it lives or who operates it, follows the same onboarding playbook. One ops team, one management plane, no per-provider tooling.

Day 0: Design decisions​

DecisionRead nextOutcome
Design the central-regional topologyMulti-region Platform, Standalone deploymentCentral Platform registers each DC's Standalone instance as a connected cluster. Tenant control planes run co-located with their GPU nodes at each site.
Define the supplier onboarding playbookStandalone HA, Private Nodes, Auto NodesStandardize the steps to register a new compute source: install Standalone, connect to central Platform, join GPU inventory, validate.
Plan networking between sitesVPNEach site's tenant clusters connect to their private nodes over VPN. Define whether tenants can span sites or are bound to a single DC.
Define tenant isolation at each sitePrivate Nodes, vNode docsEach tenant cluster at a site gets its own dedicated GPU nodes. vNode adds runtime isolation for untrusted workloads.

Day 1: Stand up the first compute source​

  1. Deploy vCluster Platform in the central region. Configure Platform HA and backup.
  2. At the first compute site, install vCluster Standalone and move to HA.
  3. Register the site's Standalone Control Plane Cluster with central Platform.
  4. Join the site's GPU inventory as private nodes. Configure Auto Nodes or vMetal for automated provisioning and reclaim.
  5. Configure templates and quotas in central Platform for this compute source.
  6. Provision a test tenant cluster at the site. Validate that it appears in central Platform and that workloads land on the correct GPU nodes.
  7. Document the supplier onboarding playbook. Repeat steps 2-6 for each additional compute source.

Day 2: Operate​

OperationRead next
Monitor the distributed fleetFleet monitoring
Onboard a new compute sourceSupplier onboarding playbook (internal), Standalone deployment
Upgrade across sitesUpgrade vCluster, upgrade Platform
Handle site failuresPlatform HA, multi-region Platform