Get the Public Cloud Experience on Your DGXÂ Systems
Run elastic, multi-tenant Kubernetes clusters directly on NVIDIA DGX hardware. vCluster brings the elasticity, automation, and developer experience of the public cloud to your on-prem GPU infrastructure.

What’s Holding Back Your NVIDIA GPU Investment
Before vCluster, DGX deployments were either too rigid or too fragmented—limiting utilization and slowing teams down. With vCluster, NVIDIA GPU infrastructure becomes cloud-native, efficient, and fully automated.
Fragmented, Rigid, and Hard to Scale
Most DGX systems today are deployed as either one giant shared cluster or many small ones, both come with trade-offs that limit performance and efficiency.
- Manual provisioning and upgrades across DGX nodes
- Separate clusters per team for isolation
- Idle GPUs and wasted capacity
- Fragmented workflows outside standard DevOps pipelines
- Complex compliance and security management
Elastic, Kubernetes-Native AI Infrastructure
vCluster transforms DGX environments into cloud-like Kubernetes platforms—secure, dynamic, and automated—with unified control over GPU resources.
- Automated provisioning and lifecycle management via BCM + vCluster
- Isolated virtual clusters without VM overhead
- GPUs scale dynamically with Auto Nodes
- Seamless GitOps and Terraform integration
- Strong governance and high utilization
The Building Blocks of Cloud-Grade DGX Operations
Each capability represents a step in modernizing DGX infrastructure, from automation and elasticity to security and hybrid connectivity.
- 1. Automate GPU Lifecycle ManagementIntegrate vCluster with NVIDIA Base Command Manager (BCM) to automate provisioning, scaling, and lifecycle operations for DGX and SuperPOD systems. Manage GPU capacity through a Kubernetes-native control plane.
- 2. Isolate Tenants Without VMsCombine Private Nodes with the vNode Runtime to deliver secure, isolated environments for each team. Prevent container breakouts and ensure GPU isolation without reverting to virtual machines.
- 3. Scale GPU Workloads on DemandUse Auto Nodes to dynamically scale GPU and CPU resources across clouds, data centers, and bare metal. Automatically add or remove DGX capacity as workloads fluctuate, maximizing utilization and reducing idle time.
- 4. Connect Hybrid Environments SecurelyEnable vCluster VPN to create private, encrypted communication between control planes and worker nodes, ideal for hybrid or burst-to-cloud GPU deployments.
- 5. Simplify Network Management for TenantsUse the Netris Integration to automate network isolation and lifecycle management. Each tenant receives its own secure network path with policies and firewalls managed declaratively.
Empowering Every AI Team on DGX
vCluster enables teams across data science, platform engineering, and operations to collaborate efficiently on shared GPU infrastructure, without compromising speed, cost, or security.
Launch isolated development environments in minutes, then put them to sleep when idle to reclaim GPU resources.
Scale DGX workloads dynamically across hybrid clusters to meet real-time compute demands.
Unify provisioning, monitoring, and upgrades across all DGX systems with one control plane.
Optimization
Track GPU usage per tenant, automate scaling, and eliminate idle cost waste.
Proven Gains in Efficiency and Scale
Organizations running AI infrastructure on NVIDIA DGX with vCluster achieve measurable improvements in utilization, velocity, and simplicity.
Faster Provisioning
- Cut environment setup from days to minutes through declarative automation.
Higher GPU Utilization
- 60–85% GPU usage sustained across production clusters.
Reduced Idle Time
- Idle GPU hours lowered by up to 70% with Auto Nodes and Sleep Mode.
Simplified Operations
- 50–70% fewer manual management tasks with vCluster Platform.
Talk with our team about turning your DGX clusters into a scalable, self-service platform for AI and ML workloads.
