Bare Metal Kubernetes Clusters in Minutes, Built for AI and HPC
vCluster gave Nscale the foundation to build what Matt Prior described as “one Kubernetes to rule them all.” Nscale runs a large underlay Kubernetes cluster on bare metal that it owns and operates. Customer clusters are provisioned on top as virtual clusters.
Instead of deploying a separate physical Kubernetes cluster for every tenant, Nscale uses vCluster to give each customer their own control plane. Each cluster includes its own API server, etcd, RBAC, and CRD namespace. Workloads are scheduled directly onto the underlying bare metal nodes. Customers get Kubernetes isolation and flexibility without the performance penalties of virtualization.
Rapid Bare Metal Cluster Provisioning
Because the underlay infrastructure already exists, Nscale can bring up new customer Kubernetes environments in minutes rather than the much longer timelines associated with provisioning standalone clusters. In live demos, Nscale has shown the ability to provision 10 bare metal nodes with 80 GPUs in roughly two minutes.
Secure Tenant Isolation on Shared Infrastructure
Running customer workloads directly on the underlay cluster requires strong isolation. Nscale built multiple layers of protection around this model.
- Dedicated customer nodes. Each virtual cluster is assigned dedicated nodes, which prevents noisy-neighbor interference between tenants
- Tenant network isolation. Network policies prevent cross-cluster traffic while preserving the high-performance communication required for AI workloads
- Sandboxed workloads. Nscale is adopting vNode to place pods in isolated sandboxes. This improves security while allowing a better developer experience than restrictive pod security policies alone
Shared Services Without Per-Tenant Overhead
Nscale can run critical infrastructure services once on the underlay cluster and expose them across customer environments. GPU drivers, device plugins, storage classes, network drivers, monitoring, and exporters are centrally managed by Nscale. Customers can immediately consume GPU and storage resources without configuring low-level infrastructure themselves.
Optimized for HPC-Style AI Workloads
Nscale uses its underlay architecture to optimize for InfiniBand and RDMA-based communication. Nodes are labeled according to their topology within the network fabric. Nscale’s placement system attempts to allocate nodes as close together as possible. This minimizes latency and maximizes performance for distributed AI training workloads.
Many large-scale training jobs still rely on Slurm. Nscale layered managed Slurm services on top of these virtual Kubernetes clusters. Customers can run both cloud-native and HPC-style workloads on the same platform.
With vCluster, Nscale built a platform that delivers cloud-like self-service and multi-tenancy while achieving the bare metal performance AI infrastructure demands.