How Nscale Builds Cloud-Scale AI Platforms with vCluster

100K
GPUs planned for AI supercluster infrastructure

2
Minutes to provision a customer-ready AI cluster

1
Kubernetes substrate for multi-tenant workloads

—

"Using vCluster lets us provision bare metal Kubernetes clusters in minutes while still giving customers the isolation and flexibility they expect from the cloud. It’s a key part of how we deliver AI infrastructure with HPC-class performance."

Matt Prior

Director of Cloud Native Platform Engineering @ Nscale

Matt Prior, Director of Cloud Native Platform Engineering at Nscale, says: “We provision a large bare-metal underlay cluster and then create customer clusters on top using vCluster.”

Without vCluster

Building Cloud-Like AI Infrastructure on Bare Metal Is Incredibly Hard

Nscale is building a next-generation AI cloud that spans the full stack, from data centers and bare metal infrastructure to managed Kubernetes, Slurm, and AI services. Its goal is ambitious. Deliver cloud-like flexibility for AI workloads while preserving the raw performance characteristics of HPC infrastructure.

That challenge is harder than it sounds.

Large-scale AI training workloads increasingly behave like traditional HPC jobs. They require massive parallel compute, ultra-low-latency networking, high-bandwidth storage, and careful node placement across the fabric. Unlike traditional HPC environments, AI customers expect self-service access, fast provisioning, Kubernetes-native tooling, and support for a wide variety of frameworks and access patterns.

For Nscale, that created a difficult platform problem. How do you give every customer an isolated Kubernetes environment without introducing the operational sprawl, provisioning delays, and performance overhead that come with spinning up separate physical clusters or relying on heavy virtualization?

What Nscale Needed

A multi-tenant Kubernetes architecture that could run directly on bare metal the overhead of virtual machines
Fast cluster provisioning for AI workloads that need to scale on demand
Strong tenant isolation without giving up access to GPUs, RDMA, and HPC networking
A way to centralize platform services like drivers, storage, and monitoring across shared infrastructure
Support for both Kubernetes-native AI workloads and traditional Slurm-based batch jobs

With vCluster

Bare Metal Kubernetes Clusters in Minutes, Built for AI and HPC

vCluster gave Nscale the foundation to build what Matt Prior described as “one Kubernetes to rule them all.” Nscale runs a large underlay Kubernetes cluster on bare metal that it owns and operates. Customer clusters are provisioned on top as virtual clusters.

Instead of deploying a separate physical Kubernetes cluster for every tenant, Nscale uses vCluster to give each customer their own control plane. Each cluster includes its own API server, etcd, RBAC, and CRD namespace. Workloads are scheduled directly onto the underlying bare metal nodes. Customers get Kubernetes isolation and flexibility without the performance penalties of virtualization.

Rapid Bare Metal Cluster Provisioning

Because the underlay infrastructure already exists, Nscale can bring up new customer Kubernetes environments in minutes rather than the much longer timelines associated with provisioning standalone clusters. In live demos, Nscale has shown the ability to provision 10 bare metal nodes with 80 GPUs in roughly two minutes.

Secure Tenant Isolation on Shared Infrastructure

Running customer workloads directly on the underlay cluster requires strong isolation. Nscale built multiple layers of protection around this model.

Dedicated customer nodes. Each virtual cluster is assigned dedicated nodes, which prevents noisy-neighbor interference between tenants
Tenant network isolation. Network policies prevent cross-cluster traffic while preserving the high-performance communication required for AI workloads
Sandboxed workloads. Nscale is adopting vNode to place pods in isolated sandboxes. This improves security while allowing a better developer experience than restrictive pod security policies alone

Shared Services Without Per-Tenant Overhead

Nscale can run critical infrastructure services once on the underlay cluster and expose them across customer environments. GPU drivers, device plugins, storage classes, network drivers, monitoring, and exporters are centrally managed by Nscale. Customers can immediately consume GPU and storage resources without configuring low-level infrastructure themselves.

Optimized for HPC-Style AI Workloads

Nscale uses its underlay architecture to optimize for InfiniBand and RDMA-based communication. Nodes are labeled according to their topology within the network fabric. Nscale’s placement system attempts to allocate nodes as close together as possible. This minimizes latency and maximizes performance for distributed AI training workloads.

Many large-scale training jobs still rely on Slurm. Nscale layered managed Slurm services on top of these virtual Kubernetes clusters. Customers can run both cloud-native and HPC-style workloads on the same platform.

With vCluster, Nscale built a platform that delivers cloud-like self-service and multi-tenancy while achieving the bare metal performance AI infrastructure demands.

Why vCluster

A Foundation for Cloud-Scale AI Infrastructure

Nscale selected vCluster because it enabled a model that would have been extremely difficult to replicate with separate physical clusters or traditional virtualization. vCluster allows Nscale to unify operations, preserve bare metal performance, and deliver isolated Kubernetes environments at the speed AI customers expect.

Bare metal performance without virtualization overhead. Customer pods run directly on the underlying hosts. This preserves GPU and networking performance for demanding AI and HPC workloads
Fast provisioning at cloud speed. New Kubernetes environments can be created in minutes, helping Nscale deliver an on-demand experience on top of bare metal infrastructure
True multi-tenant Kubernetes isolation. Each tenant receives an isolated control plane with its own RBAC, CRDs, and API surface, without the cost and complexity of per-customer physical clusters
Centralized platform operations. By managing drivers, storage, monitoring, and core services on the underlay cluster, Nscale reduces duplication and simplifies lifecycle management at scale
Built for the convergence of AI and HPC. vCluster supports the flexible Kubernetes experience AI users expect while fitting into an architecture optimized for Slurm, RDMA, and high-performance networking

With vCluster as the foundation, Nscale is building an AI platform that combines the operational efficiency of a shared Kubernetes substrate with the performance characteristics of dedicated HPC infrastructure.

Looking Ahead: Extending the Platform for the Next Era of AI Infrastructure

Nscale is just getting started. As demand for large-scale AI training and inference infrastructure continues to grow, the company is expanding the platform it has built on top of this underlay model. This includes supporting larger GPU deployments, broader AI platform services, and additional infrastructure abstractions on the same foundation.

For organizations building next-generation AI clouds, the Nscale approach offers a compelling model. Use vCluster to deliver isolated Kubernetes environments on top of a shared bare metal substrate while maintaining the performance characteristics required for modern AI workloads.

View all case studies

Start Lowering Your Kubernetes Cost Today

See how vCluster can streamline your operations and reduce expenses.

Get Started

Request demo