Summary
- Kubernetes's lack of native tenant isolation forces a choice: physical isolation (one cluster per tenant) is secure but slow and expensive, creating operational bottlenecks at scale.
- Control plane virtualization offers a modern alternative, providing strong logical isolation with a dedicated API server, etcd, and CRDs for each tenant without the overhead of full physical clusters.
- Virtual clusters spin up in seconds at near-zero marginal cost, compared to minutes or hours for physical clusters, which duplicate resource overhead for every tenant.
- For building scalable GPU clouds or internal AI factories, vCluster delivers the speed, cost-efficiency, and centralized management needed to serve hundreds of isolated tenants.
You've built your Kubernetes platform. You've carefully crafted your RBAC policies, set up your namespaces, and onboarded your first few tenants. Then the cracks start appearing.
Tenants can't install their own operators because CRDs are cluster-wide resources — and that, as one platform engineer put it, "kills one of the biggest value adds of K8s." Your team is drowning in access tickets because "you're somewhat in the business of gating access, as an admin-access bearer — and this totally kills infra teams, velocity-wise and morale-wise." And when something breaks, tenants can't debug their own infrastructure because they can't see the systems running around them.
The hard truth: Kubernetes was never designed for tenant isolation. And the two most prominent architectural responses to this problem — Mirantis's physical cluster-per-tenant model and vCluster's control plane virtualization model — make fundamentally different bets on how to solve it.
This article breaks down both approaches, compares them across the dimensions that matter most (isolation, cost, speed, GPU support, Day 2 ops), and gives you a clear framework for choosing the right model at scale.
The Core Architectural Fork
Mirantis (MKE / k0smotron / k0rdent AI): Physical Isolation
Mirantis's approach is philosophically straightforward: give every tenant their own real Kubernetes cluster. With Mirantis Kubernetes Engine (MKE) or k0smotron managing the lifecycle, each tenant gets a dedicated API server, etcd, controller manager, and worker nodes. Separation is guaranteed at the infrastructure level.
For their AI play, Mirantis has launched k0rdent AI, part of the NVIDIA AI Cloud Ready Initiative, targeting GPU cloud deployments. They've also partnered with Netris to handle the networking complexity that comes with managing fleets of separate clusters.
The strength of this model is undeniable: you get the strongest possible tenant isolation because tenants are physically separated. For legacy enterprise environments with hard compliance mandates and a small, stable set of high-value tenants, this is defensible.
But the costs are steep. Provisioning a full cluster per tenant takes minutes to hours. The resource duplication — control plane VMs, etcd nodes, dedicated workers — means your cost per tenant is high and largely fixed. Scaling to hundreds of tenants means provisioning hundreds of clusters, each requiring its own monitoring stack, update cadence, and operational runbook. As one r/kubernetes user put it, "Deployment per tenant is a big anti-pattern. I've had to help two companies deal the fallout from following that pattern."
vCluster: Control Plane Virtualization
vCluster, built by vCluster Labs (formerly Loft Labs), takes the opposite approach. Instead of provisioning real clusters, vCluster virtualizes the Kubernetes control plane itself — running a certified K8s distribution (API server, controller manager, and etcd) as a lightweight pod inside a namespace on a host cluster.
Every tenant gets their own isolated control plane: their own API server, their own etcd, their own RBAC, their own CRDs. They get cluster-admin inside their tenant cluster and can install any operator they want without touching other tenants or the host cluster. The shared blast radius that makes namespace-based tenant isolation so risky simply doesn't exist here.
What changes is the cost and speed profile. A new tenant cluster spins up in seconds, not hours. The marginal cost per tenant is near-zero — just a few pods. This makes it economically viable to run hundreds or thousands of isolated tenant environments on shared GPU infrastructure, maximizing revenue per GPU rack.
The production numbers back it up: 100K+ GPU nodes powered, 40M+ tenant clusters created, and 50+ customers including CoreWeave, JPMorganChase, and Adobe. vCluster is also named in the NVIDIA DGX SuperPOD reference architecture.
Head-to-Head: Mirantis vs vCluster
Use-Case Fit Matrix: Which Should You Choose?
Choose Mirantis When:
- You are executing a legacy enterprise single-cluster upgrade and need to maintain a strict 1:1 application-to-cluster mapping during a transition period.
- Your compliance or security model mandates physically separate infrastructure per tenant, and cost efficiency is not the primary driver.
- You have a small, static set of high-value tenants (think: two internal teams or three business units) and provisioning speed is irrelevant.
- You are already heavily invested in the Mirantis toolchain and the switching cost outweighs the operational benefits of virtualization.
Choose vCluster When:
- You are building a GPU cloud with tenant isolation, neocloud, or internal AI factory and need to serve tens, hundreds, or thousands of tenant clusters at near-zero marginal cost. vCluster was purpose-built for this.
- Speed matters. You need tenants — whether paying customers or internal development teams — to get isolated Kubernetes environments in seconds, not hours.
- Cost per tenant is a business metric. Every dollar of wasted control plane overhead on a dedicated cluster is a dollar not spent on actual GPU compute.
- You need to give tenants genuine autonomy: let them install their own operators, manage their own RBAC, and deploy their own CRDs without filing a ticket or waiting for a cluster admin to greenlight it.
- You need centralized Day 2 operations across a large fleet without building a custom management platform from scratch.
Beyond Provisioning: Day 2 Operations and the Full Stack Advantage
Spinning up a tenant cluster is just the beginning. The real question is: what does managing 200 of them look like six months in?
With Mirantis's physical cluster model, each cluster is an independent operational unit. You need to instrument each one for observability, manage each one's update lifecycle, and back up each one's etcd independently. The management overhead scales linearly with tenant count — which is exactly why "deployment per tenant" keeps coming up as an anti-pattern in the Kubernetes community.
vCluster was designed with this problem in mind. It provides a single pane of glass — UI, CLI, and API — for managing your entire fleet of tenant clusters across clouds and data centers. Features include:
- Centralized fleet management with templates, quotas, and GitOps/IaC integration (Terraform, Argo CD, CI/CD)
- Self-service tenant portal that gives end-users an EKS/GKE-like experience without burdening your platform team
- Built-in Day 2 tooling — observability, automated updates, backups, disaster recovery, and compliance out of the box
And for teams building full GPU cloud infrastructure, the vCluster Labs stack goes further than any competitor:
- vMetal: Zero-touch bare metal provisioning for GPU servers — PXE boot, OS install, machine registration, and Auto Nodes (Bare Metal Karpenter) that provisions GPU nodes on demand as tenants schedule workloads. No dependency on k3s, kubeadm, or k0s.
- vCluster: Tenant cluster orchestration on top of the provisioned infrastructure.
- vNode: Kernel-native workload isolation (seccomp, cgroups, namespaces, AppArmor) that delivers container breakout protection without any VM or hypervisor overhead — preserving bare metal GPU performance while hardening the tenant boundary.
- Certified Stacks: Pre-validated AI environments (Run:AI, Ray, Jupyter, Slurm via Slinky) that turn a blank tenant cluster into a production AI platform in minutes, not weeks.
This is the complete path — raw GPU racks → bare metal provisioning → tenant clusters → workload isolation → production AI platform — in one integrated stack. Lintasarta used it to launch Indonesia's leading GPU cloud in 90 days with 170+ tenant clusters. Boost Run launched in under 45 days with zero new platform engineering hires.
No other vendor in the Mirantis vs vCluster comparison — or the broader market — delivers that full-stack path without requiring you to stitch together four different vendors.
The Bottom Line
Both Mirantis and vCluster solve real problems. Mirantis's physical isolation model is battle-tested and appropriate for legacy enterprise environments where a small number of high-value, physically separate clusters makes compliance simpler. If you're upgrading a monolithic single-cluster architecture or have hard mandates against shared infrastructure, it's a defensible choice.
The defining infrastructure challenge of this moment is building GPU clouds with tenant isolation, neoclouds, and internal AI factories that serve hundreds of isolated tenants at speed and scale. For this challenge, Mirantis's architecture is the wrong shape for the problem. The cost per tenant is too high, the provisioning is too slow, and the operational overhead compounds as you grow.
vCluster's control plane virtualization delivers the isolation you need (full cluster-admin, dedicated API server and etcd per tenant, no shared blast radius) at the economics the GPU cloud era demands (near-zero marginal cost, seconds to provision, centralized Day 2 management at any scale).
Ready to see it in action? Visit vCluster and spin up your first isolated tenant cluster in minutes — no new infrastructure required.
Frequently Asked Questions
What is the main difference between Mirantis and vCluster for Kubernetes tenant isolation?
The main difference is their isolation model: Mirantis uses physically separate Kubernetes clusters for each tenant, while vCluster uses control plane virtualization to create isolated tenant environments as lightweight pods on a shared host cluster. Mirantis's approach provides the strongest possible hardware-level isolation, but it comes at a high cost with slow provisioning times. vCluster's approach offers strong logical isolation at a near-zero marginal cost. Its clusters spin up in seconds, making it ideal for large-scale deployments like GPU clouds.
Why is simple namespace-based tenancy not enough for Kubernetes?
Namespace-based tenancy is often insufficient because it doesn't provide true isolation for cluster-wide resources like Custom Resource Definitions (CRDs), leading to conflicts and security risks between tenants. It also forces platform teams to become a bottleneck for managing permissions. When tenants share a single control plane, they cannot install their own operators or manage their own CRDs without potentially impacting others, which limits their autonomy and makes debugging difficult.
How does vCluster provide strong tenant isolation without physical clusters?
vCluster provides strong isolation by giving each tenant their own virtualized Kubernetes control plane (API server, controller manager, etcd) running as a stateful set within a namespace on the host cluster. This architecture ensures that tenants are completely separated at the API level. Each tenant has cluster-admin privileges within their own virtual cluster, allowing them to manage their own RBAC, install operators, and use CRDs without affecting the host cluster or other tenants.
When should I choose Mirantis over vCluster?
You should choose the Mirantis model when your compliance or security model strictly mandates physically separate hardware per tenant and when cost and speed are not your primary concerns. This approach is a defensible choice for legacy enterprise environments with a small, static number of high-value tenants, or for organizations already heavily invested in the Mirantis ecosystem.
What makes vCluster a better fit for GPU clouds and AI platforms?
vCluster is better for GPU clouds and AI platforms because it allows providers to maximize resource utilization by serving hundreds or thousands of isolated tenants from shared GPU infrastructure at a near-zero marginal cost. The speed (seconds to provision) and cost-efficiency of vCluster are critical for building profitable infrastructure-as-a-service offerings. Its architecture is purpose-built for this scale and is named in the NVIDIA DGX SuperPOD reference architecture.
How does vCluster simplify Day 2 operations at scale?
vCluster simplifies Day 2 operations by providing a centralized platform for managing the entire fleet of tenant clusters from a single pane of glass. Unlike the physical cluster model where each cluster is a separate management silo, the vCluster Platform offers a unified UI, CLI, and API for all tenant environments. This includes built-in tooling for observability, automated updates, backups, disaster recovery, and tenant self-service, drastically reducing the operational burden on platform teams.
Deploy your first virtual cluster today.