You’re building an AI Cloud. You’ll serve many types of customers, including financial institutions. The moment that happens, the Digital Operational Resilience Act (DORA) becomes part of your world.
DORA entered into application on 17 January 2025. It requires financial entities, including banks, insurers, asset managers, and market infrastructure firms, to withstand, respond to, and recover from ICT disruptions. It covers ICT risk management, incident handling and reporting, resilience testing, third-party risk management, and oversight of critical ICT third-party providers (CTPPs). The European Supervisory Authorities (EBA, EIOPA, and ESMA) have begun designating CTPPs, which makes the oversight dimension of DORA operational rather than theoretical and raises supervisory expectations on providers in the financial sector’s ICT supply chain.
Even if you are not the regulated financial entity yourself, you sit inside your customers’ ICT supply chain. You show up in their register of information. You fall inside their resilience testing scope. And if your footprint grows large enough, you may eventually surface in the broader DORA oversight conversation around critical ICT providers.
DORA is less about qualifying a static architecture and more about proving operational resilience as an ongoing capability. The question is not only whether workloads are isolated. It is whether critical services can be recovered, tested, governed, monitored, and explained under pressure.
vCluster can help you implement technical patterns that support DORA readiness, especially around Tenant Isolation, scoped recovery, repeatable environments, and clearer operational boundaries. Broader DORA obligations still sit with the financial entity and its governance, legal, risk, and reporting processes. This article focuses on the enabling architecture layer: what vCluster supports, and what it does not.
Why DORA Is Challenging for Sovereign AI Clouds
Shared AI infrastructure is designed for density and speed. DORA pushes those same platforms to add strong governance, incident visibility, repeatable recovery, and disciplined third-party controls on top.
In practice, teams struggle with:
- Designing and exercising backup and recovery that can be credibly demonstrated, not just documented.
- Producing clean incident evidence quickly enough to support classification and reporting deadlines.
- Separating critical workloads from non-critical workloads without duplicating entire clusters.
- Running resilience tests in environments that actually match production.
- Defining clear third-party failure domains and dependency boundaries for customer due diligence.
- Reducing concentration risk without creating infrastructure sprawl.
- Maintaining auditability across dynamic AI workloads and multiple Tenant Clusters.
Traditional approaches force a painful tradeoff: either create a separate physical cluster for every regulated tenant, zone, or test case (expensive, slow to provision, operationally heavy) or rely on namespace-level separation (insufficient for the Tenant Isolation and resilience expectations implied by DORA). vCluster helps operators avoid that tradeoff by providing scoped, recoverable, auditable Tenant Clusters on shared infrastructure.
Breaking Down DORA into Platform Challenges
Instead of treating DORA as a long list of controls, it is more useful to map it to platform-level challenges for AI Clouds. Several of those challenges align with capabilities that vCluster helps operators implement.
1. Backup, Restore, and Recovery
DORA makes backup policies, restoration procedures, and recovery methods a first-class part of the ICT risk management framework. It is not enough to say backups exist. Recovery has to be designed, exercised, and credible. EIOPA guidance on Article 12 also emphasizes that restoration of backed-up data should rely on appropriately segregated ICT systems, which raises the bar beyond simple snapshotting.
vCluster supports this pattern at the Tenant Cluster boundary:
- Manual Snapshot & Restore backs up Tenant Cluster state to S3-compatible or OCI targets on demand.
- Automatic (Scheduled) Snapshots add retention policies and recurring capture.
- Persistent Volume Snapshots (Beta) extend recovery from control-plane state into workload data via CSI.
- High-Availability Mode, Embedded etcd, Multi-Region Mode provide control-plane resilience and geographic failure-domain separation.
- Virtual Cluster Status Page, vCluster Debug Shell make recovery playbooks easier to exercise and verify.
Instead of treating backup and restore as a whole-cluster exercise, teams can rehearse recovery at the Tenant Cluster boundary. This pattern fits regulated multi-tenant AI platforms well. Designing, testing, and evidencing recovery against DORA expectations remains the financial entity’s responsibility.
2. Incident Management and Operational Boundaries
DORA includes explicit requirements around incident management, classification, and reporting of major ICT-related incidents. The hard part operationally is rarely the existence of a process on paper. It is getting clean evidence quickly enough to support classification, escalation, and communication within mandated timeframes.
vCluster does not classify or report incidents, but it can make the underlying evidence easier to produce by localizing it:
- Audit Logging provides a centralized audit trail scoped per Tenant Cluster.
- Prometheus ServiceMonitor, Metrics Server expose control-plane and workload telemetry for detection.
- Central HostPath Mapper simplifies log collection across Tenant Clusters without breaking scope.
- Virtual Cluster Status Page, vCluster Debug Shell shorten the path from detection to investigation.
When each tenant or regulated workload has its own Virtual Control Plane boundary, incident analysis becomes less like forensics inside one giant shared cluster and more like working inside a scoped system with clearer ownership.
3. Governance, Access Control, and Separation of Duties
DORA’s ICT risk management framework begins with governance and organisation. In cloud terms, that means identity, access, change control, and clear separation of responsibilities between platform operators and tenant teams.
vCluster supports this model with:
- Identity and SSO through SSO (Okta, SAML, OIDC, LDAP, GitHub, Google), Dynamic OIDC Clients.
- Tenant-scoped boundaries via Projects, Permission Management, Extra Access Rules.
- Reduced blast radius with Least Privilege Mode for the Platform Agent, Egress-Only Agent Connections.
- Baseline hardening via Certificate Rotation CLI, Network Policies (default deny), CIS Hardening Guide.
The platform team governs the substrate. The tenant team operates inside a bounded environment. That technical separation aligns with the kind of operational clarity regulated reviews tend to look for, although board-level governance and formal separation-of-duties attestations sit with the financial entity.
4. Digital Operational Resilience Testing
DORA requires digital operational resilience testing, including broader testing requirements and advanced threat-led penetration testing (TLPT) for certain entities. The hidden blocker for most platforms is not willingness to test. It is environment drift. If recovery, failover, and security tests happen in environments that do not match production, the evidence is weak from the start.
vCluster helps operators keep environments repeatable:
- Templates, Template Cloning, Template Versioning standardize Tenant Cluster shapes across dev, staging, resilience test, and production.
- vcluster.yaml unified config keeps configuration declarative and diff-able.
- GitOps and IaC via Argo CD, Terraform, Cluster API (CAPI) turn environment creation into an auditable pipeline.
- Externally Deployed Tenant Cluster Adoption brings existing clusters under consistent governance so test scope matches reality.
Fast, isolated Tenant Clusters also make restore drills and pre-production resilience tests less disruptive to the rest of the platform. Performing TLPT, scoping tests to critical or important functions, and interpreting results against DORA remain obligations of the financial entity.
5. Third-Party Risk and Concentration Management
DORA’s third-party risk section covers general principles, concentration risk, key contractual provisions, and the register of information. It sits alongside a live oversight framework for designated critical ICT third-party providers. Customers will ask harder questions about dependency boundaries, failure domains, subcontracting chains, and which services support critical or important functions.
vCluster does not remove those questions, but it gives operators multiple isolation models so technical answers can match workload criticality:
- Shared Nodes, Dedicated Nodes for efficient, shared environments with workload-pinned compute.
- Private Nodes, vCluster Standalone for stronger workload and infrastructure boundaries, including separate CNI, CSI, and kernel.
- Multi-Region Mode, Air-Gapped Mode help align deployments with failure-domain and sovereignty expectations.
- External Database, HashiCorp Vault, External Secrets Operator let secrets and state sit behind customer-controlled systems.
- Netris Integration supports sovereign bare-metal networking for regulated footprints.
6. Environment Consistency and Auditability
DORA requires systematic documentation, traceability of actions, and the ability to produce consistent evidence across environments during supervisory reviews. When environments drift, so does evidence quality.
vCluster helps operators keep consistency and evidence aligned:
- Templates, Template Versioning enforce identical Tenant Cluster architecture across environments.
- Audit Logging produces a scoped, API-level record of actions per Tenant Cluster.
- FIPS 140-2 Compliant Images, CIS Hardening Guide align control-plane hardening with recognized baselines.
- Externally Deployed Tenant Cluster Adoption helps ensure externally deployed Tenant Clusters are governed through the same evidence pipeline.
Where vCluster Helps
vCluster introduces Tenant Clusters on top of a shared Control Plane Cluster. Each Tenant Cluster includes its own Virtual Control Plane, which enables scoped recovery, clear audit boundaries, and strong Tenant Isolation while still sharing underlying infrastructure efficiently.
For operators of sovereign AI Clouds serving the financial sector, this pattern supports the technical side of DORA readiness, especially around Tenant Isolation, scoped recovery, repeatable environments, and clearer operational boundaries. Broader DORA obligations remain with the financial entity and its governance, legal, and reporting processes.
Scoped Recovery with Dedicated Tenant Clusters
Instead of treating backup and restore as a whole-cluster exercise, each tenant or critical workload runs inside its own Tenant Cluster. This enables:
- Recovery scoped to a Tenant Cluster boundary rather than the full physical cluster.
- Snapshot and restore via Manual Snapshot & Restore and Automatic (Scheduled) Snapshots, with data recovery via Persistent Volume Snapshots (Beta).
- Control-plane resilience via High-Availability Mode, Embedded etcd, and Multi-Region Mode.
- Multiple tenancy models to match the required resilience level: Shared Nodes, Dedicated Nodes, Private Nodes, or vCluster Standalone.
Clear Incident Boundaries and Audit Trails
Tenant Clusters create a natural and auditable scope for incident evidence:
- Each Tenant Cluster has its own API audit log and RBAC history via Audit Logging.
- Prometheus ServiceMonitor, Metrics Server, and Central HostPath Mapper centralize telemetry without losing scope.
- Virtual Cluster Status Page and vCluster Debug Shell shorten the path from detection to investigation.
- Platform teams manage the Control Plane Cluster and policy; tenant teams operate inside their Tenant Clusters with scoped administrative access.
This pattern aligns with DORA’s separation-of-duties expectations at the technical layer and helps make incident classification, reporting, and post-incident review easier to evidence. The classification and reporting decisions themselves remain the financial entity’s responsibility.
Repeatable Environments for Resilience Testing
Audit readiness and resilience testing are where DORA-regulated engagements often lose time. vCluster helps reduce that burden:
- Tenant Clusters can be templated and versioned through Templates, Template Cloning, and Template Versioning.
- Deployment is declarative through vcluster.yaml unified config, Argo CD, Terraform, and Cluster API (CAPI).
- What gets validated in staging is what runs in production.
- Existing clusters can be brought under consistent governance via Externally Deployed Tenant Cluster Adoption.
Flexible Tenant Isolation for Concentration Risk
One of the biggest challenges in DORA readiness is balancing resilience with cost and operational efficiency. Without vCluster, teams typically create many physical clusters to achieve the separation DORA implies for critical or important functions. This leads to infrastructure sprawl, higher costs, slower provisioning, and a larger operational surface to secure and audit.
- Critical workloads can move onto Private Nodes or vCluster Standalone footprints with their own CNI, CSI, and kernel.
- Less sensitive environments stay on more efficient Shared Nodes or Dedicated Nodes models.
- Failure-domain and sovereignty expectations can be supported with Multi-Region Mode, Air-Gapped Mode, and Netris Integration.
Operators can align isolation strength with workload criticality instead of applying the most expensive model everywhere. Concentration risk at the entity level still needs to be managed through contracts, exit plans, and the register of information.
The Other Parts That Architecture Does Not Solve
vCluster or any other Infra Solution is not a DORA compliance product and does not by itself satisfy DORA obligations, which apply to financial entities and, where designated, to critical ICT third-party providers.
vCluster operates at the technical architecture layer. The following items sit outside that scope and must be addressed through other means:
- DORA governance framework and board-level risk management accountability.
- Incident classification and reporting to competent authorities within mandated timeframes.
- Maintenance of the register of information for ICT third-party arrangements.
- Negotiation of DORA-mandated contractual provisions with suppliers and customers.
- Performance of threat-led penetration testing (TLPT) where applicable.
- Entity-level organizational controls, personnel arrangements, and supervisory reporting.
- Contractual exit plans, subcontracting-chain oversight, and concentration risk management at the entity level.
Where this article refers to DORA provisions, the intent is to describe technical patterns that support readiness, not to claim that any product, including vCluster, satisfies those provisions on its own. With that boundary established, the rest of the article looks at where the technical architecture layer can meaningfully accelerate the work.
How vCluster Accelerates DORA Readiness
vCluster does not replace DORA compliance processes. It helps accelerate how quickly teams can implement the technical architecture those processes depend on, which is often the biggest bottleneck on the path to readiness.
Key acceleration points:
- Faster provisioning of Tenant Clusters: new isolated environments are created in seconds, not hours. Teams iterate on resilience architecture rapidly instead of waiting on infrastructure tickets.
- Scoped backup and recovery: snapshot and restore at the Tenant Cluster boundary make recovery rehearsal practical, not theoretical.
- Standardized Tenant Cluster architecture: consistent, repeatable patterns across all tenants, workloads, and environments. What gets tested in staging is what runs in production.
- Simpler audit story: scoped API audit logs, scoped RBAC history, and clear platform-versus-tenant responsibility boundaries help reduce friction during supervisory reviews.
- Reduced infrastructure duplication: fewer physical clusters to provision, secure, patch, and audit. Every cluster eliminated is one less cluster an auditor or regulator needs to assess.
- Flexible isolation per workload: Tenant Isolation levels match workload criticality instead of overbuilding every environment.
- Lower operational overhead: platform teams focus on resilience controls and policy enforcement instead of infrastructure management.
The result is a faster, more predictable path to DORA readiness at the technical architecture layer.
Reference Architecture for a Sovereign AI Cloud Aligned with DORA’s Technical Resilience Expectations
A typical architecture that supports this alignment includes:
- An EU-controlled Control Plane Cluster, or vCluster Standalone deployments where harder isolation is needed.
- Multiple Tenant Clusters segmented by customer, environment (dev, staging, resilience test, production), or criticality of the supported function.
- Centralized audit, metrics, and log pipelines using Audit Logging, Prometheus ServiceMonitor, and Central HostPath Mapper, with scope preserved per Tenant Cluster.
- Snapshot and restore targets on S3-compatible or OCI storage via Manual Snapshot & Restore and Automatic (Scheduled) Snapshots, with data recovery via Persistent Volume Snapshots (Beta).
- Secrets handled through HashiCorp Vault or External Secrets Operator.
- Repeatable deployment through Templates, Template Versioning, Argo CD, Terraform, or Cluster API (CAPI).
- Critical workloads moved onto Private Nodes or vCluster Standalone footprints; less sensitive environments retained on Shared Nodes or Dedicated Nodes models.
- Multi-Region Mode and Air-Gapped Mode where failure-domain or sovereignty requirements demand it.

This approach balances resilience, Tenant Isolation, scalability, and operational efficiency. vCluster provides the isolation, recovery, and environment management layer; the surrounding infrastructure and organizational controls address contractual, governance, and entity-level DORA obligations.
Getting to Readiness Faster
For sovereign AI Clouds working toward DORA, the main bottleneck is rarely understanding the regulation. It is operationalizing resilience without freezing platform delivery. Backup and restore need to be real. Incident evidence needs to be scoped and accessible. Test environments need to match production closely enough to matter. Third-party boundaries need to be clear enough to survive customer due diligence.
vCluster helps by:
- Enabling scoped recovery and strong Tenant Isolation without infrastructure sprawl, which is often the most operationally complex piece of DORA readiness for shared AI infrastructure.
- Providing clean, auditable separation between platform and tenant responsibilities.
- Making environments easier to standardize, replicate, and test.
- Supporting multiple isolation levels (Shared Nodes, Dedicated Nodes, Private Nodes, vCluster Standalone) so teams can match the right model to each workload’s resilience profile.
With the technical architecture handled, DORA readiness becomes less of a paperwork exercise and more of an ongoing operational discipline, shaped by governance, legal, and reporting processes that continue to sit with the financial entity.
Want to see how this fits your AI Cloud architecture? Explore a reference architecture by speaking with the vCluster team to accelerate your DORA readiness timeline.
Deploy your first virtual cluster today.
