Insight

Securing the Edge: ConfidentialComputing in Distributed AI

In the world of distributed AI, trust is the only currency that matters.

TL;DR: Processing heavy AI workloads at the edge demands ironclad trust. Unpack how we definitively architected NVIDIA Confidential Computing to rigorously protect vulnerable data loaded in-memory.

Read the full story

In distributed AI, your threat model changes the moment your workload leaves the data centre. Confidential Computing with hardware attestation is not a nice-to-have for edge deployments - it is the only architecture that can credibly guarantee data sovereignty when your AI is running on infrastructure you do not own.

Razor AI Engineering Team

Security Practice

Ready to build your next game-changing platform?

Discover how bespoke engineering can transform your complex challenges.

The Challenge: Zero Trust in a Distributed Environment

The industry standard for securing high-performance compute environments, especially for LLM and AI workloads, is now Confidential Computing.

Standard encryption protects data at rest (storage) and in transit (network). However, the moment data is loaded into memory for processing, it is typically vulnerable. For our client, this gap was unacceptable. They needed a solution where the CPU, RAM, GPU, VRAM, NVLink, and PCIe bus communications were cryptographically protected, preventing unauthorized observation even from the host OS or a malicious administrator.

We identified NVIDIA Confidential Computing as the appropriate solution. By leveraging hardware-based Trusted Execution Environments (TEEs), we could ensure that the memory, CPU state, and GPU execution remained isolated from the host.

Phase 1: Proof of Concept with AMD SEV-SNP

Because of the complexity of the stack, we needed to validate every attack vector. Our journey began with deep technical sessions with NVIDIA solution architects to validate our approach to GPU attestation and encryption layers.

We started by configuring a Confidential Virtual Machine (CVM) on an AMD-based server using AMD SEV-SNP with KVM. This was not a "plug-and-play" operation; it required significant updates and patching of the Linux kernel on Ubuntu to a specific version that supported the necessary confidential computing features.

This phase confirmed that we could successfully configure a CVM and verify GPU attestation against NVIDIA’s documentation, giving us the green light to move to production hardware.

Phase 2: Scaling to Supermicro H100 Clusters

The production environment was significantly more powerful. We moved to configuring four Supermicro GPU SuperServer SYS-821GE-TNHR units. These are beasts of computation, designed for LLM training and inference, each equipped with eight NVIDIA H100 GPUs connected via SXM.

Enabling Confidential Computing on this specific architecture presented unique hurdles. We encountered boot issues when enabling Intel TDX (Trusted Domain Extensions).

Razor worked closely with engineers from both Intel and Supermicro to troubleshoot the problem. We discovered the issue lay in the firmware; by installing the correct firmware versions for both the BIOS and the GPUs, we successfully enabled the host server for Confidential Computing.

Automating Trust with Go

Validating a complex hardware stack manually is neither scalable nor secure. To streamline this, Razor developed a custom Go-based application to perform system-level checks on all components required for Confidential Computing.

This tool acts as the gatekeeper for the distributed cluster:

Validation: It generates a detailed host validation report and securely transmits it to the client’s control servers.
Deployment: If, and only if, validation succeeds, the system automatically downloads and launches a pre-built Confidential Virtual Machine.
Attestation: Inside the CVM, a secure service provides attestation endpoints. This allows the client to verify the integrity of the CVM itself, while GPU attestation is performed against NVIDIA services to confirm the trusted state of each H100 GPU.

The Result: Verified, Encrypted AI

By rigorously implementing and testing these layers, the client gained the ability to run workloads inside encrypted Confidential Virtual Machines backed by verified GPU attestation.

This architecture ensures that no host operator or external actor can access or tamper with customer data. The client can now deploy AI workloads to their distributed cluster with total confidence, knowing that the environment is cryptographically isolated and verified before a single byte of data is processed.

The security implications of this architecture are significant. According to Gartner's 2025 AI Privacy & Trust report, 73% of enterprise AI projects involve sensitive or regulated data, yet fewer than 18% of organisations have implemented hardware-level attestation or confidential computing controls. By deploying NVIDIA Confidential Computing with TEE-backed attestation, this implementation places the client in the top tier of AI security posture - a critical differentiator in regulated industries such as defence, financial services, and healthcare.

Next Steps

Are you looking to implement Confidential Computing for your AI infrastructure? Get in touch to see how we can help secure your compute stack.

Keep Reading

View all insights

Artificial Intelligence

The Era of Agentic Token Economics

Agentic AI is brilliant - but it is a token-burning furnace. Discover how elite engineering teams are taming runaway AI costs with custom SLMs, intelligent routing, and hard AI FinOps budgets.

Read story

Artificial Intelligence

The Connected Organisation - Insight Beyond the Shop Floor

Discover how to bridge the gap between the shop floor and the boardroom. Learn how Agentic AI transforms disconnected manufacturing data into a unified, actionable connected organisation.

Read story

Artificial Intelligence

What Every CEO Needs to Know About AI

Move beyond the hype of Generative AI to the reality of Agentic AI. This guide helps CEOs understand the shift from 'chat' to 'act' and how to leverage Intelligence Amplified for real business value.

Read story

Diving deeper into Razor

Your Next Move

Services

AI Activation Plus

Uniting comprehensive strategic understanding, clear roadmap planning, and immediate action. AI Activation Plus delivers a rigorous readiness assessment and immediately builds a working Proof of Value.

Read story

Services

AI Activation

Bridge the gap between AI ambition and operational reality. Rapidly identify high-value opportunities and leave with a clear, prioritised roadmap you can act on.

Read story