Continuum AI is now public. Try out the most secure GenAI service!
Blog
Felix Schuster
Everyone is talking about AI, and we all have by now witnessed the magic that LLMs are capable of. In this blog post, I'm taking a closer look at how AI and confidential computing fit together. I'll explain the basics of "Confidential AI" and describe the three big use cases that I see:
It may be surprising, but until very recently, "Confidential AI" didn't really exist. This was because confidential-computing environments simply couldn't be extended to AI accelerators. There was no way for a secure enclave or Confidential VM (CVM) to establish trust into an accelerator, i.e., a GPU, and bootstrap a secure channel to it. A malicious host system could always do a man-in-the-middle attack and intercept and alter any communication to and from a GPU. Thus, confidential computing couldn't practically be applied to anything involving deep neural networks or large language models (LLMs).
This has changed now: Nvidia's latest Hopper H100 GPUs come with comprehensive confidential-computing features. Nvidia announced and detailed these features earlier this year for example in presentations at the Open Confidential Computing Conference (OC3) and the Confidential Computing Summit (CCS). The GPUs are already in the market since late 2022. However, only the Nvidia CUDA Toolkit 12.2 update from July 2023 adds software support for confidential computing.
In the following, I'll give a technical summary of how Nvidia implements confidential computing. If you're more interested in the use cases, you may want to skip ahead to the "Use cases for Confidential AI" section.
Nvidia's whitepaper gives an overview of the confidential-computing capabilities of the H100 and some technical details. Here's my brief summary of how the H100 implements confidential computing. All in all, there are no surprises.
For remote attestation, every H100 possesses a unique private key that is "burned into the fuses" at production time. For the corresponding public key, Nvidia's certificate authority issues a certificate. Abstractly, this is also how it's done for confidential computing-enabled CPUs from Intel and AMD.
During boot, the GPU measures ("hashes") its firmware. For remote attestation, the GPU signs its firmware measurement together with some Diffie-Hellmann parameters. Typically, this is verified by the CUDA driver in the CVM that the GPU is attached to. Verification can be done offline, as well as with the help of the Nvidia Remote Attestation Services (NRAS). After successful verification, the verifier uses the Diffie-Hellmann parameters to establish a shared symmetric session key with the GPU. Easy...
As the CPU prevents the GPU from directly accessing a CVM's memory, CVM and GPU communicate via a shared memory region outside the CVM. They use AES-GCM with the session key to protect this communication against the host system. The GPU transparently copies and decrypts all inputs to its internal memory. From then onwards, everything runs in plaintext inside the GPU. This encrypted communication between CVM and GPU appears to be the main source of overhead.
Further, an H100 in confidential-computing mode will block direct access to its internal memory and disable performance counters, which could be used for side-channel attacks.
Together, remote attestation, encrypted communication, and memory isolation provide everything that's required to extend a confidential-computing environment from a CVM or a secure enclave to a GPU.
With the foundations out of the way, let's take a look at the use cases that Confidential AI enables.
Secure outsourcing of AI workloads
First and probably foremost, we can now comprehensively protect AI workloads from the underlying infrastructure. For example, this enables companies to outsource AI workloads to an infrastructure they can't or don't want to fully trust. Think of a bank or a government institution outsourcing AI workloads to a cloud provider. There are several reasons why outsourcing can make sense. One of them is that it's difficult and expensive to acquire larger amounts of AI accelerators for on-prem use.
IP protection for AI models
A use case related to this is intellectual property (IP) protection for AI models. This can be important when a valuable proprietary AI model is deployed to a customer site or it is physically integrated into a 3rd party offering. With Confidential AI, an AI model can be deployed in such a way that it can be invoked but not copied or altered. For example, Confidential AI could make on-prem or edge deployments of the highly valuable ChatGPT model possible.
Privacy-preserving AI training and inference
In general, confidential computing enables the creation of "black box" systems that verifiably preserve privacy for data sources. This works roughly as follows: Initially, some software X is designed to keep its input data private. X is then run in a confidential-computing environment. Data sources use remote attestation to check that it really is the right instance of X they are talking to before providing their inputs. If X is designed correctly, the sources have assurance that their data will remain private. Note that this is only a rough sketch. See our whitepaper on the foundations of confidential computing for a more in-depth explanation and examples.
With confidential computing-enabled GPUs (CGPUs), one can now create a software X that efficiently performs AI training or inference and verifiably keeps its input data private. For example, one could build a "privacy-preserving ChatGPT" (PP-ChatGPT) where the web frontend runs inside CVMs and the GPT AI model runs on securely connected CGPUs. Users of this application could verify the identity and integrity of the system via remote attestation, before setting up a secure connection and sending queries. If the system has been constructed well, the users would have high assurance that neither OpenAI (the company behind ChatGPT) nor Azure (the infrastructure provider for ChatGPT) could access their data. This would address a common concern that enterprises have with SaaS-style AI applications like ChatGPT.
Similarly, one can create a software X that trains an AI model on data from multiple sources and verifiably keeps that data private. This way, individuals and companies can be encouraged to share sensitive data. This also helps with compliance. For example, if the data contains personal identifiable information (PII) and therefore it would normally need to be anonymized before training, which may degrade data quality.
This enables new business models, where data is "rented out" (or "donated") repeatedly for AI training, without anyone learning the actual data.
Given the above, a natural question is: How do users of our imaginary PP-ChatGPT and other privacy-preserving AI apps know if "the system was constructed well"? Probably the simplest answer is: If the entire software is open source, then users can review it and convince themselves that an app does indeed preserve privacy.
However, it's largely impractical for users to review a SaaS application's code before using it. But there are solutions to this. At Edgeless Systems, for instance, we ensure that our software builds are reproducible, and we publish the hashes of our software on the public transparency-log of the sigstore project. With this mechanism, we publicly commit to each new release of our product Constellation. If we did the same for PP-ChatGPT, most users probably would just want to ensure that they were talking to a recent "official" build of the software running on proper confidential-computing hardware and leave the actual review to security experts.
If you are interested in additional mechanisms to help users establish trust in a confidential-computing app, check out the talk from Conrad Grobler (Google) at OC3 2023.
By enabling comprehensive confidential-computing features in their professional H100 GPU, Nvidia has opened an exciting new chapter for confidential computing and AI. Finally, it's possible to extend the magic of confidential computing to complex AI workloads. I see huge potential for the use cases described above and can't wait to get my hands on an enabled H100 in one of the clouds. At Edgeless Systems, we're already working on adding support for CGPUs to our "confidential Kubernetes" Constellation. With that, you'll be able run end-to-end confidential AI workloads while enjoying the scale and flexibility of Kubernetes. We'll share more about that soon.
Author: Felix Schuster