Nvidia Hopper H100
Nvidia's Hopper H100 is the first AI accelerator (or GPU) to incorporate comprehensive confidential-computing features. This enables the creation of state-of-the-art "Confidential AI" applications. Prior to the H100, this was not possible, as confidential-computing features were only available in CPUs from AMD and Intel. The H100 already reached the the market in late 2022. However, only the Nvidia CUDA Toolkit 12.2 update from July 2023 adds software support for confidential computing.
Technical details
For remote attestation, every H100 possesses a unique private key that is "burned into the fuses" at production time. For the corresponding public key, Nvidia's certificate authority issues a certificate. Abstractly, this is also how it's done for confidential computing-enabled CPUs from Intel and AMD.
During boot, the GPU measures ("hashes") its firmware. For remote attestation, the GPU signs its firmware measurement together with some Diffie-Hellmann parameters. Typically, this is verified by the CUDA driver in the confidential VM (CVM) that the GPU is attached to. Verification can be done offline, as well as with the help of the Nvidia Remote Attestation Services (NRAS). After successful verification, the verifier uses the Diffie-Hellmann parameters to establish a shared symmetric session key with the GPU.
As the CPU prevents the GPU from directly accessing a CVM's memory, CVM and GPU communicate via a shared memory region outside the CVM. They use AES-GCM with the session key to protect this communication against the host system. The GPU transparently copies and decrypts all inputs to its internal memory. From then onwards, everything runs in plaintext inside the GPU. This encrypted communication between CVM and GPU appears to be the main source of overhead.
Further, a H100 in confidential-computing mode will block direct access to its internal memory and disable performance counters, which could be used for side-channel attacks.
Together, remote attestation, encrypted communication, and memory isolation provide everything that's required to extend a confidential-computing environment from a CVM or a secure enclave to a GPU. However, note that the H100 does not implement runtime encryption for its on-card memory.