Continuum AI

Use LLMs, keep data private.

Continuum protects user prompts from the AI provider, and prevents model weights from being leaked.

What is Continuum?

Continuum is a framework for deploying LLMs and other AI models. It enables the creation of ChatGPT-style services in which both user prompts and model weights are shielded throughout.

With Continuum, the infrastructure and the service provider can never access user prompts and model weights in plaintext.

Continuum integrates with well-known inference servers like NVIDIA Triton, vLLM or Hugging Face TGI.

Continuum will be released as open source in H2/2024.

How does it work exactly?

Why Continuum?

For model owners

Protect your weights while deploying to untrusted environments.

Provide model weights as encrypted files. Only attested workloads gain access to decrypted weights.

For service providers

By deploying LLMs with Continuum, user requests and responses are kept inaccessible to the inference service provider and infrastructure.

With Continuum, you can offer best-in-class privacy guarantees to users of your platform.

Demonstration

In the video you can see how the user experience may be augmented with the ability to verify a service is running on Continuum.

Continuum comes with different SDKs that allow you to integrate verification in a way that best fits your application.

Why not just ___ ?

Use TLS encryption

When encrypting network traffic with TLS, the service provider still has access to prompts and weights on their VMs. Malicious or defective workloads can leak prompts and models e.g. by sending them over the network.

Use disk encryption

When encrypting disk with e.g. LUKS, the service provider has access to the disk decryption key. Malicious or defective workloads can leak model weights e.g. by sending them over the network.

Run on premise

When running on premise, user prompts and weights are always accessible by the machine and service operator.