Secure AI inference techniques:
A tutorial

Chen-Mou Cheng

Chang Gung University

September 28, 2024

Outline

Secure AI inference
TEE: Trusted Execution Environment
FHE: Fully Homomorphic Encryption
MPC: (secure) Multi-Party Computation
ZKP: Zero-Knowledge Proof

Secure AI inference

ChatGPT is great! Love asking it all kinds of questions
How can I ask ChatGPT a private question without telling OpenAI the secret?

TEE

Hardware isolation for secure execution
- Ex: Intel SGX, ARM TrustZone
Low latency but limited scalability

Ex: LLMaaS (with SGX)

https://ieeexplore.ieee.org/document/10601537
Model Owner doesn't trust Cloud (but Data Owner does)
- Break up Transformer computation
- Run each piece inside Enclave trusted by Model Owner
Pros: Straightforward, very little overhead
Cons: Scalability to larger models & GPU/ASIC

LLMaaS workflow

FHE

Computation on encrypted data
Strong privacy but very high overhead

$$\tilde T\text{ is such that }\forall m,d.\,\text{Dec}_{sk}\left(\tilde T_m\big(\text{Enc}_{pk}(d)\big)\right)=T_m(d)$$

Data Owner

$$\begin{aligned}c_1 & \leftarrow \text{Enc}_{pk}(\color{blue}d\color{black}) \\ r & \leftarrow\text{Dec}_{\color{blue}sk\color{black}}(c_2)\end{aligned}$$

$$\xrightarrow{\hspace*{2em}c_1\hspace*{2em}}$$

$$\xleftarrow{\hspace*{2em}c_2\hspace*{2em}}$$

Model Owner

$$\begin{aligned}c_2\leftarrow\tilde T_{\color{red}m\color{black}}(c_1)\end{aligned}$$

Source: https://ieeexplore.ieee.org/document/9936637

MPC

Joint computation while keeping inputs private
- Ex: Garbled Circuit
High computational+communication overhead

Data Owner

$$\color{blue}d\color{black}$$

$$\downarrow$$

$$T_{\color{red}m\color{black}}(\color{blue}d\color{black})$$

Model Owner $$\color{red}m\color{black}$$

$$\xrightarrow{\hspace*{2em}c_1\hspace*{2em}}$$

$$\xleftarrow{\hspace*{2em}c_2\hspace*{2em}}$$

$$\xrightarrow{\hspace*{2em}c_3\hspace*{2em}}$$

$$\xleftarrow{\hspace*{2em}c_4\hspace*{2em}}$$

$$\hspace*{2.5em}\vdots\hspace*{2.5em}$$

Garbled Circuit

$$\xrightarrow{\hspace*{1em}\begin{array}{|c|}\hline \text{Garbled Table} \\\hline\hline \text{Enc}_{\color{blue}X_0^a,X_0^b\color{black}}(\color{blue}X_{f(0,0)}^c\color{black}) \\\hline \text{Enc}_{\color{blue}X_0^a,X_1^b\color{black}}(\color{blue}X_{f(0,1)}^c\color{black}) \\\hline \text{Enc}_{\color{blue}X_1^a,X_0^b\color{black}}(\color{blue}X_{f(1,0)}^c\color{black}) \\\hline \text{Enc}_{\color{blue}X_1^a,X_1^b\color{black}}(\color{blue}X_{f(1,1)}^c\color{black}) \\\hline\end{array}\hspace*{1em}}$$

$$\xrightarrow{\hspace*{4.5em}X_1^a\hspace*{4.5em}}$$

$$\xrightarrow{\hspace*{3em}\text{OT}\left(\color{blue}X_0^b\color{black},\color{blue}X_1^b\color{black}\right)\hspace*{3em}}$$

$$\xleftarrow{\hspace*{4em}X_{f(1,0)}^c\hspace*{4em}}$$

$$\xrightarrow{\hspace*{2.5em}f(1,0)\text{ (optional)}\hspace*{2.5em}}$$

Data Owner

$$\color{blue}a=1\color{black}$$

Model Owner $$\color{red}b=0\color{black}$$

$$\text{Dec}_{X_1^a,\color{red}X_0^b\color{black}}(?)$$

Ex: MARILL

https://arxiv.org/abs/2408.03561
Hack fine-tuning to minimize MPC in inference
- Layer freezing
- LoRA to reduce matrix dimensions in MPC
- Head merging instead of pruning in MPC

$$\text{head}_j=\text{softmax}\left(\frac{\sum_{\ell=jm}^{(j+1)m}Q_\ell K_\ell^T}{\sqrt{md}}\right)\left(V_{jm}||\ldots||V_{(j+1)m}\right)\in\mathbb R^{b\times md}$$

Vs. standard fine-tuned model in inference
- $\text{3.6--11.3}\times$ better runtime
- $\text{2.4--6.9}\times$ better communication

State of the art in 2023

https://eprint.iacr.org/2023/1678

MARILL workflow

MARILL techniques

Inference performance

ZKP

Learns nothing beyond "Statement S is true"
- Consider special S: $ y=f(x,\color{red}w\color{black}) $
- Can simulate the dialogues without knowing $\color{red}w\color{black}$
Ex: Colorblind test with two balls of different colors

Serialization via commitment

(How to play rock-paper-scissors over internet)

Cryptographic commitment
- Commit: $ c=\text{commit}(r,m) $
- Verify: $ \text{verify}\Big(r,m,c\Big)? $
Security properties
- Hiding: difficult to find $ m $ given $ c=\text{commit}\left(r,m\right) $
- Binding: difficult to find $ r',m'\neq m $ s.t. $ \text{verify}\Big(r',m',\text{commit}(r,m)\Big) $

Proving $ f(x)=y $

Syntax of (polynomial) functional commitment
- $ c=\text{commit}(r,f) $
- $ (y,\pi)=\text{eval}(r,f,x) $
- $ \text{verify}\Big(c,x,y,\pi\Big) $ iff $ \exists r\text{ s.t. }c=\text{commit}(r,f) $ and $ f(x)=y $
Example (Merkle tree)
- Leaves $ y_0,y_1,\ldots,y_{n-1} $ define $ f:\Big\{0,1,\ldots,n-1\Big\}\rightarrow Y $
- Authentication path encodes (binary expansion of) $ i\in\Big\{0,1,\ldots,n-1\Big\} $ and thus proves $ f(i)=y_i $

Toy ZKP example

Prover: "I know integers $ \color{red}p\color{black},\color{red}q\color{black} $ s.t. $ n=\color{red}pq\color{black} $"
Prover commits to three polynomial functions: \[ \color{red}r_p\color{black}X+\color{red}p\color{black},\color{red}r_q\color{black}X+\color{red}q\color{black},\text{ and }\color{red}r_pr_q\color{black}X^2+(\color{red}r_pq\color{black}+\color{red}pr_q\color{black})X+n \]
Verifier challenges Prover with random $ r $ and checks whether $ (\color{red}r_p\color{black}r+\color{red}p\color{black})(\color{red}r_q\color{black}r+\color{red}q\color{black})\stackrel{?}{=}\color{red}r_pr_q\color{black}r^2+(\color{red}r_pq\color{black}+\color{red}pr_q\color{black})r+n $
Lemma [Schwartz-Zippel] \[ \text{Pr}_{r\in k}\Big(f(r)=g(r)\Big)\leq\frac{d}{|k|}\text{ for }f\neq g\text{ with }\deg f,\deg g\leq d \]

Ex: zkLLM

https://arxiv.org/abs/2404.16109
Guarantee authenticity of LLM outputs

Text

$$\text{Attention}(\mathbf Q,\mathbf K,\mathbf V):=\text{Softmax}\left(\frac{\mathbf{QK}^T}{\sqrt d}\right)\mathbf V$$

Secure AI inference techniques:
A tutorial

Chen-Mou Cheng

Outline

Secure AI inference

TEE

Ex: LLMaaS (with SGX)

LLMaaS workflow

FHE

MPC

Garbled Circuit

Ex: MARILL

State of the art in 2023

MARILL workflow

MARILL techniques

Inference performance

ZKP

Serialization via commitment

Proving \( f(x)=y \)

Toy ZKP example

Ex: zkLLM

Thank you!

Questions or comments?

Secure inference

Secure inference

Chen-Mou Cheng

Secure AI inference techniques: A tutorial

Chen-Mou Cheng

Outline

Secure AI inference

TEE

Ex: LLMaaS (with SGX)

LLMaaS workflow

FHE

MPC

Garbled Circuit

Ex: MARILL

State of the art in 2023

MARILL workflow

MARILL techniques

Inference performance

ZKP

Serialization via commitment

Proving \( f(x)=y \)

Toy ZKP example

Ex: zkLLM

Thank you!

Questions or comments?

Secure inference

More from Chen-Mou Cheng

Secure AI inference techniques:
A tutorial