Secure AI inference techniques:
A tutorial

Chen-Mou Cheng

Chang Gung University

September 28, 2024

Outline

  • Secure AI inference
  • TEE: Trusted Execution Environment
  • FHE: Fully Homomorphic Encryption
  • MPC: (secure) Multi-Party Computation
  • ZKP: Zero-Knowledge Proof

Secure AI inference

  • ChatGPT is great!  Love asking it all kinds of questions
  • How can I ask ChatGPT a private question without telling OpenAI the secret?

TEE

  • Hardware isolation for secure execution

    • Ex: Intel SGX, ARM TrustZone

  • Low latency but limited scalability

Ex: LLMaaS (with SGX)

  • https://ieeexplore.ieee.org/document/10601537
  • Model Owner doesn't trust Cloud (but Data Owner does)
    • Break up Transformer computation
    • Run each piece inside Enclave trusted by Model Owner
  • Pros: Straightforward, very little overhead
  • Cons: Scalability to larger models & GPU/ASIC

LLMaaS workflow

FHE

  • Computation on encrypted data

  • Strong privacy but very high overhead

​$$\tilde T\text{ is such that }\forall m,d.\,\text{Dec}_{sk}\left(\tilde T_m\big(\text{Enc}_{pk}(d)\big)\right)=T_m(d)$$

Data Owner

$$\begin{aligned}c_1 & \leftarrow \text{Enc}_{pk}(\color{blue}d\color{black}) \\ r & \leftarrow\text{Dec}_{\color{blue}sk\color{black}}(c_2)\end{aligned}$$

$$\xrightarrow{\hspace*{2em}c_1\hspace*{2em}}$$

$$\xleftarrow{\hspace*{2em}c_2\hspace*{2em}}$$

Model Owner

$$\begin{aligned}c_2\leftarrow\tilde T_{\color{red}m\color{black}}(c_1)\end{aligned}$$

Source: https://ieeexplore.ieee.org/document/9936637

MPC

  • Joint computation while keeping inputs private

    • Ex: Garbled Circuit

  • High computational+communication overhead

Data Owner

$$\color{blue}d\color{black}$$

$$\downarrow$$

$$T_{\color{red}m\color{black}}(\color{blue}d\color{black})$$

Model Owner $$\color{red}m\color{black}$$

$$\xrightarrow{\hspace*{2em}c_1\hspace*{2em}}$$

$$\xleftarrow{\hspace*{2em}c_2\hspace*{2em}}$$

$$\xrightarrow{\hspace*{2em}c_3\hspace*{2em}}$$

$$\xleftarrow{\hspace*{2em}c_4\hspace*{2em}}$$

$$\hspace*{2.5em}\vdots\hspace*{2.5em}$$

Garbled Circuit

$$\xrightarrow{\hspace*{1em}\begin{array}{|c|}\hline \text{Garbled Table} \\\hline\hline \text{Enc}_{\color{blue}X_0^a,X_0^b\color{black}}(\color{blue}X_{f(0,0)}^c\color{black}) \\\hline \text{Enc}_{\color{blue}X_0^a,X_1^b\color{black}}(\color{blue}X_{f(0,1)}^c\color{black}) \\\hline \text{Enc}_{\color{blue}X_1^a,X_0^b\color{black}}(\color{blue}X_{f(1,0)}^c\color{black}) \\\hline \text{Enc}_{\color{blue}X_1^a,X_1^b\color{black}}(\color{blue}X_{f(1,1)}^c\color{black}) \\\hline\end{array}\hspace*{1em}}$$

$$\xrightarrow{\hspace*{4.5em}X_1^a\hspace*{4.5em}}$$

$$\xrightarrow{\hspace*{3em}\text{OT}\left(\color{blue}X_0^b\color{black},\color{blue}X_1^b\color{black}\right)\hspace*{3em}}$$

$$\xleftarrow{\hspace*{4em}X_{f(1,0)}^c\hspace*{4em}}$$

$$\xrightarrow{\hspace*{2.5em}f(1,0)\text{ (optional)}\hspace*{2.5em}}$$

Data Owner

$$\color{blue}a=1\color{black}$$

Model Owner $$\color{red}b=0\color{black}$$

$$\text{Dec}_{X_1^a,\color{red}X_0^b\color{black}}(?)$$

Ex: MARILL

  • https://arxiv.org/abs/2408.03561
  • Hack fine-tuning to minimize MPC in inference
    • Layer freezing
    • LoRA to reduce matrix dimensions in MPC
    • Head merging instead of pruning in MPC

$$\text{head}_j=\text{softmax}\left(\frac{\sum_{\ell=jm}^{(j+1)m}Q_\ell K_\ell^T}{\sqrt{md}}\right)\left(V_{jm}||\ldots||V_{(j+1)m}\right)\in\mathbb R^{b\times md}$$

  • Vs. standard fine-tuned model in inference
    • \(\text{3.6--11.3}\times\) better runtime
    • \(\text{2.4--6.9}\times\) better communication

State of the art in 2023

MARILL workflow

MARILL techniques

Inference performance

ZKP

  • Learns nothing beyond "Statement S is true"
    • Consider special S: \( y=f(x,\color{red}w\color{black}) \)
    • Can simulate the dialogues without knowing \(\color{red}w\color{black}\)
  • Ex: Colorblind test with two balls of different colors

Serialization via commitment

(How to play rock-paper-scissors over internet)

  • Cryptographic commitment
    • Commit: \( c=\text{commit}(r,m) \)
    • Verify: \( \text{verify}\Big(r,m,c\Big)? \)
  • Security properties
    • Hiding: difficult to find \( m \) given \( c=\text{commit}\left(r,m\right) \)
    • Binding: difficult to find \( r',m'\neq m \) s.t. \( \text{verify}\Big(r',m',\text{commit}(r,m)\Big) \)

Proving \( f(x)=y \)

  • Syntax of (polynomial) functional commitment
    • \( c=\text{commit}(r,f) \)
    • \( (y,\pi)=\text{eval}(r,f,x) \)
    • \( \text{verify}\Big(c,x,y,\pi\Big) \) iff \( \exists r\text{ s.t. }c=\text{commit}(r,f) \) and \( f(x)=y \)
  • Example (Merkle tree)
    • Leaves \( y_0,y_1,\ldots,y_{n-1} \) define \( f:\Big\{0,1,\ldots,n-1\Big\}\rightarrow Y \)
    • Authentication path encodes (binary expansion of) \( i\in\Big\{0,1,\ldots,n-1\Big\} \) and thus proves \( f(i)=y_i \)

Toy ZKP example

  • Prover: "I know integers \( \color{red}p\color{black},\color{red}q\color{black} \) s.t. \( n=\color{red}pq\color{black} \)"
  • Prover commits to three polynomial functions: \[ \color{red}r_p\color{black}X+\color{red}p\color{black},\color{red}r_q\color{black}X+\color{red}q\color{black},\text{ and }\color{red}r_pr_q\color{black}X^2+(\color{red}r_pq\color{black}+\color{red}pr_q\color{black})X+n \]
  • Verifier challenges Prover with random \( r \) and checks whether \( (\color{red}r_p\color{black}r+\color{red}p\color{black})(\color{red}r_q\color{black}r+\color{red}q\color{black})\stackrel{?}{=}\color{red}r_pr_q\color{black}r^2+(\color{red}r_pq\color{black}+\color{red}pr_q\color{black})r+n \)
  • Lemma [Schwartz-Zippel] \[ \text{Pr}_{r\in k}\Big(f(r)=g(r)\Big)\leq\frac{d}{|k|}\text{ for }f\neq g\text{ with }\deg f,\deg g\leq d \]

Ex: zkLLM

Text

Text

$$\text{Attention}(\mathbf Q,\mathbf K,\mathbf V):=\text{Softmax}\left(\frac{\mathbf{QK}^T}{\sqrt d}\right)\mathbf V$$

Thank you!

Questions or comments?

Secure inference

By Chen-Mou Cheng

Secure inference

  • 121