Sumcheck in GPU

Sumcheck Primer

  • Given a polynomial \(g: \mathbb{F}^\mu \rightarrow \mathbb{F}\) and \(X = \{x_i\}_{i \in [\mu]}\) compute the sum
\begin{aligned} H = \sum_{X \in B_\mu} g(x_1, x_2, \dots, x_\mu) \end{aligned}
  • Naively, a verifier would require \(|B_\mu|=2^\mu\) evaluations of \(g(.)\)
  • Sumcheck protocol requires \(\mathcal{O}(\mu + \lambda)\) verifier work
  • Here \(\lambda\) is the cost to evaluate \(g(.)\) at some \(r \in \mathbb{F}^{\mu}\)
  • Prover's work is \(\mathcal{O}(2^\mu)\), i.e. linear in no of constraints

\(g_1(\textcolor{orange}{X_1}) := \sum_{x_2\dots}g(\textcolor{orange}{X_1},x_2, \dots, x_m)\)

\(g_2(\textcolor{orange}{X_2}) := \sum_{x_3\dots}g(\textcolor{green}{r_1}, \textcolor{orange}{X_2}, x_3, \dots, x_m)\)

\(v \stackrel{?}{=} g_1(0) + g_1(1)\)

\(g_1(\textcolor{green}{r_1}) \stackrel{?}{=} g_2(0) + g_2(1)\)

\(g_3(\textcolor{orange}{X_3}) := \sum_{x_4\dots}g(\textcolor{green}{r_1}, \textcolor{green}{r_2}, \textcolor{orange}{X_3}, x_4, \dots, x_m)\)

\(g_\mu(\textcolor{orange}{X_\mu}) := g(\textcolor{green}{r_1}, \textcolor{green}{r_2}, \dots, \textcolor{green}{r_{\mu-1}}, \textcolor{orange}{X_\mu})\)

\(g_2(\textcolor{green}{r_2}) \stackrel{?}{=} g_3(0) + g_3(1)\)

\(g_{\mu-1}(\textcolor{green}{r_{\mu-1}}) \stackrel{?}{=} g_\mu(0) + g_\mu(1)\)

\(g_{\mu}(\textcolor{green}{r_{\mu}}) \stackrel{?}{=} g(\textcolor{green}{r_1}, \textcolor{green}{r_2}, \dots, \textcolor{green}{r_\mu})\)

Prover \(\mathcal{P}\)

Verifier \(\mathcal{V}\)

\(g_1\)

\(r_1\)

\(g_2\)

\(g_3\)

\(g_\mu\)

\(r_{\mu-1}\)

\(r_2\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

Sumcheck for MLE Polynomials

  • Given a vector \(\vec{f}\in \mathbb{F}^{2^\mu},\) we can write its (unique) multi-linear extension \(f(\mathbf{X})\) as:
f(\textcolor{lightgrey}{0, 0, \dots, 0, 0}) \leftarrow \textcolor{orange}{\vec{f}[0]}
f(\textcolor{lightgrey}{0, 0, \dots, 0,} \textcolor{skyblue}{1}) \leftarrow \textcolor{orange}{\vec{f}[1]}
f(\textcolor{lightgrey}{0, 0, \dots, } \textcolor{skyblue}{1} \textcolor{lightgrey}{,0}) \leftarrow \textcolor{orange}{\vec{f}[2]}
f(\textcolor{skyblue}{1, 1, \dots, 1} \textcolor{lightgrey}{,0}) \leftarrow \textcolor{orange}{\vec{f}[2^\mu-2]}
f(\textcolor{skyblue}{1, 1, \dots, 1, 1}) \leftarrow \textcolor{orange}{\vec{f}[2^\mu-1]}
\begin{aligned} f(\mathbf{X}) := & \ \textcolor{lightgrey}{(1-X_\mu)(1-X_{\mu-1})\dots}\textcolor{lightgrey}{(1-X_2)}\textcolor{lightgrey}{(1-X_1)}\textcolor{orange}{\vec{f}[0]} \ + \\ & \ \textcolor{lightgrey}{(1-X_\mu)(1-X_{\mu-1})\dots}\textcolor{lightgrey}{(1-X_2)}\textcolor{skyblue}{X_1}\textcolor{orange}{\vec{f}[1]} \ + \\ & \ \textcolor{lightgrey}{(1-X_\mu)(1-X_{\mu-1})\dots}\textcolor{skyblue}{X_2}\textcolor{lightgrey}{(1-X_1)}\textcolor{orange}{\vec{f}[2]} \ + \\ & \ \vdots \\ & \ \textcolor{skyblue}{X_\mu X_{\mu-1} \dots}\textcolor{skyblue}{X_2}\textcolor{lightgrey}{(1-X_1)}\textcolor{orange}{\vec{f}[2^\mu-2]} \ + \\ & \ \textcolor{skyblue}{X_\mu X_{\mu-1} \dots}\textcolor{skyblue}{X_2}\textcolor{skyblue}{X_1}\textcolor{orange}{\vec{f}[2^\mu-1]}. \\ \end{aligned}
\implies
\vdots
  • MLEs makes sumcheck prover's computation faster and parallelisable
  • Computing round polynomials is easy
\implies
\begin{aligned} r_1(X_\mu) = (1-X_\mu) \Big( \textcolor{lightgreen}{\vec{f}[0] + \vec{f}[1] + \dots + \vec{f}[2^{\mu-1}-1]} \Big) \end{aligned}
+ \ X_\mu \Big( \textcolor{orange}{\vec{f}[2^{\mu-1}] + \dots + \vec{f}[2^{\mu}-1]} \Big)

Sums of halves \(\rightarrow\) easy!

  • Updating original polynomial with challenge \(\alpha_1\) is tricky
f(\alpha_1\textcolor{lightgrey}{, 0, \dots, 0, 0}) \leftarrow (1-\alpha_1)\textcolor{lightgreen}{\vec{f}[0]} + \alpha_1\textcolor{orange}{\vec{f}[2^{\mu-1}]}
f(\alpha_1\textcolor{lightgrey}{, 0, \dots, 0, }\textcolor{skyblue}{1}) \leftarrow (1-\alpha_1)\textcolor{lightgreen}{\vec{f}[1]} + \alpha_1\textcolor{orange}{\vec{f}[2^{\mu-1}+1]}
f(\alpha_1\textcolor{lightgrey}{, 0, \dots, }\textcolor{skyblue}{1} \textcolor{lightgrey}{, 0}) \leftarrow (1-\alpha_1)\textcolor{lightgreen}{\vec{f}[2]} + \alpha_1\textcolor{orange}{\vec{f}[2^{\mu-1}+2]}

Sumcheck for MLE Polynomials

f(\mathbf{X})

Round computation can be parallelised

Round \(i+1\) depends on round \(i\) challenge

Need to process all terms to compute challenge

f_{1,o}
f_{1,e}
\sum
\sum
\textsf{hash}
\alpha_1
f_{2,o}
f_{2,e}
\sum
\sum
\textsf{hash}
\alpha_2
f_{3,o}
f_{3,e}
\sum
\sum
\textsf{hash}
\alpha_3
f_{4,o}
f_{4,e}
\textsf{hash}
\alpha_4

Sumcheck for R1CS

\begin{bmatrix} \\ & \bar{A} & \\ \\ \end{bmatrix} \hspace{-6pt} \begin{bmatrix} \\ \vec{z}\\ \\ \end{bmatrix} \circ \begin{bmatrix} \\ & \bar{B} & \\ \\ \end{bmatrix} \hspace{-6pt} \begin{bmatrix} \\ \vec{z}\\ \\ \end{bmatrix} = \begin{bmatrix} \\ & \bar{C} & \\ \\ \end{bmatrix} \hspace{-6pt} \begin{bmatrix} \\ \vec{z}\\ \\ \end{bmatrix}
\underbrace{\hspace{2.6cm}}
2^\mu
F(x) := \Bigg(\sum_{y \in \mathbb{B}_\mu}A(x,y)z(y)\Bigg)\Bigg(\sum_{y \in \mathbb{B}_\mu}B(x,y)z(y)\Bigg) - \sum_{y \in \mathbb{B}_\mu}C(x,y)z(y)
\sum_{x\in \mathbb{B}_\mu}\gamma_x F(x) = 0
\implies F(x) = 0 \quad \forall x \in \mathbb{B}_\mu

We need to prove that \(F\) is 0 on all points of \(\mathbb{B}_\mu\)

G(x) := \left(\Bigg(\sum_{y \in \mathbb{B}_\mu}A(x,y)z(y)\Bigg)\Bigg(\sum_{y \in \mathbb{B}_\mu}B(x,y)z(y)\Bigg) - \sum_{y \in \mathbb{B}_\mu}C(x,y)z(y)\right) \textcolor{orange}{\textsf{eq}(x, \tau)}
G(x) := a(x)b(x)\textsf{eq}(x) - c(x)\textsf{eq}(x)
  • We need sumcheck on sums of products of MLE polynomials
  • Trivial to extend sumcheck for one polynomial to sums of products
A: \mathbb{F}^{2 \mu} \rightarrow \mathbb{F}
B: \mathbb{F}^{2 \mu} \rightarrow \mathbb{F}
C: \mathbb{F}^{2 \mu} \rightarrow \mathbb{F}
z: \mathbb{F}^{\mu} \rightarrow \mathbb{F}
\textsf{combine}_G(a,b,c,d) := abd - cd
f_1(\mathbf{X})
f_2(\mathbf{X})
f_3(\mathbf{X})
f_4(\mathbf{X})
f_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
f_1(\mathbf{X})
f_2(\mathbf{X})
f_3(\mathbf{X})
f_4(\mathbf{X})
f_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
(1-\alpha_i)
(\alpha_i)
+
f_1(\mathbf{X})
f_2(\mathbf{X})
f_3(\mathbf{X})
f_4(\mathbf{X})
f_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
f_1(\mathbf{X})
f_2(\mathbf{X})
f_3(\mathbf{X})
f_4(\mathbf{X})
f_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
L_{(1-\alpha, \alpha)}
f_1(\mathbf{X})
f_2(\mathbf{X})
f_3(\mathbf{X})
f_4(\mathbf{X})
f_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
f'_1(\mathbf{X})
f'_2(\mathbf{X})
f'_3(\mathbf{X})
f'_4(\mathbf{X})
f'_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
f'_1(\mathbf{X})
f'_2(\mathbf{X})
f'_3(\mathbf{X})
f'_4(\mathbf{X})
f'_5(\mathbf{X})
f_{1,o}
f_{1,e}
f_{2,o}
f_{2,e}
f_{3,o}
f_{3,e}
f_{4,o}
f_{4,e}
f_{5,o}
f_{5,e}
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
\sum
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_{i+1}(X)
\textsf{hash}
\alpha_{i+1}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

*This isn't quite what we need, we need to show that independent sumchecks sum to a certain sum.

L2 Cache

r_1^{(1)}(X)
\alpha_1^{(1)}

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

r_1^{(2)}(X)
\alpha_1^{(2)}

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

r_1^{(3)}(X)
\alpha_1^{(3)}

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

Not the most optimal use of L2 cache

Verifier generates \((k + n)\) challenges

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell
r_1^{(2)}(X)
\alpha_1^{(2)}
r_1^{(1)}(X)
\alpha_1^{(1)}

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell
r_1^{(2)}(X)
\alpha_1^{(2)}
r_1^{(1)}(X)
\alpha_1^{(1)}

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell

Sumcheck in GPU

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

L2 Cache

\underbrace{\hspace{4.25cm}}
\ell
r_1^{(4)}(X)
\alpha_1^{(4)}
r_1^{(3)}(X)
\alpha_1^{(3)}
r_1^{(5)}(X)
\alpha_1^{(5)}

Relocation of data within L2 cache

Optimal usage of L2 cache

Sumcheck in GPU

Verifier generates \((2k + n)\) challenges

  • Main hurdle: L2 cache size is just 96MB
  • Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
  • For larger instances, we need to read and write to memory to update the state
  • Alternatively, we can break a large sumcheck instance into several smaller instances
\begin{aligned} \textsf{sumcheck}_{2^{21}} \begin{pmatrix} & & \\ & \\ \end{pmatrix} = &\ \textsf{sumcheck}_{2^{19}}(\quad) + \beta \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \\ &\ \beta^2 \cdot \textsf{sumcheck}_{2^{19}}(\quad) + \beta^3 \cdot \textsf{sumcheck}_{2^{19}}(\quad) \end{aligned}

Path Forward

  • Idea 1: Independent sumchecks
    • Still need to show \(\sum_j c_j = c\) - additional overhead
    • Better ideas to show that?
  • Idea 2: Multi-variate round polynomials
    • First round polynomial:
      • \(r_1(X) = \sum_{x_j \in \{0,1\}, j \neq 1}f(X, x_2, x_3, \dots, x_\mu)\)
    • What if round polynomials were bi-variate polynomials:
      • \(r_1(X, Y) = \sum_{x_j \in \{0,1\}, j \neq 1, 2}f(X, Y, x_3, x_4, \dots, x_\mu)\)
  • Idea 3: Bulletproofs-like folding
    • Convert a sumcheck problem to a bulletproofs problem (lot of MSMs)
    • Run \(k\) bulletproofs rounds until sumcheck size is \(\le 2^l\)
  • Idea 4: Ternary hypercube
    • Will help reduce rounds in sumcheck but is that useful to us?
f_1(\mathbf{X})
f_2(\mathbf{X})
f_3(\mathbf{X})
f_4(\mathbf{X})
f_5(\mathbf{X})
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
L(1,0)
L(0,1)
L(-1,2)
L(-2,3)
L(-3,4)
L(-4,5)
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
\textsf{combine}
r_i(X)
\textsf{hash}
\alpha_i
\sum
\sum
\sum
\sum
\sum
\sum

Sumcheck in GPU

By Suyash Bagad