Sumcheck in GPU
Sumcheck Primer
- Given a polynomial \(g: \mathbb{F}^\mu \rightarrow \mathbb{F}\) and \(X = \{x_i\}_{i \in [\mu]}\) compute the sum
- Naively, a verifier would require \(|B_\mu|=2^\mu\) evaluations of \(g(.)\)
- Sumcheck protocol requires \(\mathcal{O}(\mu + \lambda)\) verifier work
- Here \(\lambda\) is the cost to evaluate \(g(.)\) at some \(r \in \mathbb{F}^{\mu}\)
- Prover's work is \(\mathcal{O}(2^\mu)\), i.e. linear in no of constraints
\(g_1(\textcolor{orange}{X_1}) := \sum_{x_2\dots}g(\textcolor{orange}{X_1},x_2, \dots, x_m)\)
\(g_2(\textcolor{orange}{X_2}) := \sum_{x_3\dots}g(\textcolor{green}{r_1}, \textcolor{orange}{X_2}, x_3, \dots, x_m)\)
\(v \stackrel{?}{=} g_1(0) + g_1(1)\)
\(g_1(\textcolor{green}{r_1}) \stackrel{?}{=} g_2(0) + g_2(1)\)
\(g_3(\textcolor{orange}{X_3}) := \sum_{x_4\dots}g(\textcolor{green}{r_1}, \textcolor{green}{r_2}, \textcolor{orange}{X_3}, x_4, \dots, x_m)\)
\(g_\mu(\textcolor{orange}{X_\mu}) := g(\textcolor{green}{r_1}, \textcolor{green}{r_2}, \dots, \textcolor{green}{r_{\mu-1}}, \textcolor{orange}{X_\mu})\)
\(g_2(\textcolor{green}{r_2}) \stackrel{?}{=} g_3(0) + g_3(1)\)
\(g_{\mu-1}(\textcolor{green}{r_{\mu-1}}) \stackrel{?}{=} g_\mu(0) + g_\mu(1)\)
\(g_{\mu}(\textcolor{green}{r_{\mu}}) \stackrel{?}{=} g(\textcolor{green}{r_1}, \textcolor{green}{r_2}, \dots, \textcolor{green}{r_\mu})\)
Prover \(\mathcal{P}\)
Verifier \(\mathcal{V}\)
\(g_1\)
\(r_1\)
\(g_2\)
\(g_3\)
\(g_\mu\)
\(r_{\mu-1}\)
\(r_2\)
\(\vdots\)
\(\vdots\)
\(\vdots\)
Sumcheck for MLE Polynomials
- Given a vector \(\vec{f}\in \mathbb{F}^{2^\mu},\) we can write its (unique) multi-linear extension \(f(\mathbf{X})\) as:
- MLEs makes sumcheck prover's computation faster and parallelisable
- Computing round polynomials is easy
Sums of halves \(\rightarrow\) easy!
- Updating original polynomial with challenge \(\alpha_1\) is tricky
Sumcheck for MLE Polynomials
Round computation can be parallelised
Round \(i+1\) depends on round \(i\) challenge
Need to process all terms to compute challenge
Sumcheck for R1CS
We need to prove that \(F\) is 0 on all points of \(\mathbb{B}_\mu\)
- We need sumcheck on sums of products of MLE polynomials
- Trivial to extend sumcheck for one polynomial to sums of products
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
*This isn't quite what we need, we need to show that independent sumchecks sum to a certain sum.
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Not the most optimal use of L2 cache
Verifier generates \((k + n)\) challenges
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Relocation of data within L2 cache
Optimal usage of L2 cache
Sumcheck in GPU
Verifier generates \((2k + n)\) challenges
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size \(\le 2^{19}\)
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
Path Forward
-
Idea 1: Independent sumchecks
- Still need to show \(\sum_j c_j = c\) - additional overhead
- Better ideas to show that?
-
Idea 2: Multi-variate round polynomials
- First round polynomial:
- \(r_1(X) = \sum_{x_j \in \{0,1\}, j \neq 1}f(X, x_2, x_3, \dots, x_\mu)\)
- What if round polynomials were bi-variate polynomials:
- \(r_1(X, Y) = \sum_{x_j \in \{0,1\}, j \neq 1, 2}f(X, Y, x_3, x_4, \dots, x_\mu)\)
- First round polynomial:
-
Idea 3: Bulletproofs-like folding
- Convert a sumcheck problem to a bulletproofs problem (lot of MSMs)
- Run \(k\) bulletproofs rounds until sumcheck size is \(\le 2^l\)
-
Idea 4: Ternary hypercube
- Will help reduce rounds in sumcheck but is that useful to us?
Sumcheck in GPU
By Suyash Bagad
Sumcheck in GPU
- 81