Sumcheck in GPU
Sumcheck Primer
- Given a polynomial g:Fμ→F and X={xi}i∈[μ] compute the sum
- Naively, a verifier would require ∣Bμ∣=2μ evaluations of g(.)
- Sumcheck protocol requires O(μ+λ) verifier work
- Here λ is the cost to evaluate g(.) at some r∈Fμ
- Prover's work is O(2μ), i.e. linear in no of constraints
g1(X1):=∑x2…g(X1,x2,…,xm)
g2(X2):=∑x3…g(r1,X2,x3,…,xm)
v=?g1(0)+g1(1)
g1(r1)=?g2(0)+g2(1)
g3(X3):=∑x4…g(r1,r2,X3,x4,…,xm)
gμ(Xμ):=g(r1,r2,…,rμ−1,Xμ)
g2(r2)=?g3(0)+g3(1)
gμ−1(rμ−1)=?gμ(0)+gμ(1)
gμ(rμ)=?g(r1,r2,…,rμ)
Prover P
Verifier V
g1
r1
g2
g3
gμ
rμ−1
r2
⋮
⋮
⋮
Sumcheck for MLE Polynomials
- Given a vector f∈F2μ, we can write its (unique) multi-linear extension f(X) as:
- MLEs makes sumcheck prover's computation faster and parallelisable
- Computing round polynomials is easy
Sums of halves → easy!
- Updating original polynomial with challenge α1 is tricky
Sumcheck for MLE Polynomials
Round computation can be parallelised
Round i+1 depends on round i challenge
Need to process all terms to compute challenge
Sumcheck for R1CS
We need to prove that F is 0 on all points of Bμ
- We need sumcheck on sums of products of MLE polynomials
- Trivial to extend sumcheck for one polynomial to sums of products
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
*This isn't quite what we need, we need to show that independent sumchecks sum to a certain sum.
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Not the most optimal use of L2 cache
Verifier generates (k+n) challenges
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Sumcheck in GPU
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
L2 Cache
Relocation of data within L2 cache
Optimal usage of L2 cache
Sumcheck in GPU
Verifier generates (2k+n) challenges
- Main hurdle: L2 cache size is just 96MB
- Can fit an R1CS sumcheck instance of size ≤219
- For larger instances, we need to read and write to memory to update the state
- Alternatively, we can break a large sumcheck instance into several smaller instances
Path Forward
-
Idea 1: Independent sumchecks
- Still need to show ∑jcj=c - additional overhead
- Better ideas to show that?
-
Idea 2: Multi-variate round polynomials
- First round polynomial:
- r1(X)=∑xj∈{0,1},j=1f(X,x2,x3,…,xμ)
- What if round polynomials were bi-variate polynomials:
- r1(X,Y)=∑xj∈{0,1},j=1,2f(X,Y,x3,x4,…,xμ)
- First round polynomial:
-
Idea 3: Bulletproofs-like folding
- Convert a sumcheck problem to a bulletproofs problem (lot of MSMs)
- Run k bulletproofs rounds until sumcheck size is ≤2l
-
Idea 4: Ternary hypercube
- Will help reduce rounds in sumcheck but is that useful to us?
Sumcheck in GPU
By Suyash Bagad
Sumcheck in GPU
- 121