Parker Rule
MGGG
April 22, 2020
GerryChain is slow...
Why?
ouch! 😬
For a typical ReCom run on Pennsylvania, random spanning tree operations take up ~70% of runtime.
...and calls to a single NetworkX function take ~40% of runtime!
This is not particularly surprising.
DeFord et al. 2020
~50x slowdown relative to graph-tool 😢
...but NetworkX is superior in other ways (API, documentation, ease of installation, popularity).
(say, by 50x!)
xkcd
What if we could make the slow bit faster? 🤔
(say, by 50x!)
Speeding up just the biggest NetworkX call...
Speeding up all of random_spanning_tree()...
Very optimistically...
This stuff also matters!
Abandoning this branch was probably wise.
Solution: make all the parts faster.
sites.tufts.edu/vrdi
Use a compiled language!
Less computation per step...
but usually more steps.
Pennsylvania flip-to-ReCom ratio: ~100x
1k ReCom steps in ~10 minutes
100k flip steps in ~10 minutes
Contiguity checks are the slow bit this time.
(depth-first search at each step)
Contiguity checks are the a slow bit this time.
Creating new partitions also takes much of the runtime!
Each new partition of Pennsylvania costs ~20KB
(with optimizations!)
github.com/mggg/GerryChain
1 million Pennsylvania flip steps (easily computable in two hours) costs ~20GB of RAM to store.
Even with operating system RAM compression, large ensembles (millions of plans) are extremely challenging to work with!
Solution 1: compute statistics on the fly rather than saving every plan and computing statistics ex post facto.
Pros:
Trivial to implement
Potential O(1) memory usage (for pre-binned histograms, etc.)
Cons:
Loss of flexibility
(what if you don't know what statistics you want?)
Solution 2: save the deltas between each step, plus the occasional snapshot of the full plan for fast reconstruction.
Pros:
Entire chain preserved—more flexibility
Cons:
More complicated to implement
Doesn't work well for ReCom (naïvely)
Doesn't scale well to billions of steps
The answer probably depends on scale.
Thousands of plans — save everything
Millions of plans — use a more compact representation; save statistics
Billions of plans — save (rich) statistics only
Trillions of plans — save summary statistics
GerryChain theoretically supports all of these scales, but it is oriented around working with thousands of plans.
Optimization
Richer ensembles
Developing new chains
High coverage (small grid experiments)
GerryChain is a fantastic preprocessing and analysis tool. Why throw that out?
Load and filter geodata
Define chain
Run chain
🕐🕓🕖
Make a nice boxplot
GerryChain is a fantastic preprocessing and analysis tool. Why throw that out?
Load and filter geodata
Define chain
Load chain results
Make a nice boxplot
Run optimized chain
🔥
Counting polyomino tilings (districts on a grid)
mggg.org/table
Zach Schutzman's polyomino tiling enumeration code
github.com/zschutzman/enumerator
Julia? | C++? |
---|---|
Optimized for scientific computing (built-in linear algebra, graph analysis, and tabular data libraries) | More mature and well-known |
Python-like syntax (+ REPL!) | Easier to use with Python |
Dependencies are annoying, particularly in the C++ context.
Most people at MGGG (including myself) primarily use Python, and Julia is easier to learn with a Python background.
Originally written to compute ~1-billion-step flip chains for the ReCom paper (DeFord, Duchin, Solomon 2019)
Supports basic on-the-fly computation of statistics
A quick benchmark...
100k Pennsylvania flip steps
(2% population tolerance, no compactness constraint, no statistics)
GerryChain: 22 min, 34 sec
Flips.jl: 13.6 sec
~100x speedup!
Originally written to compute ~10-million-step chains for the reversible ReCom paper (Cannon, Duchin, Randall, Rule; forthcoming)
Integrated with GerryChain via Python-Julia bindings
Another benchmark...
1k Pennsylvania ReCom steps
(2% population tolerance, no compactness constraint, no statistics)
GerryChain: 6 min, 33 sec
Flips.jl: 14.1 sec
~28x speedup!
Goal: use GerryChain for preprocessing and analysis.
Calling Julia from Python is finicky, so it best to minimize interaction. (This also cuts down on overhead!)
Strategy: Loose coupling (JSON serialization)
Extremely simple (~30 lines)
takes in an initial partition; returns a block of partitions
Converting the output of the Julia chain into GerryChain partitions is effective up to the memory bottleneck.
If we save only a few partitions and record statistics, this strategy works well for up to ~100m Pennsylvania ReCom steps on the HPC cluster.
Computing statistics and initializing GerryChain partitions is the CPU bottleneck; this can be largely alleviated through parallelization.
Markov chains are inherently serial—the next step depends on the last step.
However, proposals starting from the same step can be parallelized. This can result in large wall-clock speedups for chains with low acceptance probabilities (roughly proportional to the number of cores used).
Coming soon!
(see github.com/mggg/GerryChainJulia)
Collaboration between me and Bhushan Suwal
Pure Julia—removing Python entirely should allow for faster, longer chains (but can still be called from Python if desired)
Parallelization support
Built-in support for many common statistics and analyses
May still depend on/integrate with Python geodata tools for preprocessing
Coming soon!
(see github.com/mggg/GerryChainJulia)
Please be our beta testers!
Julia has rich support for cluster computing and integrates with the Slurm job manager on the Tufts HPC cluster with just a few lines of code.
Account limit: ~450 cores 🤯
Could be used for large-scale optimization experiments
Question: How do vote patterns map to seat shares?
Hypothesis: Spatiality matters.
(Diffuse patterns and overly packed patterns perform worse.)
from preprint of Political Geometry
Goal (6x6 grid → 6 districts, 12 club voters):
evaluate 1,251,677,700 vote distributions over 451,206 districtings
What is the best vote distribution for clubs over all districtings?
What is the worst?
First-past-the-post voting introduces nonlinearities that make this problem hard.
Approach:
Decompose ~450,000 districtings into ~2,500 polyominoes
Remove vote distributions that are redundant by symmetry
Compute all polyomino-vote distribution pairings to find best and worst expected seat shares
Solved in ~3 hours over hundreds of cores using Distributed.jl!