The Hard Thing About Hard Geo Things

James Milner
Innovation Developer
@JamesLMilner
james@loxodrome.io
www.loxodrome.io





+




As inputs increase time increases

'Big Data'
is
staying

IoT
Drones
Mobile Apps
I'm going to focus more on processing than storage
Storage is a financial problem


There are many roles in GIS, but one thing I think we share is we are data heavy people
How do we gain insight from spatial data in a reasonable timeframe?

"This Point in Polygon process runs in 12 seconds with 1000 polygons"
That's fine, but many geo algorithms run in > linear time!

Linear = O(n)

Big O Notation
"how quickly the runtime grows, relative to the input as the input gets arbitrarily large" - Parker Phinney

Example Time

for postcode in postcodes:
for grid in osgrid:
if pointInBoundingBox(postcode, grid):
return True
I want to find out which of all 1.8 million UK postcodes reside in which OS Grid Cell
O(n * m) ≃ O(n²)
In the naivest solution, we must check all points against all polygons!
n = Number of Polygons
m = Number of Points
Many geo problems exhibit this kind of complexity

How do we handle complex problems + large datasets?

Pre-process
Where
Possible



Store/Use
What's
Necessary

Check
The
Algorithm

Write for
Asymptotic
Input

Profile
Benchmark Optimise


Examine
Your
Options














Stay Lean,
Try
Scripting/CLI


Leverage
Spatial
Indexing
Spatial indexes are specific structures that allow us to search geo data quickly
(Quadtree Demo)
Will
It
Parallelize?
"The Intel® Pentium® 4 processor of 2004 was just a single core processor, but today we are talking about 8, 10, 12, or even 15 cores in a workstation, and these cores can execute instructions independent of each other. "



Unfortunately not all problems are paralizeable
Can I
Distribute
It

Take Aways
- The amount of data we may have to deal with is increasing
- Many geographic problems are complex
- Perfomance and algorithms matter
- Try to figure out the best tool for the job
- Don't be scared to script and program
- We should embrace parallel processing and distribution
A special thank you to
Ed Boiling (Google)
Vlad Metodiev (OS)
The Hard Thing About Hard Geo Things
By James Milner
The Hard Thing About Hard Geo Things
- 1,753