The Hard Thing About Hard Geo Things
James Milner
Innovation Developer
@JamesLMilner
james@loxodrome.io
www.loxodrome.io
+
As inputs increase time increases
'Big Data'
is
staying
IoT
Drones
Mobile Apps
I'm going to focus more on processing than storage
Storage is a financial problem
There are many roles in GIS, but one thing I think we share is we are data heavy people
How do we gain insight from spatial data in a reasonable timeframe?
"This Point in Polygon process runs in 12 seconds with 1000 polygons"
That's fine, but many geo algorithms run in > linear time!
Linear = O(n)
Big O Notation
"how quickly the runtime grows, relative to the input as the input gets arbitrarily large" - Parker Phinney
Example Time
for postcode in postcodes:
for grid in osgrid:
if pointInBoundingBox(postcode, grid):
return True
I want to find out which of all 1.8 million UK postcodes reside in which OS Grid Cell
O(n * m) ≃ O(n²)
In the naivest solution, we must check all points against all polygons!
n = Number of Polygons
m = Number of Points
Many geo problems exhibit this kind of complexity
How do we handle complex problems + large datasets?
Pre-process
Where
Possible
Store/Use
What's
Necessary
Check
The
Algorithm
Write for
Asymptotic
Input
Profile
Benchmark Optimise
Examine
Your
Options
Stay Lean,
Try
Scripting/CLI
Leverage
Spatial
Indexing
Spatial indexes are specific structures that allow us to search geo data quickly
(Quadtree Demo)
Will
It
Parallelize?
"The Intel® Pentium® 4 processor of 2004 was just a single core processor, but today we are talking about 8, 10, 12, or even 15 cores in a workstation, and these cores can execute instructions independent of each other. "
Unfortunately not all problems are paralizeable
Can I
Distribute
It
Take Aways
- The amount of data we may have to deal with is increasing
- Many geographic problems are complex
- Perfomance and algorithms matter
- Try to figure out the best tool for the job
- Don't be scared to script and program
- We should embrace parallel processing and distribution
A special thank you to
Ed Boiling (Google)
Vlad Metodiev (OS)
The Hard Thing About Hard Geo Things
By James Milner
The Hard Thing About Hard Geo Things
- 1,600