The Hard Thing About Hard Geo Things

James Milner

Innovation Developer 

@JamesLMilner

james@loxodrome.io

www.loxodrome.io

 

 

+

As inputs increase time increases

'Big Data'
is
staying

IoT
Drones
Mobile Apps

I'm going to focus more on processing than storage

Storage is a financial problem

There are many roles in GIS, but one thing I think we share is we are data heavy people

How do we gain insight from spatial data in a reasonable timeframe?

"This Point in Polygon process runs in 12 seconds with 1000 polygons"

That's fine, but many geo algorithms run in > linear time! 

Linear = O(n)

Big O Notation 

"how quickly the runtime grows, relative to the input as the input gets arbitrarily large"  - Parker Phinney

Example Time

for postcode in postcodes:
    for grid in osgrid:
	if pointInBoundingBox(postcode, grid):
	    return True

I want to find out which of all 1.8 million UK postcodes reside in which OS Grid Cell

O(n * m) O(n²)

 

In the naivest solution, we must check all points against all polygons!

 

n = Number of Polygons

m = Number of Points

Many geo problems exhibit this kind of complexity

How do we handle complex problems + large datasets?

Pre-process

Where

Possible

Store/Use

What's

Necessary

Check

The 

Algorithm

Write for
Asymptotic

Input

Profile

Benchmark Optimise

Examine 

Your

Options

Stay Lean,

Try 

Scripting/CLI

Leverage
Spatial

Indexing

Spatial indexes are specific structures that allow us to search geo data quickly

(Quadtree Demo)

Will

It 

Parallelize?

"The Intel® Pentium® 4 processor of 2004 was just a single core processor, but today we are talking about 8, 10, 12, or even 15 cores in a workstation, and these cores can execute instructions independent of each other. "

Unfortunately not all problems are paralizeable

Can  I

Distribute
It

Take Aways

  • The amount of data we may have to deal with is increasing
  • Many geographic problems are complex
  • Perfomance and algorithms matter
  • Try to figure out the best tool for the job
  • Don't be scared to script and program
  • We should embrace parallel processing and distribution

A special thank you to

Ed Boiling (Google)

Vlad Metodiev (OS)

The Hard Thing About Hard Geo Things

By James Milner

The Hard Thing About Hard Geo Things

  • 1,555