anthony.fox AT ccri.com


What is GeoMesa?


  • Distributed Spatio-Temporal Database built on Accumulo 
  • Standard Geotools DataStore API
  • Standard OGC access
  • Geoserver Plugins and WPS analytics
  • LocationTech Open Source


High Velocity 

Spatio-Temporal Data


Twitter (100-150k tweets/second)
Foursquare (1M checkins/day)
Geolocated clickstreams
Satellite Imagery
Near-real time Traffic sensors
FAA flight information

Distributed Databases

1 MSFT

Flexible Schema
Query planning pushed into Application layer
Presents challenges

Distributed Databases

ID SYMBOL DATE CLOSE
1 MSFT 2014-05-20T00:00:00.000Z 39.64

Flexible Schema
Query planning pushed into Application layer
Presents challenges

Distributing Data

Distributing Data


Space Filling Curves


Space Filling Curves


Query Planning


Analytics

Interpolated line select
Density computations
WPS

Near-Real Time Architecture

Challenges

Geospatial Joins
 select coffee_shop.name, tweet.handle
 from coffee_shop, tweets 
 where dwithin(tweet.location, coffee_shop.location, 500 meters)
- joining on geometries from two tables
- relational databases leverage shared memory and bitset intersections
- no such luck in distributed databases
- goal: compute result while limiting network overhead


- Naive algorithm: buffer all coffee shop locations, generate index query for each polygon, run query against tweets table
Made with Slides.com