anthony.fox AT ccri.com


What is GeoMesa?


  • Distributed Spatio-Temporal Database
  • Built on Accumulo 
  • Standard Geotools DataStore API
  • Standard OGC access
  • Geoserver Plugins and WPS analytics
  • LocationTech Open Source


High Velocity 

Spatio-Temporal Data


  • Twitter 100-150k tweets/second
  • Foursquare 1M checkins/day
  • Geolocated clickstreams
  • Satellite imagery
  • Near-real time traffic sensors
  • FAA flight information

Distributed Databases

  • Very flexible schemas but not schema-less
  • Horizontally scalable
  • Query planning pushed to application layer
  • Implicit lexicographic index on keys

Distributed Databases

  • Design tradeoffs

Distributing Data

Distributing Data


Space Filling Curves


Space Filling Curves


Query Planning


Analytics

Interpolated line select
Density computations
WPS

Near-Real Time Architecture

Challenges

Geospatial Joins
 select coffee_shop.name, tweet.handle
 from coffee_shop, tweets 
 where dwithin(tweet.location, coffee_shop.location, 500 meters)
- joining on geometries from two tables
- relational databases leverage shared memory and bitset intersections
- no such luck in distributed databases
- goal: compute result while limiting network overhead


- Naive algorithm: buffer all coffee shop locations, generate index query for each polygon, run query against tweets table
Made with Slides.com