anthony.fox AT ccri.com


What is GeoMesa?
- Distributed Spatio-Temporal Database
- Built on Accumulo
- Standard Geotools DataStore API
- Standard OGC access
- Geoserver Plugins and WPS analytics
- LocationTech Open Source
High Velocity
Spatio-Temporal Data
- Twitter 100-150k tweets/second
- Foursquare 1M checkins/day
- Geolocated clickstreams
- Satellite imagery
- Near-real time traffic sensors
- FAA flight information
Distributed Databases
- Very flexible schemas but not schema-less
- Horizontally scalable
- Query planning pushed to application layer
- Implicit lexicographic index on keys

Distributed Databases
- Design tradeoffs
Distributing Data
Distributing Data
Space Filling Curves

Space Filling Curves

Query Planning

Analytics
Interpolated line select
Density computations
WPS
Near-Real Time Architecture
Challenges
Geospatial Joins
select coffee_shop.name, tweet.handle
from coffee_shop, tweets
where dwithin(tweet.location, coffee_shop.location, 500 meters)
- relational databases leverage shared memory and bitset intersections
- no such luck in distributed databases
- goal: compute result while limiting network overhead
- Naive algorithm: buffer all coffee shop locations, generate index query for each polygon, run query against tweets table
Copy of GeoMesa
By mattccri
Copy of GeoMesa
- 218
