anthony.fox AT ccri.com


What is GeoMesa?
- Distributed Spatio-Temporal Database
- Built on Accumulo
- Standard Geotools DataStore API
- Standard OGC access
- Geoserver Plugins and WPS analytics
- LocationTech Open Source
High Velocity
Spatio-Temporal Data
- Twitter 100-150k tweets/second
- Foursquare 1M checkins/day
- Geolocated clickstreams
- Satellite imagery
- Near-real time traffic sensors
- FAA flight information
Distributed Databases
- Very flexible schemas but not schema-less
- Horizontally scalable
- Query planning pushed to application layer
- Implicit lexicographic index on keys

Leveraging Distributed Databases
- Partitioning - distribute queries across multiple machines
- Striping - distribute computations across multiple machines
- Custom traversal using server side iterators
- Custom analytics embedded in server side iterators
-
Ad-hoc interactive MapReduce
Distributing Data
Distributing Data
Space Filling Curves

Linearize the Keyspace
Space Filling Curves

Query Planning

Grid from SFC

Stripe

Grid from SFC
![]()
Analytics
- Web Processing Services
- Deployed in Geoserver
- Discoverable via GetCapabilities
- Computed in parallel via server side iterators
Parallel Associative
Computations
- Any computation that is associative can be parallelized
- Compute sparse density matrix per partition/stripe
- Reduce via associative operation in client (Geoserver)

KNN
1. Fix bbox2. While results.size == 0 Compute knn per stripe/partition
if results.size == 0
loop (expand bbox and take symmetric difference with unioned previous bounds)
3. Reduce client side
- Brute force map-reduce
- Faster than best-first search using R-trees in most cases
Analytics:
Interpolated Time/Space query
-
Find features that intersect in both space and time
- Deployed in Geoserver as WPS process
- Use path-prediction and gap-filling for sparse datasets
- Examples:
-
Find other people traveling the same route at the same time as you are (Who was stuck in that traffic jam?)
- Find people you may have interacted with on a recent road-trip from DC to NYC
Tweeting the NJ Turnpike

Possible Interactions
Larger Buffer

Roadmap
- 1.0 in June
- Full integration with Geoserver and Accumulo security models
- PKI-based authentication and authorization
- Avro binary encoding
- Relational projections
- Secondary indexes
- 1.x in the Fall
- Deep Integration between Accumulo/GeoMesa + Hadoop Ecosystem
- Suite of WPS executed across distributed compute resources
- On-demand parallel tile rendering and cachin
- Data and query statistics
- Optimized query planning
Links
https://www.locationtech.org/projects/technology.geomesa
http://geomesa.org
Mailing Lists
Some Challenges
-
Geospatial (Self-) Joins
select coffee_shop.name, tweet.handle
from coffee_shop, tweets
where dwithin(tweet.location, coffee_shop.location, 500 meters)
- Joining on geometries from two tables
- Relational databases leverage shared memory and bitset intersections
-
Compute result while limiting network overhead
- Candidate solution -
- Lift data out of Accumulo into a pre-existing Spark or Storm topology
- Discretize space and group by bins
Copy of Geomesa w/Tube
By anthonyccri
Copy of Geomesa w/Tube
- 422
