anthony.fox AT ccri.com


What is GeoMesa?



  • Distributed Spatio-Temporal Database
 
  • Built on Accumulo 
  • Standard Geotools DataStore API
  • Standard OGC access
  • Geoserver Plugins and WPS analytics
  • LocationTech Open Source


High Velocity 

Spatio-Temporal Data


  • Twitter 100-150k tweets/second
  • Foursquare 1M checkins/day
  • Geolocated clickstreams
  • Satellite imagery
  • Near-real time traffic sensors
  • FAA flight information

Distributed Databases

  • Very flexible schemas but not schema-less
  • Horizontally scalable
  • Query planning pushed to application layer
  • Implicit lexicographic index on keys

Leveraging Distributed Databases

  • Partitioning - distribute queries across multiple machines
  • Striping - distribute computations across multiple machines
  • Custom traversal using server side iterators
  • Custom analytics embedded in server side iterators
  • Ad-hoc interactive MapReduce

Distributing Data

 

Distributing Data

Space Filling Curves


Linearize the Keyspace

Space Filling Curves


Query Planning


Grid from SFC


Stripe

Grid from SFC



Analytics

  • Web Processing Services
  • Deployed in Geoserver
  • Discoverable via GetCapabilities
  • Computed in parallel via server side iterators

Interpolated SpaceTime Query

Select friends that I could have interacted with during my recent trip from Washington DC to New York City
  • Interpolated in space (along trip axis) and time.
  • Multiple overlapping queries with deduplication

Parallel Density Computations

  • Compute sparse density matrix per partition/stripe
  • Reduce via associative operation in client (Geoserver)

KNN


1. Fix bbox2. While results.size == 0
      Compute knn per stripe/partition
if results.size == 0
loop (expand bbox and take symmetric difference with unioned previous bounds)
3. Reduce client side
  • Brute force map-reduce
  • Faster than best-first search using R-trees in most cases

Some Challenges

  • Geospatial (Self-) Joins
 select coffee_shop.name, tweet.handle
 from coffee_shop, tweets 
 where dwithin(tweet.location, coffee_shop.location, 500 meters)
    • Joining on geometries from two tables
    • Relational databases leverage shared memory and bitset intersections
    • Compute result while limiting network overhead
    • Candidate solution - 
      • Lift data out of Accumulo into a pre-existing Spark or Storm topology
      • Discretize space and group by bins

Roadmap

  • 1.0 in June
    • Full integration with Geoserver and Accumulo security models
    • PKI-based authentication and authorization
    • Avro binary encoding
    • Relational projections
    • Secondary indexes
  • 1.x in the Fall
    • Geoserver backed by Accumulo/GeoMesa + Hadoop
      • Suite of WPS executed across distributed compute resources
      • On-demand parallel tile rendering and cachin
    • Data and query statistics
    • Optimized query planning

Links

https://www.locationtech.org/projects/technology.geomesa
http://geomesa.org


Mailing Lists

geomesa-users@locationtech.org


geomesa-dev@locationtech.org


Made with Slides.com