About you

The Hitchhiker's Guide to the Open

Source Data Science Galaxy


June 10, 2016

John Tukey

 "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise."

The Data Science Wheel




Python (Data Wrangling)

  • Sqlalchemy: ORM and abstraction layer for database queries (over a dozen dialects of SQL)
  • GDAL/OGR: abstraction layer for raster and for vector formats (200+)
  • fiona: another OGR API

Python (Feature Engineering)

  • Geopy: Geocoding addresses (18 API)
  • censusgeocode: Python wrapper for the US Census Geocoder
  • ip2geotools: Geocoding IPs (14 API)
  • pyproj: Performs cartographic transformations and geodetic computations.
  • cartopy: cartographic python library with matplotlib support
  • scipy.spatial: Spatial algorithms and data structures (kdtree, voronoi, minkowski...)
  • scikit-image: image processing, feature engineering

Waldo Tobler

 "everything is related to everything else, but near things are more related than distant things."

Python (Model)

  • scikit-learn: machine learning, data mining
    • sklearn.cluster
    • sklearn.distance
  • networkx: complex graphs and networks
  • osmnx: OSM+networkx

  • prophet: time series
  • PySal: Python Spatial Library (also esda, giddy, spaghetti)
  • PyTorch: Deep learning platform
  • rpy2: interface with R packages


rpostgis, dplyr, tidyr, sp, shapefiles, raster, geojson, geosphere, leafletR, cartography, choroplethR, amdai, ggmap, caret, glm, forecast, deepboost

Python (Visualization)

