The Data Scientist Open Source Cartography Toolbox
Francois Dion
Chief Data Scientist
Dion Research LLC
About me
About you
The Hitchhiker's Guide to the Open
Source Data Science Galaxy
Francois Dion
June 10, 2016
John Tukey
"Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise."
The Data Science Wheel
Why?
Apps
- QGIS: Open source GIS
-
Blender: 3d creation tools
- BlenderGIS: Blender addons to make the bridge between Blender and geographic data
- BlenderFlowmapTools: for creating 2D vector fields/flowmaps
- Jupyter Notebook: the data science platform
Databases
-
Postgresql: flexible database engine
- Postgis: GIS extensions for Postgresql
- crankshaft: CartoDB Spatial Analysis
- imposm: import OSM data in Postgresql
Python (Data Wrangling)
-
Sqlalchemy: ORM and abstraction layer for database queries (over a dozen dialects of SQL)
- Psycopg2: Postgresql driver
- GeoAlchemy2: gis extensions
- GDAL/OGR: abstraction layer for raster and for vector formats (200+)
- fiona: another OGR API
- numpy/pandas: array / panel data (dataframes)
- NetCDF4: disk persistence format
- shapely: Manipulation & analysis of geometric objects
- geopandas: geographic pandas extensions
- dask-geopandas: scaling geopandas
- cartoframes: carto DF
Python (Feature Engineering)
- Geopy: Geocoding addresses (18 API)
- censusgeocode: Python wrapper for the US Census Geocoder
- ip2geotools: Geocoding IPs (14 API)
- pyproj: Performs cartographic transformations and geodetic computations.
- cartopy: cartographic python library with matplotlib support
- scipy.spatial: Spatial algorithms and data structures (kdtree, voronoi, minkowski...)
- scikit-image: image processing, feature engineering
Waldo Tobler
"everything is related to everything else, but near things are more related than distant things."
Python (Model)
-
scikit-learn: machine learning, data mining
- sklearn.cluster
- sklearn.distance
- networkx: complex graphs and networks
-
osmnx: OSM+networkx
- prophet: time series
- PySal: Python Spatial Library (also esda, giddy, spaghetti)
Python (Visualization)
- matplotlib and mpl_toolkits.basemap: old reliable (descartes)
- ipyleaflet: Jupyter notebook widget for interactive maps
- Folium: (branca)
- pptk: point processing toolkit
- OSMNX: Open Street Maps + NetworkX
- Altair: visualization tools with Vega (altair, pdvega, altair_widgets, altair_matplotlib)
- PyViz: visualization tools with Bokeh (hvPlot, DataShader, EarthSim, GeoViews)
- Dash: visualization tools with Plotly (cufflinks, scattergeo, mapbox, figure_factory)
Thank you!
fdion@dionresearch.com
https://slides.com/fdion
The data scientist open source cartography toolbox
By Francois Dion
The data scientist open source cartography toolbox
- 5,328