Data Visualization of C'ville+Albema­rle EMS Data

CVJS Meetup - Nov 2015

Michael Holroyd, Ph.D.



Data Visualization of C'ville+Albema­rle EMS Data

CVJS Meetup - Nov 2015

Michael Holroyd, Ph.D.

Demo: http://meekohi.com/rids
Code: https://github.com/cville/fire-rescue-map

What is this stuff?

node.js

 server-side event-driven javascript

mongoDB

 "noSQL" document-based database

leaflet.js

  front-end library for mapping

d3.js

  data-driven visualization toolbox

Quick History

  • 1968 - American Telephone and Telegraph Company (AT&T) establishes "911" as unique emergency number.

  • 1972 - AT&T adds "selective routing" features.

  • 1979 - Several states enact legislation requiring establishment of 911 numbers.

  • 1987 - About 50% of US population has access to 911

  • 2015 - about 98% of the US has access to some kind of 911.

Emergency Communications Center


  • You will be pleased to know that our radio tower on top of Carter's Mountain won "Tower of the Month"  ;D

1. Get the Data


MongoDB


  • NoSQL document-based database

  • I don't generally recommend MongoDB for projects, but it can get you up and running very quickly without any schema.


 MongoClient.connect(url, function(err, db) {
  var incidents = db.collection('incidents');
  ...
        

Scraper Pattern

var dates = _.range(0,days_ago+1).map(function(i){
  return moment().subtract(i,'days').format("YYYY-MM-DD");
});

async.eachLimit(dates,5,function(date,cb){
  var url = "http://warhammer.mcc.virginia.edu/calls.php?fdate="+date;
    request(url, function(err, resp, body){
      $ = cheerio.load(body);
      ...
  • Build an array of urls you want to scrape

  • Pass to async.js to do them in parallel

  • request() makes the actual HTTP request

  • cheerio sets up "lazy parsing" so you can use selectors etc.

why upsert?

incidents.update({
  incident_id:incident.incident_id,
  unit:incident.unit_id
},incident,{upsert:true},row_cb);

  • upsert with a unique identifier makes sure the code is idempotent (useful because the next step is going to cost money, and you don't want to lose data)

  • General advice: do as little parsing as possible at data collection time. Much easier to munge the raw data locally later on than needing to re-scrape everything.

2. Geolocation

Geocoding converts addresses like "1600 Ampitheater Parkway, Mountain View, CA" into geographic coordinates like (37.423021,-122.083739).

var geocoder = require('node-geocoder')('google','https',{apiKey:secrets.google_apikey});
var geocoderBottleneck = new bottleneck(0,200);

async.eachLimit(incidents,5,function(incident,cb){ // keep under the rate limit...
  geocoder.geocode(
    incident.address+" Charlottesville, VA",
    function(err, location){
      incidentsCollection.update(
        {_id:incident._id},
        {"$set":{location:location}
      },
      geocoderBottleneck.submit(cb,null,null));
    }
  );
});
https://github.com/nchaulet/node-geocoder

2. Geocoder


I'm pretty sure there are free services out there. Not sure why I went with Google API...


3. outputData

  • For smaller datasets it makes sense to just load everything into memory and create visualizations dynamically using JS on the client-side

  • As your dataset gets bigger, long load times and long processing times makes this less attractive

  • This dataset is somewhere in-between... room for improvement

4. visualize!

leaflet.js

 open-source JS library for interactive maps


  • Handles base-layers
  • Support for tiled raster layers (Anything > 100k elements)
  • Plays nice with d3.js for interactive SVG elements


4. visualize!

d3.json("data/rids.json", function(oa){
  var d3Overlay = L.d3SvgOverlay( function(selection,projection){
    selection.selectAll('#mapID circle').data(oa)
      .enter().append("circle")
      .attr("opacity",0.7)
      .attr("r", function(d){
        var endScale = Math.pow(2, 11-m.getZoom());
        return 1.0*Math.sqrt(d.incidents)*endScale;
      })
      .attr("fill", function(d){
        return svColorScale(d.averageResponseTime);
      })
      .attr("cx",function(d){
        var point = projection.latLngToLayerPoint([d.location.latitude, d.location.longitude]);
        return point.x;
      });

  }, {zoomAnimate:true});
  d3Overlay.addTo(m);
});
http://meekohi.com/rids

4. visualize!


  • Once your data is setup, visualization is pretty simple:

var svColorScale = d3.scale.linear().domain([0, 30]).range(["#66FF66","#FF6666"]);
g.selectAll("scatter-dots")
  .data(calltimesByTime)
  .enter().append("svg:circle")
      .attr("cx", function (d,i) { return x(d[0]); } )
      .attr("cy", function (d) { return y(d[2]); } )
      .attr("r", 1)
      .attr("fill", function(d){
        return svColorScale(d[2]);
      })
      .attr("opacity",0.6);

5. brainstorm


  • Everything is up on Github under the "cville" group.

  • Interested in other angles to explore this dataset.
    • (without being super creepy)

  • ACFR has mentioned that they actually don't have very good ways of looking at their own data. Tried to find out what would be useful for them, but didn't get any concrete ideas.

Thanks!


Arqball (computer vision kung-fu):
http://arqspin.com

Learnstream (link curation and sharing):
http://learnstream.com

Scenethink (simple event calendars):
http://scenethink.com

Michael Holroyd
http://meekohi.com
Made with Slides.com