Visualizing Statistical and Machine Learning Concepts

Michael Freeman, University of Washington

@mf_viz

#stratadata

Today's Objective

Develop a process for designing visual explanations of statistical and machine learning concepts.

Faculty Member at the UW Information School

Author of Programming Skills for Data Science (bit.ly/ps4ds)

What is division?

With the people around you, take 5 minutes and draw a visual explanation of this concept.

In order to visualize concepts, we need to isolate specific ideas, identify underlying data structures, and leverage corresponding algorithms.

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Concepts → Ideas

What foundational ideas underlie your statistical concept?

"What is the Central Limit Theorem?"

Central Limit Theorem

"Distribution of the sampling mean"

What foundational ideas underlie this statistical concept?

Ideas Underlying CLT

Variation within a population

Ideas Underlying CLT

Sampling and how it varies

Ideas Underlying CLT

Repeated sampling from your population

Ideas Underlying CLT

Distributions and normality

Ideas Underlying CLT

Distributions of sample means

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Ideas → Data

What data expresses your idea?

"What is hierarchical modeling?"

Hierarchical Modeling

What data expresses these ideas?

Source code available here

Data Generation Demo

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Data → Algorithm

What algorithm enables you to express your data?

"How do you interpret performance of models with binary outcomes?"

Beeswarm Plot

What algorithms are necessary to express this data?

// Bind data to a selection of circles
let circles = g.selectAll("circle").data(data)

// Append new circles to the chart and set their visual attributes
circles.enter().append("circle")
  .attr("cx", function(d) { return x(d.value)})
  .attr("cy", function(d) { return 0 })
  .attr("r", 10)

Beeswarm Plot

What algorithms are necessary to express this data?

// Construct a set of simulation forces on the data
const simulation = d3.forceSimulation(data)
    .force("x", d3.forceX(function(d) { return x(d.value); }).strength(1))
    .force("y", d3.forceY(settings.height / 2))
    .force("collide", d3.forceCollide(8))
    .stop();

// Iterate through the simulation to find the optimal positions
for(let i =0; i<100; i++) simulation.tick();

Beeswarm Plot

What algorithms are necessary to express this data?

// Source (https://github.com/d3/d3-force/blob/master/src/collide.js)
function apply(quad, x0, y0, x1, y1) {
  var data = quad.data, rj = quad.r, r = ri + rj;
  if (data) {
    if (data.index > node.index) {
      var x = xi - data.x - data.vx,
          y = yi - data.y - data.vy,
          l = x * x + y * y;
      if (l < r * r) {
        if (x === 0) x = jiggle(), l += x * x;
        if (y === 0) y = jiggle(), l += y * y;
        l = (r - (l = Math.sqrt(l))) / l * strength;
        node.vx += (x *= l) * (r = (rj *= rj) / (ri2 + rj));
        node.vy += (y *= l) * r;
        data.vx -= x * (r = 1 - r);
        data.vy -= y * r;
      }
    }
    return;
  }
  return x0 > xi + r || x1 < xi - r || y0 > yi + r || y1 < yi - r;
}

All 3 stages need to be done well

Concepts → Ideas

Ideas → Data

Data → Algorithm

In order to visualize concepts, we need to isolate specific ideas, identify underlying data structures, and leverage corresponding algorithms.

Thank You

Book: bit.ly/ps4ds

Twitter: @mf_viz

Presentation Resources: mfviz.com/strata-2019

Made with Slides.com