Visualizing Statistical and Machine Learning Concepts
Michael Freeman, University of Washington
@mf_viz
#stratadata
Today's Objective
Develop a process for designing visual explanations of statistical and machine learning concepts.
Faculty Member at the UW Information School
Author of Programming Skills for Data Science (bit.ly/ps4ds)
What is division?
With the people around you, take 5 minutes and draw a visual explanation of this concept.
In order to visualize concepts, we need to isolate specific ideas, identify underlying data structures, and leverage corresponding algorithms.
Process
Concepts → Ideas
Ideas → Data
Data → Algorithm
Process
Concepts → Ideas
Ideas → Data
Data → Algorithm
Concepts → Ideas
What foundational ideas underlie your statistical concept?
"What is the Central Limit Theorem?"
Central Limit Theorem
"Distribution of the sampling mean"
What foundational ideas underlie this statistical concept?
Ideas Underlying CLT
Variation within a population
Ideas Underlying CLT
Sampling and how it varies
Ideas Underlying CLT
Repeated sampling from your population
Ideas Underlying CLT
Distributions and normality
Ideas Underlying CLT
Distributions of sample means
Process
Concepts → Ideas
Ideas → Data
Data → Algorithm
Ideas → Data
What data expresses your idea?
"What is hierarchical modeling?"
Hierarchical Modeling
What data expresses these ideas?
Source code available here
Data Generation Demo
Process
Concepts → Ideas
Ideas → Data
Data → Algorithm
Data → Algorithm
What algorithm enables you to express your data?
"How do you interpret performance of models with binary outcomes?"
Beeswarm Plot
What algorithms are necessary to express this data?
// Bind data to a selection of circles
let circles = g.selectAll("circle").data(data)
// Append new circles to the chart and set their visual attributes
circles.enter().append("circle")
.attr("cx", function(d) { return x(d.value)})
.attr("cy", function(d) { return 0 })
.attr("r", 10)
Beeswarm Plot
What algorithms are necessary to express this data?
// Construct a set of simulation forces on the data
const simulation = d3.forceSimulation(data)
.force("x", d3.forceX(function(d) { return x(d.value); }).strength(1))
.force("y", d3.forceY(settings.height / 2))
.force("collide", d3.forceCollide(8))
.stop();
// Iterate through the simulation to find the optimal positions
for(let i =0; i<100; i++) simulation.tick();
Beeswarm Plot
What algorithms are necessary to express this data?
// Source (https://github.com/d3/d3-force/blob/master/src/collide.js)
function apply(quad, x0, y0, x1, y1) {
var data = quad.data, rj = quad.r, r = ri + rj;
if (data) {
if (data.index > node.index) {
var x = xi - data.x - data.vx,
y = yi - data.y - data.vy,
l = x * x + y * y;
if (l < r * r) {
if (x === 0) x = jiggle(), l += x * x;
if (y === 0) y = jiggle(), l += y * y;
l = (r - (l = Math.sqrt(l))) / l * strength;
node.vx += (x *= l) * (r = (rj *= rj) / (ri2 + rj));
node.vy += (y *= l) * r;
data.vx -= x * (r = 1 - r);
data.vy -= y * r;
}
}
return;
}
return x0 > xi + r || x1 < xi - r || y0 > yi + r || y1 < yi - r;
}
All 3 stages need to be done well
Concepts → Ideas
Ideas → Data
Data → Algorithm
In order to visualize concepts, we need to isolate specific ideas, identify underlying data structures, and leverage corresponding algorithms.
Thank You
Book: bit.ly/ps4ds
Twitter: @mf_viz
Presentation Resources: mfviz.com/strata-2019
strata-2019
By Michael Freeman
strata-2019
Visualizing Statistics and ML
- 1,613