Visualizing Statistical and Machine Learning Concepts

Michael Freeman, University of Washington

@mf_viz

#ODSC

Today's Objective

Develop a process for designing and building visual explanations of statistical and machine learning concepts.

What is division?

With the people around you, take 5 minutes and draw a visual explanation of this concept.

In order to visualize concepts, we need to isolate specific ideas, identify underlying data structures, and leverage corresponding algorithms.

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Concepts → Ideas

What foundational ideas underlie your statistical concept?

"What is the Central Limit Theorem?"

Central Limit Theorem

"Distribution of the sampling mean"

What foundational ideas underlie this statistical concept?

Ideas Underlying CLT

Variation within a population

Ideas Underlying CLT

Sampling and how it varies

Ideas Underlying CLT

Repeated sampling from your population

Ideas Underlying CLT

Distributions and normality

Ideas Underlying CLT

Distributions of sample means

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Ideas → Data

What data expresses your idea?

"What is hierarchical modeling?"

Hierarchical Modeling

What data expresses these ideas?

Source code available here

Data Generation Demo

Process

Concepts → Ideas

Ideas → Data

Data → Algorithm

Data → Algorithm

What algorithm enables you to express your data?

"What is conditional probability?"

Bouncing

What algorithms are necessary to express this data?

// Select circles inside the svg and bind data to the selection
var bubbles = mySvg.selectAll('circle')
                .data(myData);

// Use D3.js to create and position circles
bubbles
        .enter()
        .append("circle")        
        .attr("cx", (d) => xScale(d.x))
        .attr("cy", (d) => yScale(d.y))
        .attr('r', radius)        
        // Merge (updating) circles and stage a transition
        .merge(bubbles)
        .transition()
        .delay(() => Math.random() * 50)
        .ease(d3.easeBounce)
        .attr("cx", (d) => xScale(d.x))
        .attr("cy", (d) => yScale(d.y))

Bouncing

What algorithms are necessary to express this data?

// D3.js bouncOut algorithm
var b1 = 4 / 11,
	    b2 = 6 / 11,
	    b3 = 8 / 11,
	    b4 = 3 / 4,
	    b5 = 9 / 11,
	    b6 = 10 / 11,
	    b7 = 15 / 16,
	    b8 = 21 / 22,
	    b9 = 63 / 64,
	    b0 = 1 / b1 / b1;
	    
export function bounceOut(t) {
    return (t = +t) < b1 ? b0 * t * t : 
        t < b3 ? b0 * (t -= b2) * t + b4 : 
        t < b6 ? b0 * (t -= b5) * t + b7 : 
        b0 * (t -= b8) * t + b9;
}
	    

All 3 stages need to be done well

Concepts → Ideas

Ideas → Data

Data → Algorithm

In order to visualize concepts, we need to isolate specific ideas, identify underlying data structures, and leverage corresponding algorithms.

Thank you

All materials available at mfviz.com/odsc-2017

@mf_viz

odsc-2017

By Michael Freeman

odsc-2017

Visualizing Statistics and ML

  • 1,793