Spherical Cows

in Data Science

Scott Ernst

2016

Hypothetical Modeling & Simulation

Spherical Cows?

A Dairy Farmer

wants to increase milk production

He Asks 3 Experts for Advice

Physicist

Psychologist

Engineer

Engineer

"You need to increase milking efficiency.
Install larger tubes, retrofited pumps
 and flow-optimized piping."

Psychologist

"Your cows need to be happier during milking.
Paint the stalls green."

Physicist

"If we assume a spherical cow..."

All Models Are Wrong

Some Models Are Useful

George Box

/All_models_are_wrong

Cunningly chosen

parsimonious models

George Box

/All_models_are_wrong

often do provide 

remarkably useful approximations

Remember Fourier?

y(x) = A \cdot sin(x) + B \cdot sin(2x) + C \cdot sin(3x) + ...
y(x)=Asin(x)+Bsin(2x)+Csin(3x)+...y(x) = A \cdot sin(x) + B \cdot sin(2x) + C \cdot sin(3x) + ...

Example: Dinosaurs

Example: Dinosaurs

Meaningful Differences?

Invalidated Previous Work 

Is Economics Research Replicable? 60 Published Papers from 13 Journals Say ”Usually Not”

Finance and Economics Discussion Series
Divisions of Research & Statistics and Monetary Affairs
Federal Reserve Board, Washington, D.C.

http://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf

Over half of psychology studies fail reproducibility test

The Reproducibility Project

& The Nature Publishing Group

https://osf.io/ezcuj/

Researchers often base their study design on implicit knowledge, without necessarily intending to ... This implicit process can push the results in one direction or another.

John Ioannidis, a biologist at Harvard in an investigation of the validity of published results 

http://newmr.org/blog/most-published-research-findings-are-probably-false-john-ioannidis/

"most published research findings are probably false"

According to some estimates, three-quarters of published scientific papers in the field of machine learning are bunk...

Sandy Pentland, a computer scientist at the Massachusetts Institute of Technology.

http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble

Data Science

Preparation

1

Processing

2

Presentation

3

Failures Happen When...

Preparation

1

Processing

2

Presentation

3

  • Faulty collection

  • Unsound design

Preparation

1

Processing

2

Presentation

3

  • Misuse of tools/algorithms

  • Incorrectly groomed data

Failures Happen When...

Preparation

1

Processing

2

Presentation

3

  • Poor selection emphasis

Failures Happen When...

  • Bad layout and/or format

Overfitting

Karen Rubin
Building a Quantitative Trading Strategy To Beat the S&P500
PyCon 2016: https://www.youtube.com/watch?v=ll6Tq-wTXXw

Spherical Cows

To The Rescue!

Hypothetical Modeling
& Simulation

OVERLOAD!

Don't Panic

Structure of a Simulation

1. Input

2. Evolve

3. Output

Settings / Configuration

Initial / Boundary Conditions

Computation

Modifies State Data

Calculations / Analysis

Save State Data

Example

Airplane Boarding

/File:Lufthansa_737_interior.jpg

Boeing 737

/File:American_Airlines.Boeing_737-800.LAX.2007.jpg

Boeing 737

Seating Chart

4 Rows

23 Rows

154 Passengers

Code & Results...

Time for the Details

Spherical Cows in Data Science

By sernst

Spherical Cows in Data Science

  • 1,810