Three principles of data science:
predictability, computability, and stability
Bin Yu and Karl Kumbier
The data science life cycle guides decision making

Stability assumptions initiate the data science life cycle
- Formulating the domain question or problem
- Collecting data
- Cleaning and preprocessing data
- Exploratory data analysis
Does an alternative "appropriate" analysis produce similar findings to the performed analysis?
The PCS framework to communicate and evaluate human judgement calls
-
Computability: Can I tractably build/train my model?
- Computational constraints
-
Predictability: Does my model capture external reality?
- Prediction & evaluation functions
- Internal v. external testing data
-
Stability: Are my results consistent with respect to "reasonable" perturbations?
- Stability target
- Data/model perturbations, generative models
PCS inference: evaluating uncertainty with justified perturbations
- Formulate problem (e.g. target of interest, perturbations)
- Screen out models with low prediction accuracy
- Generate target value perturbation distributions
- Summarize target value perturbation distribution

Feature selection in linear model setting: simulation setup

Feature selection in linear model setting: simulation setup
Feature selection in linear model setting: simulation results (n = 1000)

PCS documentation transparently reports human judgment calls

PCS documentation transparently reports human judgment calls
- Domain problem formulation (narrative)
- Data collection and storage (narrative)
- Data cleaning and visualization (narrative, code, visualizations)
- PCS inference (narrative, code, visualizations)
- Conclusions/recommendations (narrative, visualizations)
PCS documentation transparently reports human judgment calls

PCS documentation transparently reports human judgment calls

PCS discussion
By kkumbier
PCS discussion
- 80