Trey Causey
@treycausey
treycausey.com
slides.com/treycausey/pydata2015
Flickr: michaelmattiphotography
Flickr: sukiweb
http://www3.cs.stonybrook.edu/~aychakrabort/
Assumptions at every step.
treycausey.com/software_dev_skills.html
(for data scientists)
... what could go wrong?
def mean(values): return sum(values) / len(values)
import pytest
def test_mean(): assert(mean([1, 2, 3, 4, 5]) == 2)
... deterministic answers may not exist
Test properties, not specific values
Make assumptions about data shape & type
Test probabalistically
For "defensive" data analysis
"The raison d’être for engarde is the fact of life that data are messy."
Property-based testing inspired
by Haskell's Quickcheck
(and be slightly diabolical about it)
engarde: is_monotonic(), within_n_std(), within_set()
scikit-learn, SciPy, NumPy have excellent test suites
pandas, SciPy, NumPy have excellent testing methods
Numerical computing is tricky.
Try to use existing tools as much as possible.
@treycausey