Testing for data scientists

Trey Causey







  • Day
    • Data scientist at Dato
  • Nights/weekends
    • Sports analytics fan & consultant
    • Blog at thespread.us


  • Why testing for data scientists?
  • Kinds of tests
  • Quick review of unit testing
  • Unique problems for data scientists
  • Promising Python packages

Life of a data scientist

Flickr: michaelmattiphotography

Flickr: sukiweb

Work looks like this


Data is messy...

... so is your code


Assumptions at every step.

Software development skills for data scientists

  • Writing modular, reusable code
  • Documentation
  • Version control
  • Testing
  • Logging

What is testing?

Ned Batchelder
"Getting Started Testing"

PyCon 2014

When to test?

  • Extract
  • Transform
  • Model -- the Wild West
  • Load
  • Repeat

Testing helps you:

  • Find bugs
  • Check your assumptions
    (by making them explicit)
  • Write simpler code
  • Work with others
  • Be surprised less

Most relevant kinds of tests

(for data scientists)

  • Unit tests
  • Regression (why?!?) tests
  • Integration tests

Unit tests

  • Test one "unit" of code
  • No dependencies on 
    other code you've written
  • Don't require access
    to databases, APIs, etc.

The standard Python unit testing landscape

  • py.test
  • unittest, unittest2
  • nose
  • fixtures
  • mock

A simple function...

... what could go wrong?

def mean(values):
    return sum(values) / len(values)

A simple test

import pytest


def test_mean():
    assert(mean([1, 2, 3, 4, 5]) == 2)


Why py.test?

  • Less boilerplate
  • Fewer classes
  • Gets you testing quickly
  • Easy to interpret errors

When & what to test?

  • When you change code, add a test.
  • Test the outcome, not the implementation
  • When you find a bug, add a test.
  • Help identify complexity
  • Don't test code that's already tested!

Test-driven development?

Write failing tests first,
fix code until tests pass.

Testing for data science can be a little different...


... deterministic answers may not exist


Laziness (not the good kind)

  • Extract data once, build many models
  • Data is representative of the future
  • Using specific samples to spot-check

Better ways to test


Test properties, not specific values


Make assumptions about data shape & type


Test probabalistically


Some promising


Feature Forge


Tom Augspurger


For "defensive" data analysis


"The raison d’être for engarde is the fact of life that data are messy."

Great for ETL on changing data

  • Built with pandas
  • Very lightweight
  • Use for functions that
    accept & return a Dataframe
  • Just add decorators!
    (or use DataFrame.pipe)



 David R. MacIver


Property-based testing inspired
by Haskell's Quickcheck

How it works

Generate data randomly
according to some specs


(and be slightly diabolical about it)


  • Very flexible
  • Plugin support
  • Finds corner cases fast

  • Ideal for code that will be
    accepting input "from the wild"

Other features

  • Works with existing testing frameworks
  • Works with Faker
  • Has a datetime plugin
  • Tests many Python WTFs
  • Experimental NumPy support

Feature Forge

Daniel Moisset
Javier Mansilla
Rafael Carrascosa

Declare feature schemas & test them

Other features


  • Designed with ML and sklearn in mind
  • Can build feature creation & testing pipelines

  • Supports experiments for testing variety of
    features and evaluating many models,
    storing results to a database

Probabilistic testing

Model testing is the wild west


engarde: is_monotonic(), within_n_std(), within_set()

Don't test algorithms you haven't personally implemented


scikit-learn, SciPy, NumPy have excellent test suites

Testing algorithms you have implemented

pandas, SciPy, NumPy have excellent testing methods


Numerical computing is tricky.


Try to use existing tools as much as possible.

What I didn't cover


  • Testing MCMC code
    • Follow @tdhopper
  • Continuous integration
  • See @digitallogic's answer here:
  • http://bit.ly/testing_algorithms