Failing Faster

Christopher Gandrud

IQSS Tech Talk

22 March 2017

Caveat

Talk is most applicable to R statistical software development

But, We Want You

Part of a larger IQSS effort ("Social Science Software Toolkit")

We want your contributions especially for other languages.

What is software development?

(often) failure management

Complexity

Software involves many interconnected parts and user behaviours.

Difficult to anticipate how a change will affect software behaviour.

So your software will fail to meet expectations.

When do you want to fail?

As soon as possible

How can we fail faster?

Test-Driven Development

We all Test

But (frequently) not:

systematically
automatically
regularly

Testing Basics

What is a test?

Comparison of an output to an expectation

Fail if output does not meet expectation

Expectation -> Collaboration

Spelling out your expectations so that they can be automatically tested, makes your expectations clear to collaborators.

-- Lets you all know when collaborators have broken something --

Enforce API & Backwards Compatibility

Tests help ensure that software actually follows its stated API. Avoids API Drift

By enforcing an API across software updates, tests enhance backwards compatibility

Types of Tests

Unit tests: test individual units (lowest level) of source code
- In R: usually individual functions or classes

Integration tests: test units in combination
- In R: functions that work in combination in an expected workflow

Other Types

Proliferation of testing types, e.g. V-Model:

See Stefan Feuerriegel (2016)

Require tests: test if your software is able to do some required task

Failure tests: test if your software fails to do tasks it is not "required" to do

Types of Tests

Aside: Limitations

Software necessarily has limitations.

Let your users know when they have reached these limitations, why, and suggest what to do about it.

Let them know as soon as possible.

Fail Fast

# Initialize Zelig5 least squares object
z5 <- zls$new()

# Estimate ls model
z5$zelig(Fertility ~ Education, data = swiss)

# Simulate quantities of interest
z5$sim()

# Plot quantities of interest
z5$graph()

> Invalid call

Graph Returns:

Warning message:
In par(old.par) : calling par(new=TRUE) with no plot

Missing setx()

Fail Fast

# Initialize Zelig5 least squares object
z5 <- zls$new()

# Estimate ls model
z5$zelig(Fertility ~ Education, data = swiss)

# Simulate quantities of interest
z5$sim()

> Invalid call

Now after sim:

Warning message:
No simulations drawn, likely due to insufficient inputs.

Be Informative

Not informative:

Better:

Error in models4[[model]]: invalid subscript type 'symbol'.

Estimation model type was not specified.

Select estimation model type with the "model" argument.

z.out <- zelig(y ~ x1 + x2, data = example)

> Invalid call

No estimation model

type specified

Testing surface

Definition:

The sum of the different software behaviours that are tested.

Aim:

Maximise the testing surface.

Trade off

But, there is a trade off between maximising the testing surface and reasonable test run time

The longer your tests take to run, the less likely you are to run them.

Obligatory xkcd comic

This is not CRAN

Coverage

Definition:

The proportion of source code that is run during testing.

Why?:

Proxy for the testing surface.

Proxy

Test coverage does not mean that your test is an accurate proxy.

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

# Test
testthat::expect_error(my_mean(1:10), NA)

100% coverage, But Poor Proxy

Effective Tests

Have well designed expectations

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

# Test
testthat::expect_equal(my_mean(1:10), 5.5)

well designed expectations

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

# Test
testthat::expect_equal(my_mean(1:10), 5.5)

well designed expectations

Error: my_mean(1:10) not equal to 5.5.
Lengths differ: 10 vs 1

Error Message

(R) Testing Tools

Dynamic Documentation

Executable code can be included as documentation examples with Roxygen2.

It is executed as part of CRAN check.

#' Find the mean of a numeric vector
#' 
#' @param x a numeric vector
#' 
#' @examples
#' my_mean(1:10) 

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

Dynamic Documentation

Executable code can be included as documentation examples with Roxygen2.

It is executed as part of CRAN check.

#' Find the mean of a numeric vector
#' 
#' @param x a numeric vector
#' 
#' @examples
#' my_mean(1:10) 

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

Keeps documentation and function source code in the same place

But, the implicit expectation is that given the numeric vector 1:10, the function will not return an error.

Expectations

Better than nothing,

But not great

testthat

The testthat package allows you to specify a broader range of expectations including:

expect_equal
expect_equivalent
expect_match
expect_true
expect_false
expect_error
expect_warning

and more.

library(testthat)

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

test_that("Correct mean of a numeric vector is returned", {
    expect_equal(my_mean(1:10), 5.5)
})

Require Testing (Example Replay)

Error: my_mean(1:10) not equal to 5.5.
Lengths differ: 10 vs 1

Error Message

# Create function to find mean
my_mean <- function(x) {
    if (!is.numeric(x)) stop('x must be numeric.', call. = FALSE) 
    mean <- sum(x) / length(x)
    return(mean)
}

# Test
test_that('my_mean failure test when supplied character string', {
    expect_error(my_mean('A'), 'x must be numeric.')
}

Failure testing

Hard to Test

Stubs and Mocks can sometimes be used

see Feuerriegel (2016)

Set up Test Suite

devtools::use_testthat()

Creates:

tests/testthat.R:
- test suite set up, including what packages to load
tests/testthat:
- R files with tests
testthat in DESCRIPTION Suggests

In package directory:

Run tests locally with:

Running Tests Locally

testthat::test_package()

# or

devtools::test()

# or

devtools::check(args = c('--as-cran'))

or in RStudio

Original aim: avoid "integration hell" by merging changes into a master as often as possible

Also refers to build servers that build the software and (can) run included tests.
- Useful for testing remotely on "clean" systems
- Can test on multiple operating systems

Continuous Integration

Windows

Linux/macos

SetUp Steps

Have your package source code on GitHub
Include .travis.yml and appveyor.yml in your project's root directory
1. Can automate with devtools: use_travis() and use_appveyor()
Login to the services and tell them to watch your package's GitHub repo. E.g. in TravisCI:

SetUp Steps

Now every time you push changes to GitHub:

Dynamically Report CI results with README badges

Code Coverage

Once a package uses testthat you can find and explore code coverage with the covr package:

library(covr)

cov <- package_coverage()

shine(cov)

Code Coverage

Returns a Shiny App to explore code coverage

library(covr)

cov <- package_coverage()

shine(cov)

+

codecov.io

Save code coverage results from Travis build and display/track results with codecov.io

+

codecov.io

Setup

1. Add to .travis.yml:

r_github_packages:
    - jimhester/covr

after_success:
    - Rscript -e 'covr::codecov()'

2. Login to codecov with GitHub username and add package repo:

Whenever travis build, CodeCov updates

https://codecov.io/gh/IQSS/Zelig

Dynamically Report CodeCov results with README badges

Workflow

Zelig Test-Driven Workflow

Want: Bugfix/new feature

Always start at master

Want: Bugfix/new feature

Create feature/hotfix branch

Want: Bugfix/new feature

Create test

Want: Bugfix/new feature

Create feature/fix

Want: Bugfix/new feature

Run test locally

Want: Bugfix/new feature

Did it pass?

Want: Bugfix/new feature

If yes, merge into master

Want: Bugfix/new feature

Push master to GitHub to initiate CI

Want: Bugfix/new feature

Passes + accumulated changes

Future: IQSSdevtools

IQSSdevtools (R) Report Card

IQSSdevtools::check_best_practices()

Documentation:
  readme: yes
  news: yes
  bugreports: yes
  vignettes: yes
  pkgdown_website: no
License:
  gpl3_license: yes
Version_Control:
  git: yes
  github: yes
Testing:
  uses_testthat: yes
  uses_travis: yes
  uses_appveyor: yes
  build_check:
    build_check_completed: yes
    no_check_warnings: yes
    no_check_errors: yes
    no_check_notes: yes
  test_coverage: 86
Background:
  package_name: Zelig
  package_version: 5.0-18
  package_commit_sha: d5a8dcf0c9655ea187d2533fa977919be90612f6
  iqssdevtools_version: 0.0.0.9000
  check_time: 2017-03-17 16:16:55

IQSSdevtools (R) Report Card

IQSSdevtools::check_best_practices()

Documentation:
  readme: yes
  news: yes
  bugreports: yes
  vignettes: yes
  pkgdown_website: no
License:
  gpl3_license: yes
Version_Control:
  git: yes
  github: yes
Testing:
  uses_testthat: yes
  uses_travis: yes
  uses_appveyor: yes
  build_check:
    build_check_completed: yes
    no_check_warnings: yes
    no_check_errors: yes
    no_check_notes: yes
  test_coverage: 86
Background:
  package_name: Zelig
  package_version: 5.0-18
  package_commit_sha: d5a8dcf0c9655ea187d2533fa977919be90612f6
  iqssdevtools_version: 0.0.0.9000
  check_time: 2017-03-17 16:16:55

Suggestions?

Variations for Other Languages?