Failing Faster

Christopher Gandrud

IQSS Tech Talk

22 March 2017

Caveat

Talk is most applicable to R statistical software development

But, We Want You

Part of a larger IQSS effort ("Social Science Software Toolkit")

 

We want your contributions especially for other languages.

What is software development?

(often) failure management

Complexity

Software involves many interconnected parts and user behaviours.

 

Difficult to anticipate how a change will affect software behaviour.

 

So your software will fail to meet expectations.

When do you want to fail?

As soon as possible

How can we fail faster?

Test-Driven Development

We all Test

But (frequently) not:

  • systematically
  • automatically
  • regularly

Testing Basics

What is a test?

Comparison of an output to an expectation

Fail if output does not meet  expectation

Expectation -> Collaboration

Spelling out your expectations so that they can be automatically tested, makes your expectations clear to collaborators.

 

-- Lets you all know when collaborators have broken something --

Enforce API & Backwards Compatibility

Tests help ensure that software actually follows its stated API. Avoids API Drift

 

By enforcing an API across software updates, tests enhance backwards compatibility

Types of Tests

  • Unit tests: test individual units (lowest level) of source code
    • ​In R: usually individual functions or classes

 

  • Integration tests: test units in combination
    • ​In R: functions that work in combination in an expected workflow

Other Types

Proliferation of testing types, e.g. V-Model:

  • Require tests: test if your software is able to do some required task

 

  • Failure tests:  test if your software fails to do tasks it is not "required" to do

Types of Tests

Aside: Limitations

Software necessarily has limitations.

 

Let your users know when they have reached these limitations, why, and suggest what to do about it.

 

Let them know as soon as possible.

Fail Fast

# Initialize Zelig5 least squares object
z5 <- zls$new()

# Estimate ls model
z5$zelig(Fertility ~ Education, data = swiss)

# Simulate quantities of interest
z5$sim()

# Plot quantities of interest
z5$graph()

> Invalid call

Graph Returns:

Warning message:
In par(old.par) : calling par(new=TRUE) with no plot

Missing setx()

Fail Fast

# Initialize Zelig5 least squares object
z5 <- zls$new()

# Estimate ls model
z5$zelig(Fertility ~ Education, data = swiss)

# Simulate quantities of interest
z5$sim()

> Invalid call

Now after sim:

Warning message:
No simulations drawn, likely due to insufficient inputs.

Be Informative

Not informative:

Better:

Error in models4[[model]]: invalid subscript type 'symbol'.
Estimation model type was not specified.

Select estimation model type with the "model" argument.
z.out <- zelig(y ~ x1 + x2, data = example)

> Invalid call

No estimation model

type specified

Testing surface

Definition:

The sum of the different software behaviours that are tested.

 

Aim:

​Maximise the testing surface.

Trade off

But, there is a trade off between maximising the testing surface and reasonable test run time

 

The longer your tests take to run, the less likely you are to run them.

Obligatory xkcd comic

This is not CRAN

Coverage

Definition:

The proportion of source code that is run during testing.

 

Why?:

Proxy for the testing surface. 

Proxy

Test coverage does not mean that your test is an accurate proxy.

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

# Test
testthat::expect_error(my_mean(1:10), NA)

100% coverage, But Poor Proxy

Effective Tests

Have well designed expectations

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

# Test
testthat::expect_equal(my_mean(1:10), 5.5)

well designed expectations

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

# Test
testthat::expect_equal(my_mean(1:10), 5.5)

well designed expectations

Error: my_mean(1:10) not equal to 5.5.
Lengths differ: 10 vs 1

Error Message

(R) Testing Tools

Dynamic Documentation

Executable code can be included as documentation examples with Roxygen2​.

It is executed as part of CRAN check.

#' Find the mean of a numeric vector
#' 
#' @param x a numeric vector
#' 
#' @examples
#' my_mean(1:10) 

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

Dynamic Documentation

Executable code can be included as documentation examples with Roxygen2​.

It is executed as part of CRAN check.

#' Find the mean of a numeric vector
#' 
#' @param x a numeric vector
#' 
#' @examples
#' my_mean(1:10) 

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

Keeps documentation and function source code in the same place

But, the implicit expectation is that given the numeric vector 1:10, the function will not return an error.

Expectations

Expectations

Better than nothing, 

But not great

testthat

The testthat package allows you to specify a broader range of expectations including:

expect_equal
expect_equivalent
expect_match
expect_true
expect_false
expect_error
expect_warning

and more.

library(testthat)

# Create function to find mean
my_mean <- function(x) {
    mean <- x
    return(mean)
}

test_that("Correct mean of a numeric vector is returned", {
    expect_equal(my_mean(1:10), 5.5)
})

Require Testing (Example Replay)

Error: my_mean(1:10) not equal to 5.5.
Lengths differ: 10 vs 1

Error Message

# Create function to find mean
my_mean <- function(x) {
    if (!is.numeric(x)) stop('x must be numeric.', call. = FALSE) 
    mean <- sum(x) / length(x)
    return(mean)
}

# Test
test_that('my_mean failure test when supplied character string', {
    expect_error(my_mean('A'), 'x must be numeric.')
}

Failure testing

Hard to Test

Stubs and Mocks can sometimes be used

Set up Test Suite

devtools::use_testthat()

Creates:

  • tests/testthat.R:
    • test suite set up, including what packages to load 
  • tests/testthat:
    • R files with tests
  • testthat in DESCRIPTION Suggests

In package directory:

 Run tests locally with:

Running Tests Locally

testthat::test_package()

# or

devtools::test()

# or

devtools::check(args = c('--as-cran'))

or in RStudio

  • Original aim: avoid "integration hell" by merging changes into a master as often as possible

 

  • Also refers to build servers that build the software and (can) run included tests.
    • Useful for testing remotely on "clean" systems
    • Can test on multiple operating systems

Continuous Integration

Windows

Linux/macos

SetUp Steps

  1. Have your package source code on GitHub
  2. Include .travis.yml and appveyor.yml in your project's root directory
    1. Can automate with devtools: use_travis() and use_appveyor()
  3. Login to the services and tell them to watch your package's GitHub repo. E.g. in TravisCI:

SetUp Steps

Now every time you push changes to GitHub:

Dynamically Report CI results with README badges 

Code Coverage

Once a package uses testthat you can find and explore code coverage with the covr package: 

library(covr)

cov <- package_coverage()

shine(cov)

Code Coverage

Returns a Shiny App to explore code coverage

library(covr)

cov <- package_coverage()

shine(cov)

+

codecov.io

Save code coverage results from Travis build and display/track results with codecov.io

+

codecov.io

Setup

1. Add to .travis.yml:

r_github_packages:
    - jimhester/covr

after_success:
    - Rscript -e 'covr::codecov()'

2. Login to codecov with GitHub username and add package repo: 

Whenever travis build, CodeCov updates

Dynamically Report CodeCov results with README badges 

Workflow

Zelig Test-Driven Workflow

Want: Bugfix/new feature

Always start at master

Want: Bugfix/new feature

Create feature/hotfix branch

Want: Bugfix/new feature

Create test

Want: Bugfix/new feature

Create feature/fix

Want: Bugfix/new feature

Run test locally

Want: Bugfix/new feature

Did it pass?

Want: Bugfix/new feature

If yes, merge into master

Want: Bugfix/new feature

Push master to GitHub to initiate CI

Want: Bugfix/new feature

Passes + accumulated changes

Future: IQSSdevtools

IQSSdevtools (R) Report Card

IQSSdevtools::check_best_practices()
Documentation:
  readme: yes
  news: yes
  bugreports: yes
  vignettes: yes
  pkgdown_website: no
License:
  gpl3_license: yes
Version_Control:
  git: yes
  github: yes
Testing:
  uses_testthat: yes
  uses_travis: yes
  uses_appveyor: yes
  build_check:
    build_check_completed: yes
    no_check_warnings: yes
    no_check_errors: yes
    no_check_notes: yes
  test_coverage: 86
Background:
  package_name: Zelig
  package_version: 5.0-18
  package_commit_sha: d5a8dcf0c9655ea187d2533fa977919be90612f6
  iqssdevtools_version: 0.0.0.9000
  check_time: 2017-03-17 16:16:55

IQSSdevtools (R) Report Card

IQSSdevtools::check_best_practices()
Documentation:
  readme: yes
  news: yes
  bugreports: yes
  vignettes: yes
  pkgdown_website: no
License:
  gpl3_license: yes
Version_Control:
  git: yes
  github: yes
Testing:
  uses_testthat: yes
  uses_travis: yes
  uses_appveyor: yes
  build_check:
    build_check_completed: yes
    no_check_warnings: yes
    no_check_errors: yes
    no_check_notes: yes
  test_coverage: 86
Background:
  package_name: Zelig
  package_version: 5.0-18
  package_commit_sha: d5a8dcf0c9655ea187d2533fa977919be90612f6
  iqssdevtools_version: 0.0.0.9000
  check_time: 2017-03-17 16:16:55

Suggestions?

 

Variations for Other Languages?

Failing Faster

By Christopher Gandrud

Failing Faster

  • 1,848