Mutation Testing: How good are your unit tests, really?

Mark Robinson

How do I test the quality of a test suite?

That's QA's problem

I do TDD, I know my tests are good

Are you sure?
What about the tests you didn't write?
How do you have confidence in test refactors?

Pull requests enforce test quality

Do not always catch everything
Time intensive activity

Line
Branch
Statement
Data, Path, Modified Condition etc

We measure code coverage

None of these coverage metrics tell you which parts of your code have been tested

What code coverage does tell you

This code was executed as part of a test

This code was not executed as part of a test

Executing code and testing code is not the same

public class CalculatorTest() {

    @Test
    public void seniorEngineerSaysMustHaveTestCoverage() {
        int result = Calculator.add(5, 2);
    }

}

public class Calculator() {

    public static int add(int first, int second) {
        return first + second;
    }

}

Code coverage tells you what code has not been tested

All OK?

In 1971 Richard Lipton proposed a good solution to the problem

"Fault diagnosis of computer programs"

If we want to know if a test suite has properly checked some code...

1) Introduce a bug!

2) See if the test suite fails

Here's a bug:

(But our tests still pass!)

  public void countIfGreaterThanNine(int number) {
    if (number > 10) {
      count++;
    }
  }

We can introduce these bugs in many ways, called mutation operators

>= to <=
>= to =
== to !=
a == b to false
object.aMethod() to // object.aMethod()
object.aMethod() to object.anotherMethod()
null returns
etc etc

Applying mutation operators to code creates a mutant

We can create a lot of mutants and do it automatically

Survived

Killed

Test Suite Fails

Test Suite Passes

Killing is good!

If the test suite can find these artificial bugs, can it find real ones?

The competent programmer hypothesis

Programmers are generally competent enough to produce code which is at least almost correct

Three types of Bugs

Built the wrong thing
Built it wrong
Oops

The coupling effect

Tests that can distinguish a program differing from a correct one by only simple errors can also implicitly distinguish more complex errors

A. Offutt. 1989. The coupling effect: fact or fiction. In Proceedings of the ACM SIGSOFT '89 third symposium on Software testing, analysis, and verification (TAV3), Richard A. Kemmerer (Ed.). http://dx.doi.org/10.1145/75308.75324

Strong empirical evidence for this

So if your tests can find these mutants, they will probably find real bugs

But what about this?

public void someFunction(int i) {
    if (i <= 100) {
        throw new IllegalArgumentException();
    }
    if (i == 100) { // changed from >= to ==
        doSomething();
    }
}

It is not possible to write a test to kill this mutant

public void someFunction(int i) {
    if (i <= 100) {
        throw new IllegalArgumentException();
    }
    if (i == 100) { // changed from >= to ==
        doSomething();
    }
}

The mutant is said to be equivalent

Equivalent mutants can highlight redundancy...

public void someFunction(int i) {
    if (i <= 100) {
        throw new IllegalArgumentException();
    }
    doSomething();
}

public void someFunction(int i) {
    if (i <= 100) {
        throw new IllegalArgumentException();
    }
    if (i >= 100) {
        doSomething();
    }
}

Mutation testing highlights code which definitely is tested

Gives a very high confidence in the test suite

It can highlight redundant code

It can sometimes find bugs

It effectively tests your tests

Does it fit?

A development activity, not a QA step

...mutation testing with productive mutants does not add a significant overhead to the software development process...

Ivanković, Goran Petrović Marko, et al. "An Industrial Application of Mutation Testing: Lessons, Challenges, and Research Directions." Proceedings of the International Workshop on Mutation Analysis (Mutation). IEEE Press, Piscataway, NJ, USA. 2018.

Based on testing at Google

How do I try it out?

<plugin>
    <groupId>org.pitest</groupId>
    <artifactId>pitest-maven</artifactId>
    <version>1.4.1</version>
</plugin>

Mutation testing takes time
- Target specific parts of the code base on large projects
- On CI, probably do not run on every commit

Tips

mvn clean install org.pitest:pitest-maven:mutationCoverage

Mutation Testing: How good are your unit tests, really?

How do I test the quality of a test suite?

None of these coverage metrics tell you which parts of your code have been tested

What code coverage does tell you

Executing code and testing code is not the same

Code coverage tells you what code has not been tested

All OK?

In 1971 Richard Lipton proposed a good solution to the problem

If we want to know if a test suite has properly checked some code...

Here's a bug:

(But our tests still pass!)

We can introduce these bugs in many ways, called mutation operators

Applying mutation operators to code creates a mutant

We can create a lot of mutants and do it automatically

If the test suite can find these artificial bugs, can it find real ones?

The competent programmer hypothesis

Three types of Bugs

The coupling effect

Strong empirical evidence for this

So if your tests can find these mutants, they will probably find real bugs

But what about this?

It is not possible to write a test to kill this mutant

The mutant is said to be equivalent

Equivalent mutants can highlight redundancy...

Mutation testing highlights code which definitely is tested

Gives a very high confidence in the test suite

It can highlight redundant code

It can sometimes find bugs

It effectively tests your tests

Does it fit?

How do I try it out?

The Demo

mutationtesting

mutationtesting

Mark Robinson

Mutation Testing: How good are your unit tests, really?

How do I test the quality of a test suite?

None of these coverage metrics tell you which parts of your code have been tested

What code coverage does tell you

Executing code and testing code is not the same

Code coverage tells you what code has not been tested

All OK?

In 1971 Richard Lipton proposed a good solution to the problem

If we want to know if a test suite has properly checked some code...

Here's a bug:

(But our tests still pass!)

We can introduce these bugs in many ways, called mutation operators

Applying mutation operators to code creates a mutant

We can create a lot of mutants and do it automatically

If the test suite can find these artificial bugs, can it find real ones?

The competent programmer hypothesis

Three types of Bugs

The coupling effect

Strong empirical evidence for this

So if your tests can find these mutants, they will probably find real bugs

But what about this?

It is not possible to write a test to kill this mutant

The mutant is said to be equivalent

Equivalent mutants can highlight redundancy...

Mutation testing highlights code which definitely is tested

Gives a very high confidence in the test suite

It can highlight redundant code

It can sometimes find bugs

It effectively tests your tests

Does it fit?

How do I try it out?

The Demo

mutationtesting

More from Mark Robinson