Mutation Testing at Scale

@giorgionatili

$ whoami

Engineering leader at Amazon (Kindle rendering team)
Organizer of Droidcon Boston (and maybe Seattle)
Organizer of SwiftFest Boston and Seattle
Meetups and community enthusiast
Lead of the System Architecture & Design learning track in Amazon

@giorgionatili

Disambiguate Software Quality

@giorgionatili

Quality starts with clean code

int d; // elapsed time in days

int elapsedTimeInDays;

VS

Measuring Code Quality

Clean code is testable

Adding tests clean the code

Rigorous TDD

Tests as First-Class Citizens

@giorgionatili

Anatomy of a Good Test

Self-descriptive
Simple
SOLID

namespace stringutil {
   
   std::string tail(const std::string& word) {
      if (word.length() == 0) return "";
      return word.substr(1);
   }
}

What to test?

TEST(AString, AllTheLettersAfterTheHeadAsShlouldBeTheTail) {
   ASSERT_THAT(tail("xyz"), Eq("yz"));
}

TEST(AString, TheTailOfAnEmptyStringShouldBeEmpty) {
   ASSERT_THAT(tail(""), Eq(""));
}

TEST(AString, TheTailOfASingleCharacterStringShouldBeEmpty) {
   ASSERT_THAT(tail("X"), Eq(""));
}

3S Based Tests

A Good Test Suite

Reliable
Accurate
Fast

Potential Test Suite Quality Metrics

Line coverage
Tests reliability
Execution speed

Which are the right metrics?

Automate Quality Checks

@giorgionatili

Is beneficial

adding tests?

What is the right test coverage?

TEST(AString, AllTheLettersAfterTheHeadAsShlouldBeTheTail) {
   ASSERT_THAT(tail("xyz"), Eq("yz"));
}

TEST(AString, TheTailOfAnEmptyStringShouldBeEmpty) {
   ASSERT_THAT(tail(""), Eq(""));
}

TEST(AString, TheTailOfASingleCharacterStringShouldBeEmpty) {
   tail("X");
}

What is the test coverage?

The

Oracle

Problem

Different Approach

Learning from earlier mistakes to prevent them from happening again
Simulate earlier mistakes and see whether the resulting defects gets discovered

Fault

Based Testing

Not Always Black or White!

Fuzzing All the Things

@giorgionatili

Goals

Measure the degree to which a system, component, or function can work with an invalid or stressful input
Deviate from the normal expected input of a program to analyze the consequences

bool checkEvenOdd(int num){
    return num % 2 == 0 ? true : false;
}

Input Validation

bool isDigit(char *c_array){
    for (int k = 0; k < strlen(c_array); k++) {
        if ((int)c_array[k]<(int)'0' ||
            (int)c_array[k]>(int)'9') {
            return false;
        }
    }
    return true;
}

Yet
Pretty

Open

The Heart bleed bug

heartbleed.com

Benefits

Early bugs finding
Discover security issues
Discover fragile areas of the codebase

Approaches

Dumb fuzzers (mutation)
Intelligent fuzzers (generation)

Drawbacks

Fuzz testing alone cannot provide a complete picture of an overall security threat or bugs
Fuzz testing can detect only simple faults or threats
To perform effectively, it will require significant time
Setting a boundary value condition with random inputs is very problematic

Tools

Fuzzing Frameworks
- Boofuz
- BDFuzz
Mutational Fuzzers (alter existing data samples to create new test data)
- AFL / libFuzzer
- Radamsa

Fuzzing Doesn't Listen

Mutation Testing

@giorgionatili

Unexpected Program Mutations

What Is It?

Mutation testing evaluates the quality of existing software tests
The idea is to modify (i.e., mutate) code covered by tests in a small way and check whether the existing test set detects or rejects the change

Mutation Testing Framework

Alter source code in one very small way
Run unit tests
Record if any tests fail

Mutants

Each transformation results in a new program, called mutant, that differs from the original
Detecting and rejecting such a modification by the existing tests is denoted as killing a mutant

Killing Mutants

Metrics

Test suite effectiveness is measured by its ability to detect those mutants
The mutation score is the ratio of killed mutants to the total number of mutants

What About Test Coverage?

This is where mutation testing comes into play!

Different Mutations

Statement mutation
Value mutation
Decision mutation

// Initial code:
if(a < b) {
  c = 10;
 } else {
  c = 20;
}

// Changed code:
if(a < b) {
  d = 10;
 } else {
  d = 20;
}

Statement Mutation

// Initial code:
int mod = 1000000007;
int a = 12345678;
int b = 98765432;
int c = (a + b) % mod;

// Mutated code:
int mod = 1007;
int a = 12345678;
int b = 98765432;
int c = (a + b) % mod;

Value Mutation

// Initial code:
if(a < b) {
 c = 10;
} else {
 c = 20;
}

// Mutated code:
if(a > b) {
 c = 10;
 } else {
 c = 20; 
}

Decision Mutation

Dedicated mutation operators

int greatestCommonDenominator(int x, int y) {
    
    int tmp;
    while(y != 0) {
        tmp = x % y; // The % operator can be replaced 
        x = y;       // with +,-,*,/,%,**
        y = tmp;
    }
    return x;
}

Arithmetic Operator Replacement

int greatestCommonDenominator(int x, int y) {
    
    int tmp;
    while(y != 0) {    // The != operator can be  
        tmp = x % y;   // replaced by <,>,<=,>=,=, !=
        x = y;      
        y = tmp;
    }
    return x;
}

Relational Operator Replacement

if(a && b) 
// Potential mutations
if(a || b)
if(a & b)
if(a | b)
if(a ^ b)
if(false)
if(true)
if(a)
if(b)

Conditional Operator Replacement

Many Others

Assignment Operator Replacement
Unary Operator Insertion
Scalar Variable Replacement
Absolute Value Insertion

Mutation Testing

Identifies areas of code that are not tested properly
Identifies hidden defects that can’t be detected using other testing methods
Assesses the quality of the test cases
Assesses error propagation in the program

Mutation Testing

+

Mutation Analysis

A Lot of Data

Mutation testing based on LLVM

@giorgionatili

Supported Languages

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

Let's Focus On

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

LLVM

Available Tools

Dextool Mutate, plugin based on Dextool
MuCPP, based on source code mutants generation
Mull, an LLVM-based tool with a focus on C and C++
CCMutator, based on higher-order mutation operators implemented as opt passes on LLVM IR
Xemu, based on QEMU software emulator

What is Mull

An open-source tool for mutation testing based on LLVM
An extendable tool to analyze the effectiveness of your test suite
A command-line tool that produces a SQLite database or an HTML report of the tested program

mull-cxx -test-framework=GoogleTest -mutators=conditional \                                                                                                                                                                           
         -reporters=Elements -report-dir=./report \
         -report-name=MULL-TEST-ONE \
         -workers=4 -compdb-path compile_commands \
         -disable-cache=0 \
         ./bin/core-test

How to Run Mull

Why Mull

Efficiency in generating a mutation
Support for dry-run mode
Effective sandbox model
Support for failing fast

Supported Mutators

Mathematical
Conditional negator
Remove void function
Replace call
Scalar value replacement
Many others! :)

$ ./mull-cxx --help

Explore the Mutators

--mutators=<value>                            - Choose mutators:
    =all                                        -   default, experimental
    =arithmetic                                 -   cxx_arithmetic_add_to_sub, cxx_arithmetic_add_assign_to_sub_assign, cxx_arithmetic_post_inc_to_post_dec, cxx_arithmetic_pre_inc_to_pre_dec, cxx_arithmetic_sub_to_add, cxx_arithmetic_sub_assign_to_add_assign, cxx_arithmetic_post_dec_to_post_inc, cxx_arithmetic_pre_dec_to_pre_inc, cxx_arithmetic_mul_to_div, cxx_arithmetic_mul_assign_to_div_assign, cxx_arithmetic_div_to_mul, cxx_arithmetic_div_assign_to_mul_assign, cxx_arithmetic_rem_to_div, cxx_arithmetic_rem_assign_to_div_assign
    =bitwise                                    -   cxx_bitwise_lshift_to_rshift, cxx_bitwise_lshift_assign_to_rshift_assign, cxx_bitwise_rshift_to_lshift, cxx_bitwise_rshift_assign_to_lshift_assign, cxx_bitwise_and_to_or, cxx_bitwise_and_assign_to_or_assign, cxx_bitwise_or_to_and, cxx_bitwise_or_assign_to_and_assign, cxx_bitwise_xor_to_or, cxx_bitwise_xor_assign_to_or_assign
    =conditional                                -   and_or_replacement_mutator, negate_mutator, conditionals_boundary_mutator, negate_relational
    =conditionals_boundary_mutator              -   cxx_relational_le_to_lt, cxx_relational_lt_to_le, cxx_relational_ge_to_gt, cxx_relational_gt_to_ge
    =constant                                   -   scalar_value_mutator
    =cxx                                        -   conditionals_boundary_mutator, negate_relational, arithmetic, numbers
    =default                                    -   cxx_arithmetic_add_to_sub, negate_mutator, remove_void_function_mutator
    =experimental                               -   and_or_replacement_mutator, numbers, replace_call_mutator, scalar_value_mutator, conditionals_boundary_mutator, negate_relational, arithmetic, bitwise
    =functions                                  -   replace_call_mutator, remove_void_function_mutator
    =math                                       -   cxx_arithmetic_add_to_sub, cxx_arithmetic_sub_to_add, cxx_arithmetic_mul_to_div, cxx_arithmetic_div_to_mul
    =negate_relational                          -   cxx_relational_gt_to_le, cxx_relational_ge_to_lt, cxx_relational_lt_to_ge, cxx_relational_le_to_gt, cxx_relational_eq_to_ne, cxx_relational_ne_to_eq
    =numbers                                    -   cxx_number_init_const, cxx_number_assign_const
    =and_or_replacement_mutator                 -   Replaces && with ||, || with &&
    =cxx_arithmetic_add_assign_to_sub_assign    -   Replaces += with -=
    =cxx_arithmetic_add_to_sub                  -   Replaces + with -
    =cxx_arithmetic_div_assign_to_mul_assign    -   Replaces /= with *=
    =cxx_arithmetic_div_to_mul                  -   Replaces / with *
    =cxx_arithmetic_mul_assign_to_div_assign    -   Replaces *= with /=
    =cxx_arithmetic_mul_to_div                  -   Replaces * with /
    =cxx_arithmetic_post_dec_to_post_inc        -   Replaces x-- with x++
    =cxx_arithmetic_post_inc_to_post_dec        -   Replaces x++ with x--
    =cxx_arithmetic_pre_dec_to_pre_inc          -   Replaces --x with ++x
    =cxx_arithmetic_pre_inc_to_pre_dec          -   Replaces ++x with --x
    =cxx_arithmetic_rem_assign_to_div_assign    -   Replaces %= with /=
    =cxx_arithmetic_rem_to_div                  -   Replaces % with /
    =cxx_arithmetic_sub_assign_to_add_assign    -   Replaces -= with +=
    =cxx_arithmetic_sub_to_add                  -   Replaces - with +
    =cxx_bitwise_and_assign_to_or_assign        -   Replaces &= with |=
    =cxx_bitwise_and_to_or                      -   Replaces & with |
    =cxx_bitwise_lshift_assign_to_rshift_assign -   Replaces <<= with >>=
    =cxx_bitwise_lshift_to_rshift               -   Replaces << with >>
    =cxx_bitwise_or_assign_to_and_assign        -   Replaces |= with &=
    =cxx_bitwise_or_to_and                      -   Replaces | with &
    =cxx_bitwise_rshift_assign_to_lshift_assign -   Replaces >>= with <<=
    =cxx_bitwise_rshift_to_lshift               -   Replaces << with >>
    =cxx_bitwise_xor_assign_to_or_assign        -   Replaces ^= with |=
    =cxx_bitwise_xor_to_or                      -   Replaces ^ with |
    =cxx_number_assign_const                    -   Replaces 'a = b' with 'a = 42'
    =cxx_number_init_const                      -   Replaces 'T a = b' with 'T a = 42'
    =cxx_relational_eq_to_ne                    -   Replaces == with !=
    =cxx_relational_ge_to_gt                    -   Replaces >= with >
    =cxx_relational_ge_to_lt                    -   Replaces >= with <
    =cxx_relational_gt_to_ge                    -   Replaces > with >=
    =cxx_relational_gt_to_le                    -   Replaces > with <=
    =cxx_relational_le_to_gt                    -   Replaces <= with >
    =cxx_relational_le_to_lt                    -   Replaces <= with <
    =cxx_relational_lt_to_ge                    -   Replaces < with >=
    =cxx_relational_lt_to_le                    -   Replaces < with <=
    =cxx_relational_ne_to_eq                    -   Replaces != with ==
    =negate_mutator                             -   Negates conditionals !x to x and x to !x
    =remove_void_function_mutator               -   Removes calls to a function returning void
    =replace_call_mutator                       -   Replaces call to a function with 42
    =scalar_value_mutator                       -   Replaces zeros with 42, and non-zeros with 0

Mull's Approach

Mutations can be done either at a high level (i.e., source code) or at a lower level (i.e., bitcode)
Mull applies mutations at a lower level because:
- The same engine can be used to support any LLVM-based language
- The execution time for each mutation is lower

Under the Hood

Loads LLVM bitcode into memory
Inserts instrumentation code into each function
Compiles instrumented LLVM bitcode to machine code
Prepares the machine code for execution by the LLVM JIT engine
At an IR code level, it finds the matching tests
Runs each test using the LLVM JIT engine and collects code coverage information

Drawbacks

Compiling with bitode enabled is straightforward for a small project but painful for big projects
Mutating the bitcode generates noise because not all the mutations have a representation in code
Some mutations generate the same behavior
Mutation testing is time-consuming and requires brain power
It is not a solution for black-box testing

What?!?

Then Why?

To identify potential areas of improvement
To find bugs behind the usual human interaction
To optimize error handling strategies
To asses the quality and health status of the codebase
To estimate the remaining unknowns bugs of a program

How is it possible?

Reports and Metrics

@giorgionatili

Discover

Inspect

Improve

mull-cxx -test-framework=GoogleTest -mutators=math \
         -reporters=Elements -report-dir=./report -report-name=TEST \
         -workers=4 -compdb-path compile_cmd.json -disable-cache=0 \
         -compilation-flags="\
            -isystem /opt/clang+llvm-9.0.0/include/c++/v1 \
            -isystem /opt/clang+llvm-9.0.0/lib/clang/9.0.0/include \
            -isystem /usr/include" \
         ./bin/core-test

Generate Reports

Loading bitcode files (threads: 4): 4/4. Finished in 267ms.
Compiling instrumented code (threads: 4): 4/4. Finished in 11ms.
Loading dynamic libraries (threads: 1): 1/1. Finished in 0ms.
Searching tests (threads: 1): 1/1. Finished in 2ms.
Preparing original test run (threads: 1): 1/1. Finished in 145ms.
Running original tests (threads: 4): 30/30. Finished in 187ms.
Applying function filter: no debug info (threads: 4): 3496/3496. Finished in 14ms.
Applying function filter: file path (threads: 4): 3313/3313. Finished in 22ms.
Instruction selection (threads: 4): 3313/3313. Finished in 23ms.
Searching mutants across functions (threads: 4): 3313/3313. Finished in 369ms.
Applying filter: no debug info (threads: 4): 12355/12355. Finished in 12ms.
Applying filter: file path (threads: 4): 12355/12355. Finished in 35ms.
Applying filter: junk (threads: 4): 12355/12355. Finished in 3657ms.
Prepare mutations (threads: 1): 1/1. Finished in 0ms.
Cloning functions for mutation (threads: 4): 4/4. Finished in 769ms.
Removing original functions (threads: 4): 4/4. Finished in 194ms.
Redirect mutated functions (threads: 4): 4/4. Finished in 11ms.
Applying mutations (threads: 1): 409/409. Finished in 11ms.
Compiling original code (threads: 4): 4/4. Finished in 3625ms.
Running mutants (threads: 4): 409/409. Finished in 4586ms.

Exploring Logs

Mutation Score

(fmt / include / fmt / core.h with math mutators)

Survived Mutant

(fmt / include / fmt / core.h with math mutators)

Compiling instrumented code (threads: 4): 4/4. Finished in 4612ms.
Loading dynamic libraries (threads: 1): 1/1. Finished in 0ms.
Searching tests (threads: 1): 1/1. Finished in 1ms.
Preparing original test run (threads: 1): 1/1. Finished in 86ms.
Running original tests (threads: 4): 30/30. Finished in 203ms.
Applying function filter: no debug info (threads: 4): 3496/3496. Finished in 15ms.
Applying function filter: file path (threads: 4): 3313/3313. Finished in 23ms.
Instruction selection (threads: 4): 3313/3313. Finished in 21ms.
Searching mutants across functions (threads: 4): 3313/3313. Finished in 608ms.
Applying filter: no debug info (threads: 4): 20586/20586. Finished in 15ms.
Applying filter: file path (threads: 4): 20586/20586. Finished in 58ms.
Applying filter: junk (threads: 4): 20586/20586. Finished in 3969ms.
Prepare mutations (threads: 1): 1/1. Finished in 1ms.
Cloning functions for mutation (threads: 4): 4/4. Finished in 1040ms.
Removing original functions (threads: 4): 4/4. Finished in 204ms.
Redirect mutated functions (threads: 4): 4/4. Finished in 13ms.
Applying mutations (threads: 1): 446/446. Finished in 10ms.
Compiling original code (threads: 4): 4/4. Finished in 3808ms.
Running mutants (threads: 4): 446/446. Finished in 5704ms.

Total execution time: 21046ms

Time Constraints

Estimate Remaining Bugs

Switching Perspectives

Count how many open bugs are in your backlog
Label or categorize 30% of them
Run mull, then categorize and count the bugs
Calculate the ratio between categorized and not categorized bugs

total = 300 known bugs

labeled = 100 categorized bugs

found = 100 total bugs discovered with mutation

labeledFound = 30 existing bugs discovered with mutation

Existing Data

labeled

--------------------

unknown

labeledFound

--------------------------

notLabeledFound

=

unknown > 200 potential unknown bugs

Simple Ratio

Don't Panic

Scaling Mutation Testing

@giorgionatili

Cultural Changes

Code quality is essential to release a successful product
It's possible to objectively evaluate code quality
Automating quality checks is keen for effective teams
Tests are code and should be implemented with the same criteria

Technical Challenges

Everyone worked with obsolete compilers
Integrate the checks in your build tools
Minimize the junk in your data
Support every operating system

Compilers Outdated

Compilers can be updated
Software architecture can simplify compilers update
Updates are like a fresh start

Pipeline Integration

Build infrastructure can integrate any tool
Be thoughtful on when trigger mutation testing
Analyze your data early in the process and often
Modularize your pipeline

Dev Environment

Invest time to simplify the usage of the tools
Be inclusive, support all the dev platforms
Write exhaustive documentation

Report Analysis

Review the data in isolation and share your finding
Collect the findings and learn from them
Implement a data model to learn from errors

Get ready for a new challenge

Final Remarks

Terminology

A fault is an erroneous part of a program
A mutation is a fault that introduced in a program
A mutant is a program created from the original one with a potential failure
A variant is a program that shows a deviation at runtime from the original program
A redundant fault is a duplicated fault

Ubiquity

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

Bitcode and Bytecode

Same same but different (JVM instructions are stack-oriented, whereas LLVM bitcode is not)
LLVM bitcode is closer to machine-level code, but isn't bound by a particular architecture

Resources

Fuzzing github.com/secfigo/Awesome-Fuzzing
LLVM command line guide llvm.org/docs/CommandGuide/
Mutation Testing github.com/theofidry/awesome-mutation-testing
Rahul Gopinath papers rahul.gopinath.org/publications/

@giorgionatili

Mutation Testing at Scale

$ whoami

Disambiguate Software Quality

Quality starts with clean code

Measuring Code Quality

Clean code is testable

Adding tests clean the code

Rigorous TDD

Tests as First-Class Citizens

Anatomy of a Good Test

What to test?

3S Based Tests

A Good Test Suite

Potential Test Suite Quality Metrics

Which are the right metrics?

Automate Quality Checks

Is beneficial

adding tests?

What is the right test coverage?

What is the test coverage?

The

Oracle

Problem

Different Approach

Fault

Based Testing

Not Always Black or White!

Fuzzing All the Things

Goals

Input Validation

Yet Pretty

Open

The Heart bleed bug

Benefits

Approaches

Drawbacks

Tools

Fuzzing Doesn't Listen

Mutation Testing

Unexpected Program Mutations

What Is It?

Mutation Testing Framework

Mutants

Killing Mutants

Metrics

What About Test Coverage?

This is where mutation testing comes into play!

Different Mutations

Statement Mutation

Value Mutation

Decision Mutation

Dedicated mutation operators

Arithmetic Operator Replacement

Relational Operator Replacement

Conditional Operator Replacement

Many Others

Mutation Testing

Mutation Testing

+

Mutation Analysis

A Lot of Data

Mutation testing based on LLVM

Supported Languages

Let's Focus On

LLVM

Available Tools

What is Mull

How to Run Mull

Why Mull

Supported Mutators

Explore the Mutators

Mull's Approach

Under the Hood

Drawbacks

What?!?

Then Why?

How is it possible?

Reports and Metrics

Discover

Inspect

Yet
Pretty