Mutation Testing at Scale

@giorgionatili

$ whoami

  • Engineering leader at Amazon (Kindle rendering team)
  • Organizer of Droidcon Boston (and maybe Seattle)
  • Organizer of SwiftFest Boston and Seattle
  • Meetups and community enthusiast
  • Lead of the System Architecture & Design learning track in Amazon

@giorgionatili

Disambiguate Software Quality

@giorgionatili

Quality starts with clean code

int d; // elapsed time in days
int elapsedTimeInDays;

VS

Measuring Code Quality

Clean code is testable

Adding tests clean the code

Rigorous TDD

Tests as First-Class Citizens

@giorgionatili

Anatomy of a Good Test

  • Self-descriptive
  • Simple
  • SOLID
namespace stringutil {
   
   std::string tail(const std::string& word) {
      if (word.length() == 0) return "";
      return word.substr(1);
   }
}

What to test?

TEST(AString, AllTheLettersAfterTheHeadAsShlouldBeTheTail) {
   ASSERT_THAT(tail("xyz"), Eq("yz"));
}

TEST(AString, TheTailOfAnEmptyStringShouldBeEmpty) {
   ASSERT_THAT(tail(""), Eq(""));
}

TEST(AString, TheTailOfASingleCharacterStringShouldBeEmpty) {
   ASSERT_THAT(tail("X"), Eq(""));
}

3S Based Tests

A Good Test Suite

  • Reliable
  • Accurate
  • Fast

Potential Test Suite Quality Metrics

  • Line coverage
  • Tests reliability
  • Execution speed

Which are the right metrics?

Automate Quality Checks

@giorgionatili

Is beneficial

adding tests?

What is the right test coverage?

TEST(AString, AllTheLettersAfterTheHeadAsShlouldBeTheTail) {
   ASSERT_THAT(tail("xyz"), Eq("yz"));
}

TEST(AString, TheTailOfAnEmptyStringShouldBeEmpty) {
   ASSERT_THAT(tail(""), Eq(""));
}

TEST(AString, TheTailOfASingleCharacterStringShouldBeEmpty) {
   tail("X");
}

What is the test coverage?

The

Oracle

Problem

Different Approach

  • Learning from earlier mistakes to prevent them from happening again
  • Simulate earlier mistakes and see whether the resulting defects gets discovered

Fault

Based Testing

Not Always Black or White!

Fuzzing All the Things

@giorgionatili

Goals

  • Measure the degree to which a system, component, or function can work with an invalid or stressful input
  • Deviate from the normal expected input of a program to analyze the consequences
bool checkEvenOdd(int num){
    return num % 2 == 0 ? true : false;
}

Input Validation

bool isDigit(char *c_array){
    for (int k = 0; k < strlen(c_array); k++) {
        if ((int)c_array[k]<(int)'0' ||
            (int)c_array[k]>(int)'9') {
            return false;
        }
    }
    return true;
}

Yet
Pretty

Open

The Heart bleed bug

Benefits

  • Early bugs finding
  • Discover security issues
  • Discover fragile areas of the codebase

Approaches

  • Dumb fuzzers (mutation)
  • Intelligent fuzzers (generation)

Drawbacks

  • Fuzz testing alone cannot provide a complete picture of an overall security threat or bugs
  • Fuzz testing can detect only simple faults or threats
  • To perform effectively, it will require significant time
  • Setting a boundary value condition with random inputs is very problematic

Tools

  • Fuzzing Frameworks
    • Boofuz
    • BDFuzz
  • Mutational Fuzzers (alter existing data samples to create new test data)
    • AFL / libFuzzer
    • Radamsa
 

Fuzzing Doesn't Listen

Mutation Testing

@giorgionatili

Unexpected Program Mutations

What Is It?

  • Mutation testing evaluates the quality of existing software tests
  • The idea is to modify (i.e., mutate) code covered by tests in a small way and check whether the existing test set detects or rejects the change

Mutation Testing Framework

  • Alter source code in one very small way
  • Run unit tests
  • Record if any tests fail

Mutants

  • Each transformation results in a new program, called mutant, that differs from the original

  • Detecting and rejecting such a modification by the existing tests is denoted as killing a mutant

Killing Mutants

Metrics

  • Test suite effectiveness is measured by its ability to detect those mutants

  • The mutation score is the ratio of killed mutants to the total number of mutants 

What About Test Coverage?

This is where mutation testing comes into play!

Different Mutations

  • Statement mutation
  • Value mutation
  • Decision mutation
// Initial code:
if(a < b) {
  c = 10;
 } else {
  c = 20;
}

// Changed code:
if(a < b) {
  d = 10;
 } else {
  d = 20;
}

Statement Mutation

// Initial code:
int mod = 1000000007;
int a = 12345678;
int b = 98765432;
int c = (a + b) % mod;

// Mutated code:
int mod = 1007;
int a = 12345678;
int b = 98765432;
int c = (a + b) % mod;

Value Mutation

// Initial code:
if(a < b) {
 c = 10;
} else {
 c = 20;
}

// Mutated code:
if(a > b) {
 c = 10;
 } else {
 c = 20; 
}

Decision Mutation

Dedicated mutation operators

int greatestCommonDenominator(int x, int y) {
    
    int tmp;
    while(y != 0) {
        tmp = x % y; // The % operator can be replaced 
        x = y;       // with +,-,*,/,%,**
        y = tmp;
    }
    return x;
}

Arithmetic Operator Replacement

int greatestCommonDenominator(int x, int y) {
    
    int tmp;
    while(y != 0) {    // The != operator can be  
        tmp = x % y;   // replaced by <,>,<=,>=,=, !=
        x = y;      
        y = tmp;
    }
    return x;
}

Relational Operator Replacement

if(a && b) 
// Potential mutations
if(a || b)
if(a & b)
if(a | b)
if(a ^ b)
if(false)
if(true)
if(a)
if(b) 

Conditional Operator Replacement

Many Others

  • Assignment Operator Replacement
  • Unary Operator Insertion
  • Scalar Variable Replacement
  • Absolute Value Insertion

Mutation Testing 

  • Identifies areas of code that are not tested properly
  • Identifies hidden defects that can’t be detected using other testing methods
  • Assesses the quality of the test cases
  • Assesses error propagation in the program

Mutation Testing

+

Mutation Analysis

A Lot of Data

Mutation testing based on LLVM

@giorgionatili

Supported Languages

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

Let's Focus On

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

LLVM

Available Tools

  • Dextool Mutate, plugin based on Dextool
  • MuCPP, based on source code mutants generation
  • Mull, an LLVM-based tool with a focus on C and C++
  • CCMutator, based on higher-order mutation operators implemented as opt passes on LLVM IR
  • Xemu, based on QEMU software emulator

What is Mull

  • An open-source tool for mutation testing based on LLVM 
  • An extendable tool to analyze the effectiveness of your test suite
  • A command-line tool that produces a SQLite database or an HTML report of the tested program
mull-cxx -test-framework=GoogleTest -mutators=conditional \                                                                                                                                                                           
         -reporters=Elements -report-dir=./report \
         -report-name=MULL-TEST-ONE \
         -workers=4 -compdb-path compile_commands \
         -disable-cache=0 \
         ./bin/core-test

How to Run Mull

Why Mull

  • Efficiency in generating a mutation
  • Support for dry-run mode
  • Effective sandbox model
  • Support for failing fast

Supported Mutators

  • Mathematical 
  • Conditional negator 
  • Remove void function
  • Replace call
  • Scalar value replacement
  • Many others! :)
$ ./mull-cxx --help

Explore the Mutators

--mutators=<value>                            - Choose mutators:
    =all                                        -   default, experimental
    =arithmetic                                 -   cxx_arithmetic_add_to_sub, cxx_arithmetic_add_assign_to_sub_assign, cxx_arithmetic_post_inc_to_post_dec, cxx_arithmetic_pre_inc_to_pre_dec, cxx_arithmetic_sub_to_add, cxx_arithmetic_sub_assign_to_add_assign, cxx_arithmetic_post_dec_to_post_inc, cxx_arithmetic_pre_dec_to_pre_inc, cxx_arithmetic_mul_to_div, cxx_arithmetic_mul_assign_to_div_assign, cxx_arithmetic_div_to_mul, cxx_arithmetic_div_assign_to_mul_assign, cxx_arithmetic_rem_to_div, cxx_arithmetic_rem_assign_to_div_assign
    =bitwise                                    -   cxx_bitwise_lshift_to_rshift, cxx_bitwise_lshift_assign_to_rshift_assign, cxx_bitwise_rshift_to_lshift, cxx_bitwise_rshift_assign_to_lshift_assign, cxx_bitwise_and_to_or, cxx_bitwise_and_assign_to_or_assign, cxx_bitwise_or_to_and, cxx_bitwise_or_assign_to_and_assign, cxx_bitwise_xor_to_or, cxx_bitwise_xor_assign_to_or_assign
    =conditional                                -   and_or_replacement_mutator, negate_mutator, conditionals_boundary_mutator, negate_relational
    =conditionals_boundary_mutator              -   cxx_relational_le_to_lt, cxx_relational_lt_to_le, cxx_relational_ge_to_gt, cxx_relational_gt_to_ge
    =constant                                   -   scalar_value_mutator
    =cxx                                        -   conditionals_boundary_mutator, negate_relational, arithmetic, numbers
    =default                                    -   cxx_arithmetic_add_to_sub, negate_mutator, remove_void_function_mutator
    =experimental                               -   and_or_replacement_mutator, numbers, replace_call_mutator, scalar_value_mutator, conditionals_boundary_mutator, negate_relational, arithmetic, bitwise
    =functions                                  -   replace_call_mutator, remove_void_function_mutator
    =math                                       -   cxx_arithmetic_add_to_sub, cxx_arithmetic_sub_to_add, cxx_arithmetic_mul_to_div, cxx_arithmetic_div_to_mul
    =negate_relational                          -   cxx_relational_gt_to_le, cxx_relational_ge_to_lt, cxx_relational_lt_to_ge, cxx_relational_le_to_gt, cxx_relational_eq_to_ne, cxx_relational_ne_to_eq
    =numbers                                    -   cxx_number_init_const, cxx_number_assign_const
    =and_or_replacement_mutator                 -   Replaces && with ||, || with &&
    =cxx_arithmetic_add_assign_to_sub_assign    -   Replaces += with -=
    =cxx_arithmetic_add_to_sub                  -   Replaces + with -
    =cxx_arithmetic_div_assign_to_mul_assign    -   Replaces /= with *=
    =cxx_arithmetic_div_to_mul                  -   Replaces / with *
    =cxx_arithmetic_mul_assign_to_div_assign    -   Replaces *= with /=
    =cxx_arithmetic_mul_to_div                  -   Replaces * with /
    =cxx_arithmetic_post_dec_to_post_inc        -   Replaces x-- with x++
    =cxx_arithmetic_post_inc_to_post_dec        -   Replaces x++ with x--
    =cxx_arithmetic_pre_dec_to_pre_inc          -   Replaces --x with ++x
    =cxx_arithmetic_pre_inc_to_pre_dec          -   Replaces ++x with --x
    =cxx_arithmetic_rem_assign_to_div_assign    -   Replaces %= with /=
    =cxx_arithmetic_rem_to_div                  -   Replaces % with /
    =cxx_arithmetic_sub_assign_to_add_assign    -   Replaces -= with +=
    =cxx_arithmetic_sub_to_add                  -   Replaces - with +
    =cxx_bitwise_and_assign_to_or_assign        -   Replaces &= with |=
    =cxx_bitwise_and_to_or                      -   Replaces & with |
    =cxx_bitwise_lshift_assign_to_rshift_assign -   Replaces <<= with >>=
    =cxx_bitwise_lshift_to_rshift               -   Replaces << with >>
    =cxx_bitwise_or_assign_to_and_assign        -   Replaces |= with &=
    =cxx_bitwise_or_to_and                      -   Replaces | with &
    =cxx_bitwise_rshift_assign_to_lshift_assign -   Replaces >>= with <<=
    =cxx_bitwise_rshift_to_lshift               -   Replaces << with >>
    =cxx_bitwise_xor_assign_to_or_assign        -   Replaces ^= with |=
    =cxx_bitwise_xor_to_or                      -   Replaces ^ with |
    =cxx_number_assign_const                    -   Replaces 'a = b' with 'a = 42'
    =cxx_number_init_const                      -   Replaces 'T a = b' with 'T a = 42'
    =cxx_relational_eq_to_ne                    -   Replaces == with !=
    =cxx_relational_ge_to_gt                    -   Replaces >= with >
    =cxx_relational_ge_to_lt                    -   Replaces >= with <
    =cxx_relational_gt_to_ge                    -   Replaces > with >=
    =cxx_relational_gt_to_le                    -   Replaces > with <=
    =cxx_relational_le_to_gt                    -   Replaces <= with >
    =cxx_relational_le_to_lt                    -   Replaces <= with <
    =cxx_relational_lt_to_ge                    -   Replaces < with >=
    =cxx_relational_lt_to_le                    -   Replaces < with <=
    =cxx_relational_ne_to_eq                    -   Replaces != with ==
    =negate_mutator                             -   Negates conditionals !x to x and x to !x
    =remove_void_function_mutator               -   Removes calls to a function returning void
    =replace_call_mutator                       -   Replaces call to a function with 42
    =scalar_value_mutator                       -   Replaces zeros with 42, and non-zeros with 0

Mull's Approach

  • Mutations can be done either at a high level (i.e., source code) or at a lower level (i.e., bitcode)
  • Mull applies mutations at a lower level because:
    • The same engine can be used to support any LLVM-based language
    • The execution time for each mutation is lower 

Under the Hood

  • Loads LLVM bitcode into memory
  • Inserts instrumentation code into each function
  • Compiles instrumented LLVM bitcode to machine code
  • Prepares the machine code for execution by the LLVM JIT engine
  • At an IR code level, it finds the matching tests
  • Runs each test using the LLVM JIT engine and collects code coverage information

Drawbacks

  • Compiling with bitode enabled is straightforward for a small project but painful for big projects
  • Mutating the bitcode generates noise because not all the mutations have a representation in code
  • Some mutations generate the same behavior 
  • Mutation testing is time-consuming and requires brain power
  • It is not a solution for black-box testing

What?!?

Then Why?

  • To identify potential areas of improvement 
  • To find bugs behind the usual human interaction
  • To optimize error handling strategies
  • To asses the quality and health status of the codebase
  • To estimate the remaining unknowns bugs of a program

How is it possible?

Reports and Metrics

@giorgionatili

Discover

Inspect

Improve

 

mull-cxx -test-framework=GoogleTest -mutators=math \
         -reporters=Elements -report-dir=./report -report-name=TEST \
         -workers=4 -compdb-path compile_cmd.json -disable-cache=0 \
         -compilation-flags="\
            -isystem /opt/clang+llvm-9.0.0/include/c++/v1 \
            -isystem /opt/clang+llvm-9.0.0/lib/clang/9.0.0/include \
            -isystem /usr/include" \
         ./bin/core-test

Generate Reports

Loading bitcode files (threads: 4): 4/4. Finished in 267ms.
Compiling instrumented code (threads: 4): 4/4. Finished in 11ms.
Loading dynamic libraries (threads: 1): 1/1. Finished in 0ms.
Searching tests (threads: 1): 1/1. Finished in 2ms.
Preparing original test run (threads: 1): 1/1. Finished in 145ms.
Running original tests (threads: 4): 30/30. Finished in 187ms.
Applying function filter: no debug info (threads: 4): 3496/3496. Finished in 14ms.
Applying function filter: file path (threads: 4): 3313/3313. Finished in 22ms.
Instruction selection (threads: 4): 3313/3313. Finished in 23ms.
Searching mutants across functions (threads: 4): 3313/3313. Finished in 369ms.
Applying filter: no debug info (threads: 4): 12355/12355. Finished in 12ms.
Applying filter: file path (threads: 4): 12355/12355. Finished in 35ms.
Applying filter: junk (threads: 4): 12355/12355. Finished in 3657ms.
Prepare mutations (threads: 1): 1/1. Finished in 0ms.
Cloning functions for mutation (threads: 4): 4/4. Finished in 769ms.
Removing original functions (threads: 4): 4/4. Finished in 194ms.
Redirect mutated functions (threads: 4): 4/4. Finished in 11ms.
Applying mutations (threads: 1): 409/409. Finished in 11ms.
Compiling original code (threads: 4): 4/4. Finished in 3625ms.
Running mutants (threads: 4): 409/409. Finished in 4586ms.

Exploring Logs

Mutation Score

(fmt / include / fmt / core.h with math mutators)

Survived Mutant

(fmt / include / fmt / core.h with math mutators)

Compiling instrumented code (threads: 4): 4/4. Finished in 4612ms.
Loading dynamic libraries (threads: 1): 1/1. Finished in 0ms.
Searching tests (threads: 1): 1/1. Finished in 1ms.
Preparing original test run (threads: 1): 1/1. Finished in 86ms.
Running original tests (threads: 4): 30/30. Finished in 203ms.
Applying function filter: no debug info (threads: 4): 3496/3496. Finished in 15ms.
Applying function filter: file path (threads: 4): 3313/3313. Finished in 23ms.
Instruction selection (threads: 4): 3313/3313. Finished in 21ms.
Searching mutants across functions (threads: 4): 3313/3313. Finished in 608ms.
Applying filter: no debug info (threads: 4): 20586/20586. Finished in 15ms.
Applying filter: file path (threads: 4): 20586/20586. Finished in 58ms.
Applying filter: junk (threads: 4): 20586/20586. Finished in 3969ms.
Prepare mutations (threads: 1): 1/1. Finished in 1ms.
Cloning functions for mutation (threads: 4): 4/4. Finished in 1040ms.
Removing original functions (threads: 4): 4/4. Finished in 204ms.
Redirect mutated functions (threads: 4): 4/4. Finished in 13ms.
Applying mutations (threads: 1): 446/446. Finished in 10ms.
Compiling original code (threads: 4): 4/4. Finished in 3808ms.
Running mutants (threads: 4): 446/446. Finished in 5704ms.

Total execution time: 21046ms

Time Constraints

Estimate Remaining Bugs

Switching Perspectives

  • Count how many open bugs are in your backlog
  • Label or categorize 30% of them 
  • Run mull, then categorize and count the bugs
  • Calculate the ratio between categorized and not categorized bugs

total = 300 known bugs

labeled = 100 categorized bugs

found = 100 total bugs discovered with mutation

labeledFound  = 30 existing bugs discovered with mutation

Existing Data

labeled

--------------------

 unknown 

labeledFound

--------------------------

notLabeledFound

=

unknown > 200 potential unknown bugs

Simple Ratio

Don't Panic

Scaling Mutation Testing

@giorgionatili

Cultural Changes

  • Code quality is essential to release a successful product 
  • It's possible to objectively evaluate code quality
  • Automating quality checks is keen for effective teams
  • Tests are code and should be implemented with the same criteria  

Technical Challenges

  • Everyone worked with obsolete compilers
  • Integrate the checks in your build tools
  • Minimize the junk in your data
  • Support every operating system

Compilers Outdated

  • Compilers can be updated
  • Software architecture can simplify compilers update
  • Updates are like a fresh start

Pipeline Integration

  • Build infrastructure can integrate any tool
  • Be thoughtful on when trigger mutation testing
  • Analyze your data early in the process and often
  • Modularize your pipeline

Dev Environment

  • Invest time to simplify the usage of the tools
  • Be inclusive, support all the dev platforms
  • Write exhaustive documentation

Report Analysis 

  • Review the data in isolation and share your finding
  • Collect the findings and learn from them 
  • Implement a data model to learn from errors

Get ready for a new challenge

Final Remarks

Terminology

  • A fault is an erroneous part of a program
  • A mutation is a fault that introduced in a program
  • A mutant is a program created from the original one with a potential failure
  • A variant is a program that shows a deviation at runtime from the original program
  • A redundant fault is a duplicated fault

Ubiquity

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

Bitcode and Bytecode

  • Same same but different (JVM instructions are stack-oriented, whereas LLVM bitcode is not)
  • LLVM bitcode is closer to machine-level code, but isn't bound by a particular architecture

Resources

@giorgionatili

Thank You!