Mutation Testing at Scale

@giorgionatili

$ whoami

  • Engineering leader at Amazon (Kindle rendering team)
  • Organizer of Droidcon Boston (and maybe Seattle)
  • Organizer of SwiftFest Boston and Seattle
  • Meetups and community enthusiast
  • Lead of the System Architecture & Design learning track in Amazon

@giorgionatili

Disambiguate Software Quality

@giorgionatili

Quality starts with clean code

int d; // elapsed time in days
int elapsedTimeInDays;

VS

Measuring Code Quality

Clean code is testable

Adding tests clean the code

Rigorous TDD

Tests as First-Class Citizens

@giorgionatili

Anatomy of a Good Test

  • Self-descriptive
  • Simple
  • SOLID
namespace stringutil {
   
   std::string tail(const std::string& word) {
      if (word.length() == 0) return "";
      return word.substr(1);
   }
}

What to test?

TEST(AString, AllTheLettersAfterTheHeadAsShlouldBeTheTail) {
   ASSERT_THAT(tail("xyz"), Eq("yz"));
}

TEST(AString, TheTailOfAnEmptyStringShouldBeEmpty) {
   ASSERT_THAT(tail(""), Eq(""));
}

TEST(AString, TheTailOfASingleCharacterStringShouldBeEmpty) {
   ASSERT_THAT(tail("X"), Eq(""));
}

3S Based Tests

A Good Test Suite

  • Reliable
  • Accurate
  • Fast

Potential Test Suite Quality Metrics

  • Line coverage
  • Tests reliability
  • Execution speed

Which are the right metrics?

Automate Quality Checks

@giorgionatili

Is beneficial

adding tests?

What is the right test coverage?

TEST(AString, AllTheLettersAfterTheHeadAsShlouldBeTheTail) {
   ASSERT_THAT(tail("xyz"), Eq("yz"));
}

TEST(AString, TheTailOfAnEmptyStringShouldBeEmpty) {
   ASSERT_THAT(tail(""), Eq(""));
}

TEST(AString, TheTailOfASingleCharacterStringShouldBeEmpty) {
   tail("X");
}

What is the test coverage?

The

Oracle

Problem

Different Approach

  • Learning from earlier mistakes to prevent them from happening again
  • Simulate earlier mistakes and see whether the resulting defects gets discovered

Fault

Based Testing

Not Always Black or White!

Fuzzing All the Things

@giorgionatili

Goals

  • Measure the degree to which a system, component, or function can work with an invalid or stressful input
  • Deviate from the normal expected input of a program to analyze the consequences
bool checkEvenOdd(int num){
    return num % 2 == 0 ? true : false;
}

Input Validation

bool isDigit(char *c_array){
    for (int k = 0; k < strlen(c_array); k++) {
        if ((int)c_array[k]<(int)'0' ||
            (int)c_array[k]>(int)'9') {
            return false;
        }
    }
    return true;
}

Yet
Pretty

Open

The Heart bleed bug

Benefits

  • Early bugs finding
  • Discover security issues
  • Discover fragile areas of the codebase

Approaches

  • Dumb fuzzers (mutation)
  • Intelligent fuzzers (generation)

Drawbacks

  • Fuzz testing alone cannot provide a complete picture of an overall security threat or bugs
  • Fuzz testing can detect only simple faults or threats
  • To perform effectively, it will require significant time
  • Setting a boundary value condition with random inputs is very problematic

Tools

  • Fuzzing Frameworks
    • Boofuz
    • BDFuzz
  • Mutational Fuzzers (alter existing data samples to create new test data)
    • AFL / libFuzzer
    • Radamsa
 

Fuzzing Doesn't Listen

Mutation Testing

@giorgionatili

Unexpected Program Mutations

What Is It?

  • Mutation testing evaluates the quality of existing software tests
  • The idea is to modify (i.e., mutate) code covered by tests in a small way and check whether the existing test set detects or rejects the change

Mutation Testing Framework

  • Alter source code in one very small way
  • Run unit tests
  • Record if any tests fail

Mutants

  • Each transformation results in a new program, called mutant, that differs from the original

  • Detecting and rejecting such a modification by the existing tests is denoted as killing a mutant

Killing Mutants

Metrics

  • Test suite effectiveness is measured by its ability to detect those mutants

  • The mutation score is the ratio of killed mutants to the total number of mutants 

What About Test Coverage?

This is where mutation testing comes into play!

Different Mutations

  • Statement mutation
  • Value mutation
  • Decision mutation
// Initial code:
if(a < b) {
  c = 10;
 } else {
  c = 20;
}

// Changed code:
if(a < b) {
  d = 10;
 } else {
  d = 20;
}

Statement Mutation

// Initial code:
int mod = 1000000007;
int a = 12345678;
int b = 98765432;
int c = (a + b) % mod;

// Mutated code:
int mod = 1007;
int a = 12345678;
int b = 98765432;
int c = (a + b) % mod;

Value Mutation

// Initial code:
if(a < b) {
 c = 10;
} else {
 c = 20;
}

// Mutated code:
if(a > b) {
 c = 10;
 } else {
 c = 20; 
}

Decision Mutation

Dedicated mutation operators

int greatestCommonDenominator(int x, int y) {
    
    int tmp;
    while(y != 0) {
        tmp = x % y; // The % operator can be replaced 
        x = y;       // with +,-,*,/,%,**
        y = tmp;
    }
    return x;
}

Arithmetic Operator Replacement

int greatestCommonDenominator(int x, int y) {
    
    int tmp;
    while(y != 0) {    // The != operator can be  
        tmp = x % y;   // replaced by <,>,<=,>=,=, !=
        x = y;      
        y = tmp;
    }
    return x;
}

Relational Operator Replacement

if(a && b) 
// Potential mutations
if(a || b)
if(a & b)
if(a | b)
if(a ^ b)
if(false)
if(true)
if(a)
if(b) 

Conditional Operator Replacement

Many Others

  • Assignment Operator Replacement
  • Unary Operator Insertion
  • Scalar Variable Replacement
  • Absolute Value Insertion

Mutation Testing 

  • Identifies areas of code that are not tested properly
  • Identifies hidden defects that can’t be detected using other testing methods
  • Assesses the quality of the test cases
  • Assesses error propagation in the program

Mutation Testing

+

Mutation Analysis

A Lot of Data

Mutation testing based on LLVM

@giorgionatili

Supported Languages

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

Let's Focus On

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

LLVM

Available Tools

  • Dextool Mutate, plugin based on Dextool
  • MuCPP, based on source code mutants generation
  • Mull, an LLVM-based tool with a focus on C and C++
  • CCMutator, based on higher-order mutation operators implemented as opt passes on LLVM IR
  • Xemu, based on QEMU software emulator

What is Mull

  • An open-source tool for mutation testing based on LLVM 
  • An extendable tool to analyze the effectiveness of your test suite
  • A command-line tool that produces a SQLite database or an HTML report of the tested program
mull-cxx -test-framework=GoogleTest -mutators=conditional \                                                                                                                                                                           
         -reporters=Elements -report-dir=./report \
         -report-name=MULL-TEST-ONE \
         -workers=4 -compdb-path compile_commands \
         -disable-cache=0 \
         ./bin/core-test

How to Run Mull

Why Mull

  • Efficiency in generating a mutation
  • Support for dry-run mode
  • Effective sandbox model
  • Support for failing fast

Supported Mutators

  • Mathematical 
  • Conditional negator 
  • Remove void function
  • Replace call
  • Scalar value replacement
  • Many others! :)
$ ./mull-cxx --help

Explore the Mutators

--mutators=<value>                            - Choose mutators:
    =all                                        -   default, experimental
    =arithmetic                                 -   cxx_arithmetic_add_to_sub, cxx_arithmetic_add_assign_to_sub_assign, cxx_arithmetic_post_inc_to_post_dec, cxx_arithmetic_pre_inc_to_pre_dec, cxx_arithmetic_sub_to_add, cxx_arithmetic_sub_assign_to_add_assign, cxx_arithmetic_post_dec_to_post_inc, cxx_arithmetic_pre_dec_to_pre_inc, cxx_arithmetic_mul_to_div, cxx_arithmetic_mul_assign_to_div_assign, cxx_arithmetic_div_to_mul, cxx_arithmetic_div_assign_to_mul_assign, cxx_arithmetic_rem_to_div, cxx_arithmetic_rem_assign_to_div_assign
    =bitwise                                    -   cxx_bitwise_lshift_to_rshift, cxx_bitwise_lshift_assign_to_rshift_assign, cxx_bitwise_rshift_to_lshift, cxx_bitwise_rshift_assign_to_lshift_assign, cxx_bitwise_and_to_or, cxx_bitwise_and_assign_to_or_assign, cxx_bitwise_or_to_and, cxx_bitwise_or_assign_to_and_assign, cxx_bitwise_xor_to_or, cxx_bitwise_xor_assign_to_or_assign
    =conditional                                -   and_or_replacement_mutator, negate_mutator, conditionals_boundary_mutator, negate_relational
    =conditionals_boundary_mutator              -   cxx_relational_le_to_lt, cxx_relational_lt_to_le, cxx_relational_ge_to_gt, cxx_relational_gt_to_ge
    =constant                                   -   scalar_value_mutator
    =cxx                                        -   conditionals_boundary_mutator, negate_relational, arithmetic, numbers
    =default                                    -   cxx_arithmetic_add_to_sub, negate_mutator, remove_void_function_mutator
    =experimental                               -   and_or_replacement_mutator, numbers, replace_call_mutator, scalar_value_mutator, conditionals_boundary_mutator, negate_relational, arithmetic, bitwise
    =functions                                  -   replace_call_mutator, remove_void_function_mutator
    =math                                       -   cxx_arithmetic_add_to_sub, cxx_arithmetic_sub_to_add, cxx_arithmetic_mul_to_div, cxx_arithmetic_div_to_mul
    =negate_relational                          -   cxx_relational_gt_to_le, cxx_relational_ge_to_lt, cxx_relational_lt_to_ge, cxx_relational_le_to_gt, cxx_relational_eq_to_ne, cxx_relational_ne_to_eq
    =numbers                                    -   cxx_number_init_const, cxx_number_assign_const
    =and_or_replacement_mutator                 -   Replaces && with ||, || with &&
    =cxx_arithmetic_add_assign_to_sub_assign    -   Replaces += with -=
    =cxx_arithmetic_add_to_sub                  -   Replaces + with -
    =cxx_arithmetic_div_assign_to_mul_assign    -   Replaces /= with *=
    =cxx_arithmetic_div_to_mul                  -   Replaces / with *
    =cxx_arithmetic_mul_assign_to_div_assign    -   Replaces *= with /=
    =cxx_arithmetic_mul_to_div                  -   Replaces * with /
    =cxx_arithmetic_post_dec_to_post_inc        -   Replaces x-- with x++
    =cxx_arithmetic_post_inc_to_post_dec        -   Replaces x++ with x--
    =cxx_arithmetic_pre_dec_to_pre_inc          -   Replaces --x with ++x
    =cxx_arithmetic_pre_inc_to_pre_dec          -   Replaces ++x with --x
    =cxx_arithmetic_rem_assign_to_div_assign    -   Replaces %= with /=
    =cxx_arithmetic_rem_to_div                  -   Replaces % with /
    =cxx_arithmetic_sub_assign_to_add_assign    -   Replaces -= with +=
    =cxx_arithmetic_sub_to_add                  -   Replaces - with +
    =cxx_bitwise_and_assign_to_or_assign        -   Replaces &= with |=
    =cxx_bitwise_and_to_or                      -   Replaces & with |
    =cxx_bitwise_lshift_assign_to_rshift_assign -   Replaces <<= with >>=
    =cxx_bitwise_lshift_to_rshift               -   Replaces << with >>
    =cxx_bitwise_or_assign_to_and_assign        -   Replaces |= with &=
    =cxx_bitwise_or_to_and                      -   Replaces | with &
    =cxx_bitwise_rshift_assign_to_lshift_assign -   Replaces >>= with <<=
    =cxx_bitwise_rshift_to_lshift               -   Replaces << with >>
    =cxx_bitwise_xor_assign_to_or_assign        -   Replaces ^= with |=
    =cxx_bitwise_xor_to_or                      -   Replaces ^ with |
    =cxx_number_assign_const                    -   Replaces 'a = b' with 'a = 42'
    =cxx_number_init_const                      -   Replaces 'T a = b' with 'T a = 42'
    =cxx_relational_eq_to_ne                    -   Replaces == with !=
    =cxx_relational_ge_to_gt                    -   Replaces >= with >
    =cxx_relational_ge_to_lt                    -   Replaces >= with <
    =cxx_relational_gt_to_ge                    -   Replaces > with >=
    =cxx_relational_gt_to_le                    -   Replaces > with <=
    =cxx_relational_le_to_gt                    -   Replaces <= with >
    =cxx_relational_le_to_lt                    -   Replaces <= with <
    =cxx_relational_lt_to_ge                    -   Replaces < with >=
    =cxx_relational_lt_to_le                    -   Replaces < with <=
    =cxx_relational_ne_to_eq                    -   Replaces != with ==
    =negate_mutator                             -   Negates conditionals !x to x and x to !x
    =remove_void_function_mutator               -   Removes calls to a function returning void
    =replace_call_mutator                       -   Replaces call to a function with 42
    =scalar_value_mutator                       -   Replaces zeros with 42, and non-zeros with 0

Mull's Approach

  • Mutations can be done either at a high level (i.e., source code) or at a lower level (i.e., bitcode)
  • Mull applies mutations at a lower level because:
    • The same engine can be used to support any LLVM-based language
    • The execution time for each mutation is lower 

Under the Hood

  • Loads LLVM bitcode into memory
  • Inserts instrumentation code into each function
  • Compiles instrumented LLVM bitcode to machine code
  • Prepares the machine code for execution by the LLVM JIT engine
  • At an IR code level, it finds the matching tests
  • Runs each test using the LLVM JIT engine and collects code coverage information

Drawbacks

  • Compiling with bitode enabled is straightforward for a small project but painful for big projects
  • Mutating the bitcode generates noise because not all the mutations have a representation in code
  • Some mutations generate the same behavior 
  • Mutation testing is time-consuming and requires brain power
  • It is not a solution for black-box testing

What?!?

Then Why?

  • To identify potential areas of improvement 
  • To find bugs behind the usual human interaction
  • To optimize error handling strategies
  • To asses the quality and health status of the codebase
  • To estimate the remaining unknowns bugs of a program

How is it possible?

Reports and Metrics

@giorgionatili

Discover

Inspect

Improve

 

mull-cxx -test-framework=GoogleTest -mutators=math \
         -reporters=Elements -report-dir=./report -report-name=TEST \
         -workers=4 -compdb-path compile_cmd.json -disable-cache=0 \
         -compilation-flags="\
            -isystem /opt/clang+llvm-9.0.0/include/c++/v1 \
            -isystem /opt/clang+llvm-9.0.0/lib/clang/9.0.0/include \
            -isystem /usr/include" \
         ./bin/core-test

Generate Reports

Loading bitcode files (threads: 4): 4/4. Finished in 267ms.
Compiling instrumented code (threads: 4): 4/4. Finished in 11ms.
Loading dynamic libraries (threads: 1): 1/1. Finished in 0ms.
Searching tests (threads: 1): 1/1. Finished in 2ms.
Preparing original test run (threads: 1): 1/1. Finished in 145ms.
Running original tests (threads: 4): 30/30. Finished in 187ms.
Applying function filter: no debug info (threads: 4): 3496/3496. Finished in 14ms.
Applying function filter: file path (threads: 4): 3313/3313. Finished in 22ms.
Instruction selection (threads: 4): 3313/3313. Finished in 23ms.
Searching mutants across functions (threads: 4): 3313/3313. Finished in 369ms.
Applying filter: no debug info (threads: 4): 12355/12355. Finished in 12ms.
Applying filter: file path (threads: 4): 12355/12355. Finished in 35ms.
Applying filter: junk (threads: 4): 12355/12355. Finished in 3657ms.
Prepare mutations (threads: 1): 1/1. Finished in 0ms.
Cloning functions for mutation (threads: 4): 4/4. Finished in 769ms.
Removing original functions (threads: 4): 4/4. Finished in 194ms.
Redirect mutated functions (threads: 4): 4/4. Finished in 11ms.
Applying mutations (threads: 1): 409/409. Finished in 11ms.
Compiling original code (threads: 4): 4/4. Finished in 3625ms.
Running mutants (threads: 4): 409/409. Finished in 4586ms.

Exploring Logs

Mutation Score

(fmt / include / fmt / core.h with math mutators)

Survived Mutant

(fmt / include / fmt / core.h with math mutators)

Compiling instrumented code (threads: 4): 4/4. Finished in 4612ms.
Loading dynamic libraries (threads: 1): 1/1. Finished in 0ms.
Searching tests (threads: 1): 1/1. Finished in 1ms.
Preparing original test run (threads: 1): 1/1. Finished in 86ms.
Running original tests (threads: 4): 30/30. Finished in 203ms.
Applying function filter: no debug info (threads: 4): 3496/3496. Finished in 15ms.
Applying function filter: file path (threads: 4): 3313/3313. Finished in 23ms.
Instruction selection (threads: 4): 3313/3313. Finished in 21ms.
Searching mutants across functions (threads: 4): 3313/3313. Finished in 608ms.
Applying filter: no debug info (threads: 4): 20586/20586. Finished in 15ms.
Applying filter: file path (threads: 4): 20586/20586. Finished in 58ms.
Applying filter: junk (threads: 4): 20586/20586. Finished in 3969ms.
Prepare mutations (threads: 1): 1/1. Finished in 1ms.
Cloning functions for mutation (threads: 4): 4/4. Finished in 1040ms.
Removing original functions (threads: 4): 4/4. Finished in 204ms.
Redirect mutated functions (threads: 4): 4/4. Finished in 13ms.
Applying mutations (threads: 1): 446/446. Finished in 10ms.
Compiling original code (threads: 4): 4/4. Finished in 3808ms.
Running mutants (threads: 4): 446/446. Finished in 5704ms.

Total execution time: 21046ms

Time Constraints

Estimate Remaining Bugs

Switching Perspectives

  • Count how many open bugs are in your backlog
  • Label or categorize 30% of them 
  • Run mull, then categorize and count the bugs
  • Calculate the ratio between categorized and not categorized bugs

total = 300 known bugs

labeled = 100 categorized bugs

found = 100 total bugs discovered with mutation

labeledFound  = 30 existing bugs discovered with mutation

Existing Data

labeled

--------------------

 unknown 

labeledFound

--------------------------

notLabeledFound

=

unknown > 200 potential unknown bugs

Simple Ratio

Don't Panic

Scaling Mutation Testing

@giorgionatili

Cultural Changes

  • Code quality is essential to release a successful product 
  • It's possible to objectively evaluate code quality
  • Automating quality checks is keen for effective teams
  • Tests are code and should be implemented with the same criteria  

Technical Challenges

  • Everyone worked with obsolete compilers
  • Integrate the checks in your build tools
  • Minimize the junk in your data
  • Support every operating system

Compilers Outdated

  • Compilers can be updated
  • Software architecture can simplify compilers update
  • Updates are like a fresh start

Pipeline Integration

  • Build infrastructure can integrate any tool
  • Be thoughtful on when trigger mutation testing
  • Analyze your data early in the process and often
  • Modularize your pipeline

Dev Environment

  • Invest time to simplify the usage of the tools
  • Be inclusive, support all the dev platforms
  • Write exhaustive documentation

Report Analysis 

  • Review the data in isolation and share your finding
  • Collect the findings and learn from them 
  • Implement a data model to learn from errors

Get ready for a new challenge

Final Remarks

Terminology

  • A fault is an erroneous part of a program
  • A mutation is a fault that introduced in a program
  • A mutant is a program created from the original one with a potential failure
  • A variant is a program that shows a deviation at runtime from the original program
  • A redundant fault is a duplicated fault

Ubiquity

Java, JVM

C, C++

Javascript

Rust

Swift

Ruby

PHP

C#

Closure

Python

Scala

Bitcode and Bytecode

  • Same same but different (JVM instructions are stack-oriented, whereas LLVM bitcode is not)
  • LLVM bitcode is closer to machine-level code, but isn't bound by a particular architecture

Resources

@giorgionatili

Thank You!

Mutation Testing at Scale

By Giorgio Natili

Mutation Testing at Scale

Mutation testing is a technique used to evaluate the quality of existing test suites that implies modifying a program logic in small ways such as negating conditionals, changing a logical connector, and so on to introduce faults. When the application code changes, it should produce different results and cause the unit tests to fail. If a unit test does not fail in this situation, it may indicate that there is an issue in the test suite. Mutation testing typically utilizes a set of language-specific source code transformations, called operators, to introduce faults, but these operators may not be sufficient to generate enough mutations to cover the specific domain of the system under test. Join Giorgio in this session to learn more about mutation testing and how to create operators to generate mutations specific to a custom domain.

  • 437