Property-Based Testing
​Abstract Machines

Francesco Komauli

Università degli Studi di Milano
Facoltà di Scienze e Tecnologie
Corso di Laurea Magistrale in Informatica

April 9, 2018

Introduction

In recent years we have witnessed an increasing attention toward machine assisted theorem proving

A challenge proposed by the research community: evaluation of the current state of
machine assisted reasoning about
programming languages

The purpose is “making the use of proof tools common practice in programming language research”

Meta-Theory of Programming Languages

The study of properties that calculi underlying a programming language should satisfy

  • Type soundness

  • Compiler correctness

  • Information-flow control

  • Etc...

Formal machine-checkable proofs are the most trustable evidence of correctness of a software model

Motivations

Verifying the properties of a non-trivial model through interactive theorem-proving requires a significant effort

Proof attempts are not the best way to
debug an incorrect model, because of the little assistance given by proof assistants in such cases

Complementary approach to formal verification of programming languages meta-theory:
the use of model-checking prior to theorem proving to
find errors earlier in the verification process

Model Checking

In model-checking, the user efforts are limited to the specification of a model and properties it should satisfy

While verification concerns proving certain properties, validation tries to refute them, giving some confidence that the design of a model is correct, though not the full confidence that formal verification would give

Property-Based Testing (PBT)

A form of bounded model-checking in the scope of programming language testing, originated with the testing library QuickCheck [Claessen and Hughes, 2000]

Properties are executable predicates,
tested via random or exhaustive data generation

Searching for counterexamples
is a task delegated to the PBT library

Objectives of This Thesis

We want to evaluate the efficacy of different data generation strategies when applying PBT to programming languages meta-theory

The field of high-level languages has already been deeply investigated by the research community, so our attention is directed towards typed assembly languages and abstract machines

The problem of data generation is more challenging when concerning less structured languages

Work Done

Our work focus on the List-Machine benchmark
[Appel et al., 2012]

Two implementations of the model to compare the random generation approach with the exhaustive one:

  • a relational implementation with the logic programming language \(\alpha\)Prolog and his model checker \(\alpha\)Check [Cheney and Momigliano, 2017]
  • a functional implementation developed in F#, with random generation offered by FsCheck

The List-Machine

Assembly language consisting of a small instruction set, comprising instructions for fetching values from memory and storing them back, and for (un)conditional branches

It includes a simple type system that classifies list types, together with a subtyping relation, which guarantees that the execution of well-typed programs do not cause the machine to get stuck

The List-Machine Instruction Set

\(\iota_1, \iota_2, \ldots \)

jump \(l\)
branch-if-nil \(v\ l\)
fetch-field \(v\ 0\ v'\)
fetch-field \(v\ 1\ v'\)
cons \(v_o\ v_1\ v'\)
halt
\(\iota_1\); \(\iota_2\)

\(: I\ \) instructions

\(: I\ \) jump to label \(l\)
\(: I\ \) if \(v =\ \)nil then jump to \(l\)
\(: I\ \) fetch the head of \(v\) into \(v'\)
\(: I\ \) fetch the tail of \(v\) into \(v'\)
\(: I\ \) make a cons cell in \(v'\)
\(: I\ \) stop executing
\(: I\ \) sequential composition

The model defines operational semantics describing how the machine changes its state when executing an instruction

The List-Machine Types

List \(\tau\)

Nil

Listcons \(\tau'\)

\((\tau' <: \tau)\)

Lists can either be empty (Nil) or contain at least one element of type \(\tau\) (Listcons), while List and Listcons type parameters are covariant

The model defines a typechecking relation which uses these types

Property Example: Progress

Progress
The machine cannot get stuck
when executing a well-typed instruction

\models_{\text{prog}} p: \Pi \quad \Pi\; \vdash_{\text{instr}} \Gamma \{ \iota \} \Gamma' \quad r: \Gamma
progp:ΠΠinstrΓ{ι}Γr:Γ\models_{\text{prog}} p: \Pi \quad \Pi\; \vdash_{\text{instr}} \Gamma \{ \iota \} \Gamma' \quad r: \Gamma
\text{step-or-halt}(p,\ r,\ \iota)
step-or-halt(p, r, ι)\text{step-or-halt}(p,\ r,\ \iota)

Program \( p \) typechecks with the program typing \( \Pi \)

The Hoare triple \(\Gamma \{ \iota \} \Gamma' \) relates environment \(\Gamma\) updated to \(\Gamma'\) under the program typing \(\Pi\)

The store \(r\) matches the typing environment \(\Gamma\)

instruction \(\iota\) can either make a step with \(r\) as store,
or it is a halt instruction

PBT Model Implementations

Other than progress, several properties were encoded and checked against the model implementations

 

Checks exposed some errors that were introduced in the encoding of the abstract machine, such as typos or unsound semantics

We kept correcting the models until no more counterexamples were found by the test suites

Mutation Testing

Not all benchmarks are supplied with a set of errors that we can use to assess the efficacy of a tool

To provide such errors we employ mutation testing:
mutations are (manually) applied to the source code, then we run checks to see whether the PBT suite finds the introduced errors


When a check finds a counterexample, it is said to
kill the mutant

Experimental Results

Evaluation: Functional - Random

PBT libraries are highly configurable and execution time of checks is more predictable

Non-trivial properties need a custom generator to be adequately tested and shrinkers to reduce the counterexamples size

When an abstract machine evolves with time, custom generators and shrinkers need to be re-written

The chosen shrinking strategy may mostly produce some non optimal local minimum

Evaluation: Logic - Exhaustive

All properties can be immediately checked, because generators are automatically derived from the model

When an abstract machine evolves with time, PBT with exhaustive generators can be seamlessly extended

Counterexamples found with exhaustive generation and iterative deepening are always a global minimum

\(\alpha\)Check performs a naive exhaustive search without any heuristics, thus the search space may exponentially explode unpredictably

Ongoing Work: List-Machine

The list-machine model was further developed in
[Appel et al., 2012], by adding indirect jumps to its instruction set.

We want to study how the employed tools respond to
specifications changes of the model under test

Ongoing Work: Information-Flow Control

Implementing an abstract machine with dynamic secure information-flow control [Hritcu et al., 2013], which hardest-to-find bugs have minimal counterexamples too large and well beyond the scope of naive exhaustive testing. Thus we may want to find out how well \(\alpha\)Check can perform in this challenging case study

Future Work: WebAssembly

More ambitiously, we may try to re-discover some of the bugs that were present in the specification of WebAssembly, as found in its remarkable formalization in Isabelle [Watt, 2018]

Property-Based Testing
​Abstract Machines

Thank You

Property-Based Testing Abstract Machines

By Francesco Komauli

Property-Based Testing Abstract Machines

  • 68