Francesco Komauli
Developer
Università degli Studi di Milano
Facoltà di Scienze e Tecnologie
Corso di Laurea Magistrale in Informatica
April 9, 2018
In recent years we have witnessed an increasing attention toward machine assisted theorem proving
A challenge proposed by the research community: evaluation of the current state of
machine assisted reasoning about
programming languages
The purpose is “making the use of proof tools common practice in programming language research”
The study of properties that calculi underlying a programming language should satisfy
Type soundness
Compiler correctness
Information-flow control
Etc...
Formal machine-checkable proofs are the most trustable evidence of correctness of a software model
Verifying the properties of a non-trivial model through interactive theorem-proving requires a significant effort
Proof attempts are not the best way to debug an incorrect model, because of the little assistance given by proof assistants in such cases
Complementary approach to formal verification of programming languages meta-theory:
the use of model-checking prior to theorem proving to
find errors earlier in the verification process
In model-checking, the user efforts are limited to the specification of a model and properties it should satisfy
While verification concerns proving certain properties, validation tries to refute them, giving some confidence that the design of a model is correct, though not the full confidence that formal verification would give
A form of bounded model-checking in the scope of programming language testing, originated with the testing library QuickCheck [Claessen and Hughes, 2000]
Properties are executable predicates,
tested via random or exhaustive data generation
Searching for counterexamples
is a task delegated to the PBT library
We want to evaluate the efficacy of different data generation strategies when applying PBT to programming languages meta-theory
The field of high-level languages has already been deeply investigated by the research community, so our attention is directed towards typed assembly languages and abstract machines
The problem of data generation is more challenging when concerning less structured languages
Our work focus on the List-Machine benchmark
[Appel et al., 2012]
Two implementations of the model to compare the random generation approach with the exhaustive one:
Assembly language consisting of a small instruction set, comprising instructions for fetching values from memory and storing them back, and for (un)conditional branches
It includes a simple type system that classifies list types, together with a subtyping relation, which guarantees that the execution of well-typed programs do not cause the machine to get stuck
\(\iota_1, \iota_2, \ldots \)
jump \(l\)
branch-if-nil \(v\ l\)
fetch-field \(v\ 0\ v'\)
fetch-field \(v\ 1\ v'\)
cons \(v_o\ v_1\ v'\)
halt
\(\iota_1\); \(\iota_2\)
\(: I\ \) instructions
\(: I\ \) jump to label \(l\)
\(: I\ \) if \(v =\ \)nil then jump to \(l\)
\(: I\ \) fetch the head of \(v\) into \(v'\)
\(: I\ \) fetch the tail of \(v\) into \(v'\)
\(: I\ \) make a cons cell in \(v'\)
\(: I\ \) stop executing
\(: I\ \) sequential composition
The model defines operational semantics describing how the machine changes its state when executing an instruction
\((\tau' <: \tau)\)
Lists can either be empty (Nil) or contain at least one element of type \(\tau\) (Listcons), while List and Listcons type parameters are covariant
The model defines a typechecking relation which uses these types
Progress
The machine cannot get stuck
when executing a well-typed instruction
Program \( p \) typechecks with the program typing \( \Pi \)
The Hoare triple \(\Gamma \{ \iota \} \Gamma' \) relates environment \(\Gamma\) updated to \(\Gamma'\) under the program typing \(\Pi\)
The store \(r\) matches the typing environment \(\Gamma\)
instruction \(\iota\) can either make a step with \(r\) as store,
or it is a halt instruction
Other than progress, several properties were encoded and checked against the model implementations
Checks exposed some errors that were introduced in the encoding of the abstract machine, such as typos or unsound semantics
We kept correcting the models until no more counterexamples were found by the test suites
Not all benchmarks are supplied with a set of errors that we can use to assess the efficacy of a tool
To provide such errors we employ mutation testing:
mutations are (manually) applied to the source code, then we run checks to see whether the PBT suite finds the introduced errors
When a check finds a counterexample, it is said to
kill the mutant
PBT libraries are highly configurable and execution time of checks is more predictable
Non-trivial properties need a custom generator to be adequately tested and shrinkers to reduce the counterexamples size
When an abstract machine evolves with time, custom generators and shrinkers need to be re-written
The chosen shrinking strategy may mostly produce some non optimal local minimum
All properties can be immediately checked, because generators are automatically derived from the model
When an abstract machine evolves with time, PBT with exhaustive generators can be seamlessly extended
Counterexamples found with exhaustive generation and iterative deepening are always a global minimum
\(\alpha\)Check performs a naive exhaustive search without any heuristics, thus the search space may exponentially explode unpredictably
The list-machine model was further developed in
[Appel et al., 2012], by adding indirect jumps to its instruction set.
We want to study how the employed tools respond to
specifications changes of the model under test
Implementing an abstract machine with dynamic secure information-flow control [Hritcu et al., 2013], which hardest-to-find bugs have minimal counterexamples too large and well beyond the scope of naive exhaustive testing. Thus we may want to find out how well \(\alpha\)Check can perform in this challenging case study
More ambitiously, we may try to re-discover some of the bugs that were present in the specification of WebAssembly, as found in its remarkable formalization in Isabelle [Watt, 2018]
By Francesco Komauli