Property-Based Testing

Matthias Benkort

github/KtorZ

Cardano

WHY DO

WE TEST

?

... TO PROVE
A PROGRAM
IS CORRECT

WRONG

WE TEST TO

FIND BUGS

Program testing can be used to show the presence of bugs, but never their absence!

E. Dijkstra

Things humans are good at

Catching bugs

Finding edge-cases

Evaluating behaviors of

complex programs

Extrapolating

Things to do when testing

Writing bugs

PROPERTY TESTING?

WHAT IS

property

? counter-example ?

data

something always true

λ> quickCheck $ \(x :: Integer) -> x === 1
*** Failed! Falsified (after 1 test):  
x: 0
0 /= 1

λ> quickCheck $ \(xs :: [Char]) -> length xs === 0
*** Failed! Falsified (after 3 tests and 2 shrinks):    
xs: "a" 
1 /= 0

λ> quickCheck $ \(x :: Integer) (y :: Integer) -> x /= 0 ==> y+x =/= y*x
+++ OK, passed 100 tests; 12 discarded.

λ> quickCheck $ \(x :: Integer) (y :: Integer) -> x /= 0 ==> y+x =/= y*x
*** Failed! Falsified (after 3 tests):                  
x: 2
y: 2
4 == 4

\forall x \in \N. x = 1

\forall xs \in \alpha^*. length(xs) = 0

\forall (x,y) \in \N\times\N. x \ne 0 \implies x+y \ne x*y

C	Coq	Go	Lua	PHP	Scala
C++	D	IO	Node.js	Python	Scheme
C#	Elm	Java	Objective-C	R	SmallTalk
Chicken	Elixir	JavaScript	OCaml	Racket	Swift
Clojure	Erlang	Julia	Perl	Ruby	TypeScript
Lisp	F#	Logtalk	Prolog	Rust	VisualBasic

sort :: [x] -> [x]

length(xs) = length(sort(xs))

sort(sort(xs)) = sort(xs)

sort(xs) = xs

assert (sort [] == [])
assert (sort [2,1] == [1,2])
assert (sort [1,2,1] == [1,1,2])

λ> quickCheck . verbose $ \(xs :: [Int]) -> xs === sort xs
Passed: []        
Passed: [-1,1]    
Failed: [-3,1,-1] 
-------------------- Shrinking [-3,1,-1]
Passed: []        
Failed: [1,-1] 
-------------------- Shrinking [1,-1]
Passed: []        
Passed: [-1]      
Passed: [1]       
Failed: [0,-1] 
-------------------- Shrinking [0,-1]
Passed: []        
Passed: [-1]      
Passed: [0]       
Passed: [0,1]     
Passed: [0,0] 

*** Failed! Falsified (after 3 tests and 2 shrinks):    
[0,-1]
[0,-1] /= [-1,0]

Generating

Labelling

Shrinking

Generating...

(random) data from composable primitives.

Shrinking...

complex data-structures down to "minimal" counter-examples.

Labelling...

properties to measure efficiency and reason about them.

GREAT TEST RUNNER

WHAT ABOUT

THE REAL WORLD

Idempotence

Oracles

Labyrinth

Roundtrips

Testing The Ugly

Actual State

Actual Operation

Model State

Model Operation

Semantic Interpretation

write a

write b

read a

delete b

{
  "a": ...
}

{}

{
  "a": ...,
  "b": ...
}

{
  "a": ...,
  "b": ...
}

{
  "a": ...,
  "b": ...
}

{
  "a": ...
}

\forall cs \in Command*. \forall c \in cs. \newline model(c) = abstraction(actual(c))

data Cmd
    = CreateCheckpoint Checkpoint Metadata TxHistory
    | PutCheckpoint Checkpoint
    | ReadCheckpoint 
    | ListCheckpoints 
    | RollbackTo SlotId
    | PutMetadata Metadata
    | ReadMetadata 
    | PutTxHistory TxHistory
    | ReadTxHistory SortOrder (Range SlotId) (Maybe TxStatus)
    | RemovePendingTx (Hash "Tx")
    | PutPrivateKey XPrv
    | ReadPrivateKey 
    | PutDelegationCertificate DelegationCertificate SlotId

data Success
    = Unit ()
    | Checkpoint (Maybe Checkpoint)
    | Metadata (Maybe Metadata)
    | TxHistory TxHistory
    | PrivateKey (Maybe XPrv)
    | BlockHeaders [BlockHeader]
    | Point SlotId
    
data Err 
    = NoCheckpoint
    | CheckpointAlreadyExists 
    | CannotRemovePendingTx ErrRemovePendingTx

data Resp
    = Resp (Either Err Success)

Sqlite State machine tests
    +++ OK, passed 800 tests:
        54.2% UnsuccessfulReadTxHistory
        53.1% SuccessfulReadCheckpoint
        51.9% TxUnsortedInputs
        51.7% TxUnsortedOutputs
        31.4% SuccessfulReadTxHistory
        28.5% UnsuccessfulReadCheckpoint
        26.9% ReadTxHistoryAfterDelete
        25.2% ReadMetaAfterPutCert
        24.9% PutCheckpointTwice
        20.0% RolledBackOnce
        17.4% RemovePendingTxTwice
        17.0% SuccessfulReadPrivateKey

Sqlite State machine tests
Falsifiable (after 74 tests and 17 shrinks):
   [ Command (CreateCheckpoint (...))
   , Command (PutDelegationCertificate (...))
   , Command (ReadCheckpoint (...))
   , Command (RollbackTo (...))
   , Command (PutDelegationCertificate (...))
   , Command (ReadMetadata (...))
   ]
   
PostconditionFailed ... (Model /= Sqlite)

HTTP (REST)

Node Chain Synchronization
    +++ OK, passed 100000 tests:
        64.697% started from scratch
        56.047% advanced more than k blocks
        53.147% started with an empty chain
        41.140% started with more than k blocks
        10.611% switched to a shorter chain
         7.015% switched to a longer chain
         6.882% rewinded without switch
         0.857% rolled back full k

        96.319% Intersection hit rate GREAT (75% - 100%)
         3.673% Intersection hit rate GOOD  (50% - 75%)
         0.008% Intersection hit rate POOR  (10% - 50%)

The local state is eventually synced with the node.

Node Chain Synchronization
    Assertion failed (after 1 test and 9 shrinks):
    Initial consumer chain: genesis
    Initial node chain:     genesis
    Applied blocks:         genesis 0001 0002 0003 0004 0005 0006 
    Node chain:             genesis 0001 0002 0003 0004 0005 0006 0007 
    Logs:                 
    [CONSUMER] nextBlocks SlotId 0.0
    [PRODUCER] getTipId
        ↳ genesis
    [NETWORK ] NodeAddBlocks [0001]
    [PRODUCER] getBlock "genesis"
        ↳ 0.0:genesis
    [NETWORK ] NodeAddBlocks [0002]
    [CONSUMER] AwaitReply
    [CONSUMER] nextBlocks 0.0
    [PRODUCER] getTipId
        ↳ 0002
    [NETWORK ] NodeAddBlocks [0003]
    [PRODUCER] getBlock "0002"
        ↳ 0.2:0002->0001
    [NETWORK ] NodeAddBlocks [0004]
    [PRODUCER] getBlock "0001"
        ↳ 0.1:0001->genesis
    ...

Property-Based Testing

Matthias Benkort

github/KtorZ

Cardano

WHY DO

WE TEST

?

... TO PROVE A PROGRAM IS CORRECT

WRONG

WE TEST TO

FIND BUGS

Program testing can be used to show the presence of bugs, but never their absence!

Things humans are good at

Catching bugs

Finding edge-cases

Evaluating behaviors of

complex programs

Extrapolating

Things to do when testing

Writing bugs

PROPERTY TESTING?

WHAT IS

property

? counter-example ?

something always true

Generating

Labelling

Shrinking

Generating...

Shrinking...

Labelling...

GREAT TEST RUNNER

WHAT ABOUT

THE REAL WORLD

Idempotence

Oracles

Labyrinth

Roundtrips

write a

write b

read a

read a

delete b

HTTP (REST)

The local state is eventually synced with the node.

Merci.

Property Based Testing

More from Matthias Benkort

... TO PROVE
A PROGRAM
IS CORRECT