Property-Based Testing

Matthias Benkort

github/KtorZ

Cardano

WHY DO

WE TEST

?

... TO  PROVE
A PROGRAM
IS  CORRECT

 

WRONG

WE TEST TO

FIND   BUGS

Program testing can be used to show the presence of bugs, but never their absence!

E. Dijkstra

Things humans are good at

Catching bugs

Finding edge-cases

Evaluating behaviors of

complex programs

Extrapolating

Things to do when testing

Writing bugs

PROPERTY TESTING?

WHAT IS

property

? counter-example ?

data

something always true

λ> quickCheck $ \(x :: Integer) -> x === 1
*** Failed! Falsified (after 1 test):  
x: 0
0 /= 1
λ> quickCheck $ \(xs :: [Char]) -> length xs === 0
*** Failed! Falsified (after 3 tests and 2 shrinks):    
xs: "a" 
1 /= 0
λ> quickCheck $ \(x :: Integer) (y :: Integer) -> x /= 0 ==> y+x =/= y*x
+++ OK, passed 100 tests; 12 discarded.
λ> quickCheck $ \(x :: Integer) (y :: Integer) -> x /= 0 ==> y+x =/= y*x
*** Failed! Falsified (after 3 tests):                  
x: 2
y: 2
4 == 4
\forall x \in \N. x = 1
\forall xs \in \alpha^*. length(xs) = 0
\forall (x,y) \in \N\times\N. x \ne 0 \implies x+y \ne x*y
C Coq Go Lua PHP Scala
C++ D IO Node.js Python Scheme
C# Elm Java Objective-C R SmallTalk
Chicken Elixir JavaScript OCaml Racket Swift
Clojure Erlang Julia Perl Ruby TypeScript
Lisp F# Logtalk Prolog Rust VisualBasic

sort :: [x] -> [x]

length(xs) = length(sort(xs))
sort(sort(xs)) = sort(xs)
sort(xs) = xs

!

assert (sort [] == [])
assert (sort [2,1] == [1,2])
assert (sort [1,2,1] == [1,1,2])
λ> quickCheck . verbose $ \(xs :: [Int]) -> xs === sort xs
Passed: []        
Passed: [-1,1]    
Failed: [-3,1,-1] 
-------------------- Shrinking [-3,1,-1]
Passed: []        
Failed: [1,-1] 
-------------------- Shrinking [1,-1]
Passed: []        
Passed: [-1]      
Passed: [1]       
Failed: [0,-1] 
-------------------- Shrinking [0,-1]
Passed: []        
Passed: [-1]      
Passed: [0]       
Passed: [0,1]     
Passed: [0,0] 

*** Failed! Falsified (after 3 tests and 2 shrinks):    
[0,-1]
[0,-1] /= [-1,0]

Generating

Labelling

Shrinking

Generating...

  (random) data from composable primitives.

Shrinking...

complex data-structures down to "minimal" counter-examples.

Labelling...

properties to measure efficiency and reason  about them.

GREAT TEST RUNNER

WHAT ABOUT

THE REAL WORLD

Idempotence

Oracles

?

Labyrinth

Roundtrips

Testing The Ugly

Actual State

Actual State

Actual Operation

Model State

Model State

Model Operation

Semantic Interpretation

Semantic Interpretation

write a

write b

read a

read a

delete b

{
  "a": ...
}

{}
{
  "a": ...,
  "b": ...
}
{
  "a": ...,
  "b": ...
}
{
  "a": ...,
  "b": ...
}
{
  "a": ...
}
\forall cs \in Command*. \forall c \in cs. \newline model(c) = abstraction(actual(c))
data Cmd
    = CreateCheckpoint Checkpoint Metadata TxHistory
    | PutCheckpoint Checkpoint
    | ReadCheckpoint 
    | ListCheckpoints 
    | RollbackTo SlotId
    | PutMetadata Metadata
    | ReadMetadata 
    | PutTxHistory TxHistory
    | ReadTxHistory SortOrder (Range SlotId) (Maybe TxStatus)
    | RemovePendingTx (Hash "Tx")
    | PutPrivateKey XPrv
    | ReadPrivateKey 
    | PutDelegationCertificate DelegationCertificate SlotId
data Success
    = Unit ()
    | Checkpoint (Maybe Checkpoint)
    | Metadata (Maybe Metadata)
    | TxHistory TxHistory
    | PrivateKey (Maybe XPrv)
    | BlockHeaders [BlockHeader]
    | Point SlotId
    
data Err 
    = NoCheckpoint
    | CheckpointAlreadyExists 
    | CannotRemovePendingTx ErrRemovePendingTx

data Resp
    = Resp (Either Err Success)
Sqlite State machine tests
    +++ OK, passed 800 tests:
        54.2% UnsuccessfulReadTxHistory
        53.1% SuccessfulReadCheckpoint
        51.9% TxUnsortedInputs
        51.7% TxUnsortedOutputs
        31.4% SuccessfulReadTxHistory
        28.5% UnsuccessfulReadCheckpoint
        26.9% ReadTxHistoryAfterDelete
        25.2% ReadMetaAfterPutCert
        24.9% PutCheckpointTwice
        20.0% RolledBackOnce
        17.4% RemovePendingTxTwice
        17.0% SuccessfulReadPrivateKey
Sqlite State machine tests
Falsifiable (after 74 tests and 17 shrinks):
   [ Command (CreateCheckpoint (...))
   , Command (PutDelegationCertificate (...))
   , Command (ReadCheckpoint (...))
   , Command (RollbackTo (...))
   , Command (PutDelegationCertificate (...))
   , Command (ReadMetadata (...))
   ]
   
PostconditionFailed ... (Model /= Sqlite)

HTTP (REST)

Node Chain Synchronization
    +++ OK, passed 100000 tests:
        64.697% started from scratch
        56.047% advanced more than k blocks
        53.147% started with an empty chain
        41.140% started with more than k blocks
        10.611% switched to a shorter chain
         7.015% switched to a longer chain
         6.882% rewinded without switch
         0.857% rolled back full k

        96.319% Intersection hit rate GREAT (75% - 100%)
         3.673% Intersection hit rate GOOD  (50% - 75%)
         0.008% Intersection hit rate POOR  (10% - 50%)

The local state is eventually synced with the node.

Node Chain Synchronization
    Assertion failed (after 1 test and 9 shrinks):
    Initial consumer chain: genesis
    Initial node chain:     genesis
    Applied blocks:         genesis 0001 0002 0003 0004 0005 0006 
    Node chain:             genesis 0001 0002 0003 0004 0005 0006 0007 
    Logs:                 
    [CONSUMER] nextBlocks SlotId 0.0
    [PRODUCER] getTipId
        ↳ genesis
    [NETWORK ] NodeAddBlocks [0001]
    [PRODUCER] getBlock "genesis"
        ↳ 0.0:genesis
    [NETWORK ] NodeAddBlocks [0002]
    [CONSUMER] AwaitReply
    [CONSUMER] nextBlocks 0.0
    [PRODUCER] getTipId
        ↳ 0002
    [NETWORK ] NodeAddBlocks [0003]
    [PRODUCER] getBlock "0002"
        ↳ 0.2:0002->0001
    [NETWORK ] NodeAddBlocks [0004]
    [PRODUCER] getBlock "0001"
        ↳ 0.1:0001->genesis
    ...

Merci.

Property Based Testing

By Matthias Benkort

Property Based Testing

  • 95