Property-Based Testing
Matthias Benkort
github/KtorZ



Cardano
WHY DO
WE TEST
?
... TO PROVE
A PROGRAM
IS CORRECT
WRONG
WE TEST TO
FIND BUGS

Program testing can be used to show the presence of bugs, but never their absence!
E. Dijkstra
Things humans are good at
Catching bugs
Finding edge-cases
Evaluating behaviors of
complex programs
Extrapolating
Things to do when testing
Writing bugs
PROPERTY TESTING?
WHAT IS
property
? counter-example ?
data
something always true
λ> quickCheck $ \(x :: Integer) -> x === 1
*** Failed! Falsified (after 1 test):
x: 0
0 /= 1λ> quickCheck $ \(xs :: [Char]) -> length xs === 0
*** Failed! Falsified (after 3 tests and 2 shrinks):
xs: "a"
1 /= 0λ> quickCheck $ \(x :: Integer) (y :: Integer) -> x /= 0 ==> y+x =/= y*x
+++ OK, passed 100 tests; 12 discarded.λ> quickCheck $ \(x :: Integer) (y :: Integer) -> x /= 0 ==> y+x =/= y*x
*** Failed! Falsified (after 3 tests):
x: 2
y: 2
4 == 4\forall x \in \N. x = 1
\forall xs \in \alpha^*. length(xs) = 0
\forall (x,y) \in \N\times\N. x \ne 0 \implies x+y \ne x*y

| C | Coq | Go | Lua | PHP | Scala |
| C++ | D | IO | Node.js | Python | Scheme |
| C# | Elm | Java | Objective-C | R | SmallTalk |
| Chicken | Elixir | JavaScript | OCaml | Racket | Swift |
| Clojure | Erlang | Julia | Perl | Ruby | TypeScript |
| Lisp | F# | Logtalk | Prolog | Rust | VisualBasic |
sort :: [x] -> [x]
length(xs) = length(sort(xs))
sort(sort(xs)) = sort(xs)
sort(xs) = xs
!
assert (sort [] == [])
assert (sort [2,1] == [1,2])
assert (sort [1,2,1] == [1,1,2])λ> quickCheck . verbose $ \(xs :: [Int]) -> xs === sort xs
Passed: []
Passed: [-1,1]
Failed: [-3,1,-1]
-------------------- Shrinking [-3,1,-1]
Passed: []
Failed: [1,-1]
-------------------- Shrinking [1,-1]
Passed: []
Passed: [-1]
Passed: [1]
Failed: [0,-1]
-------------------- Shrinking [0,-1]
Passed: []
Passed: [-1]
Passed: [0]
Passed: [0,1]
Passed: [0,0]
*** Failed! Falsified (after 3 tests and 2 shrinks):
[0,-1]
[0,-1] /= [-1,0]Generating
Labelling
Shrinking
Generating...
(random) data from composable primitives.
Shrinking...
complex data-structures down to "minimal" counter-examples.
Labelling...
properties to measure efficiency and reason about them.
GREAT TEST RUNNER
WHAT ABOUT
THE REAL WORLD
Idempotence
Oracles
?
Labyrinth
Roundtrips
Testing The Ugly
Actual State
Actual State
Actual Operation
Model State
Model State
Model Operation
Semantic Interpretation
Semantic Interpretation






write a
write b
read a
read a
delete b
{
"a": ...
}
{}{
"a": ...,
"b": ...
}{
"a": ...,
"b": ...
}{
"a": ...,
"b": ...
}{
"a": ...
}\forall cs \in Command*. \forall c \in cs.
\newline
model(c) = abstraction(actual(c))
data Cmd
= CreateCheckpoint Checkpoint Metadata TxHistory
| PutCheckpoint Checkpoint
| ReadCheckpoint
| ListCheckpoints
| RollbackTo SlotId
| PutMetadata Metadata
| ReadMetadata
| PutTxHistory TxHistory
| ReadTxHistory SortOrder (Range SlotId) (Maybe TxStatus)
| RemovePendingTx (Hash "Tx")
| PutPrivateKey XPrv
| ReadPrivateKey
| PutDelegationCertificate DelegationCertificate SlotIddata Success
= Unit ()
| Checkpoint (Maybe Checkpoint)
| Metadata (Maybe Metadata)
| TxHistory TxHistory
| PrivateKey (Maybe XPrv)
| BlockHeaders [BlockHeader]
| Point SlotId
data Err
= NoCheckpoint
| CheckpointAlreadyExists
| CannotRemovePendingTx ErrRemovePendingTx
data Resp
= Resp (Either Err Success)Sqlite State machine tests
+++ OK, passed 800 tests:
54.2% UnsuccessfulReadTxHistory
53.1% SuccessfulReadCheckpoint
51.9% TxUnsortedInputs
51.7% TxUnsortedOutputs
31.4% SuccessfulReadTxHistory
28.5% UnsuccessfulReadCheckpoint
26.9% ReadTxHistoryAfterDelete
25.2% ReadMetaAfterPutCert
24.9% PutCheckpointTwice
20.0% RolledBackOnce
17.4% RemovePendingTxTwice
17.0% SuccessfulReadPrivateKeySqlite State machine tests
Falsifiable (after 74 tests and 17 shrinks):
[ Command (CreateCheckpoint (...))
, Command (PutDelegationCertificate (...))
, Command (ReadCheckpoint (...))
, Command (RollbackTo (...))
, Command (PutDelegationCertificate (...))
, Command (ReadMetadata (...))
]
PostconditionFailed ... (Model /= Sqlite)HTTP (REST)
Node Chain Synchronization
+++ OK, passed 100000 tests:
64.697% started from scratch
56.047% advanced more than k blocks
53.147% started with an empty chain
41.140% started with more than k blocks
10.611% switched to a shorter chain
7.015% switched to a longer chain
6.882% rewinded without switch
0.857% rolled back full k
96.319% Intersection hit rate GREAT (75% - 100%)
3.673% Intersection hit rate GOOD (50% - 75%)
0.008% Intersection hit rate POOR (10% - 50%)The local state is eventually synced with the node.
Node Chain Synchronization
Assertion failed (after 1 test and 9 shrinks):
Initial consumer chain: genesis
Initial node chain: genesis
Applied blocks: genesis 0001 0002 0003 0004 0005 0006
Node chain: genesis 0001 0002 0003 0004 0005 0006 0007
Logs:
[CONSUMER] nextBlocks SlotId 0.0
[PRODUCER] getTipId
↳ genesis
[NETWORK ] NodeAddBlocks [0001]
[PRODUCER] getBlock "genesis"
↳ 0.0:genesis
[NETWORK ] NodeAddBlocks [0002]
[CONSUMER] AwaitReply
[CONSUMER] nextBlocks 0.0
[PRODUCER] getTipId
↳ 0002
[NETWORK ] NodeAddBlocks [0003]
[PRODUCER] getBlock "0002"
↳ 0.2:0002->0001
[NETWORK ] NodeAddBlocks [0004]
[PRODUCER] getBlock "0001"
↳ 0.1:0001->genesis
...Merci.
Property Based Testing
By Matthias Benkort
Property Based Testing
- 95