Do you write tests?

Who code?

Do you like writing tests?

What is the most difficult part of writing tests?

Importing the code
Organizing the tests
Think of test cases

Hello I am Cheuk

Open-Source contributor
Organisers of community events
PSF director and fellow
Community manager at OpenSSF

Have you heard of property-based testing?

A new way of writing tests...

Property

the given is obvious
works extra well with typing
edge case automatically be found

Example

need to think of what is and what is not
take extra steps to write examples
may overlook edge cases

Testing by...

Introducing...

Hypothesis

Can be used with pytest or unitest

How does Hypothesis do it?

decorators

entry point to modify the test

strategies

generating test data

 def encode(input_string):
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst
 
 
def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q def encode(input_string):
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst
 
 
def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q def encode(input_string):
    count = 1
    prev = ""
    lst = []
    for character in input_string:
        if character != prev:
            if prev:
                entry = (prev, count)
                lst.append(entry)
            count = 1
            prev = character
        else:
            count += 1
    entry = (character, count)
    lst.append(entry)
    return lst
 
 
def decode(lst):
    q = ""
    for character, count in lst:
        q += character * count
    return q

 from hypothesis import given
from hypothesis.strategies import text
 
 
@given(text())
def test_decode_inverts_encode(s):
    assert decode(encode(s)) == s from hypothesis import given
from hypothesis.strategies import text
 
 
@given(text())
def test_decode_inverts_encode(s):
    assert decode(encode(s)) == s from hypothesis import given
from hypothesis.strategies import text
 
 
@given(text())
def test_decode_inverts_encode(s):
    assert decode(encode(s)) == s

Details: https://hypothesis.readthedocs.io/en/latest/quickstart.html

Who uses numpy and/or pandas?

You are in luck

Hypothesis for the scientific stack

Numpy

Have to install the extra like this:
`pip install hypothesis[numpy]`
located at `hypothesis.extra.numpy`
Provide strategies for scalar and array dtypes
https://hypothesis.readthedocs.io/en/latest/numpy.html

Pandas

Also need to install the extra like this:
`pip install hypothesis[numpy]`
located at `hypothesis.extra.pandas`
Provide strategies for pd.Index, pd.Series and pd.DataFrame
https://hypothesis.readthedocs.io/en/latest/numpy.html

What if I am really lazy?

But I hope you are still using typing 🙃

We have the 👻 ghostwriter for you

Using the typing as a hint
pick the right strategy for input parameters
CLI tool included
Automatically `black` the code for you

We have the 👻 ghostwriter for you

 hypothesis write gzip

Just type in command line...

Ghostwriters 👻 available are...

Details: https://hypothesis.readthedocs.io/en/latest/ghostwriter.html

Fuzz

checks that valid input only leads to expected exceptions

 from re import compile, error
 
from hypothesis.extra import ghostwriter
 
ghostwriter.fuzz(compile, except_=error)

Idempotent

result does not change when use the funciton on its own output

 from typing import Sequence
 
from hypothesis.extra import ghostwriter
 
 
def timsort(seq: Sequence[int]) -> Sequence[int]:
    return sorted(seq)
 
 
ghostwriter.idempotent(timsort)

Roundtrip

calling the 2nd function to the result of the 1st one will go back to the input

 import json
 
from hypothesis.extra import ghostwriter
 
ghostwriter.roundtrip(json.dumps, json.loads)

Equivalent

check the 1st function has the same effect as the 2nd function

 import math
 
from hypothesis.extra import ghostwriter
 
 
def my_pow(x, y):
  result = 1.0
  for _ in range(y):
    result *= x
  return result
 
 
ghostwriter.equivalent(my_pow, math.pow)

... and 2 more

binary_operation : for testing binary operators

ufunc : for Numpy array ufunc

Using the CLI tool

Let's try 😉

... few things to consider though...

tests can run much slower
(generate strategies are expansive)
tests can be harder to understand
if no typing not much can be done
(don't be too lazy 👻)

Try using property-based testing
Try using Hypothesis
Learn more about Hypothesis with pandas:
https://github.com/Cheukting/hypothesis-dataframe

Following up

Thank you ❤️

Cheuk Ting Ho

@cheukting_ho@fosstodon

Cheukting

https://cheuk.dev

@cheuktingho

https://slides.com/cheukting_ho/writing-tests-use-hypothesis

	def encode(input_string):
	count = 1
	prev = ""
	lst = []
	for character in input_string:
	if character != prev:
	if prev:
	entry = (prev, count)
	lst.append(entry)
	count = 1
	prev = character
	else:
	count += 1
	entry = (character, count)
	lst.append(entry)
	return lst


	def decode(lst):
	q = ""
	for character, count in lst:
	q += character * count
	return q

	from hypothesis import given
	from hypothesis.strategies import text


	@given(text())
	def test_decode_inverts_encode(s):
	assert decode(encode(s)) == s

	from re import compile, error

	from hypothesis.extra import ghostwriter

	ghostwriter.fuzz(compile, except_=error)

	from typing import Sequence

	from hypothesis.extra import ghostwriter


	def timsort(seq: Sequence[int]) -> Sequence[int]:
	return sorted(seq)


	ghostwriter.idempotent(timsort)

	import json

	from hypothesis.extra import ghostwriter

	ghostwriter.roundtrip(json.dumps, json.loads)

	import math

	from hypothesis.extra import ghostwriter


	def my_pow(x, y):
	result = 1.0
	for _ in range(y):
	result *= x
	return result


	ghostwriter.equivalent(my_pow, math.pow)

Use Hypothesis

whether you like writing tests or not

Use Hypothesis, whether you like writing tests or not

More from Cheuk Ting Ho