Fuzzying test in Go

David Chou @ Golang Taipei

CC-BY-SA-3.0-TW

@ Umbo Computer Vision

@ Golang Taipei Co-organizer 🙋‍♂️

Software engineer, DevOps, and Gopher 👨‍💻

david74.chou @ gmail

david74.chou @ facebook

david74.chou @ medium

david7482 @ github

What is fuzzing test?

wiki: an automated testing that provides random data as inputs to a computer program.

A brief history of fuzzing

1950s:

1988: term fuzzing is coined by Barton Miller

We didn't call it fuzzing back in the 1950s, but it was our standard practice to test programs by inputting decks of punch cards taken from the trash. This type of testing was so common that it had no name. - Gerald M. Weinberg

Fuzzing is the process of sending intentionally invalid data to a product in the hopes of triggering an error.
- H.D. Moore

Fuzzing test

Continuously manipulate inputs
Semi-random data from various mutation
Discover new code coverage based on instrumentation
Run more mutations quickly;
rather than fewer mutations intelligently

https://blog.code-intelligence.com/the-magic-behind-feedback-based-fuzzing

What can be fuzzed?

deserialization (xml, json, proto, gob)
network protocols (HTTP, SMTP)
media codecs (audio, video, images, pdf)
crypto (boringssl, openssl)
compression (zip, gzip, bzip2, brotli)
etc

Why do we need fuzzing?

you don't know what you don't know

Why do we need fuzzing?

Fuzzing can reach edge cases which humans often miss
It is particularly valuable for finding vulnerabilities
Also a good choice for regression testing
Lots of real-world Trophies
- found 15000+ bugs in Chrome [link]
- found 1500+ bugs in FFMPEG [link]

A simple example

func CountAverage(num []byte) int {
	sum := byte(0)
	for _, v := range num {
		sum += v
	}
	return int(sum) / len(num)
}

func TestCountAverage(t *testing.T) {
	tests := []struct {
		name string
		num []byte
		want int
	}{
		{
			num: []byte{1, 2, 3, 4, 5},
			want: 3,
		},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			got := CountAverage(tt.num)
			assert.EqualValues(t, tt.want, got)
		})
	}
}

$ go test -run TestCountAverage -cover
PASS
coverage: 100.0% of statements

A real-world example:

OpenSSL Heartbleed

Heartbleed fuzzing

150255 REDUCE cov: 485 ft: 756 corp: 38/15713b exec/s: 25042 
              rss: 402Mb L: 2891/2891 MS: 1 EraseBytes-
=================================================================
==6098==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000009748 at pc 0x0000005133a2 bp 0x7fffe29233c0 sp 0x7fffe2922b70
READ of size 48830 at 0x629000009748 thread T0
   #0 0x5133a1 in __asan_memcpy (/app/handshake-fuzzer+0x5133a1)
   1 0x5630c8 in tls1_process_heartbeat /app/openssl-1.0.1f/ssl/t1_lib.c:2586:3
   #2 0x5cfa9d in ssl3_read_bytes /app/openssl-1.0.1f/ssl/s3_pkt.c:1092:4
   #3 0x5d42da in ssl3_get_message /app/openssl-1.0.1f/ssl/s3_both.c:457:7
   #4 0x59f537 in ssl3_get_client_hello /app/openssl-1.0.1f/ssl/s3_srvr.c:941:4
   #5 0x59b5a9 in ssl3_accept /app/openssl-1.0.1f/ssl/s3_srvr.c:357:9
   #6 0x551335 in LLVMFuzzerTestOneInput /app/handshake-fuzzer.cc:66:3

...

SUMMARY: AddressSanitizer: heap-buffer-overflow (/app/handshake-fuzzer+0x5133a1) in __asan_memcpy

https://gitlab.com/gitlab-org/security-products/demos/coverage-fuzzing/heartbleed-fuzzing-example

Also works for logical bugs

Sanity check still works
- the result must be within [0, 1) range
- image decoder: 100 byte input -> 100 MB output?
- encrypt, check decryption would fail with wrong key
- sorting: each element exists and the order is expected

Also works for logical bugs

Roud-trip test
- deserialize -> serialize -> deserialize
- decompress/compress, decrypt/encrypt
Check
- serialize does not fail
- 2nd deserialize does not fail
- deserialize results are equal

Fuzzing test in Go

go-fuzz to the rescue

go-fuzz

Dmitry Vyukov, Google
A successful 3rd-party Go fuzzing solution
It found 200+ bugs in go stdlib, and thousands more
Coverage-based fuzzing

Instrument program for code coverage
Collect initial corpus of inputs
for {
    Randomly mutate an input from the corpus
    Execute and collect coverage
    If the input gives new coverage, add it to corpus
}

1. Write fuzz function

// +build gofuzz

func Fuzz(data []byte) int {
  gob.NewDecoder(bytes.NewReader(data)).Decode(new(interface{}))
  return 0
}

2. Build

go get github.com/dvyukov/go-fuzz/...
go-fuzz-build github.com/dvyukov/go-fuzz-corpus/gob

3. Run

go-fuzz -bin gob-fuzz.zip -workdir ./workdir

workers: 8, corpus: 1525 (6s ago), crashers: 6, execs: 0 (0/sec), cover: 1651, uptime: 6s
workers: 8, corpus: 1525 (9s ago), crashers: 6, execs: 16787 (1860/sec), cover: 1651, uptime: 9s
workers: 8, corpus: 1525 (12s ago), crashers: 6, execs: 29840 (2482/sec), cover: 1651, uptime: 12s

go-fuzz's problems

Might break (multiple times) due to Go internal package changes.
It tries to do coverage instrumentation without compiler's help.
More difficult to use compared to Go's unit testing
- custom command-line tools
  separate test files or build tags, etc.

Go's official fuzzing proposal

go test -fuzz

Official proposal [link]
Write fuzz function just like test function
- func FuzzFoo(f *testing.F)
Integrate with go command
- go test -fuzz
Coveraged-based fuzzing
Plan to land in 1.18

Already beta now

func FuzzCountAverage(f *testing.F) {
	f.Add([]byte{1})
	f.Fuzz(func(t *testing.T, num []byte) {
		CountAverage(num)
	})
}

The fuzz target is a FuzzX function
Each fuzz target has its own corpus input
testing.F
- f.Add(): add seed corpus
- f.Fuzz(): run the fuzz function

$ gotip test -fuzz=FuzzCountAverage -parallel=2
fuzzing, elapsed: 3.0s, execs: 40648 (13549/sec), workers: 2, interesting: 3
fuzzing, elapsed: 3.4s, execs: 44291 (13157/sec), workers: 2, interesting: 3
found a crash, minimizing...
--- FAIL: FuzzCountAverage (3.37s)
        panic: runtime error: integer divide by zero
        goroutine 21364 [running]:
        runtime/debug.Stack()
                /home/david74/sdk/gotip/src/runtime/debug/stack.go:24 +0x90
        testing.tRunner.func1.2({0x69e4c0, 0x887760})
                /home/david74/sdk/gotip/src/testing/testing.go:1281 +0x267
        testing.tRunner.func1()
                /home/david74/sdk/gotip/src/testing/testing.go:1288 +0x218
        panic({0x69e4c0, 0x887760})
                /home/david74/sdk/gotip/src/runtime/panic.go:1038 +0x215
        github.com/david7482/go-fuzzing-playground.CountAverage({0xc000246000, 0x0, 0x0})
                /home/david74/projects/go-fuzzing-playground/count_average.go:8 +0xa5
        ...
        --- FAIL: FuzzCountAverage (0.00s)
    
Crash written to 
  testdata/corpus/FuzzCountAverage/d40a98862ed393eb712e47a91bcef18e6f24cf368bb4bd248c7a7101ef8e178d
To re-run:
go test github.com/david7482/go-fuzzing-playground \
  -run=FuzzCountAverage/d40a98862ed393eb712e47a91bcef18e6f24cf368bb4bd248c7a7101ef8e178d

func FuzzUnmarshal(f *testing.F) {
	f.Add([]byte{1})
	f.Fuzz(func(t *testing.T, num []byte) {
		var v interface{}
		_ = yaml.Unmarshal([]byte(input), &v)
	})
}

go-yaml/yaml

$ gotip test -fuzz=FuzzUnmarshal
fuzzing, elapsed: 3.0s, execs: 62242 (20740/sec), workers: 4, interesting: 41
fuzzing, elapsed: 6.0s, execs: 127025 (21168/sec), workers: 4, interesting: 48
...
fuzzing, elapsed: 1794.0s, execs: 39365685 (21943/sec), workers: 4, interesting: 324
fuzzing, elapsed: 1796.9s, execs: 39427737 (21942/sec), workers: 4, interesting: 324
found a crash, minimizing...
--- FAIL: FuzzUnmarshal (1796.90s)
        panic: runtime error: invalid memory address or nil pointer dereference
        goroutine 9884315 [running]:
        panic({0x72d820, 0x93abe0})
                /home/ubuntu/sdk/gotip/src/runtime/panic.go:1038 +0x215
        gopkg.in/yaml%2ev3.handleErr(0xc00007f6b0)
                /home/ubuntu/go/pkg/mod/gopkg.in/yaml.v3@v3.0.0-20210107192922-496545a6307b/yaml.go:294 +0xc5
        panic({0x72d820, 0x93abe0})
                /home/ubuntu/sdk/gotip/src/runtime/panic.go:1038 +0x215
        gopkg.in/yaml%2ev3.yaml_parser_split_stem_comment(0xc00bf34c00, 0x1)
                /home/ubuntu/go/pkg/mod/gopkg.in/yaml.v3@v3.0.0-20210107192922-496545a6307b/parserc.go:789 +0x6a
        gopkg.in/yaml%2ev3.yaml_parser_parse_block_sequence_entry(0xc00bf34c00, 0xc00bf34eb0, 0x0)
                /home/ubuntu/go/pkg/mod/gopkg.in/yaml.v3@v3.0.0-20210107192922-496545a6307b/parserc.go:703 +0x293
        gopkg.in/yaml%2ev3.yaml_parser_state_machine(0xc00bf34c00, 0x40df54)
        ...
        --- FAIL: FuzzUnmarshal (0.00s)

Crash written to 
  testdata/corpus/FuzzUnmarshal/9c9e78ca4b2c797536d2fbe662c68321c5c3ab6df680664b23c913799fc7f092
To re-run:
go test gopkg.in/yaml.v2 \
  -run=FuzzUnmarshal/9c9e78ca4b2c797536d2fbe662c68321c5c3ab6df680664b23c913799fc7f092

package main

import (
	"fmt"

	"gopkg.in/yaml.v3"
)

func main() {
	in := "#\n-[["

	var n yaml.Node
	if err := yaml.Unmarshal([]byte(in), &n); err != nil {
		fmt.Println(err)
	}
}

It does fuzzing with multiple processes
Seed corpus folder: ${pkg}/testdata/corpus
Seed corpus = seeds in files + seeds in test
A good seed corpus can save the mutation engine a lot of work
Regression test
- go test (no -fuzz) also runs Fuzz() functions with seed corpus as input

Current limitation

Only support []byte and primitive types
No struct type, slice and array support
Cannot run multiple fuzzers in the same pkg
Cannot keep running after a crash is found
Cannot convert existing files to the corpus format

go test fuzz v1
float(45.241)
int(12345)
[]byte("ABC\xa8\x8c\xb3G\xfc")

How "go test -fuzz" works

show me the codes

Instrument program for code coverage
Collect initial corpus of inputs
for {
    Randomly mutate an input from the corpus
    Execute and collect coverage
    If the input gives new coverage, add it to corpus
}

The architecture of "go test -fuzz"
How it collects code coverage
How it mutates input data

Coordinator

Worker

run & ping workers
ask workers to fuzz next input
write to seed corpus if crash
write to corpus cache if new edge

RPC
- request <-> response
- command: pipe
- input data: shm

mutate input
run fuzz function
collect coverage
return crash or new edge; otherwise cont.

Compiler instrumentation

// edge inserts coverage instrumentation for libfuzzer.
func (o *orderState) edge() {
	// Create a new uint8 counter to be allocated in section
	// __libfuzzer_extra_counters.
	counter := staticinit.StaticName(types.Types[types.TUINT8])
	counter.SetLibfuzzerExtraCounter(true)

	// counter += 1
	incr := ir.NewAssignOpStmt(base.Pos, ir.OADD, counter, ir.NewInt(1))
	o.append(incr)
}

edge() inserts coverage instrumentation

Compiler instrumentation

func (o *orderState) stmt(n ir.Node) {
    switch n.Op() {
    ...
    case ir.OFOR:
        edge()
    case ir.OIF:
        edge()
    case ir.ORANGE:
        edge()
    case ir.OSELECT:
        edge()
    case ir.OSWITCH:
        edge()
    ...
    }
}

compiler adds edge() into each edge

Compiler instrumentation

// _counters and _ecounters mark the start and end, respectively, of where
// the 8-bit coverage counters reside in memory. They're known to cmd/link,
// which specially assigns their addresses for this purpose.
var _counters, _ecounters [0]byte

func coverage() []byte {
	addr := unsafe.Pointer(&_counters)
	size := uintptr(unsafe.Pointer(&_ecounters)) - uintptr(addr)

	var res []byte
	*(*unsafeheader.Slice)(unsafe.Pointer(&res)) = unsafeheader.Slice{
		Data: addr,
		Len:  int(size),
		Cap:  int(size),
	}
	return res
}

coverage() returns the coverage counters

The mutators

var byteSliceMutators = []byteSliceMutator{
	byteSliceRemoveBytes,
	byteSliceInsertRandomBytes,
	byteSliceDuplicateBytes,
	byteSliceOverwriteBytes,
	byteSliceBitFlip,
	byteSliceXORByte,
	byteSliceSwapByte,
	byteSliceOverwriteInterestingUint8,
	byteSliceOverwriteInterestingUint16,
	byteSliceOverwriteInterestingUint32,
	byteSliceInsertConstantBytes,
	byteSliceOverwriteConstantBytes,
	byteSliceShuffleBytes,
	byteSliceSwapBytes,
	....
}

func (m *mutator) mutateBytes(ptrB *[]byte)

func (m *mutator) mutateInt(v, maxValue int64) int64

func (m *mutator) mutateUInt(v, maxValue uint64) uint64

func (m *mutator) mutateFloat(v, maxValue float64) float64

fuzzing test
the benefit of fuzzing
go-fuzz project
go official fuzzing solution
continuous fuzzing ????

Fuzzying test in Go

@ Umbo Computer Vision

@ Golang Taipei Co-organizer 🙋‍♂️

Software engineer, DevOps, and Gopher 👨‍💻

What is fuzzing test?

wiki: an automated testing that provides random data as inputs to a computer program.

A brief history of fuzzing

1950s:

1988: term fuzzing is coined by Barton Miller

Fuzzing is the process of sending intentionally invalid data to a product in the hopes of triggering an error. - H.D. Moore

Fuzzing test

Continuously manipulate inputs

Semi-random data from various mutation

Discover new code coverage based on instrumentation

Run more mutations quickly; rather than fewer mutations intelligently

What can be fuzzed?

deserialization (xml, json, proto, gob)

network protocols (HTTP, SMTP)

media codecs (audio, video, images, pdf)

crypto (boringssl, openssl)

compression (zip, gzip, bzip2, brotli)

etc

Why do we need fuzzing?

you don't know what you don't know

Why do we need fuzzing?

Fuzzing can reach edge cases which humans often miss

It is particularly valuable for finding vulnerabilities

Also a good choice for regression testing

Lots of real-world Trophies

found 15000+ bugs in Chrome [link]

found 1500+ bugs in FFMPEG [link]

A simple example

A real-world example:

OpenSSL Heartbleed

Heartbleed fuzzing

Also works for logical bugs

Sanity check still works

the result must be within [0, 1) range

image decoder: 100 byte input -> 100 MB output?

encrypt, check decryption would fail with wrong key

sorting: each element exists and the order is expected

Also works for logical bugs

Roud-trip test

deserialize -> serialize -> deserialize

decompress/compress, decrypt/encrypt

Check

serialize does not fail

2nd deserialize does not fail

deserialize results are equal

Fuzzing test in Go

go-fuzz to the rescue

go-fuzz

Dmitry Vyukov, Google

A successful 3rd-party Go fuzzing solution

It found 200+ bugs in go stdlib, and thousands more

Coverage-based fuzzing

1. Write fuzz function

2. Build

3. Run

go-fuzz's problems

Might break (multiple times) due to Go internal package changes.

It tries to do coverage instrumentation without compiler's help.

More difficult to use compared to Go's unit testing

custom command-line tools separate test files or build tags, etc.

Go's official fuzzing proposal

go test -fuzz

Official proposal [link]

Write fuzz function just like test function

func FuzzFoo(f *testing.F)

​Integrate with go command

go test -fuzz

Coveraged-based fuzzing

Plan to land in 1.18

Already beta now

The fuzz target is a FuzzX function

Each fuzz target has its own corpus input

testing.F

f.Add(): add seed corpus

f.Fuzz(): run the fuzz function

go-yaml/yaml

Fuzzing is the process of sending intentionally invalid data to a product in the hopes of triggering an error.
- H.D. Moore

Run more mutations quickly;
rather than fewer mutations intelligently

custom command-line tools
separate test files or build tags, etc.

Integrate with go command

Only support []byte and primitive types
No struct type, slice and array support