Go Dynamic Tools Gophercon 2015, July 9, 2015 Dmitry Vyukov Google dvyukov@ * Video A video of this talk was recorded at GopherCon in Denver. .link https://www.youtube.com/watch?v=a9xrxRsIbSU Watch the talk on YouTube * About me Did a bunch of work on Go: - Scalable goroutine scheduler - Integrated network poller - Parallel GC, concurrent sweeping - Memory allocator speed/space improvements - Sync primitives - Race detector - Blocking profile - 800+ commits, filed 500+ bugs But actually on dynamic testing tools team: - Thread/Address/MemorySanitizer * Agenda - Data race detector - Go-fuzz, randomized testing system - Execution tracer * Data race detector .image dynamic-tools/philosoraptor.png * What is a data race? A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. *All*bets*are*off*!* Any data race can destroy the memory/type-safety of a Go program. * There are no "benign" data races // goroutine 1 // goroutine 2 m[k1] = v1 m[k2] = v2 Bad! // goroutine 1 // goroutine 2 stat++ stat++ Also bad! Compilers assume race-free programs and do aggressive optimizations based on that assumption (e.g. assume "ownership" over written-to variables). Races are non-deterministic and hard to debug. * Usage $ go test -race mypkg // to test the package $ go run -race mysrc.go // to run the source file $ go build -race mycmd // to build the command $ go install -race mypkg // to install the package That's it! * Example package main func main() { m := make(map[int]int) go func() { m[1] = 1 }() m[2] = 2 } * Example report WARNING: DATA RACE Write by goroutine 5: runtime.mapassign1() runtime/hashmap.go:411 +0x0 main.main.func1() race.go:6 +0x60 Previous write by main goroutine: runtime.mapassign1() runtime/hashmap.go:411 +0x0 main.main() race.go:8 +0xb6 Goroutine 5 (running) created at: main.main() race.go:7 +0x76 * Achievements - 70+ bugs in std lib - 350+??? bugs in Google internal code base - ??? bugs found in the wild * Instrumentation Compiler instrumentation pass enabled by -race. func foo(p *int) { *p = 1 } Becomes: func foo(p *int) { runtime.funcenter(caller_pc) runtime.racewrite(p) *p = 1 runtime.funcexit() } * Run-time module Handles: - memory accesses (to catch racy accesses) - synchronization (to not produce false reports) - function calls (to collect stack traces) - goroutine creation/exit (to keep track of live goroutines) Algorithm is based on dynamic modelling of happens-before relation: - no false positives - false negatives are possible * Usage tips Dynamic tools are only as good as your tests are. - write good *concurrent* tests - have continuous build with race detector - run integration tests - run race-enabled canaries in production * Go-fuzz .image dynamic-tools/go-fuzz.png * Randomized testing A different approach to testing that finds [lots of] bugs that other testing approaches do not. Intended mostly for programs that parse complex inputs. Generate random blob -> feed into program -> see if it crashes -> profit! - cheap to use - does not have any bias Completely random blobs won't uncover lots of bugs. How can we generate diverse but meaningful inputs that will trigger nil derefs, off-by-ones, etc? * Coverage-guided fuzzing Genetic algorithms to the rescue! Instrument program for code coverage Collect initial corpus of inputs for { Randomly mutate an input from the corpus Execute and collect coverage If the input gives new coverage, add it to corpus } * Example The following code wants "ABCD" input: if input[0] == 'A' { if input[1] == 'B' { if input[2] == 'C' { if input[3] == 'D' { panic("input must not be ABCD") } } } } Corpus progression: "" "", "A" "", "A", "AB" "", "A", "AB", "ABC" "", "A", "AB", "ABC", "ABCD" * Game over CRC32 checksum verification in `image/png/reader.go` func (d *decoder) verifyChecksum() error { if binary.BigEndian.Uint32(d.tmp[:4]) != d.crc.Sum32() { return FormatError("invalid checksum") } return nil } Probability that random mutations will alter input in an interesting way and guess CRC32 at the same time is basically ZERO. * Sonar Don't need to guess, program knows it! + v1 := binary.BigEndian.Uint32(d.tmp[:4]) + v2 := d.crc.Sum32() + __go_fuzz.Sonar(v1, v2) if v1 != v2 { return FormatError("invalid checksum") } Then, find v1 in the input and replace it with v2. Done! * Game over 2 Mutations and sonar do low-level changes ("bit-flipping"): Original: `100` Mutated: `100<` Also want high-level changes! * Versifier Versifier reverse-engineers [text] protocol and learns its _structure_. abc -> alphanum token 123, 1e-2 -> number "..." -> quoted [...] -> parenthesized ...,...,... -> list ...\n...\n -> lines Then, applies _structural_ mutations to inputs. * Versifier example Original: `100` Versified (all valid xml): = -026023767521520230564132665e0333302100 510-9e-07036514 prop name="p"/}01e-6 1008 * Algorithm .image dynamic-tools/algo.png * Achievements - 115 bugs in std lib (66 fixed) - 43 bugs in golang.org/x/... (24 fixed) - 134 elsewhere * Achievements fmt.Sprintf("%.[]") panic: runtime error: index out of range regexp.MustCompile("((0(0){0}))").ReplaceAllString("00000", "00$00") panic: runtime error: slice bounds out of range ioutil.ReadAll(flate.NewReader(strings.NewReader("4LJNIMK\a\x00\x00\xff..\xff.....\xff"))) runs forever var x = 1/"."[0] crashes compiler archive/tar: hang archive/zip: cap out of range encoding/gob: stack overflow encoding/asn1: index out of range image/jpeg: Decode hangs image/png: nil deref math/big: incorrect string->Float conversion crypto/x509: division by zero ... * Usage - go get github.com/dvyukov/go-fuzz/... - write test: func Fuzz(data []byte) int { gob.NewDecoder(bytes.NewReader(data)) return 0 } - build $ go-fuzz-build github.com/dvyukov/go-fuzz/examples/gob - collect corpus - run $ go-fuzz -bin=gob-fuzz.zip -workdir=examples/gob * Execution tracer .image dynamic-tools/tracer.png * Execution tracer Gives insight into dynamic execution of a program. Captures with nanosecond precision: - goroutine creation/start/end - goroutine blocking/unblocking - network blocking - system calls - GC events * Execution tracer .image dynamic-tools/trace.png 450 _ * Recap - race detector: always use for testing (-race) - go-fuzz: parsing of complex inputs (github.com/dvyukov/go-fuzz) - execution tracer: deep dive into execution (-trace)