Static analysis tools for Go code comprehension and refactoring golang nyc meetup 13 Nov 2014 Alan Donovan Google, New York adonovan@google.com * Video This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014. .link http://vimeo.com/114736889 Watch the talk on Vimeo * Introduction Programmers say "writing code", but most of that time is spent _reading_ Actually: reading code, and making logical deductions Machines are good at reading and logic *Machines* *should* *be* *helping* *us* *read* *code.* And write it! Refactoring can be tedious and error-prone. * Outline - Code comprehension tools: `oracle`, `godoc` - Analysis technology - Refactoring tools: `gorename`, `eg` * Demo: the Go oracle Supports interactive, editor-integrated queries: - where is this thing defined? - what are the methods of this type? - what are the free variables of this selection? - what might this expression point to? - what dynamic types might this interface or `reflect.Value` contain? * Demo: godoc analysis features .link /lib/godoc/analysis/help.html godoc -analysis=type,pointer Package view - method set and _implements_ relation for every type - internal call graph of every package Source code view - kind, type, definition of every identifier - method set and _implements_ relation for every type - peers of every channel send/receive - callers of every function - callees of every call site (even dynamic) * How it works * Libraries and tools .image static-analysis/tools.svg Mostly in `golang.org/x/tools` repo * go/types: the Go type checker Computes mappings from: - identifiers to definitions - constant expressions to values - expressions to types Many subtle corners: - method set computation - recursive interfaces - shifts Making it correct, fast, and clean was a substantial project .link https://pkg.go.dev/golang.org/x/tools/go/types golang.org/x/tools/go/types Author: Robert Griesemer * go/ssa: intermediate representation (IR) Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly Programs are lowered into Static Single-Assignment form (SSA): - simplifies dataflow analyses since _reaching_ _definitions_ are implicit - invented 1991, now mainstream (gcc, llvm) All Go programs can be expressed using only ~30 basic instructions Simple, explicit, high-level, high source fidelity .link https://pkg.go.dev/golang.org/x/tools/go/ssa golang.org/x/tools/go/ssa The llgo project is using go/ssa as a front-end for LLVM * Demo: ssadump # Simple fibonacci function # % ssadump -build=F fib.go basic # % ssadump -build=FD fib.go debug info # % ssadump -test -run unicode toy interpreter * go/pointer: pointer analysis Pointers complicate reasoning about program behaviour - functions may be called dynamically - a variable may be updated and accessed via different names ("aliases") - runtime types are values too: `interface`, `reflect.Type` We use *pointer* *analysis* to answer the question: which variables might this pointer point to? .link https://pkg.go.dev/golang.org/x/tools/go/pointer golang.org/x/tools/go/pointer # comment on go's appropriateness for this analysis: # (closed program---no dlopen, classloading, no generics, typesafe) * Pointer analysis Let `pts(p)` be the _points-to_ _set_ of pointer p Its elements are abstract variables: globals, locals, unnamed (`new(T)`) Overview: - analyze each SSA statement in the whole program - generate appropriate constraints on the points-to set of each variable Statement Constraint y = &x pts(y) ⊇ {x} y = x pts(y) ⊇ pts(x) "inclusion-based" *y = x ∀o ∈ pts(y). pts(o) ⊇ pts(x) y = *x ∀o ∈ pts(x). pts(y) ⊇ pts(o) y = &x->f \ y = x.(T) } not shown reflection / - solve the constraint system * Pointer analysis: constraint generation All Go operations can be expressed in these constraints Function, map, slice and channel ops all simplify to struct ops Treatment of `unsafe.Pointer` conversion is unsound But we don't care! This isn't a compiler The constraint system is: - *field-sensitive*: struct fields x.f and x.g have distinct solutions - *flow-insensitive*: the order of statements is ignored - *context-insensitive*: each function is analyzed only once - *whole-program*: Go source is required for all dependencies - *inclusion-based*: i.e. Andersen's analysis Optimization: don't make constraints for "uninteresting" types such as types not related to a one-shot Oracle query * Pointer analysis: pre-solver optimizations Constraint system for 124KLoC program (oracle) has: 254,000 variables 184,000 equations Solving phase is in O(|v|³), so pre-solver optimisation is crucial We can bring this down to: 88,000 variables 65,000 equations * Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07) p = &x a = &x if ... { q = p b = a p, q = &x, &y r = q c = b } else { s = r if ... { a = c } p, q = &y, &x } .image static-analysis/hvn.svg Hash-Value Numbering * Pointer analysis: solving It's transitive closure of a graph, essentially Lots of optimizations, for example: _sparse_bit_vectors_, a very compact representation for points-to sets .link https://pkg.go.dev/golang.org/x/tools/container/ints golang.org/x/tools/container/ints Solver log is >1GB. Debugging is fun. #* Sparse bit vector #`golang.org/x/tools/container/ints` # #.image sparsebitvector.svg # #Very compact representation of sparse `set` #Doubly-linked list of (offset int, data [256]bit) blocks. * Call graph The *call* *graph* can be read directly from the solution The `golang.org/x/tools/cmd/callgraph` tool prints raw call graphs Demo (time permitting) * Refactoring tools * gorename: precise, type-safe identifier renaming A refactoring tool, usable from... - the command line % gorename -from bytes.Buffer.Len -to Size Renamed 105 occurrences in 42 files in 33 packages. - many editors ([[https://code.google.com/p/go/source/browse/refactor/rename/rename.el?repo=tools&r=511801758bb9a0b83f9bf931342889fbedbc9b48][Emacs]], [[http://quick.as/dgof2p7][Vim]], [[https://github.com/smartystreets/sublime-gorename][Sublime]], [[https://github.com/davidrjenni/agorn][Acme]]) All renamings are reversible Sound: either a conflict is reported, or the refactoring preserves behaviour* *except reflection .link https://pkg.go.dev/golang.org/x/tools/cmd/gorename golang.org/x/tools/cmd/gorename * Demo: gorename * Example-based refactoring with eg A Go port of the Java _Refaster_ tool A template specifies the pattern and replacement as Go expressions: package P import ( "errors"; "fmt" ) func before(s string) error { return fmt.Errorf("%s", s) } func after(s string) error { return errors.New(s) } Parameters are _wildcards_ % eg -t template.go ... .link https://pkg.go.dev/golang.org/x/tools/cmd/eg golang.org/x/tools/cmd/eg * Demo: eg * Conclusion From the outset, Go has been renowned for its tools: `go`, `gofmt` We have many building blocks for sophisticated source analysis tools What should we build next?