Text file talks/2014/static-analysis.slide

     1  Static analysis tools
     2  for Go code comprehension and refactoring
     3  
     4  golang nyc meetup
     5  13 Nov 2014
     6  
     7  Alan Donovan
     8  Google, New York
     9  adonovan@google.com
    10  
    11  * Video
    12  
    13  This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014.
    14  
    15  .link http://vimeo.com/114736889 Watch the talk on Vimeo
    16  
    17  * Introduction
    18  
    19  Programmers say "writing code", but most of that time is spent _reading_
    20  
    21  Actually: reading code, and making logical deductions
    22  
    23  Machines are good at reading and logic
    24  
    25  *Machines* *should* *be* *helping* *us* *read* *code.*
    26  
    27  And write it!  Refactoring can be tedious and error-prone.
    28  
    29  
    30  * Outline
    31  
    32  - Code comprehension tools: `oracle`, `godoc`
    33  - Analysis technology
    34  - Refactoring tools:  `gorename`, `eg`
    35  
    36  
    37  * Demo: the Go oracle
    38  
    39  Supports interactive, editor-integrated queries:
    40  - where is this thing defined?
    41  - what are the methods of this type?
    42  - what are the free variables of this selection?
    43  - what might this expression point to?
    44  - what dynamic types might this interface or `reflect.Value` contain?
    45  
    46  
    47  * Demo: godoc analysis features
    48  
    49  .link /lib/godoc/analysis/help.html godoc -analysis=type,pointer
    50  
    51  Package view
    52  - method set and _implements_ relation for every type
    53  - internal call graph of every package
    54  
    55  Source code view
    56  - kind, type, definition of every identifier
    57  - method set and _implements_ relation for every type
    58  - peers of every channel send/receive
    59  - callers of every function
    60  - callees of every call site (even dynamic)
    61  
    62  
    63  * How it works
    64  
    65  
    66  
    67  * Libraries and tools
    68  
    69  .image static-analysis/tools.svg
    70  
    71  Mostly in `golang.org/x/tools` repo
    72  
    73  
    74  * go/types: the Go type checker
    75  
    76  Computes mappings from:
    77  - identifiers to definitions
    78  - constant expressions to values
    79  - expressions to types
    80  
    81  Many subtle corners:
    82  - method set computation
    83  - recursive interfaces
    84  - shifts
    85  
    86  Making it correct, fast, and clean was a substantial project
    87  
    88  .link https://pkg.go.dev/golang.org/x/tools/go/types golang.org/x/tools/go/types
    89  
    90  Author: Robert Griesemer
    91  
    92  
    93  
    94  * go/ssa: intermediate representation (IR)
    95  
    96  Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly
    97  
    98  Programs are lowered into Static Single-Assignment form (SSA):
    99  - simplifies dataflow analyses since _reaching_ _definitions_ are implicit
   100  - invented 1991, now mainstream (gcc, llvm)
   101  
   102  All Go programs can be expressed using only ~30 basic instructions
   103  
   104  Simple, explicit, high-level, high source fidelity
   105  
   106  .link https://pkg.go.dev/golang.org/x/tools/go/ssa golang.org/x/tools/go/ssa
   107  
   108  The llgo project is using go/ssa as a front-end for LLVM
   109  
   110  
   111  
   112  * Demo: ssadump
   113  
   114  # Simple fibonacci function
   115  # % ssadump -build=F  fib.go		basic
   116  # % ssadump -build=FD fib.go		debug info
   117  # % ssadump -test -run unicode 		toy interpreter
   118  
   119  
   120  
   121  * go/pointer: pointer analysis
   122  
   123  Pointers complicate reasoning about program behaviour
   124  - functions may be called dynamically
   125  - a variable may be updated and accessed via different names ("aliases")
   126  - runtime types are values too: `interface`, `reflect.Type`
   127  
   128  We use *pointer* *analysis* to answer the question:
   129  which variables might this pointer point to?
   130  
   131  .link https://pkg.go.dev/golang.org/x/tools/go/pointer golang.org/x/tools/go/pointer
   132  
   133  # comment on go's appropriateness for this analysis:
   134  # (closed program---no dlopen, classloading, no generics, typesafe)
   135  
   136  
   137  * Pointer analysis
   138  
   139  Let `pts(p)` be the _points-to_ _set_ of pointer p
   140  Its elements are abstract variables: globals, locals, unnamed (`new(T)`)
   141  
   142  Overview:
   143  
   144  - analyze each SSA statement in the whole program
   145  
   146  - generate appropriate constraints on the points-to set of each variable
   147  
   148  	Statement      Constraint
   149  	 y = &x         pts(y) ⊇ {x}
   150  	 y = x          pts(y) ⊇ pts(x)		         	"inclusion-based"
   151  	*y = x          ∀o ∈ pts(y). pts(o) ⊇ pts(x)
   152  	 y = *x         ∀o ∈ pts(x). pts(y) ⊇ pts(o)
   153  	 y = &x->f      \
   154  	 y = x.(T)       } not shown
   155  	 reflection     /
   156  
   157  - solve the constraint system
   158  
   159  
   160  * Pointer analysis: constraint generation
   161  
   162  All Go operations can be expressed in these constraints
   163  Function, map, slice and channel ops all simplify to struct ops
   164  
   165  Treatment of `unsafe.Pointer` conversion is unsound
   166  But we don't care!  This isn't a compiler
   167  
   168  The constraint system is:
   169  - *field-sensitive*: struct fields x.f and x.g have distinct solutions
   170  - *flow-insensitive*: the order of statements is ignored
   171  - *context-insensitive*: each function is analyzed only once
   172  - *whole-program*: Go source is required for all dependencies
   173  - *inclusion-based*: i.e. Andersen's analysis
   174  
   175  Optimization: don't make constraints for "uninteresting" types
   176  such as types not related to a one-shot Oracle query
   177  
   178  * Pointer analysis: pre-solver optimizations
   179  
   180  Constraint system for 124KLoC program (oracle) has:
   181  
   182  254,000 variables
   183  184,000 equations
   184  
   185  Solving phase is in O(|v|³), so pre-solver optimisation is crucial
   186  
   187  We can bring this down to:
   188  
   189  88,000 variables
   190  65,000 equations
   191  
   192  * Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07)
   193  
   194          p = &x                  a = &x                         if ... {
   195          q = p                   b = a                            p, q = &x, &y
   196          r = q                   c = b                          } else {
   197          s = r                   if ... { a = c }                 p, q = &y, &x
   198                                                                 }
   199  
   200  .image static-analysis/hvn.svg
   201  
   202  Hash-Value Numbering
   203  
   204  * Pointer analysis: solving
   205  
   206  It's transitive closure of a graph, essentially
   207  
   208  Lots of optimizations, for example:
   209  
   210  _sparse_bit_vectors_, a very compact representation for points-to sets
   211  
   212  .link https://pkg.go.dev/golang.org/x/tools/container/ints golang.org/x/tools/container/ints
   213  
   214  Solver log is >1GB.  Debugging is fun.
   215  
   216  
   217  #* Sparse bit vector
   218  #`golang.org/x/tools/container/ints`
   219  #
   220  #.image sparsebitvector.svg
   221  #
   222  #Very compact representation of sparse `set<int>`
   223  #Doubly-linked list of (offset int, data [256]bit) blocks.
   224  
   225  
   226  * Call graph
   227  
   228  The *call* *graph* can be read directly from the solution
   229  
   230  The `golang.org/x/tools/cmd/callgraph` tool prints raw call graphs
   231  
   232  Demo (time permitting)
   233  
   234  
   235  * Refactoring tools
   236  
   237  * gorename: precise, type-safe identifier renaming
   238  
   239  A refactoring tool, usable from...
   240  
   241  - the command line
   242  
   243  	% gorename -from bytes.Buffer.Len -to Size
   244  	Renamed 105 occurrences in 42 files in 33 packages.
   245  
   246  - many editors ([[https://code.google.com/p/go/source/browse/refactor/rename/rename.el?repo=tools&r=511801758bb9a0b83f9bf931342889fbedbc9b48][Emacs]], [[http://quick.as/dgof2p7][Vim]], [[https://github.com/smartystreets/sublime-gorename][Sublime]], [[https://github.com/davidrjenni/agorn][Acme]])
   247  
   248  All renamings are reversible
   249  
   250  Sound: either a conflict is reported, or the refactoring preserves behaviour*
   251  
   252  *except reflection
   253  
   254  .link https://pkg.go.dev/golang.org/x/tools/cmd/gorename golang.org/x/tools/cmd/gorename
   255  
   256  * Demo: gorename
   257  
   258  
   259  * Example-based refactoring with eg
   260  
   261  A Go port of the Java _Refaster_ tool
   262  
   263  A template specifies the pattern and replacement as Go expressions:
   264  
   265  	package P
   266  
   267  	import ( "errors"; "fmt" )
   268  
   269  	func before(s string) error { return fmt.Errorf("%s", s) }
   270  	func after(s string)  error { return errors.New(s) }
   271  
   272  Parameters are _wildcards_
   273  
   274  	% eg -t template.go <package> ...
   275  
   276  .link https://pkg.go.dev/golang.org/x/tools/cmd/eg golang.org/x/tools/cmd/eg
   277  
   278  * Demo: eg
   279  
   280  * Conclusion
   281  
   282  From the outset, Go has been renowned for its tools: `go`, `gofmt`
   283  
   284  We have many building blocks for sophisticated source analysis tools
   285  
   286  What should we build next?
   287  

View as plain text