Text file talks/2015/gogo.slide

     1  Go in Go
     2  Gopherfest
     3  26 May 2015
     4  
     5  Rob Pike
     6  Google
     7  r@golang.org
     8  https://go.dev/
     9  
    10  * Go in Go
    11  
    12  As of the 1.5 release of Go, the entire system is now written in Go.
    13  (And a little assembler.)
    14  
    15  C is gone.
    16  
    17  Side note: `gccgo` is still going strong.
    18  This talk is about the original compiler, `gc`.
    19  
    20  * Why was it in C?
    21  
    22  Bootstrapping.
    23  
    24  (Also Go was not intended primarily as a compiler implementation language.)
    25  
    26  * Why move the compiler to Go?
    27  
    28  Not for validation; we have more pragmatic motives:
    29  
    30  - Go is easier to write (correctly) than C.
    31  - Go is easier to debug than C (even absent a debugger).
    32  - Go is the only language you'd need to know; encourages contributions.
    33  - Go has better modularity, tooling, testing, profiling, ...
    34  - Go makes parallel execution trivial.
    35  
    36  Already seeing benefits, and it's early yet.
    37  
    38  Design document: [[/s/go13compiler]]
    39  
    40  * Why move the runtime to Go?
    41  
    42  We had our own C compiler just to compile the runtime.
    43  We needed a compiler with the same ABI as Go, such as segmented stacks.
    44  
    45  Switching it to Go means we can get rid of the C compiler.
    46  That's more important than converting the compiler to Go.
    47  
    48  (All the reasons for moving the compiler apply to the runtime as well.)
    49  
    50  Now only one language in the runtime; easier integration, stack management, etc.
    51  
    52  
    53  As always, simplicity is the overriding consideration.
    54  
    55  * History
    56  
    57  Why do we have our own tool chain at all?
    58  Our own ABI?
    59  Our own file formats?
    60  
    61  History, familiarity, and ease of moving forward. And speed.
    62  
    63  Many of Go's big changes would be much harder with GCC or LLVM.
    64  
    65  .link https://news.ycombinator.com/item?id=8817990
    66  
    67  * Big changes
    68  
    69  All made easier by owning the tools and/or moving to Go:
    70  
    71  - linker rearchitecture
    72  - new garbage collector
    73  - stack maps
    74  - contiguous stacks
    75  - write barriers
    76  
    77  The last three are all but impossible in C:
    78  
    79  - C is not type safe; don't always know what's a pointer
    80  - aliasing of stack slots caused by optimization
    81  
    82  (`Gccgo` will have segmented stacks and imprecise (stack) collection for a while yet.)
    83  
    84  * Goroutine stacks
    85  
    86  - Until 1.2: Stacks were segmented.
    87  - 1.3: Stacks were contiguous unless executing C code (runtime).
    88  - 1.4: Stacks made contiguous by restricting C to system stack.
    89  - 1.5: Stacks made contiguous by eliminating C.
    90  
    91  These were each huge steps, made quickly (led by `khr@`).
    92  
    93  * Converting the runtime
    94  
    95  Mostly done by hand with machine assistance.
    96  
    97  Challenge to implement the runtime in a safe language.
    98  Some use of `unsafe` to deal with pointers as raw bits in the GC, for instance.
    99  But less than you might think.
   100  
   101  The translator (next sections) helped for some of the translation.
   102  
   103  * Converting the compiler
   104  
   105  Why translate it, not write it from scratch? Correctness, testing.
   106  
   107  Steps:
   108  
   109  - Write a custom translator from C to Go.
   110  - Run the translator, iterate until success.
   111  - Measure success by bit-identical output.
   112  - Clean up the code by hand and by machine.
   113  - Turn it from C-in-Go to idiomatic Go (still happening).
   114  
   115  * Translator
   116  
   117  First output was C line-by-line translated to (bad!) Go.
   118  Tool to do this written by `rsc@` (talked about at GopherCon 2014).
   119  Custom written for this job, not a general C-to-Go translator.
   120  
   121  Steps:
   122  
   123  - Parse C code using new simple C parser (`yacc`)
   124  - Remove or rewrite C-isms such as `*p++` as an expression
   125  - Walk the C parse tree, print the C code in Go syntax
   126  - Compile the output
   127  - Run, compare generated code
   128  - Repeat
   129  
   130  The `Yacc` grammar was translated by sam-powered hands.
   131  
   132  * Translator configuration
   133  
   134  Aided by hand-written rewrite rules, such as:
   135  
   136  - this field is a bool
   137  - this function returns a bool
   138  
   139  Also diff-like rewrites for things such as using the standard library:
   140  
   141  	diff {
   142  	-	g.Rpo = obj.Calloc(g.Num*sizeof(g.Rpo[0]), 1).([]*Flow)
   143  	-	idom = obj.Calloc(g.Num*sizeof(idom[0]), 1).([]int32)
   144  	-	if g.Rpo == nil || idom == nil {
   145  	-		Fatal("out of memory")
   146  	-	}
   147  	+	g.Rpo = make([]*Flow, g.Num)
   148  	+	idom = make([]int32, g.Num)
   149  	}
   150  
   151  * Another example
   152  
   153  This one due to semantic difference between the languages.
   154  
   155  	diff {
   156  	-	if nreg == 64 {
   157  	-		mask = ^0 // can't rely on C to shift by 64
   158  	-	} else {
   159  	-		mask = (1 << uint(nreg)) - 1
   160  	-	}
   161  	+	mask = (1 << uint(nreg)) - 1
   162  	}
   163  
   164  * Grind
   165  
   166  Once in Go, new tool `grind` deployed (by `rsc@`):
   167  
   168  - parses Go, type checks
   169  - records a list of edits to perform: "insert this text at this position"
   170  - at end, applies edits to source (hard to edit AST).
   171  
   172  Changes guided by profiling and other analysis:
   173  
   174  - removes dead code
   175  - removes gotos
   176  - removes unused labels, needless indirections, etc.
   177  - moves `var` declarations nearer to first use
   178  
   179  .link http://rsc.io/grind
   180  
   181  * Performance problems
   182  
   183  Output from translator was poor Go, and ran about 10X slower.
   184  Most of that slowdown has been recovered.
   185  
   186  Problems with C to Go:
   187  
   188  - C patterns can be poor Go; e.g.: complex `for` loops
   189  - C stack variables never escape; Go compiler isn't as sure
   190  - interfaces such as `fmt.Stringer` vs. C's `varargs`
   191  - no `unions` in Go, so use `structs` instead: bloat
   192  - variable declarations in wrong place
   193  
   194  C compiler didn't free much memory, but Go has a GC.
   195  Adds CPU and memory overhead.
   196  
   197  * Performance fixes
   198  
   199  Profile! (Never done before!)
   200  
   201  - move `vars` closer to first use
   202  - split `vars` into multiple
   203  - replace code in the compiler with code in the library: e.g. `math/big`
   204  - use interface or other tricks to combine `struct` fields
   205  - better escape analysis (`drchase@`).
   206  - hand tuning code and data layout
   207  
   208  Use tools like `grind`, `gofmt` `-r` and `eg` for much of this.
   209  
   210  Removing interface argument from a debugging print library got 15% overall!
   211  
   212  More remains to be done.
   213  
   214  * Technical benefits
   215  
   216  Other benefits of the conversion:
   217  
   218  Garbage collection means no more worry about introducing a dangling pointer.
   219  
   220  Chance to clean up the back ends.
   221  
   222  Unified `386` and `amd64` architectures throughout the tool chain.
   223  
   224  New architectures are easier to add.
   225  
   226  Unified the tools: now one compiler, one assembler, one linker.
   227  
   228  * Compiler
   229  
   230  `GOOS=YYY` `GOARCH=XXX` `go` `tool` `compile`
   231  
   232  One compiler; no more `6g`, `8g` etc.
   233  
   234  About 50K lines of portable code.
   235  Even the registerizer is portable now; architectures well characterized.
   236  Non-portable: Peepholing, details like registers bound to instructions.
   237  Typically around 10% of the portable LOC.
   238  
   239  * Assembler
   240  
   241  `GOOS=YYY` `GOARCH=XXX` `go` `tool` `asm`
   242  
   243  New assembler, all in Go, written from scratch by `r@`.
   244  Clean, idiomatic Go code.
   245  
   246  Less than 4000 lines, <10% machine-dependent.
   247  
   248  Almost completely compatible with previous `yacc` and C assemblers.
   249  
   250  How is this possible?
   251  
   252  - shared syntax originating in the Plan 9 assemblers
   253  - unified back-end logic (old `liblink`, now `internal/obj`)
   254  
   255  * Linker
   256  
   257  `GOOS=YYY` `GOARCH=XXX` `go` `tool` `link`
   258  
   259  Mostly hand- and machine- translated from C code.
   260  
   261  New library, `internal/obj`, part of original linker, captures details about machines, writes object files.
   262  
   263  27000 lines summed across 4 architectures, mostly tables (plus some ugliness).
   264  
   265  - `arm`: 4000
   266  - `arm64`: 6000
   267  - `ppc64`: 5000
   268  - `x86`: 7500 (`386` and `amd64`)
   269  
   270  Example benefit: one print routine to print any instruction for any architecture.
   271  
   272  * Bootstrap
   273  
   274  With no C compiler, bootstrapping requires a Go compiler.
   275  
   276  Therefore need to build or download a working Go installation to build 1.5 from source.
   277  
   278  We use Go 1.4+ as the base to build the 1.5+ tool chain. (Newer is OK too.)
   279  
   280  Details: [[/s/go15bootstrap]]
   281  
   282  * Future
   283  
   284  Much work still to do, but 1.5 is mostly set.
   285  
   286  Future work:
   287  
   288  Better escape analysis.
   289  New compiler back end using SSA (much easier in Go than C).
   290  Will allow much more optimization.
   291  
   292  Generate machine descriptions from PDFs (or maybe XML).
   293  Will have a purely machine-generated instruction definition:
   294  "Read in PDF, write out an assembler configuration".
   295  Already deployed for the disassemblers.
   296  
   297  * Conclusions
   298  
   299  Getting rid of C was a huge advance for the project.
   300  Code is cleaner, testable, profilable, easier to work on.
   301  
   302  New unified tool chain reduces code size, increases maintainability.
   303  
   304  Flexible tool chain, portability still paramount.
   305  
   306  
   307  

View as plain text