Text file talks/2013/oscon-dl.slide

     1  dl.google.com: Powered by Go
     2  10:00 26 Jul 2013
     3  Tags: download, oscon, port, c++, google, groupcache, caching
     4  
     5  Brad Fitzpatrick
     6  Gopher, Google
     7  @bradfitz
     8  bradfitz@golang.org
     9  http://bradfitz.com/
    10  https://go.dev/
    11  https://github.com/golang/groupcache/
    12  
    13  * Overview / tl;dw:
    14  
    15  - dl.google.com serves Google downloads
    16  - Was written in C++
    17  - Now in Go
    18  - Now much better
    19  - Extensive, idiomatic use of Go's standard library
    20  - ... which is all open source
    21  - composition of interfaces is fun
    22  - _groupcache_, now Open Source, handles group-aware caching and cache-filling
    23  
    24  * too long...
    25  
    26  * me
    27  
    28  - Brad Fitzpatrick
    29  - bradfitz.com
    30  - @bradfitz
    31  - past: LiveJournal, memcached, OpenID, Perl stuff...
    32  - nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ...
    33  
    34  * I love Go
    35  
    36  - this isn't a talk about Go, sorry.
    37  - but check it out.
    38  - simple, powerful, fast, liberating, refreshing
    39  - great mix of low- and high- level
    40  - light on the page
    41  - static binaries, easy to deploy
    42  - not perfect, but my favorite language yet
    43  
    44  * dl.google.com
    45  
    46  * dl.google.com
    47  
    48  - HTTP download server
    49  - serves Chrome, Android SDK, Earth, much more
    50  - Some huge, some tiny (e.g. WebGL white/blacklist JSON)
    51  - behind an edge cache; still high traffic
    52  - lots of datacenters, lots of bandwidth
    53  
    54  * Why port?
    55  
    56  * reason 0
    57  
    58  $ apt-get update
    59  
    60  .image oscon-dl/slow.png
    61  
    62  - embarrassing
    63  - Google can't serve a 1,238 byte file?
    64  - Hanging?
    65  - 207 B/s?!
    66  
    67  * Yeah, embarrassing, for years...
    68  
    69  .image oscon-dl/crbug.png
    70  
    71  * ... which led to:
    72  
    73  - complaining on corp G+. Me: "We suck. This sucks."
    74  - primary SRE owning it: "Yup, it sucks. And is unmaintained."
    75  - "I'll rewrite it for you!"
    76  - "Hah."
    77  - "No, serious. That's kinda our job. But I get to do it in Go."
    78  - (Go team's loan-out-a-Gopher program...)
    79  
    80  * How hard can this be?
    81  
    82  * dl.google.com: few tricks
    83  
    84  each "payload" (~URL) described by a protobuf:
    85  
    86  - paths/patterns for its URL(s)
    87  - go-live reveal date
    88  - ACLs (geo, network, user, user type, ...)
    89  - dynamic zip files
    90  - custom HTTP headers
    91  - custom caching
    92  
    93  * dl.google.com: how it was
    94  
    95  .image oscon-dl/before.png
    96  
    97  * Aside: Why good code goes bad
    98  
    99  * Why good code goes bad
   100  
   101  - Premise: people don't suck
   102  - Premise: code was once beautiful
   103  - code tends towards complexity (gets worse)
   104  - environment changes
   105  - scale changes
   106  
   107  * code complexity
   108  
   109  - without regular love, code grows warts over time
   110  - localized fixes and additions are easy & quick, but globally crappy
   111  - features, hacks and workarounds added without docs or tests
   112  - maintainers come & go,
   113  - ... or just go.
   114  
   115  * changing environment
   116  
   117  - Google's infrastructure (hardware & software), like anybody's, is always changing
   118  - properties of networks, storage
   119  - design assumptions no longer make sense
   120  - scale changes (design for 10x growth, rethink at 100x)
   121  - new internal services (beta or non-existent then, dependable now)
   122  - once-modern home-grown invented wheels might now look archaic
   123  
   124  * so why did it suck?
   125  
   126  .image oscon-dl/slow.png
   127  
   128  - stalling its single-threaded event loop, blocking when it shouldn't
   129  - maxed out at one CPU, but couldn't even use a fraction of a single CPU.
   130  
   131  * but why?
   132  
   133  - code was too complicated
   134  - future maintainers slowly violated unwritten rules
   135  - or knowingly violated them, assuming it couldn't be too bad?
   136  - C++ single-threaded event-based callback spaghetti
   137  - hard to know when/where code was running, or what "blocking" meant
   138  
   139  * Old code
   140  
   141  - served from local disk
   142  - single-threaded event loop
   143  - used sendfile(2) "for performance"
   144  - tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile
   145  - while also trying to do HTTP chunking,
   146  - ... and HTTP range requests,
   147  - ... and dynamic zip files,
   148  - lots of duplicated copy/paste code paths
   149  - many wrong/incomplete in different ways
   150  
   151  * Mitigation solution?
   152  
   153  - more complexity!
   154  - ad hoc addition of more threads
   155  - ... not really defined which threads did what,
   156  - ... or what the ownership or locking rules were,
   157  - no surprise: random crashes
   158  
   159  * Summary of 5-year old code in 2012
   160  
   161  - incomplete docs, tests
   162  - stalling event loop
   163  - ad-hoc threads...
   164  - ... stalling event loops
   165  - ... races
   166  - ... crashes
   167  - copy/paste code
   168  - ... incomplete code
   169  - two processes in the container
   170  - ... different languages
   171  
   172  * Environment changes
   173  
   174  - Remember: on start, we had to copy all payloads to local disk
   175  - in 2007, using local disk wasn't restricted
   176  - in 2007, sum(payload size) was much smaller
   177  - in 2012, containers get tiny % of local disk spindle time
   178  - ... why aren't you using the cluster file systems like everybody else?
   179  - ... cluster file systems own disk time on your machine, not you.
   180  - in 2007, it started up quickly.
   181  - in 2012, it started in 12-24 hours (!!!)
   182  - ... hope we don't crash! (oh, whoops)
   183  
   184  * Copying N bytes from A to B in event loop environments (node.js, this C++, etc)
   185  
   186  - Can *A* read?
   187  - Read up to _n_ bytes from A.
   188  - What'd we get? _rn_
   189  - _n_ -= _rn_
   190  - Store those.
   191  - Note we want to want to write to *B* now.
   192  - Can *B* write?
   193  - Try to write _rn_ bytes to *B*. Got _wn_.
   194  - buffered -= _wn_
   195  - while (blah blah blah) { ... blah blah blah ... }
   196  
   197  * Thought that sucked? Try to mix in other state / logic, and then write it in C++.
   198  
   199  *
   200  
   201  .image oscon-dl/cpp-write.png
   202  
   203  *
   204  
   205  .image oscon-dl/cpp-writeerr.png
   206  
   207  *
   208  
   209  .image oscon-dl/cpp-toggle.png
   210  
   211  * Or in JavaScript...
   212  
   213  - [[https://github.com/nodejitsu/node-http-proxy/blob/master/lib/node-http-proxy/http-proxy.js]]
   214  - Or Python gevent, Twisted, ...
   215  - Or Perl AnyEvent, etc.
   216  - Unreadable, discontiguous code.
   217  
   218  * Copying N bytes from A to B in Go:
   219  
   220  .code oscon-dl/copy.go /START OMIT/,/END OMIT/
   221  
   222  - dst is an _io.Writer_ (an interface type)
   223  - src is an _io.Reader_ (an interface type)
   224  - synchronous (blocks)
   225  - Go runtime deals with making blocking efficient
   226  - goroutines, epoll, user-space scheduler, ...
   227  - easier to reason about
   228  - fewer, easier, compatible APIs
   229  - concurrency is a _language_ (not _library_) feature
   230  
   231  * Where to start?
   232  
   233  - baby steps, not changing everything at once
   234  - only port the `payload_server`, not the `payload_fetcher`
   235  - read lots of old design docs
   236  - read lots of C++ code
   237  - port all command-line flags
   238  - serve from local disk
   239  - try to run integration tests
   240  - while (fail) { debug, port, swear, ...}
   241  
   242  * Notable stages
   243  
   244  - pass integration tests
   245  - run in a lightly-loaded datacenter
   246  - audit mode
   247  - ... mirror traffic to old & new servers; compare responses.
   248  - drop all SWIG dependencies on C++ libraries
   249  - ... use IP-to-geo lookup service, not static file + library
   250  
   251  * Notable stages
   252  
   253  - fetch blobs directly from blobstore, falling back to local disk on any errors,
   254  - relying entirely on blobstore, but `payload_fetcher` still running
   255  - disable `payload_fetcher` entirely; fast start-up time.
   256  
   257  * Using Go's Standard Library
   258  
   259  * Using Go's Standard Library
   260  
   261  - dl.google.com mostly just uses the standard library
   262  
   263  * Go's Standard Library
   264  
   265  - net/http
   266  - io
   267  - [[/pkg/net/http/#ServeContent][http.ServeContent]]
   268  
   269  * Hello World
   270  
   271  .play oscon-dl/server-hello.go
   272  
   273  * File Server
   274  
   275  .play oscon-dl/server-fs.go
   276  
   277  * http.ServeContent
   278  
   279  .image oscon-dl/servecontent.png
   280  
   281  * io.Reader, io.Seeker
   282  
   283  .image oscon-dl/readseeker.png
   284  .image oscon-dl/reader.png
   285  .image oscon-dl/seeker.png
   286  
   287  * http.ServeContent
   288  
   289  $ curl -H "Range: bytes=5-" http://localhost:8080
   290  
   291  .play oscon-dl/server-content.go
   292  
   293  * groupcache
   294  
   295  * groupcache
   296  
   297  - memcached alternative / replacement
   298  - [[http://github.com/golang/groupcache]]
   299  - _library_ that is both a client & server
   300  - connects to its peers
   301  - coordinated cache filling (no thundering herds on miss)
   302  - replication of hot items
   303  
   304  * Using groupcache
   305  
   306  Declare who you are and who your peers are.
   307  
   308  .code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/
   309  
   310  This peer interface is pluggable. (e.g. inside Google it's automatic.)
   311  
   312  * Using groupcache
   313  
   314  Declare a group. (group of keys, shared between group of peers)
   315  
   316  .code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/
   317  
   318  - group name "thumbnail" must be globally unique
   319  - 64 MB max per-node memory usage
   320  - Sink is an interface with SetString, SetBytes, SetProto
   321  
   322  * Using groupcache
   323  
   324  Request keys
   325  
   326  .code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/
   327  
   328  - might come from local memory cache
   329  - might come from peer's memory cache
   330  - might be computed locally
   331  - might be computed remotely
   332  - of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters
   333  
   334  * dl.google.com and groupcache
   335  
   336  - Keys are "<blobref>-<chunk_offset>"
   337  - Chunks are 2MB
   338  - Chunks cached from local memory (for self-owned and hot items),
   339  - Chunks cached remotely, or
   340  - Chunks fetched from Google storage systems
   341  
   342  * dl.google.com interface composition
   343  
   344  .code oscon-dl/sizereaderat.go /START_1/,/END_1/
   345  
   346  * io.SectionReader
   347  
   348  .image oscon-dl/sectionreader.png
   349  
   350  * chunk-aligned ReaderAt
   351  
   352  .code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/
   353  
   354  - Caller can do ReadAt calls of any size and any offset
   355  - `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk)
   356  
   357  * Composing all this
   358  
   359  - http.ServeContent wants a ReadSeeker
   360  - io.SectionReader(ReaderAt + size) -> ReadSeeker
   361  - Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache.
   362  - Wrapped in a chunk-aligned ReaderAt
   363  - ... concatenate parts of with MultiReaderAt
   364  
   365  .play oscon-dl/server-compose.go /START/,/END/
   366  
   367  * Things we get for free from net/http
   368  
   369  - Last-Modified
   370  - ETag
   371  - Range requests (w/ its paranoia)
   372  - HTTP/1.1 chunking, etc.
   373  - ... old server tried to do all this itself
   374  - ... incorrectly
   375  - ... incompletely
   376  - ... in a dozen different copies
   377  
   378  * Overall simplification
   379  
   380  - deleted C++ payload_server & Python payload_fetcher
   381  - 39 files (14,032 lines) deleted
   382  - one binary now (just Go `payload_server`, no `payload_fetcher`)
   383  - starts immediately, no huge start-up delay
   384  - server is just "business logic" now, not HTTP logic
   385  
   386  * From this...
   387  
   388  .image oscon-dl/before.png
   389  
   390  * ... to this.
   391  
   392  .image oscon-dl/after.png
   393  
   394  * And from page and pages of this...
   395  
   396  .image oscon-dl/cpp-writeerr.png
   397  
   398  * ... to this
   399  
   400  .image oscon-dl/after-code.png
   401  
   402  * So how does it compare to C++?
   403  
   404  - less than half the code
   405  - more testable, tests
   406  - same CPU usage for same bandwidth
   407  - ... but can do much more bandwidth
   408  - ... and more than one CPU
   409  - less memory (!)
   410  - no disk
   411  - starts up instantly (not 24 hours)
   412  - doesn't crash
   413  - handles hot download spikes
   414  
   415  * Could we have just rewritten it in new C++?
   416  
   417  - Sure.
   418  - But why?
   419  
   420  * Could I have just fixed the bugs in the C++ version?
   421  
   422  - Sure, if I could find them.
   423  - Then have to own it ("You touched it last...")
   424  - And I already maintain an HTTP server library. Don't want to maintain a bad one too.
   425  - It's much more maintainable. (and 3+ other people now do)
   426  
   427  * How much of dl.google.com is closed-source?
   428  
   429  - Very little.
   430  - ... ACL policies
   431  - ... RPCs to Google storage services.
   432  - Most is open source:
   433  - ... code.google.com/p/google-api-go-client/storage/v1beta1
   434  - ... net/http and rest of Go standard library
   435  - ... `groupcache`, now open source ([[https://github.com/golang/groupcache][github.com/golang/groupcache]])
   436  
   437  

View as plain text