dl.google.com: Powered by Go 10:00 26 Jul 2013 Tags: download, oscon, port, c++, google, groupcache, caching Brad Fitzpatrick Gopher, Google @bradfitz bradfitz@golang.org http://bradfitz.com/ https://go.dev/ https://github.com/golang/groupcache/ * Overview / tl;dw: - dl.google.com serves Google downloads - Was written in C++ - Now in Go - Now much better - Extensive, idiomatic use of Go's standard library - ... which is all open source - composition of interfaces is fun - _groupcache_, now Open Source, handles group-aware caching and cache-filling * too long... * me - Brad Fitzpatrick - bradfitz.com - @bradfitz - past: LiveJournal, memcached, OpenID, Perl stuff... - nowadays: Go, Go, Camlistore, Go, anything & everything written in Go ... * I love Go - this isn't a talk about Go, sorry. - but check it out. - simple, powerful, fast, liberating, refreshing - great mix of low- and high- level - light on the page - static binaries, easy to deploy - not perfect, but my favorite language yet * dl.google.com * dl.google.com - HTTP download server - serves Chrome, Android SDK, Earth, much more - Some huge, some tiny (e.g. WebGL white/blacklist JSON) - behind an edge cache; still high traffic - lots of datacenters, lots of bandwidth * Why port? * reason 0 $ apt-get update .image oscon-dl/slow.png - embarrassing - Google can't serve a 1,238 byte file? - Hanging? - 207 B/s?! * Yeah, embarrassing, for years... .image oscon-dl/crbug.png * ... which led to: - complaining on corp G+. Me: "We suck. This sucks." - primary SRE owning it: "Yup, it sucks. And is unmaintained." - "I'll rewrite it for you!" - "Hah." - "No, serious. That's kinda our job. But I get to do it in Go." - (Go team's loan-out-a-Gopher program...) * How hard can this be? * dl.google.com: few tricks each "payload" (~URL) described by a protobuf: - paths/patterns for its URL(s) - go-live reveal date - ACLs (geo, network, user, user type, ...) - dynamic zip files - custom HTTP headers - custom caching * dl.google.com: how it was .image oscon-dl/before.png * Aside: Why good code goes bad * Why good code goes bad - Premise: people don't suck - Premise: code was once beautiful - code tends towards complexity (gets worse) - environment changes - scale changes * code complexity - without regular love, code grows warts over time - localized fixes and additions are easy & quick, but globally crappy - features, hacks and workarounds added without docs or tests - maintainers come & go, - ... or just go. * changing environment - Google's infrastructure (hardware & software), like anybody's, is always changing - properties of networks, storage - design assumptions no longer make sense - scale changes (design for 10x growth, rethink at 100x) - new internal services (beta or non-existent then, dependable now) - once-modern home-grown invented wheels might now look archaic * so why did it suck? .image oscon-dl/slow.png - stalling its single-threaded event loop, blocking when it shouldn't - maxed out at one CPU, but couldn't even use a fraction of a single CPU. * but why? - code was too complicated - future maintainers slowly violated unwritten rules - or knowingly violated them, assuming it couldn't be too bad? - C++ single-threaded event-based callback spaghetti - hard to know when/where code was running, or what "blocking" meant * Old code - served from local disk - single-threaded event loop - used sendfile(2) "for performance" - tried to be clever and steal the fd from the "SelectServer" sometimes to manually call sendfile - while also trying to do HTTP chunking, - ... and HTTP range requests, - ... and dynamic zip files, - lots of duplicated copy/paste code paths - many wrong/incomplete in different ways * Mitigation solution? - more complexity! - ad hoc addition of more threads - ... not really defined which threads did what, - ... or what the ownership or locking rules were, - no surprise: random crashes * Summary of 5-year old code in 2012 - incomplete docs, tests - stalling event loop - ad-hoc threads... - ... stalling event loops - ... races - ... crashes - copy/paste code - ... incomplete code - two processes in the container - ... different languages * Environment changes - Remember: on start, we had to copy all payloads to local disk - in 2007, using local disk wasn't restricted - in 2007, sum(payload size) was much smaller - in 2012, containers get tiny % of local disk spindle time - ... why aren't you using the cluster file systems like everybody else? - ... cluster file systems own disk time on your machine, not you. - in 2007, it started up quickly. - in 2012, it started in 12-24 hours (!!!) - ... hope we don't crash! (oh, whoops) * Copying N bytes from A to B in event loop environments (node.js, this C++, etc) - Can *A* read? - Read up to _n_ bytes from A. - What'd we get? _rn_ - _n_ -= _rn_ - Store those. - Note we want to want to write to *B* now. - Can *B* write? - Try to write _rn_ bytes to *B*. Got _wn_. - buffered -= _wn_ - while (blah blah blah) { ... blah blah blah ... } * Thought that sucked? Try to mix in other state / logic, and then write it in C++. * .image oscon-dl/cpp-write.png * .image oscon-dl/cpp-writeerr.png * .image oscon-dl/cpp-toggle.png * Or in JavaScript... - [[https://github.com/nodejitsu/node-http-proxy/blob/master/lib/node-http-proxy/http-proxy.js]] - Or Python gevent, Twisted, ... - Or Perl AnyEvent, etc. - Unreadable, discontiguous code. * Copying N bytes from A to B in Go: .code oscon-dl/copy.go /START OMIT/,/END OMIT/ - dst is an _io.Writer_ (an interface type) - src is an _io.Reader_ (an interface type) - synchronous (blocks) - Go runtime deals with making blocking efficient - goroutines, epoll, user-space scheduler, ... - easier to reason about - fewer, easier, compatible APIs - concurrency is a _language_ (not _library_) feature * Where to start? - baby steps, not changing everything at once - only port the `payload_server`, not the `payload_fetcher` - read lots of old design docs - read lots of C++ code - port all command-line flags - serve from local disk - try to run integration tests - while (fail) { debug, port, swear, ...} * Notable stages - pass integration tests - run in a lightly-loaded datacenter - audit mode - ... mirror traffic to old & new servers; compare responses. - drop all SWIG dependencies on C++ libraries - ... use IP-to-geo lookup service, not static file + library * Notable stages - fetch blobs directly from blobstore, falling back to local disk on any errors, - relying entirely on blobstore, but `payload_fetcher` still running - disable `payload_fetcher` entirely; fast start-up time. * Using Go's Standard Library * Using Go's Standard Library - dl.google.com mostly just uses the standard library * Go's Standard Library - net/http - io - [[/pkg/net/http/#ServeContent][http.ServeContent]] * Hello World .play oscon-dl/server-hello.go * File Server .play oscon-dl/server-fs.go * http.ServeContent .image oscon-dl/servecontent.png * io.Reader, io.Seeker .image oscon-dl/readseeker.png .image oscon-dl/reader.png .image oscon-dl/seeker.png * http.ServeContent $ curl -H "Range: bytes=5-" http://localhost:8080 .play oscon-dl/server-content.go * groupcache * groupcache - memcached alternative / replacement - [[http://github.com/golang/groupcache]] - _library_ that is both a client & server - connects to its peers - coordinated cache filling (no thundering herds on miss) - replication of hot items * Using groupcache Declare who you are and who your peers are. .code oscon-dl/groupcache.go /STARTINIT/,/ENDINIT/ This peer interface is pluggable. (e.g. inside Google it's automatic.) * Using groupcache Declare a group. (group of keys, shared between group of peers) .code oscon-dl/groupcache.go /STARTGROUP/,/ENDGROUP/ - group name "thumbnail" must be globally unique - 64 MB max per-node memory usage - Sink is an interface with SetString, SetBytes, SetProto * Using groupcache Request keys .code oscon-dl/groupcache.go /STARTUSE/,/ENDUSE/ - might come from local memory cache - might come from peer's memory cache - might be computed locally - might be computed remotely - of all threads on all machines, only one thumbnail is made, then fanned out in-process and across-network to all waiters * dl.google.com and groupcache - Keys are "-" - Chunks are 2MB - Chunks cached from local memory (for self-owned and hot items), - Chunks cached remotely, or - Chunks fetched from Google storage systems * dl.google.com interface composition .code oscon-dl/sizereaderat.go /START_1/,/END_1/ * io.SectionReader .image oscon-dl/sectionreader.png * chunk-aligned ReaderAt .code oscon-dl/chunkaligned.go /START_DOC/,/END_DOC/ - Caller can do ReadAt calls of any size and any offset - `r` only sees ReadAt calls on 2MB offset boundaries, of size 2MB (unless final chunk) * Composing all this - http.ServeContent wants a ReadSeeker - io.SectionReader(ReaderAt + size) -> ReadSeeker - Download server payloads are a type "content" with Size and ReadAt, implemented with calls to groupcache. - Wrapped in a chunk-aligned ReaderAt - ... concatenate parts of with MultiReaderAt .play oscon-dl/server-compose.go /START/,/END/ * Things we get for free from net/http - Last-Modified - ETag - Range requests (w/ its paranoia) - HTTP/1.1 chunking, etc. - ... old server tried to do all this itself - ... incorrectly - ... incompletely - ... in a dozen different copies * Overall simplification - deleted C++ payload_server & Python payload_fetcher - 39 files (14,032 lines) deleted - one binary now (just Go `payload_server`, no `payload_fetcher`) - starts immediately, no huge start-up delay - server is just "business logic" now, not HTTP logic * From this... .image oscon-dl/before.png * ... to this. .image oscon-dl/after.png * And from page and pages of this... .image oscon-dl/cpp-writeerr.png * ... to this .image oscon-dl/after-code.png * So how does it compare to C++? - less than half the code - more testable, tests - same CPU usage for same bandwidth - ... but can do much more bandwidth - ... and more than one CPU - less memory (!) - no disk - starts up instantly (not 24 hours) - doesn't crash - handles hot download spikes * Could we have just rewritten it in new C++? - Sure. - But why? * Could I have just fixed the bugs in the C++ version? - Sure, if I could find them. - Then have to own it ("You touched it last...") - And I already maintain an HTTP server library. Don't want to maintain a bad one too. - It's much more maintainable. (and 3+ other people now do) * How much of dl.google.com is closed-source? - Very little. - ... ACL policies - ... RPCs to Google storage services. - Most is open source: - ... code.google.com/p/google-api-go-client/storage/v1beta1 - ... net/http and rest of Go standard library - ... `groupcache`, now open source ([[https://github.com/golang/groupcache][github.com/golang/groupcache]])