#_This presentation was the closing keynote of the Heroku Waza conference in January, 2012. #_It has been slightly modified here for clarity and for use in the "present" format; the original #_used a precursor to that tool. Concurrency is not Parallelism Waza Jan 11, 2012 Rob Pike r@golang.org * Video This talk was presented at Heroku's Waza conference in January 2012. .link http://vimeo.com/49718712 Watch the talk on Vimeo * The modern world is parallel Multicore. Networks. Clouds of CPUs. Loads of users. Our technology should help. That's where concurrency comes in. * Go supports concurrency Go provides: - concurrent execution (goroutines) - synchronization and messaging (channels) - multi-way concurrent control (select) * Concurrency is cool! Yay parallelism!! NO! A fallacy. When Go was announced, many were confused by the distinction. "I ran the prime sieve with 4 processors and it got slower!" * Concurrency Programming as the composition of independently executing processes. (Processes in the general sense, not Linux processes. Famously hard to define.) * Parallelism Programming as the simultaneous execution of (possibly related) computations. * Concurrency vs. parallelism Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once. Not the same, but related. Concurrency is about structure, parallelism is about execution. Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable. * An analogy Concurrent: Mouse, keyboard, display, and disk drivers. Parallel: Vector dot product. * Concurrency plus communication Concurrency is a way to structure a program by breaking it into pieces that can be executed independently. Communication is the means to coordinate the independent executions. This is the Go model and (like Erlang and others) it's based on CSP: C. A. R. Hoare: Communicating Sequential Processes (CACM 1978) * Gophers This is too abstract. Let's get concrete. * Our problem Move a pile of obsolete language manuals to the incinerator. .image waza/gophersimple1.jpg With only one gopher this will take too long. * More gophers! .image waza/gophersimple3.jpg More gophers are not enough; they need more carts. * More gophers and more carts .image waza/gophersimple2.jpg This will go faster, but there will be bottlenecks at the pile and incinerator. Also need to synchronize the gophers. A message (that is, a communication between the gophers) will do. * Double everything Remove the bottleneck; make them really independent. .image waza/gophersimple4.jpg This will consume input twice as fast. * Concurrent composition .image waza/gophersimple4.jpg The concurrent composition of two gopher procedures. * Concurrent composition This design is not automatically parallel! What if only one gopher is moving at a time? Then it's still concurrent (that's in the design), just not parallel. However, it's automatically parallelizable! Moreover the concurrent composition suggests other models. * Another design .image waza/gophercomplex0.jpg Three gophers in action, but with likely delays. Each gopher is an independently executing procedure, plus coordination (communication). * Finer-grained concurrency Add another gopher procedure to return the empty carts. .image waza/gophercomplex1.jpg Four gophers in action for better flow, each doing one simple task. If we arrange everything right (implausible but not impossible), that's four times faster than our original one-gopher design. * Observation We improved performance by adding a concurrent procedure to the existing design. More gophers doing more work; it runs better. This is a deeper insight than mere parallelism. * Concurrent procedures Four distinct gopher procedures: - load books onto cart - move cart to incinerator - unload cart into incinerator - return empty cart Different concurrent designs enable different ways to parallelize. * More parallelization! We can now parallelize on the other axis; the concurrent design makes it easy. Eight gophers, all busy. .image waza/gophercomplex2.jpg * Or maybe no parallelization at all Keep in mind, even if only one gopher is active at a time (zero parallelism), it's still a correct and concurrent solution. .image waza/gophercomplex2.jpg * Another design Here's another way to structure the problem as the concurrent composition of gopher procedures. Two gopher procedures, plus a staging pile. .image waza/gophercomplex3.jpg * Parallelize the usual way Run more concurrent procedures to get more throughput. .image waza/gophercomplex4.jpg * Or a different way Bring the staging pile to the multi-gopher concurrent model: .image waza/gophercomplex5.jpg * Full on optimization Use all our techniques. Sixteen gophers hard at work! .image waza/gophercomplex6.jpg * Lesson There are many ways to break the processing down. That's concurrent design. Once we have the breakdown, parallelization can fall out and correctness is easy. * Back to Computing In our book transport problem, substitute: - book pile => web content - gopher => CPU - cart => marshaling, rendering, or networking - incinerator => proxy, browser, or other consumer It becomes a concurrent design for a scalable web service. Gophers serving web content. * A little background about Go Not the place for a tutorial, just quick highlights. * Goroutines A goroutine is a function running independently in the same address space as other goroutines .code waza/snippets /f.runs/ .code waza/snippets /f.starts.running/,/return/ Like launching a function with shell's `&` notation. * Goroutines are not threads (They're a bit like threads, but they're much cheaper.) Goroutines are multiplexed onto OS threads as required. When a goroutine blocks, that thread blocks but no other goroutine blocks. * Channels Channels are typed values that allow goroutines to synchronize and exchange information. .code waza/snippets /make.*chan/,/completedAt/ * Select The `select` statement is like a `switch`, but the decision is based on ability to communicate rather than equal values. .code waza/snippets /select/,/}/ * Go really supports concurrency Really. It's routine to create thousands of goroutines in one program. (Once debugged a program after it had created 1.3 million.) Stacks start small, but grow and shrink as required. Goroutines aren't free, but they're very cheap. * Closures are also part of the story Make some concurrent calculations easier to express. They are just local functions. Here's a non-concurrent example: .code waza/snippets /Compose/,/sin,/ * Some examples Learn concurrent Go by osmosis. * Launching daemons Use a closure to wrap a background operation. This copies items from the input channel to the output channel: .code waza/snippets /copy.input/,/^}/ The `for` `range` operation runs until channel is drained. * A simple load balancer (1) A unit of work: .code waza/load1 /type/,/^}/ * A simple load balancer (2) A worker task .code waza/load1 /worker/,/^}/ Must make sure other workers can run when one blocks. * A simple load balancer (3) The runner .code waza/load1 /Run/,/^}/ Easy problem but also hard to solve concisely without concurrency. * Concurrency enables parallelism The load balancer is implicitly parallel and scalable. `NumWorkers` could be huge. The tools of concurrency make it almost trivial to build a safe, working, scalable, parallel design. * Concurrency simplifies synchronization No explicit synchronization needed. The structure of the program is implicitly synchronized. * That was too easy Let's do a more realistic load balancer. * Load balancer .image waza/gopherchart.jpg * Request definition The requester sends Requests to the balancer .code waza/load2 /^type.Request/,/^}/ Note the return channel inside the request. Channels are first-class values. * Requester function An artificial but illustrative simulation of a requester, a load generator. .code waza/load2 /^func.requester/,/^}/ * Worker definition A channel of requests, plus some load tracking data. .code waza/load2 /type.Worker/,/^}/ * Worker Balancer sends request to most lightly loaded worker .code waza/load2 /^func.*work.*done/,/^}/ The channel of requests (`w.requests`) delivers requests to each worker. The balancer tracks the number of pending requests as a measure of load. Each response goes directly to its requester. Could run the loop body as a goroutine for parallelism. * Balancer definition The load balancer needs a pool of workers and a single channel to which requesters can report task completion. .code waza/load2 /type.Pool/,/^}/ * Balancer function Easy! .code waza/load2 /func.*balance/,/^}/ Just need to implement dispatch and completed. * A heap of channels Make Pool an implementation of the `Heap` interface by providing a few methods such as: .code waza/load2 /func.*Less/,/^}/ Now we balance by making the `Pool` a heap tracked by load. * Dispatch All the pieces are in place. .code waza/load2 /Send.Request/,/^}/ * Completed .code waza/load2 /Job.is.complete/,/^}/ * Lesson A complex problem can be broken down into easy-to-understand components. The pieces can be composed concurrently. The result is easy to understand, efficient, scalable, and correct. Maybe even parallel. * One more example We have a replicated database and want to minimize latency by asking them all and returning the first response to arrive. * Query a replicated database .code waza/snippets /func.Query/,/^}/ Concurrent tools and garbage collection make this an easy solution to a subtle problem. (Teardown of late finishers is left as an exercise.) * Conclusion Concurrency is powerful. Concurrency is not parallelism. Concurrency enables parallelism. Concurrency makes parallelism (and scaling and everything else) easy. * For more information Go: golang.org Some history: swtch.com/~rsc/thread/ A previous talk (video): tinyurl.com/newsqueak1 Parallelism is not concurrency (Harper): tinyurl.com/pincharper A concurrent window system (Pike): tinyurl.com/pikecws Concurrent power series (McIlroy): tinyurl.com/powser And finally, parallel but not concurrent: research.google.com/archive/sawzall.html