Source file src/runtime/mgc.go
1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // Garbage collector (GC). 6 // 7 // The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8 // GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9 // non-generational and non-compacting. Allocation is done using size segregated per P allocation 10 // areas to minimize fragmentation while eliminating locks in the common case. 11 // 12 // The algorithm decomposes into several steps. 13 // This is a high level description of the algorithm being used. For an overview of GC a good 14 // place to start is Richard Jones' gchandbook.org. 15 // 16 // The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17 // Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18 // On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19 // 966-975. 20 // For journal quality proofs that these steps are complete, correct, and terminate see 21 // Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22 // Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23 // 24 // 1. GC performs sweep termination. 25 // 26 // a. Stop the world. This causes all Ps to reach a GC safe-point. 27 // 28 // b. Sweep any unswept spans. There will only be unswept spans if 29 // this GC cycle was forced before the expected time. 30 // 31 // 2. GC performs the mark phase. 32 // 33 // a. Prepare for the mark phase by setting gcphase to _GCmark 34 // (from _GCoff), enabling the write barrier, enabling mutator 35 // assists, and enqueueing root mark jobs. No objects may be 36 // scanned until all Ps have enabled the write barrier, which is 37 // accomplished using STW. 38 // 39 // b. Start the world. From this point, GC work is done by mark 40 // workers started by the scheduler and by assists performed as 41 // part of allocation. The write barrier shades both the 42 // overwritten pointer and the new pointer value for any pointer 43 // writes (see mbarrier.go for details). Newly allocated objects 44 // are immediately marked black. 45 // 46 // c. GC performs root marking jobs. This includes scanning all 47 // stacks, shading all globals, and shading any heap pointers in 48 // off-heap runtime data structures. Scanning a stack stops a 49 // goroutine, shades any pointers found on its stack, and then 50 // resumes the goroutine. 51 // 52 // d. GC drains the work queue of grey objects, scanning each grey 53 // object to black and shading all pointers found in the object 54 // (which in turn may add those pointers to the work queue). 55 // 56 // e. Because GC work is spread across local caches, GC uses a 57 // distributed termination algorithm to detect when there are no 58 // more root marking jobs or grey objects (see gcMarkDone). At this 59 // point, GC transitions to mark termination. 60 // 61 // 3. GC performs mark termination. 62 // 63 // a. Stop the world. 64 // 65 // b. Set gcphase to _GCmarktermination, and disable workers and 66 // assists. 67 // 68 // c. Perform housekeeping like flushing mcaches. 69 // 70 // 4. GC performs the sweep phase. 71 // 72 // a. Prepare for the sweep phase by setting gcphase to _GCoff, 73 // setting up sweep state and disabling the write barrier. 74 // 75 // b. Start the world. From this point on, newly allocated objects 76 // are white, and allocating sweeps spans before use if necessary. 77 // 78 // c. GC does concurrent sweeping in the background and in response 79 // to allocation. See description below. 80 // 81 // 5. When sufficient allocation has taken place, replay the sequence 82 // starting with 1 above. See discussion of GC rate below. 83 84 // Concurrent sweep. 85 // 86 // The sweep phase proceeds concurrently with normal program execution. 87 // The heap is swept span-by-span both lazily (when a goroutine needs another span) 88 // and concurrently in a background goroutine (this helps programs that are not CPU bound). 89 // At the end of STW mark termination all spans are marked as "needs sweeping". 90 // 91 // The background sweeper goroutine simply sweeps spans one-by-one. 92 // 93 // To avoid requesting more OS memory while there are unswept spans, when a 94 // goroutine needs another span, it first attempts to reclaim that much memory 95 // by sweeping. When a goroutine needs to allocate a new small-object span, it 96 // sweeps small-object spans for the same object size until it frees at least 97 // one object. When a goroutine needs to allocate large-object span from heap, 98 // it sweeps spans until it frees at least that many pages into heap. There is 99 // one case where this may not suffice: if a goroutine sweeps and frees two 100 // nonadjacent one-page spans to the heap, it will allocate a new two-page 101 // span, but there can still be other one-page unswept spans which could be 102 // combined into a two-page span. 103 // 104 // It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105 // mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106 // so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107 // When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108 // the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109 // The finalizer goroutine is kicked off only when all spans are swept. 110 // When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112 // GC rate. 113 // Next GC is after we've allocated an extra amount of memory proportional to 114 // the amount already in use. The proportion is controlled by GOGC environment variable 115 // (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116 // (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in 117 // linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118 // (and also the amount of extra memory used). 119 120 // Oblets 121 // 122 // In order to prevent long pauses while scanning large objects and to 123 // improve parallelism, the garbage collector breaks up scan jobs for 124 // objects larger than maxObletBytes into "oblets" of at most 125 // maxObletBytes. When scanning encounters the beginning of a large 126 // object, it scans only the first oblet and enqueues the remaining 127 // oblets as new scan jobs. 128 129 package runtime 130 131 import ( 132 "internal/cpu" 133 "internal/runtime/atomic" 134 "unsafe" 135 ) 136 137 const ( 138 _DebugGC = 0 139 _FinBlockSize = 4 * 1024 140 141 // concurrentSweep is a debug flag. Disabling this flag 142 // ensures all spans are swept while the world is stopped. 143 concurrentSweep = true 144 145 // debugScanConservative enables debug logging for stack 146 // frames that are scanned conservatively. 147 debugScanConservative = false 148 149 // sweepMinHeapDistance is a lower bound on the heap distance 150 // (in bytes) reserved for concurrent sweeping between GC 151 // cycles. 152 sweepMinHeapDistance = 1024 * 1024 153 ) 154 155 // heapObjectsCanMove always returns false in the current garbage collector. 156 // It exists for go4.org/unsafe/assume-no-moving-gc, which is an 157 // unfortunate idea that had an even more unfortunate implementation. 158 // Every time a new Go release happened, the package stopped building, 159 // and the authors had to add a new file with a new //go:build line, and 160 // then the entire ecosystem of packages with that as a dependency had to 161 // explicitly update to the new version. Many packages depend on 162 // assume-no-moving-gc transitively, through paths like 163 // inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc. 164 // This was causing a significant amount of friction around each new 165 // release, so we added this bool for the package to //go:linkname 166 // instead. The bool is still unfortunate, but it's not as bad as 167 // breaking the ecosystem on every new release. 168 // 169 // If the Go garbage collector ever does move heap objects, we can set 170 // this to true to break all the programs using assume-no-moving-gc. 171 // 172 //go:linkname heapObjectsCanMove 173 func heapObjectsCanMove() bool { 174 return false 175 } 176 177 func gcinit() { 178 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 179 throw("size of Workbuf is suboptimal") 180 } 181 // No sweep on the first cycle. 182 sweep.active.state.Store(sweepDrainedMask) 183 184 // Initialize GC pacer state. 185 // Use the environment variable GOGC for the initial gcPercent value. 186 // Use the environment variable GOMEMLIMIT for the initial memoryLimit value. 187 gcController.init(readGOGC(), readGOMEMLIMIT()) 188 189 work.startSema = 1 190 work.markDoneSema = 1 191 lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters) 192 lockInit(&work.assistQueue.lock, lockRankAssistQueue) 193 lockInit(&work.wbufSpans.lock, lockRankWbufSpans) 194 } 195 196 // gcenable is called after the bulk of the runtime initialization, 197 // just before we're about to start letting user code run. 198 // It kicks off the background sweeper goroutine, the background 199 // scavenger goroutine, and enables GC. 200 func gcenable() { 201 // Kick off sweeping and scavenging. 202 c := make(chan int, 2) 203 go bgsweep(c) 204 go bgscavenge(c) 205 <-c 206 <-c 207 memstats.enablegc = true // now that runtime is initialized, GC is okay 208 } 209 210 // Garbage collector phase. 211 // Indicates to write barrier and synchronization task to perform. 212 var gcphase uint32 213 214 // The compiler knows about this variable. 215 // If you change it, you must change builtin/runtime.go, too. 216 // If you change the first four bytes, you must also change the write 217 // barrier insertion code. 218 var writeBarrier struct { 219 enabled bool // compiler emits a check of this before calling write barrier 220 pad [3]byte // compiler uses 32-bit load for "enabled" field 221 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 222 } 223 224 // gcBlackenEnabled is 1 if mutator assists and background mark 225 // workers are allowed to blacken objects. This must only be set when 226 // gcphase == _GCmark. 227 var gcBlackenEnabled uint32 228 229 const ( 230 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 231 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 232 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 233 ) 234 235 //go:nosplit 236 func setGCPhase(x uint32) { 237 atomic.Store(&gcphase, x) 238 writeBarrier.enabled = gcphase == _GCmark || gcphase == _GCmarktermination 239 } 240 241 // gcMarkWorkerMode represents the mode that a concurrent mark worker 242 // should operate in. 243 // 244 // Concurrent marking happens through four different mechanisms. One 245 // is mutator assists, which happen in response to allocations and are 246 // not scheduled. The other three are variations in the per-P mark 247 // workers and are distinguished by gcMarkWorkerMode. 248 type gcMarkWorkerMode int 249 250 const ( 251 // gcMarkWorkerNotWorker indicates that the next scheduled G is not 252 // starting work and the mode should be ignored. 253 gcMarkWorkerNotWorker gcMarkWorkerMode = iota 254 255 // gcMarkWorkerDedicatedMode indicates that the P of a mark 256 // worker is dedicated to running that mark worker. The mark 257 // worker should run without preemption. 258 gcMarkWorkerDedicatedMode 259 260 // gcMarkWorkerFractionalMode indicates that a P is currently 261 // running the "fractional" mark worker. The fractional worker 262 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 263 // an integer and using only dedicated workers would result in 264 // utilization too far from the target of gcBackgroundUtilization. 265 // The fractional worker should run until it is preempted and 266 // will be scheduled to pick up the fractional part of 267 // GOMAXPROCS*gcBackgroundUtilization. 268 gcMarkWorkerFractionalMode 269 270 // gcMarkWorkerIdleMode indicates that a P is running the mark 271 // worker because it has nothing else to do. The idle worker 272 // should run until it is preempted and account its time 273 // against gcController.idleMarkTime. 274 gcMarkWorkerIdleMode 275 ) 276 277 // gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 278 // to use in execution traces. 279 var gcMarkWorkerModeStrings = [...]string{ 280 "Not worker", 281 "GC (dedicated)", 282 "GC (fractional)", 283 "GC (idle)", 284 } 285 286 // pollFractionalWorkerExit reports whether a fractional mark worker 287 // should self-preempt. It assumes it is called from the fractional 288 // worker. 289 func pollFractionalWorkerExit() bool { 290 // This should be kept in sync with the fractional worker 291 // scheduler logic in findRunnableGCWorker. 292 now := nanotime() 293 delta := now - gcController.markStartTime 294 if delta <= 0 { 295 return true 296 } 297 p := getg().m.p.ptr() 298 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 299 // Add some slack to the utilization goal so that the 300 // fractional worker isn't behind again the instant it exits. 301 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 302 } 303 304 var work workType 305 306 type workType struct { 307 full lfstack // lock-free list of full blocks workbuf 308 _ cpu.CacheLinePad // prevents false-sharing between full and empty 309 empty lfstack // lock-free list of empty blocks workbuf 310 _ cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait 311 312 wbufSpans struct { 313 lock mutex 314 // free is a list of spans dedicated to workbufs, but 315 // that don't currently contain any workbufs. 316 free mSpanList 317 // busy is a list of all spans containing workbufs on 318 // one of the workbuf lists. 319 busy mSpanList 320 } 321 322 // Restore 64-bit alignment on 32-bit. 323 _ uint32 324 325 // bytesMarked is the number of bytes marked this cycle. This 326 // includes bytes blackened in scanned objects, noscan objects 327 // that go straight to black, and permagrey objects scanned by 328 // markroot during the concurrent scan phase. This is updated 329 // atomically during the cycle. Updates may be batched 330 // arbitrarily, since the value is only read at the end of the 331 // cycle. 332 // 333 // Because of benign races during marking, this number may not 334 // be the exact number of marked bytes, but it should be very 335 // close. 336 // 337 // Put this field here because it needs 64-bit atomic access 338 // (and thus 8-byte alignment even on 32-bit architectures). 339 bytesMarked uint64 340 341 markrootNext uint32 // next markroot job 342 markrootJobs uint32 // number of markroot jobs 343 344 nproc uint32 345 tstart int64 346 nwait uint32 347 348 // Number of roots of various root types. Set by gcMarkRootPrepare. 349 // 350 // nStackRoots == len(stackRoots), but we have nStackRoots for 351 // consistency. 352 nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int 353 354 // Base indexes of each root type. Set by gcMarkRootPrepare. 355 baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32 356 357 // stackRoots is a snapshot of all of the Gs that existed 358 // before the beginning of concurrent marking. The backing 359 // store of this must not be modified because it might be 360 // shared with allgs. 361 stackRoots []*g 362 363 // Each type of GC state transition is protected by a lock. 364 // Since multiple threads can simultaneously detect the state 365 // transition condition, any thread that detects a transition 366 // condition must acquire the appropriate transition lock, 367 // re-check the transition condition and return if it no 368 // longer holds or perform the transition if it does. 369 // Likewise, any transition must invalidate the transition 370 // condition before releasing the lock. This ensures that each 371 // transition is performed by exactly one thread and threads 372 // that need the transition to happen block until it has 373 // happened. 374 // 375 // startSema protects the transition from "off" to mark or 376 // mark termination. 377 startSema uint32 378 // markDoneSema protects transitions from mark to mark termination. 379 markDoneSema uint32 380 381 bgMarkDone uint32 // cas to 1 when at a background mark completion point 382 // Background mark completion signaling 383 384 // mode is the concurrency mode of the current GC cycle. 385 mode gcMode 386 387 // userForced indicates the current GC cycle was forced by an 388 // explicit user call. 389 userForced bool 390 391 // initialHeapLive is the value of gcController.heapLive at the 392 // beginning of this GC cycle. 393 initialHeapLive uint64 394 395 // assistQueue is a queue of assists that are blocked because 396 // there was neither enough credit to steal or enough work to 397 // do. 398 assistQueue struct { 399 lock mutex 400 q gQueue 401 } 402 403 // sweepWaiters is a list of blocked goroutines to wake when 404 // we transition from mark termination to sweep. 405 sweepWaiters struct { 406 lock mutex 407 list gList 408 } 409 410 // cycles is the number of completed GC cycles, where a GC 411 // cycle is sweep termination, mark, mark termination, and 412 // sweep. This differs from memstats.numgc, which is 413 // incremented at mark termination. 414 cycles atomic.Uint32 415 416 // Timing/utilization stats for this cycle. 417 stwprocs, maxprocs int32 418 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 419 420 // pauseNS is the total STW time this cycle, measured as the time between 421 // when stopping began (just before trying to stop Ps) and just after the 422 // world started again. 423 pauseNS int64 424 425 // debug.gctrace heap sizes for this cycle. 426 heap0, heap1, heap2 uint64 427 428 // Cumulative estimated CPU usage. 429 cpuStats 430 } 431 432 // GC runs a garbage collection and blocks the caller until the 433 // garbage collection is complete. It may also block the entire 434 // program. 435 func GC() { 436 // We consider a cycle to be: sweep termination, mark, mark 437 // termination, and sweep. This function shouldn't return 438 // until a full cycle has been completed, from beginning to 439 // end. Hence, we always want to finish up the current cycle 440 // and start a new one. That means: 441 // 442 // 1. In sweep termination, mark, or mark termination of cycle 443 // N, wait until mark termination N completes and transitions 444 // to sweep N. 445 // 446 // 2. In sweep N, help with sweep N. 447 // 448 // At this point we can begin a full cycle N+1. 449 // 450 // 3. Trigger cycle N+1 by starting sweep termination N+1. 451 // 452 // 4. Wait for mark termination N+1 to complete. 453 // 454 // 5. Help with sweep N+1 until it's done. 455 // 456 // This all has to be written to deal with the fact that the 457 // GC may move ahead on its own. For example, when we block 458 // until mark termination N, we may wake up in cycle N+2. 459 460 // Wait until the current sweep termination, mark, and mark 461 // termination complete. 462 n := work.cycles.Load() 463 gcWaitOnMark(n) 464 465 // We're now in sweep N or later. Trigger GC cycle N+1, which 466 // will first finish sweep N if necessary and then enter sweep 467 // termination N+1. 468 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 469 470 // Wait for mark termination N+1 to complete. 471 gcWaitOnMark(n + 1) 472 473 // Finish sweep N+1 before returning. We do this both to 474 // complete the cycle and because runtime.GC() is often used 475 // as part of tests and benchmarks to get the system into a 476 // relatively stable and isolated state. 477 for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) { 478 Gosched() 479 } 480 481 // Callers may assume that the heap profile reflects the 482 // just-completed cycle when this returns (historically this 483 // happened because this was a STW GC), but right now the 484 // profile still reflects mark termination N, not N+1. 485 // 486 // As soon as all of the sweep frees from cycle N+1 are done, 487 // we can go ahead and publish the heap profile. 488 // 489 // First, wait for sweeping to finish. (We know there are no 490 // more spans on the sweep queue, but we may be concurrently 491 // sweeping spans, so we have to wait.) 492 for work.cycles.Load() == n+1 && !isSweepDone() { 493 Gosched() 494 } 495 496 // Now we're really done with sweeping, so we can publish the 497 // stable heap profile. Only do this if we haven't already hit 498 // another mark termination. 499 mp := acquirem() 500 cycle := work.cycles.Load() 501 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 502 mProf_PostSweep() 503 } 504 releasem(mp) 505 } 506 507 // gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 508 // already completed this mark phase, it returns immediately. 509 func gcWaitOnMark(n uint32) { 510 for { 511 // Disable phase transitions. 512 lock(&work.sweepWaiters.lock) 513 nMarks := work.cycles.Load() 514 if gcphase != _GCmark { 515 // We've already completed this cycle's mark. 516 nMarks++ 517 } 518 if nMarks > n { 519 // We're done. 520 unlock(&work.sweepWaiters.lock) 521 return 522 } 523 524 // Wait until sweep termination, mark, and mark 525 // termination of cycle N complete. 526 work.sweepWaiters.list.push(getg()) 527 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceBlockUntilGCEnds, 1) 528 } 529 } 530 531 // gcMode indicates how concurrent a GC cycle should be. 532 type gcMode int 533 534 const ( 535 gcBackgroundMode gcMode = iota // concurrent GC and sweep 536 gcForceMode // stop-the-world GC now, concurrent sweep 537 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 538 ) 539 540 // A gcTrigger is a predicate for starting a GC cycle. Specifically, 541 // it is an exit condition for the _GCoff phase. 542 type gcTrigger struct { 543 kind gcTriggerKind 544 now int64 // gcTriggerTime: current time 545 n uint32 // gcTriggerCycle: cycle number to start 546 } 547 548 type gcTriggerKind int 549 550 const ( 551 // gcTriggerHeap indicates that a cycle should be started when 552 // the heap size reaches the trigger heap size computed by the 553 // controller. 554 gcTriggerHeap gcTriggerKind = iota 555 556 // gcTriggerTime indicates that a cycle should be started when 557 // it's been more than forcegcperiod nanoseconds since the 558 // previous GC cycle. 559 gcTriggerTime 560 561 // gcTriggerCycle indicates that a cycle should be started if 562 // we have not yet started cycle number gcTrigger.n (relative 563 // to work.cycles). 564 gcTriggerCycle 565 ) 566 567 // test reports whether the trigger condition is satisfied, meaning 568 // that the exit condition for the _GCoff phase has been met. The exit 569 // condition should be tested when allocating. 570 func (t gcTrigger) test() bool { 571 if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff { 572 return false 573 } 574 switch t.kind { 575 case gcTriggerHeap: 576 trigger, _ := gcController.trigger() 577 return gcController.heapLive.Load() >= trigger 578 case gcTriggerTime: 579 if gcController.gcPercent.Load() < 0 { 580 return false 581 } 582 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 583 return lastgc != 0 && t.now-lastgc > forcegcperiod 584 case gcTriggerCycle: 585 // t.n > work.cycles, but accounting for wraparound. 586 return int32(t.n-work.cycles.Load()) > 0 587 } 588 return true 589 } 590 591 // gcStart starts the GC. It transitions from _GCoff to _GCmark (if 592 // debug.gcstoptheworld == 0) or performs all of GC (if 593 // debug.gcstoptheworld != 0). 594 // 595 // This may return without performing this transition in some cases, 596 // such as when called on a system stack or with locks held. 597 func gcStart(trigger gcTrigger) { 598 // Since this is called from malloc and malloc is called in 599 // the guts of a number of libraries that might be holding 600 // locks, don't attempt to start GC in non-preemptible or 601 // potentially unstable situations. 602 mp := acquirem() 603 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 604 releasem(mp) 605 return 606 } 607 releasem(mp) 608 mp = nil 609 610 // Pick up the remaining unswept/not being swept spans concurrently 611 // 612 // This shouldn't happen if we're being invoked in background 613 // mode since proportional sweep should have just finished 614 // sweeping everything, but rounding errors, etc, may leave a 615 // few spans unswept. In forced mode, this is necessary since 616 // GC can be forced at any point in the sweeping cycle. 617 // 618 // We check the transition condition continuously here in case 619 // this G gets delayed in to the next GC cycle. 620 for trigger.test() && sweepone() != ^uintptr(0) { 621 } 622 623 // Perform GC initialization and the sweep termination 624 // transition. 625 semacquire(&work.startSema) 626 // Re-check transition condition under transition lock. 627 if !trigger.test() { 628 semrelease(&work.startSema) 629 return 630 } 631 632 // In gcstoptheworld debug mode, upgrade the mode accordingly. 633 // We do this after re-checking the transition condition so 634 // that multiple goroutines that detect the heap trigger don't 635 // start multiple STW GCs. 636 mode := gcBackgroundMode 637 if debug.gcstoptheworld == 1 { 638 mode = gcForceMode 639 } else if debug.gcstoptheworld == 2 { 640 mode = gcForceBlockMode 641 } 642 643 // Ok, we're doing it! Stop everybody else 644 semacquire(&gcsema) 645 semacquire(&worldsema) 646 647 // For stats, check if this GC was forced by the user. 648 // Update it under gcsema to avoid gctrace getting wrong values. 649 work.userForced = trigger.kind == gcTriggerCycle 650 651 trace := traceAcquire() 652 if trace.ok() { 653 trace.GCStart() 654 traceRelease(trace) 655 } 656 657 // Check that all Ps have finished deferred mcache flushes. 658 for _, p := range allp { 659 if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen { 660 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 661 throw("p mcache not flushed") 662 } 663 } 664 665 gcBgMarkStartWorkers() 666 667 systemstack(gcResetMarkState) 668 669 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 670 if work.stwprocs > ncpu { 671 // This is used to compute CPU time of the STW phases, 672 // so it can't be more than ncpu, even if GOMAXPROCS is. 673 work.stwprocs = ncpu 674 } 675 work.heap0 = gcController.heapLive.Load() 676 work.pauseNS = 0 677 work.mode = mode 678 679 now := nanotime() 680 work.tSweepTerm = now 681 var stw worldStop 682 systemstack(func() { 683 stw = stopTheWorldWithSema(stwGCSweepTerm) 684 }) 685 686 // Accumulate fine-grained stopping time. 687 work.cpuStats.accumulateGCPauseTime(stw.stoppingCPUTime, 1) 688 689 // Finish sweep before we start concurrent scan. 690 systemstack(func() { 691 finishsweep_m() 692 }) 693 694 // clearpools before we start the GC. If we wait the memory will not be 695 // reclaimed until the next GC cycle. 696 clearpools() 697 698 work.cycles.Add(1) 699 700 // Assists and workers can start the moment we start 701 // the world. 702 gcController.startCycle(now, int(gomaxprocs), trigger) 703 704 // Notify the CPU limiter that assists may begin. 705 gcCPULimiter.startGCTransition(true, now) 706 707 // In STW mode, disable scheduling of user Gs. This may also 708 // disable scheduling of this goroutine, so it may block as 709 // soon as we start the world again. 710 if mode != gcBackgroundMode { 711 schedEnableUser(false) 712 } 713 714 // Enter concurrent mark phase and enable 715 // write barriers. 716 // 717 // Because the world is stopped, all Ps will 718 // observe that write barriers are enabled by 719 // the time we start the world and begin 720 // scanning. 721 // 722 // Write barriers must be enabled before assists are 723 // enabled because they must be enabled before 724 // any non-leaf heap objects are marked. Since 725 // allocations are blocked until assists can 726 // happen, we want to enable assists as early as 727 // possible. 728 setGCPhase(_GCmark) 729 730 gcBgMarkPrepare() // Must happen before assists are enabled. 731 gcMarkRootPrepare() 732 733 // Mark all active tinyalloc blocks. Since we're 734 // allocating from these, they need to be black like 735 // other allocations. The alternative is to blacken 736 // the tiny block on every allocation from it, which 737 // would slow down the tiny allocator. 738 gcMarkTinyAllocs() 739 740 // At this point all Ps have enabled the write 741 // barrier, thus maintaining the no white to 742 // black invariant. Enable mutator assists to 743 // put back-pressure on fast allocating 744 // mutators. 745 atomic.Store(&gcBlackenEnabled, 1) 746 747 // In STW mode, we could block the instant systemstack 748 // returns, so make sure we're not preemptible. 749 mp = acquirem() 750 751 // Update the CPU stats pause time. 752 // 753 // Use maxprocs instead of stwprocs here because the total time 754 // computed in the CPU stats is based on maxprocs, and we want them 755 // to be comparable. 756 work.cpuStats.accumulateGCPauseTime(nanotime()-stw.finishedStopping, work.maxprocs) 757 758 // Concurrent mark. 759 systemstack(func() { 760 now = startTheWorldWithSema(0, stw) 761 work.pauseNS += now - stw.startedStopping 762 work.tMark = now 763 764 // Release the CPU limiter. 765 gcCPULimiter.finishGCTransition(now) 766 }) 767 768 // Release the world sema before Gosched() in STW mode 769 // because we will need to reacquire it later but before 770 // this goroutine becomes runnable again, and we could 771 // self-deadlock otherwise. 772 semrelease(&worldsema) 773 releasem(mp) 774 775 // Make sure we block instead of returning to user code 776 // in STW mode. 777 if mode != gcBackgroundMode { 778 Gosched() 779 } 780 781 semrelease(&work.startSema) 782 } 783 784 // gcMarkDoneFlushed counts the number of P's with flushed work. 785 // 786 // Ideally this would be a captured local in gcMarkDone, but forEachP 787 // escapes its callback closure, so it can't capture anything. 788 // 789 // This is protected by markDoneSema. 790 var gcMarkDoneFlushed uint32 791 792 // gcMarkDone transitions the GC from mark to mark termination if all 793 // reachable objects have been marked (that is, there are no grey 794 // objects and can be no more in the future). Otherwise, it flushes 795 // all local work to the global queues where it can be discovered by 796 // other workers. 797 // 798 // This should be called when all local mark work has been drained and 799 // there are no remaining workers. Specifically, when 800 // 801 // work.nwait == work.nproc && !gcMarkWorkAvailable(p) 802 // 803 // The calling context must be preemptible. 804 // 805 // Flushing local work is important because idle Ps may have local 806 // work queued. This is the only way to make that work visible and 807 // drive GC to completion. 808 // 809 // It is explicitly okay to have write barriers in this function. If 810 // it does transition to mark termination, then all reachable objects 811 // have been marked, so the write barrier cannot shade any more 812 // objects. 813 func gcMarkDone() { 814 // Ensure only one thread is running the ragged barrier at a 815 // time. 816 semacquire(&work.markDoneSema) 817 818 top: 819 // Re-check transition condition under transition lock. 820 // 821 // It's critical that this checks the global work queues are 822 // empty before performing the ragged barrier. Otherwise, 823 // there could be global work that a P could take after the P 824 // has passed the ragged barrier. 825 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 826 semrelease(&work.markDoneSema) 827 return 828 } 829 830 // forEachP needs worldsema to execute, and we'll need it to 831 // stop the world later, so acquire worldsema now. 832 semacquire(&worldsema) 833 834 // Flush all local buffers and collect flushedWork flags. 835 gcMarkDoneFlushed = 0 836 forEachP(waitReasonGCMarkTermination, func(pp *p) { 837 // Flush the write barrier buffer, since this may add 838 // work to the gcWork. 839 wbBufFlush1(pp) 840 841 // Flush the gcWork, since this may create global work 842 // and set the flushedWork flag. 843 // 844 // TODO(austin): Break up these workbufs to 845 // better distribute work. 846 pp.gcw.dispose() 847 // Collect the flushedWork flag. 848 if pp.gcw.flushedWork { 849 atomic.Xadd(&gcMarkDoneFlushed, 1) 850 pp.gcw.flushedWork = false 851 } 852 }) 853 854 if gcMarkDoneFlushed != 0 { 855 // More grey objects were discovered since the 856 // previous termination check, so there may be more 857 // work to do. Keep going. It's possible the 858 // transition condition became true again during the 859 // ragged barrier, so re-check it. 860 semrelease(&worldsema) 861 goto top 862 } 863 864 // There was no global work, no local work, and no Ps 865 // communicated work since we took markDoneSema. Therefore 866 // there are no grey objects and no more objects can be 867 // shaded. Transition to mark termination. 868 now := nanotime() 869 work.tMarkTerm = now 870 getg().m.preemptoff = "gcing" 871 var stw worldStop 872 systemstack(func() { 873 stw = stopTheWorldWithSema(stwGCMarkTerm) 874 }) 875 // The gcphase is _GCmark, it will transition to _GCmarktermination 876 // below. The important thing is that the wb remains active until 877 // all marking is complete. This includes writes made by the GC. 878 879 // Accumulate fine-grained stopping time. 880 work.cpuStats.accumulateGCPauseTime(stw.stoppingCPUTime, 1) 881 882 // There is sometimes work left over when we enter mark termination due 883 // to write barriers performed after the completion barrier above. 884 // Detect this and resume concurrent mark. This is obviously 885 // unfortunate. 886 // 887 // See issue #27993 for details. 888 // 889 // Switch to the system stack to call wbBufFlush1, though in this case 890 // it doesn't matter because we're non-preemptible anyway. 891 restart := false 892 systemstack(func() { 893 for _, p := range allp { 894 wbBufFlush1(p) 895 if !p.gcw.empty() { 896 restart = true 897 break 898 } 899 } 900 }) 901 if restart { 902 getg().m.preemptoff = "" 903 systemstack(func() { 904 // Accumulate the time we were stopped before we had to start again. 905 work.cpuStats.accumulateGCPauseTime(nanotime()-stw.finishedStopping, work.maxprocs) 906 907 // Start the world again. 908 now := startTheWorldWithSema(0, stw) 909 work.pauseNS += now - stw.startedStopping 910 }) 911 semrelease(&worldsema) 912 goto top 913 } 914 915 gcComputeStartingStackSize() 916 917 // Disable assists and background workers. We must do 918 // this before waking blocked assists. 919 atomic.Store(&gcBlackenEnabled, 0) 920 921 // Notify the CPU limiter that GC assists will now cease. 922 gcCPULimiter.startGCTransition(false, now) 923 924 // Wake all blocked assists. These will run when we 925 // start the world again. 926 gcWakeAllAssists() 927 928 // Likewise, release the transition lock. Blocked 929 // workers and assists will run when we start the 930 // world again. 931 semrelease(&work.markDoneSema) 932 933 // In STW mode, re-enable user goroutines. These will be 934 // queued to run after we start the world. 935 schedEnableUser(true) 936 937 // endCycle depends on all gcWork cache stats being flushed. 938 // The termination algorithm above ensured that up to 939 // allocations since the ragged barrier. 940 gcController.endCycle(now, int(gomaxprocs), work.userForced) 941 942 // Perform mark termination. This will restart the world. 943 gcMarkTermination(stw) 944 } 945 946 // World must be stopped and mark assists and background workers must be 947 // disabled. 948 func gcMarkTermination(stw worldStop) { 949 // Start marktermination (write barrier remains enabled for now). 950 setGCPhase(_GCmarktermination) 951 952 work.heap1 = gcController.heapLive.Load() 953 startTime := nanotime() 954 955 mp := acquirem() 956 mp.preemptoff = "gcing" 957 mp.traceback = 2 958 curgp := mp.curg 959 // N.B. The execution tracer is not aware of this status 960 // transition and handles it specially based on the 961 // wait reason. 962 casGToWaitingForGC(curgp, _Grunning, waitReasonGarbageCollection) 963 964 // Run gc on the g0 stack. We do this so that the g stack 965 // we're currently running on will no longer change. Cuts 966 // the root set down a bit (g0 stacks are not scanned, and 967 // we don't need to scan gc's internal state). We also 968 // need to switch to g0 so we can shrink the stack. 969 systemstack(func() { 970 gcMark(startTime) 971 // Must return immediately. 972 // The outer function's stack may have moved 973 // during gcMark (it shrinks stacks, including the 974 // outer function's stack), so we must not refer 975 // to any of its variables. Return back to the 976 // non-system stack to pick up the new addresses 977 // before continuing. 978 }) 979 980 var stwSwept bool 981 systemstack(func() { 982 work.heap2 = work.bytesMarked 983 if debug.gccheckmark > 0 { 984 // Run a full non-parallel, stop-the-world 985 // mark using checkmark bits, to check that we 986 // didn't forget to mark anything during the 987 // concurrent mark process. 988 startCheckmarks() 989 gcResetMarkState() 990 gcw := &getg().m.p.ptr().gcw 991 gcDrain(gcw, 0) 992 wbBufFlush1(getg().m.p.ptr()) 993 gcw.dispose() 994 endCheckmarks() 995 } 996 997 // marking is complete so we can turn the write barrier off 998 setGCPhase(_GCoff) 999 stwSwept = gcSweep(work.mode) 1000 }) 1001 1002 mp.traceback = 0 1003 casgstatus(curgp, _Gwaiting, _Grunning) 1004 1005 trace := traceAcquire() 1006 if trace.ok() { 1007 trace.GCDone() 1008 traceRelease(trace) 1009 } 1010 1011 // all done 1012 mp.preemptoff = "" 1013 1014 if gcphase != _GCoff { 1015 throw("gc done but gcphase != _GCoff") 1016 } 1017 1018 // Record heapInUse for scavenger. 1019 memstats.lastHeapInUse = gcController.heapInUse.load() 1020 1021 // Update GC trigger and pacing, as well as downstream consumers 1022 // of this pacing information, for the next cycle. 1023 systemstack(gcControllerCommit) 1024 1025 // Update timing memstats 1026 now := nanotime() 1027 sec, nsec, _ := time_now() 1028 unixNow := sec*1e9 + int64(nsec) 1029 work.pauseNS += now - stw.startedStopping 1030 work.tEnd = now 1031 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1032 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1033 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1034 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1035 memstats.pause_total_ns += uint64(work.pauseNS) 1036 1037 // Accumulate CPU stats. 1038 // 1039 // Use maxprocs instead of stwprocs for GC pause time because the total time 1040 // computed in the CPU stats is based on maxprocs, and we want them to be 1041 // comparable. 1042 // 1043 // Pass gcMarkPhase=true to accumulate so we can get all the latest GC CPU stats 1044 // in there too. 1045 work.cpuStats.accumulateGCPauseTime(now-stw.finishedStopping, work.maxprocs) 1046 work.cpuStats.accumulate(now, true) 1047 1048 // Compute overall GC CPU utilization. 1049 // Omit idle marking time from the overall utilization here since it's "free". 1050 memstats.gc_cpu_fraction = float64(work.cpuStats.GCTotalTime-work.cpuStats.GCIdleTime) / float64(work.cpuStats.TotalTime) 1051 1052 // Reset assist time and background time stats. 1053 // 1054 // Do this now, instead of at the start of the next GC cycle, because 1055 // these two may keep accumulating even if the GC is not active. 1056 scavenge.assistTime.Store(0) 1057 scavenge.backgroundTime.Store(0) 1058 1059 // Reset idle time stat. 1060 sched.idleTime.Store(0) 1061 1062 if work.userForced { 1063 memstats.numforcedgc++ 1064 } 1065 1066 // Bump GC cycle count and wake goroutines waiting on sweep. 1067 lock(&work.sweepWaiters.lock) 1068 memstats.numgc++ 1069 injectglist(&work.sweepWaiters.list) 1070 unlock(&work.sweepWaiters.lock) 1071 1072 // Increment the scavenge generation now. 1073 // 1074 // This moment represents peak heap in use because we're 1075 // about to start sweeping. 1076 mheap_.pages.scav.index.nextGen() 1077 1078 // Release the CPU limiter. 1079 gcCPULimiter.finishGCTransition(now) 1080 1081 // Finish the current heap profiling cycle and start a new 1082 // heap profiling cycle. We do this before starting the world 1083 // so events don't leak into the wrong cycle. 1084 mProf_NextCycle() 1085 1086 // There may be stale spans in mcaches that need to be swept. 1087 // Those aren't tracked in any sweep lists, so we need to 1088 // count them against sweep completion until we ensure all 1089 // those spans have been forced out. 1090 // 1091 // If gcSweep fully swept the heap (for example if the sweep 1092 // is not concurrent due to a GODEBUG setting), then we expect 1093 // the sweepLocker to be invalid, since sweeping is done. 1094 // 1095 // N.B. Below we might duplicate some work from gcSweep; this is 1096 // fine as all that work is idempotent within a GC cycle, and 1097 // we're still holding worldsema so a new cycle can't start. 1098 sl := sweep.active.begin() 1099 if !stwSwept && !sl.valid { 1100 throw("failed to set sweep barrier") 1101 } else if stwSwept && sl.valid { 1102 throw("non-concurrent sweep failed to drain all sweep queues") 1103 } 1104 1105 systemstack(func() { 1106 // The memstats updated above must be updated with the world 1107 // stopped to ensure consistency of some values, such as 1108 // sched.idleTime and sched.totaltime. memstats also include 1109 // the pause time (work,pauseNS), forcing computation of the 1110 // total pause time before the pause actually ends. 1111 // 1112 // Here we reuse the same now for start the world so that the 1113 // time added to /sched/pauses/total/gc:seconds will be 1114 // consistent with the value in memstats. 1115 startTheWorldWithSema(now, stw) 1116 }) 1117 1118 // Flush the heap profile so we can start a new cycle next GC. 1119 // This is relatively expensive, so we don't do it with the 1120 // world stopped. 1121 mProf_Flush() 1122 1123 // Prepare workbufs for freeing by the sweeper. We do this 1124 // asynchronously because it can take non-trivial time. 1125 prepareFreeWorkbufs() 1126 1127 // Free stack spans. This must be done between GC cycles. 1128 systemstack(freeStackSpans) 1129 1130 // Ensure all mcaches are flushed. Each P will flush its own 1131 // mcache before allocating, but idle Ps may not. Since this 1132 // is necessary to sweep all spans, we need to ensure all 1133 // mcaches are flushed before we start the next GC cycle. 1134 // 1135 // While we're here, flush the page cache for idle Ps to avoid 1136 // having pages get stuck on them. These pages are hidden from 1137 // the scavenger, so in small idle heaps a significant amount 1138 // of additional memory might be held onto. 1139 // 1140 // Also, flush the pinner cache, to avoid leaking that memory 1141 // indefinitely. 1142 forEachP(waitReasonFlushProcCaches, func(pp *p) { 1143 pp.mcache.prepareForSweep() 1144 if pp.status == _Pidle { 1145 systemstack(func() { 1146 lock(&mheap_.lock) 1147 pp.pcache.flush(&mheap_.pages) 1148 unlock(&mheap_.lock) 1149 }) 1150 } 1151 pp.pinnerCache = nil 1152 }) 1153 if sl.valid { 1154 // Now that we've swept stale spans in mcaches, they don't 1155 // count against unswept spans. 1156 // 1157 // Note: this sweepLocker may not be valid if sweeping had 1158 // already completed during the STW. See the corresponding 1159 // begin() call that produced sl. 1160 sweep.active.end(sl) 1161 } 1162 1163 // Print gctrace before dropping worldsema. As soon as we drop 1164 // worldsema another cycle could start and smash the stats 1165 // we're trying to print. 1166 if debug.gctrace > 0 { 1167 util := int(memstats.gc_cpu_fraction * 100) 1168 1169 var sbuf [24]byte 1170 printlock() 1171 print("gc ", memstats.numgc, 1172 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1173 util, "%: ") 1174 prev := work.tSweepTerm 1175 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1176 if i != 0 { 1177 print("+") 1178 } 1179 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1180 prev = ns 1181 } 1182 print(" ms clock, ") 1183 for i, ns := range []int64{ 1184 int64(work.stwprocs) * (work.tMark - work.tSweepTerm), 1185 gcController.assistTime.Load(), 1186 gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(), 1187 gcController.idleMarkTime.Load(), 1188 int64(work.stwprocs) * (work.tEnd - work.tMarkTerm), 1189 } { 1190 if i == 2 || i == 3 { 1191 // Separate mark time components with /. 1192 print("/") 1193 } else if i != 0 { 1194 print("+") 1195 } 1196 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1197 } 1198 print(" ms cpu, ", 1199 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1200 gcController.lastHeapGoal>>20, " MB goal, ", 1201 gcController.lastStackScan.Load()>>20, " MB stacks, ", 1202 gcController.globalsScan.Load()>>20, " MB globals, ", 1203 work.maxprocs, " P") 1204 if work.userForced { 1205 print(" (forced)") 1206 } 1207 print("\n") 1208 printunlock() 1209 } 1210 1211 // Set any arena chunks that were deferred to fault. 1212 lock(&userArenaState.lock) 1213 faultList := userArenaState.fault 1214 userArenaState.fault = nil 1215 unlock(&userArenaState.lock) 1216 for _, lc := range faultList { 1217 lc.mspan.setUserArenaChunkToFault() 1218 } 1219 1220 // Enable huge pages on some metadata if we cross a heap threshold. 1221 if gcController.heapGoal() > minHeapForMetadataHugePages { 1222 systemstack(func() { 1223 mheap_.enableMetadataHugePages() 1224 }) 1225 } 1226 1227 semrelease(&worldsema) 1228 semrelease(&gcsema) 1229 // Careful: another GC cycle may start now. 1230 1231 releasem(mp) 1232 mp = nil 1233 1234 // now that gc is done, kick off finalizer thread if needed 1235 if !concurrentSweep { 1236 // give the queued finalizers, if any, a chance to run 1237 Gosched() 1238 } 1239 } 1240 1241 // gcBgMarkStartWorkers prepares background mark worker goroutines. These 1242 // goroutines will not run until the mark phase, but they must be started while 1243 // the work is not stopped and from a regular G stack. The caller must hold 1244 // worldsema. 1245 func gcBgMarkStartWorkers() { 1246 // Background marking is performed by per-P G's. Ensure that each P has 1247 // a background GC G. 1248 // 1249 // Worker Gs don't exit if gomaxprocs is reduced. If it is raised 1250 // again, we can reuse the old workers; no need to create new workers. 1251 if gcBgMarkWorkerCount >= gomaxprocs { 1252 return 1253 } 1254 1255 // Increment mp.locks when allocating. We are called within gcStart, 1256 // and thus must not trigger another gcStart via an allocation. gcStart 1257 // bails when allocating with locks held, so simulate that for these 1258 // allocations. 1259 // 1260 // TODO(prattmic): cleanup gcStart to use a more explicit "in gcStart" 1261 // check for bailing. 1262 mp := acquirem() 1263 ready := make(chan struct{}, 1) 1264 releasem(mp) 1265 1266 for gcBgMarkWorkerCount < gomaxprocs { 1267 mp := acquirem() // See above, we allocate a closure here. 1268 go gcBgMarkWorker(ready) 1269 releasem(mp) 1270 1271 // N.B. we intentionally wait on each goroutine individually 1272 // rather than starting all in a batch and then waiting once 1273 // afterwards. By running one goroutine at a time, we can take 1274 // advantage of runnext to bounce back and forth between 1275 // workers and this goroutine. In an overloaded application, 1276 // this can reduce GC start latency by prioritizing these 1277 // goroutines rather than waiting on the end of the run queue. 1278 <-ready 1279 // The worker is now guaranteed to be added to the pool before 1280 // its P's next findRunnableGCWorker. 1281 1282 gcBgMarkWorkerCount++ 1283 } 1284 } 1285 1286 // gcBgMarkPrepare sets up state for background marking. 1287 // Mutator assists must not yet be enabled. 1288 func gcBgMarkPrepare() { 1289 // Background marking will stop when the work queues are empty 1290 // and there are no more workers (note that, since this is 1291 // concurrent, this may be a transient state, but mark 1292 // termination will clean it up). Between background workers 1293 // and assists, we don't really know how many workers there 1294 // will be, so we pretend to have an arbitrarily large number 1295 // of workers, almost all of which are "waiting". While a 1296 // worker is working it decrements nwait. If nproc == nwait, 1297 // there are no workers. 1298 work.nproc = ^uint32(0) 1299 work.nwait = ^uint32(0) 1300 } 1301 1302 // gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single 1303 // gcBgMarkWorker goroutine. 1304 type gcBgMarkWorkerNode struct { 1305 // Unused workers are managed in a lock-free stack. This field must be first. 1306 node lfnode 1307 1308 // The g of this worker. 1309 gp guintptr 1310 1311 // Release this m on park. This is used to communicate with the unlock 1312 // function, which cannot access the G's stack. It is unused outside of 1313 // gcBgMarkWorker(). 1314 m muintptr 1315 } 1316 1317 func gcBgMarkWorker(ready chan struct{}) { 1318 gp := getg() 1319 1320 // We pass node to a gopark unlock function, so it can't be on 1321 // the stack (see gopark). Prevent deadlock from recursively 1322 // starting GC by disabling preemption. 1323 gp.m.preemptoff = "GC worker init" 1324 node := new(gcBgMarkWorkerNode) 1325 gp.m.preemptoff = "" 1326 1327 node.gp.set(gp) 1328 1329 node.m.set(acquirem()) 1330 1331 ready <- struct{}{} 1332 // After this point, the background mark worker is generally scheduled 1333 // cooperatively by gcController.findRunnableGCWorker. While performing 1334 // work on the P, preemption is disabled because we are working on 1335 // P-local work buffers. When the preempt flag is set, this puts itself 1336 // into _Gwaiting to be woken up by gcController.findRunnableGCWorker 1337 // at the appropriate time. 1338 // 1339 // When preemption is enabled (e.g., while in gcMarkDone), this worker 1340 // may be preempted and schedule as a _Grunnable G from a runq. That is 1341 // fine; it will eventually gopark again for further scheduling via 1342 // findRunnableGCWorker. 1343 // 1344 // Since we disable preemption before notifying ready, we guarantee that 1345 // this G will be in the worker pool for the next findRunnableGCWorker. 1346 // This isn't strictly necessary, but it reduces latency between 1347 // _GCmark starting and the workers starting. 1348 1349 for { 1350 // Go to sleep until woken by 1351 // gcController.findRunnableGCWorker. 1352 gopark(func(g *g, nodep unsafe.Pointer) bool { 1353 node := (*gcBgMarkWorkerNode)(nodep) 1354 1355 if mp := node.m.ptr(); mp != nil { 1356 // The worker G is no longer running; release 1357 // the M. 1358 // 1359 // N.B. it is _safe_ to release the M as soon 1360 // as we are no longer performing P-local mark 1361 // work. 1362 // 1363 // However, since we cooperatively stop work 1364 // when gp.preempt is set, if we releasem in 1365 // the loop then the following call to gopark 1366 // would immediately preempt the G. This is 1367 // also safe, but inefficient: the G must 1368 // schedule again only to enter gopark and park 1369 // again. Thus, we defer the release until 1370 // after parking the G. 1371 releasem(mp) 1372 } 1373 1374 // Release this G to the pool. 1375 gcBgMarkWorkerPool.push(&node.node) 1376 // Note that at this point, the G may immediately be 1377 // rescheduled and may be running. 1378 return true 1379 }, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceBlockSystemGoroutine, 0) 1380 1381 // Preemption must not occur here, or another G might see 1382 // p.gcMarkWorkerMode. 1383 1384 // Disable preemption so we can use the gcw. If the 1385 // scheduler wants to preempt us, we'll stop draining, 1386 // dispose the gcw, and then preempt. 1387 node.m.set(acquirem()) 1388 pp := gp.m.p.ptr() // P can't change with preemption disabled. 1389 1390 if gcBlackenEnabled == 0 { 1391 println("worker mode", pp.gcMarkWorkerMode) 1392 throw("gcBgMarkWorker: blackening not enabled") 1393 } 1394 1395 if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker { 1396 throw("gcBgMarkWorker: mode not set") 1397 } 1398 1399 startTime := nanotime() 1400 pp.gcMarkWorkerStartTime = startTime 1401 var trackLimiterEvent bool 1402 if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode { 1403 trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime) 1404 } 1405 1406 decnwait := atomic.Xadd(&work.nwait, -1) 1407 if decnwait == work.nproc { 1408 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1409 throw("work.nwait was > work.nproc") 1410 } 1411 1412 systemstack(func() { 1413 // Mark our goroutine preemptible so its stack 1414 // can be scanned. This lets two mark workers 1415 // scan each other (otherwise, they would 1416 // deadlock). We must not modify anything on 1417 // the G stack. However, stack shrinking is 1418 // disabled for mark workers, so it is safe to 1419 // read from the G stack. 1420 // 1421 // N.B. The execution tracer is not aware of this status 1422 // transition and handles it specially based on the 1423 // wait reason. 1424 casGToWaitingForGC(gp, _Grunning, waitReasonGCWorkerActive) 1425 switch pp.gcMarkWorkerMode { 1426 default: 1427 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1428 case gcMarkWorkerDedicatedMode: 1429 gcDrainMarkWorkerDedicated(&pp.gcw, true) 1430 if gp.preempt { 1431 // We were preempted. This is 1432 // a useful signal to kick 1433 // everything out of the run 1434 // queue so it can run 1435 // somewhere else. 1436 if drainQ, n := runqdrain(pp); n > 0 { 1437 lock(&sched.lock) 1438 globrunqputbatch(&drainQ, int32(n)) 1439 unlock(&sched.lock) 1440 } 1441 } 1442 // Go back to draining, this time 1443 // without preemption. 1444 gcDrainMarkWorkerDedicated(&pp.gcw, false) 1445 case gcMarkWorkerFractionalMode: 1446 gcDrainMarkWorkerFractional(&pp.gcw) 1447 case gcMarkWorkerIdleMode: 1448 gcDrainMarkWorkerIdle(&pp.gcw) 1449 } 1450 casgstatus(gp, _Gwaiting, _Grunning) 1451 }) 1452 1453 // Account for time and mark us as stopped. 1454 now := nanotime() 1455 duration := now - startTime 1456 gcController.markWorkerStop(pp.gcMarkWorkerMode, duration) 1457 if trackLimiterEvent { 1458 pp.limiterEvent.stop(limiterEventIdleMarkWork, now) 1459 } 1460 if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode { 1461 atomic.Xaddint64(&pp.gcFractionalMarkTime, duration) 1462 } 1463 1464 // Was this the last worker and did we run out 1465 // of work? 1466 incnwait := atomic.Xadd(&work.nwait, +1) 1467 if incnwait > work.nproc { 1468 println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode, 1469 "work.nwait=", incnwait, "work.nproc=", work.nproc) 1470 throw("work.nwait > work.nproc") 1471 } 1472 1473 // We'll releasem after this point and thus this P may run 1474 // something else. We must clear the worker mode to avoid 1475 // attributing the mode to a different (non-worker) G in 1476 // traceGoStart. 1477 pp.gcMarkWorkerMode = gcMarkWorkerNotWorker 1478 1479 // If this worker reached a background mark completion 1480 // point, signal the main GC goroutine. 1481 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 1482 // We don't need the P-local buffers here, allow 1483 // preemption because we may schedule like a regular 1484 // goroutine in gcMarkDone (block on locks, etc). 1485 releasem(node.m.ptr()) 1486 node.m.set(nil) 1487 1488 gcMarkDone() 1489 } 1490 } 1491 } 1492 1493 // gcMarkWorkAvailable reports whether executing a mark worker 1494 // on p is potentially useful. p may be nil, in which case it only 1495 // checks the global sources of work. 1496 func gcMarkWorkAvailable(p *p) bool { 1497 if p != nil && !p.gcw.empty() { 1498 return true 1499 } 1500 if !work.full.empty() { 1501 return true // global work available 1502 } 1503 if work.markrootNext < work.markrootJobs { 1504 return true // root scan work available 1505 } 1506 return false 1507 } 1508 1509 // gcMark runs the mark (or, for concurrent GC, mark termination) 1510 // All gcWork caches must be empty. 1511 // STW is in effect at this point. 1512 func gcMark(startTime int64) { 1513 if debug.allocfreetrace > 0 { 1514 tracegc() 1515 } 1516 1517 if gcphase != _GCmarktermination { 1518 throw("in gcMark expecting to see gcphase as _GCmarktermination") 1519 } 1520 work.tstart = startTime 1521 1522 // Check that there's no marking work remaining. 1523 if work.full != 0 || work.markrootNext < work.markrootJobs { 1524 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 1525 panic("non-empty mark queue after concurrent mark") 1526 } 1527 1528 if debug.gccheckmark > 0 { 1529 // This is expensive when there's a large number of 1530 // Gs, so only do it if checkmark is also enabled. 1531 gcMarkRootCheck() 1532 } 1533 1534 // Drop allg snapshot. allgs may have grown, in which case 1535 // this is the only reference to the old backing store and 1536 // there's no need to keep it around. 1537 work.stackRoots = nil 1538 1539 // Clear out buffers and double-check that all gcWork caches 1540 // are empty. This should be ensured by gcMarkDone before we 1541 // enter mark termination. 1542 // 1543 // TODO: We could clear out buffers just before mark if this 1544 // has a non-negligible impact on STW time. 1545 for _, p := range allp { 1546 // The write barrier may have buffered pointers since 1547 // the gcMarkDone barrier. However, since the barrier 1548 // ensured all reachable objects were marked, all of 1549 // these must be pointers to black objects. Hence we 1550 // can just discard the write barrier buffer. 1551 if debug.gccheckmark > 0 { 1552 // For debugging, flush the buffer and make 1553 // sure it really was all marked. 1554 wbBufFlush1(p) 1555 } else { 1556 p.wbBuf.reset() 1557 } 1558 1559 gcw := &p.gcw 1560 if !gcw.empty() { 1561 printlock() 1562 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1563 if gcw.wbuf1 == nil { 1564 print(" wbuf1=<nil>") 1565 } else { 1566 print(" wbuf1.n=", gcw.wbuf1.nobj) 1567 } 1568 if gcw.wbuf2 == nil { 1569 print(" wbuf2=<nil>") 1570 } else { 1571 print(" wbuf2.n=", gcw.wbuf2.nobj) 1572 } 1573 print("\n") 1574 throw("P has cached GC work at end of mark termination") 1575 } 1576 // There may still be cached empty buffers, which we 1577 // need to flush since we're going to free them. Also, 1578 // there may be non-zero stats because we allocated 1579 // black after the gcMarkDone barrier. 1580 gcw.dispose() 1581 } 1582 1583 // Flush scanAlloc from each mcache since we're about to modify 1584 // heapScan directly. If we were to flush this later, then scanAlloc 1585 // might have incorrect information. 1586 // 1587 // Note that it's not important to retain this information; we know 1588 // exactly what heapScan is at this point via scanWork. 1589 for _, p := range allp { 1590 c := p.mcache 1591 if c == nil { 1592 continue 1593 } 1594 c.scanAlloc = 0 1595 } 1596 1597 // Reset controller state. 1598 gcController.resetLive(work.bytesMarked) 1599 } 1600 1601 // gcSweep must be called on the system stack because it acquires the heap 1602 // lock. See mheap for details. 1603 // 1604 // Returns true if the heap was fully swept by this function. 1605 // 1606 // The world must be stopped. 1607 // 1608 //go:systemstack 1609 func gcSweep(mode gcMode) bool { 1610 assertWorldStopped() 1611 1612 if gcphase != _GCoff { 1613 throw("gcSweep being done but phase is not GCoff") 1614 } 1615 1616 lock(&mheap_.lock) 1617 mheap_.sweepgen += 2 1618 sweep.active.reset() 1619 mheap_.pagesSwept.Store(0) 1620 mheap_.sweepArenas = mheap_.allArenas 1621 mheap_.reclaimIndex.Store(0) 1622 mheap_.reclaimCredit.Store(0) 1623 unlock(&mheap_.lock) 1624 1625 sweep.centralIndex.clear() 1626 1627 if !concurrentSweep || mode == gcForceBlockMode { 1628 // Special case synchronous sweep. 1629 // Record that no proportional sweeping has to happen. 1630 lock(&mheap_.lock) 1631 mheap_.sweepPagesPerByte = 0 1632 unlock(&mheap_.lock) 1633 // Flush all mcaches. 1634 for _, pp := range allp { 1635 pp.mcache.prepareForSweep() 1636 } 1637 // Sweep all spans eagerly. 1638 for sweepone() != ^uintptr(0) { 1639 } 1640 // Free workbufs eagerly. 1641 prepareFreeWorkbufs() 1642 for freeSomeWbufs(false) { 1643 } 1644 // All "free" events for this mark/sweep cycle have 1645 // now happened, so we can make this profile cycle 1646 // available immediately. 1647 mProf_NextCycle() 1648 mProf_Flush() 1649 return true 1650 } 1651 1652 // Background sweep. 1653 lock(&sweep.lock) 1654 if sweep.parked { 1655 sweep.parked = false 1656 ready(sweep.g, 0, true) 1657 } 1658 unlock(&sweep.lock) 1659 return false 1660 } 1661 1662 // gcResetMarkState resets global state prior to marking (concurrent 1663 // or STW) and resets the stack scan state of all Gs. 1664 // 1665 // This is safe to do without the world stopped because any Gs created 1666 // during or after this will start out in the reset state. 1667 // 1668 // gcResetMarkState must be called on the system stack because it acquires 1669 // the heap lock. See mheap for details. 1670 // 1671 //go:systemstack 1672 func gcResetMarkState() { 1673 // This may be called during a concurrent phase, so lock to make sure 1674 // allgs doesn't change. 1675 forEachG(func(gp *g) { 1676 gp.gcscandone = false // set to true in gcphasework 1677 gp.gcAssistBytes = 0 1678 }) 1679 1680 // Clear page marks. This is just 1MB per 64GB of heap, so the 1681 // time here is pretty trivial. 1682 lock(&mheap_.lock) 1683 arenas := mheap_.allArenas 1684 unlock(&mheap_.lock) 1685 for _, ai := range arenas { 1686 ha := mheap_.arenas[ai.l1()][ai.l2()] 1687 clear(ha.pageMarks[:]) 1688 } 1689 1690 work.bytesMarked = 0 1691 work.initialHeapLive = gcController.heapLive.Load() 1692 } 1693 1694 // Hooks for other packages 1695 1696 var poolcleanup func() 1697 var boringCaches []unsafe.Pointer // for crypto/internal/boring 1698 1699 //go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 1700 func sync_runtime_registerPoolCleanup(f func()) { 1701 poolcleanup = f 1702 } 1703 1704 //go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache 1705 func boring_registerCache(p unsafe.Pointer) { 1706 boringCaches = append(boringCaches, p) 1707 } 1708 1709 func clearpools() { 1710 // clear sync.Pools 1711 if poolcleanup != nil { 1712 poolcleanup() 1713 } 1714 1715 // clear boringcrypto caches 1716 for _, p := range boringCaches { 1717 atomicstorep(p, nil) 1718 } 1719 1720 // Clear central sudog cache. 1721 // Leave per-P caches alone, they have strictly bounded size. 1722 // Disconnect cached list before dropping it on the floor, 1723 // so that a dangling ref to one entry does not pin all of them. 1724 lock(&sched.sudoglock) 1725 var sg, sgnext *sudog 1726 for sg = sched.sudogcache; sg != nil; sg = sgnext { 1727 sgnext = sg.next 1728 sg.next = nil 1729 } 1730 sched.sudogcache = nil 1731 unlock(&sched.sudoglock) 1732 1733 // Clear central defer pool. 1734 // Leave per-P pools alone, they have strictly bounded size. 1735 lock(&sched.deferlock) 1736 // disconnect cached list before dropping it on the floor, 1737 // so that a dangling ref to one entry does not pin all of them. 1738 var d, dlink *_defer 1739 for d = sched.deferpool; d != nil; d = dlink { 1740 dlink = d.link 1741 d.link = nil 1742 } 1743 sched.deferpool = nil 1744 unlock(&sched.deferlock) 1745 } 1746 1747 // Timing 1748 1749 // itoaDiv formats val/(10**dec) into buf. 1750 func itoaDiv(buf []byte, val uint64, dec int) []byte { 1751 i := len(buf) - 1 1752 idec := i - dec 1753 for val >= 10 || i >= idec { 1754 buf[i] = byte(val%10 + '0') 1755 i-- 1756 if i == idec { 1757 buf[i] = '.' 1758 i-- 1759 } 1760 val /= 10 1761 } 1762 buf[i] = byte(val + '0') 1763 return buf[i:] 1764 } 1765 1766 // fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 1767 func fmtNSAsMS(buf []byte, ns uint64) []byte { 1768 if ns >= 10e6 { 1769 // Format as whole milliseconds. 1770 return itoaDiv(buf, ns/1e6, 0) 1771 } 1772 // Format two digits of precision, with at most three decimal places. 1773 x := ns / 1e3 1774 if x == 0 { 1775 buf[0] = '0' 1776 return buf[:1] 1777 } 1778 dec := 3 1779 for x >= 100 { 1780 x /= 10 1781 dec-- 1782 } 1783 return itoaDiv(buf, x, dec) 1784 } 1785 1786 // Helpers for testing GC. 1787 1788 // gcTestMoveStackOnNextCall causes the stack to be moved on a call 1789 // immediately following the call to this. It may not work correctly 1790 // if any other work appears after this call (such as returning). 1791 // Typically the following call should be marked go:noinline so it 1792 // performs a stack check. 1793 // 1794 // In rare cases this may not cause the stack to move, specifically if 1795 // there's a preemption between this call and the next. 1796 func gcTestMoveStackOnNextCall() { 1797 gp := getg() 1798 gp.stackguard0 = stackForceMove 1799 } 1800 1801 // gcTestIsReachable performs a GC and returns a bit set where bit i 1802 // is set if ptrs[i] is reachable. 1803 func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) { 1804 // This takes the pointers as unsafe.Pointers in order to keep 1805 // them live long enough for us to attach specials. After 1806 // that, we drop our references to them. 1807 1808 if len(ptrs) > 64 { 1809 panic("too many pointers for uint64 mask") 1810 } 1811 1812 // Block GC while we attach specials and drop our references 1813 // to ptrs. Otherwise, if a GC is in progress, it could mark 1814 // them reachable via this function before we have a chance to 1815 // drop them. 1816 semacquire(&gcsema) 1817 1818 // Create reachability specials for ptrs. 1819 specials := make([]*specialReachable, len(ptrs)) 1820 for i, p := range ptrs { 1821 lock(&mheap_.speciallock) 1822 s := (*specialReachable)(mheap_.specialReachableAlloc.alloc()) 1823 unlock(&mheap_.speciallock) 1824 s.special.kind = _KindSpecialReachable 1825 if !addspecial(p, &s.special) { 1826 throw("already have a reachable special (duplicate pointer?)") 1827 } 1828 specials[i] = s 1829 // Make sure we don't retain ptrs. 1830 ptrs[i] = nil 1831 } 1832 1833 semrelease(&gcsema) 1834 1835 // Force a full GC and sweep. 1836 GC() 1837 1838 // Process specials. 1839 for i, s := range specials { 1840 if !s.done { 1841 printlock() 1842 println("runtime: object", i, "was not swept") 1843 throw("IsReachable failed") 1844 } 1845 if s.reachable { 1846 mask |= 1 << i 1847 } 1848 lock(&mheap_.speciallock) 1849 mheap_.specialReachableAlloc.free(unsafe.Pointer(s)) 1850 unlock(&mheap_.speciallock) 1851 } 1852 1853 return mask 1854 } 1855 1856 // gcTestPointerClass returns the category of what p points to, one of: 1857 // "heap", "stack", "data", "bss", "other". This is useful for checking 1858 // that a test is doing what it's intended to do. 1859 // 1860 // This is nosplit simply to avoid extra pointer shuffling that may 1861 // complicate a test. 1862 // 1863 //go:nosplit 1864 func gcTestPointerClass(p unsafe.Pointer) string { 1865 p2 := uintptr(noescape(p)) 1866 gp := getg() 1867 if gp.stack.lo <= p2 && p2 < gp.stack.hi { 1868 return "stack" 1869 } 1870 if base, _, _ := findObject(p2, 0, 0); base != 0 { 1871 return "heap" 1872 } 1873 for _, datap := range activeModules() { 1874 if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata { 1875 return "data" 1876 } 1877 if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss { 1878 return "bss" 1879 } 1880 } 1881 KeepAlive(p) 1882 return "other" 1883 } 1884