GoKer

The following examples are obtained from the publication “GoBench: A Benchmark Suite of Real-World Go Concurrency Bugs” (doi:10.1109/CGO51591.2021.9370317).

Authors Ting Yuan (yuanting@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Guangwei Li (liguangwei@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Jie Lu† (lujie@ict.ac.an): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences; Chen Liu (liuchen17z@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China Lian Li (lianli@ict.ac.cn): State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China; Jingling Xue (jingling@cse.unsw.edu.au): University of New South Wales, School of Computer Science and Engineering, Sydney, Australia

White paper: https://lujie.ac.cn/files/papers/GoBench.pdf

The examples have been modified in order to run the goroutine leak profiler. Buggy snippets are moved from within a unit test to separate applications. Each is then independently executed, possibly as multiple copies within the same application in order to exercise more interleavings. Concurrently, the main program sets up a waiting period (typically 1ms), followed by a goroutine leak profile request. Other modifications may involve injecting calls to runtime.Gosched(), to more reliably exercise buggy interleavings, or reductions in waiting periods when calling time.Sleep, in order to reduce overall testing time.

The resulting goroutine leak profile is analyzed to ensure that no unexpecte leaks occurred, and that the expected leaks did occur. If the leak is flaky, the only purpose of the expected leak list is to protect against unexpected leaks.

The examples have also been modified to remove data races, since those create flaky test failures, when really all we care about are leaked goroutines.

The entries below document each of the corresponding leaks.

Cockroach/10214

Bug ID Ref Patch Type Sub-type
cockroach#10214 pull request patch Resource AB-BA leak

Description

This goroutine leak is caused by different order when acquiring coalescedMu.Lock() and raftMu.Lock(). The fix is to refactor sendQueuedHeartbeats() so that cockroachdb can unlock coalescedMu before locking raftMu.

Example execution

G1                                      G2
------------------------------------------------------------------------------------
s.sendQueuedHeartbeats()                .
s.coalescedMu.Lock() [L1]               .
s.sendQueuedHeartbeatsToNode()          .
s.mu.replicas[0].reportUnreachable()    .
s.mu.replicas[0].raftMu.Lock() [L2]     .
.                                       s.mu.replicas[0].tick()
.                                       s.mu.replicas[0].raftMu.Lock() [L2]
.                                       s.mu.replicas[0].tickRaftMuLocked()
.                                       s.mu.replicas[0].mu.Lock() [L3]
.                                       s.mu.replicas[0].maybeQuiesceLocked()
.                                       s.mu.replicas[0].maybeCoalesceHeartbeat()
.                                       s.coalescedMu.Lock() [L1]
--------------------------------G1,G2 leak------------------------------------------

Cockroach/1055

Bug ID Ref Patch Type Sub-type
cockroach#1055 pull request patch Mixed Channel & WaitGroup

Description

  1. Stop() is called and blocked at s.stop.Wait() after acquiring the lock.
  2. StartTask() is called and attempts to acquire the lock. It is then blocked.
  3. Stop() never finishes since the task doesn’t call SetStopped.

Example execution

G1                      G2.0                       G2.1                       G2.2                       G3
-------------------------------------------------------------------------------------------------------------------------------
s[0].stop.Add(1) [1]
go func() [G2.0]
s[1].stop.Add(1) [1]    .
go func() [G2.1]        .
s[2].stop.Add(1) [1]    .                          .
go func() [G2.2]        .                          .
go func() [G3]          .                          .                          .
<-done                  .                          .                          .                          .
.                       s[0].StartTask()           .                          .                          .
.                       s[0].draining == 0         .                          .                          .
.                       .                          s[1].StartTask()           .                          .
.                       .                          s[1].draining == 0         .                          .
.                       .                          .                          s[2].StartTask()           .
.                       .                          .                          s[2].draining == 0         .
.                       .                          .                          .                          s[0].Quiesce()
.                       .                          .                          .                          s[0].mu.Lock() [L1[0]]
.                       s[0].mu.Lock() [L1[0]]     .                          .                          .
.                       s[0].drain.Add(1) [1]      .                          .                          .
.                       s[0].mu.Unlock() [L1[0]]   .                          .                          .
.                       <-s[0].ShouldStop()        .                          .                          .
.                       .                          .                          .                          s[0].draining = 1
.                       .                          .                          .                          s[0].drain.Wait()
.                       .                          s[0].mu.Lock() [L1[1]]     .                          .
.                       .                          s[1].drain.Add(1) [1]      .                          .
.                       .                          s[1].mu.Unlock() [L1[1]]   .                          .
.                       .                          <-s[1].ShouldStop()        .                          .
.                       .                          .                          s[2].mu.Lock() [L1[2]]     .
.                       .                          .                          s[2].drain.Add() [1]       .
.                       .                          .                          s[2].mu.Unlock() [L1[2]]   .
.                       .                          .                          <-s[2].ShouldStop()        .
----------------------------------------------------G1, G2.[0..2], G3 leak-----------------------------------------------------

Cockroach/10790

Bug ID Ref Patch Type Sub-type
cockroach#10790 pull request patch Communication Channel & Context

Description

It is possible that a message from ctxDone will make beginCmds return without draining the channel ch, so that anonymous function goroutines will leak.

Example execution

G1                  G2              helper goroutine
-----------------------------------------------------
.                   .               r.sendChans()
r.beginCmds()       .               .
.                   .               ch1 <- true
<- ch1              .               .
.                   .               ch2 <- true
...
.                   cancel()
<- ch1
------------------G1 leak----------------------------

Cockroach/13197

Bug ID Ref Patch Type Sub-type
cockroach#13197 pull request patch Communication Channel & Context

Description

One goroutine executing (*Tx).awaitDone() blocks and waiting for a signal context.Done().

Example execution

G1              G2
-------------------------------
begin()
.               awaitDone()
return          .
.               <-tx.ctx.Done()
-----------G2 leaks------------

Cockroach/13755

Bug ID Ref Patch Type Sub-type
cockroach#13755 pull request patch Communication Channel & Context

Description

The buggy code does not close the db query result (rows), so that one goroutine running (*Rows).awaitDone is blocked forever. The blocking goroutine is waiting for cancel signal from context.

Example execution

G1                      G2
---------------------------------------
initContextClose()
.                       awaitDone()
return                  .
.                       <-tx.ctx.Done()
---------------G2 leaks----------------

Cockroach/1462

Bug ID Ref Patch Type Sub-type
cockroach#1462 pull request patch Mixed Channel & WaitGroup

Description

Executing <-stopper.ShouldStop() in processEventsUntil may cause goroutines created by lt.RunWorker in lt.start to be stuck sending a message over lt.Events. The main thread is then stuck at s.stop.Wait(), since the sender goroutines cannot call s.stop.Done().

Example execution

G1                                  G2                                G3
-------------------------------------------------------------------------------------------------------
NewLocalInterceptableTransport()
lt.start()
lt.stopper.RunWorker()
s.AddWorker()
s.stop.Add(1) [1]
go func() [G2]
stopper.RunWorker()                 .
s.AddWorker()                       .
s.stop.Add(1) [2]                   .
go func() [G3]                      .
s.Stop()                            .                                 .
s.Quiesce()                         .                                 .
.                                   select [default]                  .
.                                   lt.Events <- interceptMessage(0)  .
close(s.stopper)                    .                                 .
.                                   .                                 select [<-stopper.ShouldStop()]
.                                   .                                 &lt;&lt;<done>>>
s.stop.Wait()                       .
----------------------------------------------G1,G2 leak-----------------------------------------------

Cockroach/16167

Bug ID Ref Patch Type Sub-type
cockroach#16167 pull request patch Resource Double Locking

Description

This is another example of goroutine leaks caused by recursively acquiring RWLock. There are two lock variables (systemConfigCond and systemConfigMu) which refer to the same underlying lock. The leak invovlves two goroutines. The first acquires systemConfigMu.Lock(), then tries to acquire systemConfigMu.RLock(). The second acquires systemConfigMu.Lock(). If the second goroutine interleaves in between the two lock operations of the first goroutine, both goroutines will leak.

Example execution

G1                                G2
---------------------------------------------------------------
.                                 e.Start()
.                                 e.updateSystemConfig()
e.execParsed()                    .
e.systemConfigCond.L.Lock() [L1]  .
.                                 e.systemConfigMu.Lock() [L1]
e.systemConfigMu.RLock() [L1]     .
------------------------G1,G2 leak-----------------------------

Cockroach/18101

Bug ID Ref Patch Type Sub-type
cockroach#18101 pull request patch Resource Double Locking

Description

The context.Done() signal short-circuits the reader goroutine, but not the senders, leading them to leak.

Example execution

G1                      G2                   helper goroutine
--------------------------------------------------------------
restore()
.                       splitAndScatter()
&lt;-readyForImportCh      .
&lt;-readyForImportCh &lt;==> readyForImportCh&lt;-
...
.                       .                    cancel()
&lt;<done>>                .                    &lt;<done>>
                        readyForImportCh&lt;-
-----------------------G2 leaks--------------------------------

Cockroach/2448

Bug ID Ref Patch Type Sub-type
cockroach#2448 pull request patch Communication Channel

Description

This bug is caused by two goroutines waiting for each other to unblock their channels:

  1. MultiRaft sends the commit event for the Membership change
  2. store.processRaft takes it and begins processing
  3. another command commits and triggers another sendEvent, but this blocks since store.processRaft isn’t ready for another select. Consequently the main MultiRaft loop is waiting for that as well.
  4. the Membership change was applied to the range, and the store now tries to execute the callback
  5. the callback tries to write to callbackChan, but that is consumed by the MultiRaft loop, which is currently waiting for store.processRaft to consume from the events channel, which it will only do after the callback has completed.

Example execution

G1                          G2
--------------------------------------------------------------------------
s.processRaft()             st.start()
select                      .
.                           select [default]
.                           s.handleWriteResponse()
.                           s.sendEvent()
.                           select
&lt;-s.multiraft.Events &lt;----> m.Events &lt;- event
.                           select [default]
.                           s.handleWriteResponse()
.                           s.sendEvent()
.                           select [m.Events&lt;-, &lt;-s.stopper.ShouldStop()]
callback()                  .
select [
   m.callbackChan&lt;-,
   &lt;-s.stopper.ShouldStop()
]                           .
------------------------------G1,G2 leak----------------------------------

Cockroach/24808

Bug ID Ref Patch Type Sub-type
cockroach#24808 pull request patch Communication Channel

Description

When we Start the Compactor, it may already have received Suggestions, leaking the previously blocking write to a full channel.

Example execution

G1
------------------------------------------------
...
compactor.ch &lt;-
compactor.Start()
compactor.ch &lt;-
--------------------G1 leaks--------------------

Cockroach/25456

Bug ID Ref Patch Type Sub-type
cockroach#25456 pull request patch Communication Channel

Description

When CheckConsistency (in the complete code) returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. This check was incorrect and would block until the store actually started draining.

Example execution

G1
---------------------------------------
...
&lt;-repl.store.Stopper().ShouldQuiesce()
---------------G1 leaks----------------

Cockroach/35073

Bug ID Ref Patch Type Sub-type
cockroach#35073 pull request patch Communication Channel

Description

Previously, the outbox could fail during startup without closing its RowChannel. This could lead to goroutine leaks in rare cases due to channel communication mismatch.

Example execution

Cockroach/35931

Bug ID Ref Patch Type Sub-type
cockroach#35931 pull request patch Communication Channel

Description

Previously, if a processor that reads from multiple inputs was waiting on one input to provide more data, and the other input was full, and both inputs were connected to inbound streams, it was possible to cause goroutine leaks during flow cancellation when trying to propagate the cancellation metadata messages into the flow. The cancellation method wrote metadata messages to each inbound stream one at a time, so if the first one was full, the canceller would block and never send a cancellation message to the second stream, which was the one actually being read from.

Example execution

Cockroach/3710

Bug ID Ref Patch Type Sub-type
cockroach#3710 pull request patch Resource RWR Deadlock

Description

The goroutine leak is caused by acquiring a RLock twice in a call chain. ForceRaftLogScanAndProcess(acquire s.mu.RLock()) -> MaybeAdd() -> shouldQueue() -> getTruncatableIndexes() ->RaftStatus(acquire s.mu.Rlock())

Example execution

G1                                      G2
------------------------------------------------------------
store.ForceRaftLogScanAndProcess()
s.mu.RLock()
s.raftLogQueue.MaybeAdd()
bq.impl.shouldQueue()
getTruncatableIndexes()
r.store.RaftStatus()
.                                        store.processRaft()
.                                        s.mu.Lock()
s.mu.RLock()
----------------------G1,G2 leak-----------------------------

Cockroach/584

Bug ID Ref Patch Type Sub-type
cockroach#584 pull request patch Resource Double Locking

Description

Missing call to mu.Unlock() before the break in the loop.

Example execution

G1
---------------------------
g.bootstrap()
g.mu.Lock() [L1]
if g.closed { ==> break
g.manage()
g.mu.Lock() [L1]
----------G1 leaks---------

Cockroach/6181

Bug ID Ref Patch Type Sub-type
cockroach#6181 pull request patch Resource RWR Deadlock

Description

The same RWMutex may be recursively acquired for both reading and writing.

Example execution

G1                                G2                               G3                     ...
-----------------------------------------------------------------------------------------------
testRangeCacheCoalescedRquests()
initTestDescriptorDB()
pauseLookupResumeAndAssert()
return
.                                 doLookupWithToken()
.                                 .                                doLookupWithToken()
.                                 rc.LookupRangeDescriptor()       .
.                                 .                                rc.LookupRangeDescriptor()
.                                 rdc.rangeCacheMu.RLock()         .
.                                 rdc.String()                     .
.                                 .                                rdc.rangeCacheMu.RLock()
.                                 .                                fmt.Printf()
.                                 .                                rdc.rangeCacheMu.RUnlock()
.                                 .                                rdc.rangeCacheMu.Lock()
.                                 rdc.rangeCacheMu.RLock()         .
-----------------------------------G2,G3,... leak----------------------------------------------

Cockroach/7504

Bug ID Ref Patch Type Sub-type
cockroach#7504 pull request patch Resource AB-BA Deadlock

Description

The locks are acquired as leaseState and tableNameCache in Release(), but as tableNameCache and leaseState in AcquireByName, leading to an AB-BA deadlock.

Example execution

G1                        G2
-----------------------------------------------------
mgr.AcquireByName()       mgr.Release()
m.tableNames.get(id)      .
c.mu.Lock() [L2]          .
.                         t.release(lease)
.                         t.mu.Lock() [L3]
.                         s.mu.Lock() [L1]
lease.mu.Lock() [L1]      .
.                         t.removeLease(s)
.                         t.tableNameCache.remove()
.                         c.mu.Lock() [L2]
---------------------G1, G2 leak---------------------

Cockroach/9935

Bug ID Ref Patch Type Sub-type
cockroach#9935 pull request patch Resource Double Locking

Description

This bug is caused by acquiring l.mu.Lock() twice.

Example execution

Etcd/10492

Bug ID Ref Patch Type Sub-type
etcd#10492 pull request patch Resource Double locking

Description

A simple double locking case for lines 19, 31.

Etcd/5509

Bug ID Ref Patch Type Sub-type
etcd#5509 pull request patch Resource Double locking

Description

r.acquire() returns holding r.client.mu.RLock() on a failure path (line 42). This causes any call to client.Close() to leak goroutines.

Etcd/6708

Bug ID Ref Patch Type Sub-type
etcd#6708 pull request patch Resource Double locking

Description

Line 54, 49 double locking

Etcd/6857

Bug ID Ref Patch Type Sub-type
etcd#6857 pull request patch Communication Channel

Description

Choosing a different case in a select statement (n.stop) will lead to goroutine leaks when sending over n.status.

Example execution

G1              G2              G3
-------------------------------------------
n.run()         .               .
.               .               n.Stop()
.               .               n.stop&lt;-
&lt;-n.stop        .               .
.               .               &lt;-n.done
close(n.done)   .               .
return          .               .
.               .               return
.               n.Status()
.               n.status&lt;-
----------------G2 leaks-------------------

Etcd/6873

Bug ID Ref Patch Type Sub-type
etcd#6873 pull request patch Mixed Channel & Lock

Description

This goroutine leak involves a goroutine acquiring a lock and being blocked over a channel operation with no partner, while another tries to acquire the same lock.

Example execution

G1                      G2                  G3
--------------------------------------------------------------
newWatchBroadcasts()
wbs.update()
wbs.updatec &lt;-
return
.                       &lt;-wbs.updatec       .
.                       wbs.coalesce()      .
.                       .                   wbs.stop()
.                       .                   wbs.mu.Lock()
.                       .                   close(wbs.updatec)
.                       .                   &lt;-wbs.donec
.                       wbs.mu.Lock()       .
---------------------G2,G3 leak--------------------------------

Etcd/7492

Bug ID Ref Patch Type Sub-type
etcd#7492 pull request patch Mixed Channel & Lock

Description

This goroutine leak involves a goroutine acquiring a lock and being blocked over a channel operation with no partner, while another tries to acquire the same lock.

Example execution

G2                                    G1
---------------------------------------------------------------
.                                     stk.run()
ts.assignSimpleTokenToUser()          .
t.simpleTokensMu.Lock()               .
t.simpleTokenKeeper.addSimpleToken()  .
tm.addSimpleTokenCh &lt;- true           .
.                                     &lt;-tm.addSimpleTokenCh
t.simpleTokensMu.Unlock()             .
ts.assignSimpleTokenToUser()          .
...
t.simpleTokensMu.Lock()
.                                     &lt;-tokenTicker.C
tm.addSimpleTokenCh &lt;- true           .
.                                     tm.deleteTokenFunc()
.                                     t.simpleTokensMu.Lock()
---------------------------G1,G2 leak--------------------------

Etcd/7902

Bug ID Ref Patch Type Sub-type
etcd#7902 pull request patch Mixed Channel & Lock

Description

If the follower gooroutine acquires mu.Lock() first and calls rc.release(), it will be blocked sending over rcNextc. Only the leader can close(nextc) to unblock the follower. However, in order to invoke rc.release(), the leader needs to acquires mu.Lock(). The fix is to remove the lock and unlock around rc.release().

Example execution

G1                      G2 (leader)              G3 (follower)
---------------------------------------------------------------------
runElectionFunc()
doRounds()
wg.Wait()
.                       ...
.                       mu.Lock()
.                       rc.validate()
.                       rcNextc = nextc
.                       mu.Unlock()              ...
.                       .                        mu.Lock()
.                       .                        rc.validate()
.                       .                        mu.Unlock()
.                       .                        mu.Lock()
.                       .                        rc.release()
.                       .                        &lt;-rcNextc
.                       mu.Lock()
-------------------------G1,G2,G3 leak--------------------------

Grpc/1275

Bug ID Ref Patch Type Sub-type
grpc#1275 pull request patch Communication Channel

Description

Two goroutines are involved in this leak. The main goroutine is blocked at case &lt;- donec, and is waiting for the second goroutine to close the channel. The second goroutine is created by the main goroutine. It is blocked when calling stream.Read(), which invokes recvBufferRead.Read(). The second goroutine is blocked at case i := r.recv.get(), and it is waiting for someone to send a message to this channel. It is the client.CloseSream() method called by the main goroutine that should send the message, but it is not. The patch is to send out this message.

Example execution

G1                            G2
-----------------------------------------------------
testInflightStreamClosing()
.                             stream.Read()
.                             io.ReadFull()
.                             &lt;-r.recv.get()
CloseStream()
&lt;-donec
---------------------G1, G2 leak---------------------

Grpc/1424

Bug ID Ref Patch Type Sub-type
grpc#1424 pull request patch Communication Channel

Description

The goroutine running cc.lbWatcher returns without draining the done channel.

Example execution

G1                      G2                          G3
-----------------------------------------------------------------
DialContext()           .                           .
.                       cc.dopts.balancer.Notify()  .
.                       .                           cc.lbWatcher()
.                       &lt;-doneChan
close()
---------------------------G2 leaks-------------------------------

Grpc/1460

Bug ID Ref Patch Type Sub-type
grpc#1460 pull request patch Mixed Channel & Lock

Description

When gRPC keepalives are enabled (which isn’t the case by default at this time) and PermitWithoutStream is false (the default), the client can leak goroutines when transitioning between having no active stream and having one active stream.The keepalive() goroutine is stuck at “<-t.awakenKeepalive”, while the main goroutine is stuck in NewStream() on t.mu.Lock().

Example execution

G1                         G2
--------------------------------------------
client.keepalive()
.                       client.NewStream()
t.mu.Lock()
&lt;-t.awakenKeepalive
.                       t.mu.Lock()
---------------G1,G2 leak-------------------

Grpc/3017

Bug ID Ref Patch Type Sub-type
grpc#3017 pull request patch Resource Missing unlock

Description

Line 65 is an execution path with a missing unlock.

Example execution

G1                                  G2                                         G3
------------------------------------------------------------------------------------------------
NewSubConn([1])
ccc.mu.Lock() [L1]
sc = 1
ccc.subConnToAddr[1] = 1
go func() [G2]
&lt;-done                               .
.                                   ccc.RemoveSubConn(1)
.                                   ccc.mu.Lock()
.                                   addr = 1
.                                   entry = &subConnCacheEntry_grpc3017{}
.                                   cc.subConnCache[1] = entry
.                                   timer = time.AfterFunc() [G3]
.                                   entry.cancel = func()
.                                   sc = ccc.NewSubConn([1])
.                                   ccc.mu.Lock() [L1]
.                                   entry.cancel()
.                                   !timer.Stop() [true]
.                                   entry.abortDeleting = true
.                                   .                                          ccc.mu.Lock()
.                                   .                                          &lt;&lt;<done>>>
.                                   ccc.RemoveSubConn(1)
.                                   ccc.mu.Lock() [L1]
-------------------------------------------G1, G2 leak-----------------------------------------

Grpc/660

Bug ID Ref Patch Type Sub-type
grpc#660 pull request patch Communication Channel

Description

The parent function could return without draining the done channel.

Example execution

G1                         G2               helper goroutine
-------------------------------------------------------------
doCloseLoopUnary()
.                                       bc.stop &lt;- true
&lt;-bc.stop
return
.                       done &lt;-
----------------------G2 leak--------------------------------

Grpc/795

Bug ID Ref Patch Type Sub-type
grpc#795 pull request patch Resource Double locking

Description

Line 20 is an execution path with a missing unlock.

Grpc/862

Bug ID Ref Patch Type Sub-type
grpc#862 pull request patch Communication Channel & Context

Description

When return value conn is nil, cc(ClientConn) is not closed. The goroutine executing resetAddrConn is leaked. The patch is to close ClientConn in defer func().

Example execution

G1                  G2
---------------------------------------
DialContext()
.                   cc.resetAddrConn()
.                   resetTransport()
.                   &lt;-ac.ctx.Done()
--------------G2 leak------------------

Hugo/3251

Bug ID Ref Patch Type Sub-type
hugo#3251 pull request patch Resource RWR deadlock

Description

A goroutine can hold Lock() at line 20 then acquire RLock() at line 29. RLock() at line 29 will never be acquired because Lock() at line 20 will never be released.

Example execution

G1                        G2                             G3
------------------------------------------------------------------------------------------
wg.Add(1) [W1: 1]
go func() [G2]
go func() [G3]
.                        resGetRemote()
.                        remoteURLLock.URLLock(url)
.                        l.Lock() [L1]
.                        l.m[url] = &sync.Mutex{} [L2]
.                        l.m[url].Lock() [L2]
.                        l.Unlock() [L1]
.                        .                               resGetRemote()
.                        .                               remoteURLLock.URLLock(url)
.                        .                               l.Lock() [L1]
.                        .                               l.m[url].Lock() [L2]
.                        remoteURLLock.URLUnlock(url)
.                        l.RLock() [L1]
...
wg.Wait() [W1]
----------------------------------------G1,G2,G3 leak--------------------------------------

Hugo/5379

Bug ID Ref Patch Type Sub-type
hugo#5379 pull request patch Resource Double locking

Description

A goroutine first acquire contentInitMu at line 99 then acquire the same Mutex at line 66

Istio/16224

Bug ID Ref Patch Type Sub-type
istio#16224 pull request patch Mixed Channel & Lock

Description

A goroutine holds a Mutex at line 91 and is then blocked at line 93. Another goroutine attempts to acquire the same Mutex at line 101 to further drains the same channel at 103.

Istio/17860

Bug ID Ref Patch Type Sub-type
istio#17860 pull request patch Communication Channel

Description

a.statusCh can’t be drained at line 70.

Istio/18454

Bug ID Ref Patch Type Sub-type
istio#18454 pull request patch Communication Channel & Context

Description

s.timer.Stop() at line 56 and 61 can be called concurrency (i.e. from their entry point at line 104 and line 66). See Timer.

Kubernetes/10182

Bug ID Ref Patch Type Sub-type
kubernetes#10182 pull request patch Mixed Channel & Lock

Description

Goroutine 1 is blocked on a lock held by goroutine 3, while goroutine 3 is blocked on sending message to ch, which is read by goroutine 1.

Example execution

G1                         G2                           G3
-------------------------------------------------------------------------------
s.Start()
s.syncBatch()
.                          s.SetPodStatus()
.                          s.podStatusesLock.Lock()
&lt;-s.podStatusChannel &lt;===> s.podStatusChannel &lt;- true
.                          s.podStatusesLock.Unlock()
.                          return
s.DeletePodStatus()        .
.                          .                            s.podStatusesLock.Lock()
.                          .                            s.podStatusChannel &lt;- true
s.podStatusesLock.Lock()
-----------------------------G1,G3 leak-----------------------------------------

Kubernetes/11298

Bug ID Ref Patch Type Sub-type
kubernetes#11298 pull request patch Communication Channel & Condition Variable

Description

n.node used the n.lock as underlaying locker. The service loop initially locked it, the Notify function tried to lock it before calling n.node.Signal(), leading to a goroutine leak. n.cond.Signal() at line 59 and line 81 are not guaranteed to unblock the n.cond.Wait at line 56.

Kubernetes/13135

Bug ID Ref Patch Type Sub-type
kubernetes#13135 pull request patch Resource AB-BA deadlock

Description

G1                              G2                              G3
----------------------------------------------------------------------------------
NewCacher()
watchCache.SetOnReplace()
watchCache.SetOnEvent()
.                               cacher.startCaching()
.                               c.Lock()
.                               c.reflector.ListAndWatch()
.                               r.syncWith()
.                               r.store.Replace()
.                               w.Lock()
.                               w.onReplace()
.                               cacher.initOnce.Do()
.                               cacher.Unlock()
return cacher                   .
.                               .                               c.watchCache.Add()
.                               .                               w.processEvent()
.                               .                               w.Lock()
.                               cacher.startCaching()           .
.                               c.Lock()                        .
...
.                                                               c.Lock()
.                               w.Lock()
--------------------------------G2,G3 leak-----------------------------------------

Kubernetes/1321

Bug ID Ref Patch Type Sub-type
kubernetes#1321 pull request patch Mixed Channel & Lock

Description

This is a lock-channel bug. The first goroutine invokes distribute(), which holds m.lock.Lock(), while blocking at sending message to w.result. The second goroutine invokes stopWatching() function, which can unblock the first goroutine by closing w.result. However, in order to close w.result, stopWatching() function needs to acquire m.lock.Lock().

The fix is to introduce another channel and put receive message from the second channel in the same select statement as the w.result. Close the second channel can unblock the first goroutine, while no need to hold m.lock.Lock().

Example execution

G1                          G2
----------------------------------------------
testMuxWatcherClose()
NewMux()
.                           m.loop()
.                           m.distribute()
.                           m.lock.Lock()
.                           w.result &lt;- true
w := m.Watch()
w.Stop()
mw.m.stopWatching()
m.lock.Lock()
---------------G1,G2 leak---------------------

Kubernetes/25331

Bug ID Ref Patch Type Sub-type
kubernetes#25331 pull request patch Communication Channel & Context

Description

A potential goroutine leak occurs when an error has happened, blocking resultChan, while cancelling context in Stop().

Example execution

G1                    G2
------------------------------------
wc.run()
.                   wc.Stop()
.                   wc.errChan &lt;-
.                   wc.cancel()
&lt;-wc.errChan
wc.cancel()
wc.resultChan &lt;-
-------------G1 leak----------------

Kubernetes/26980

Bug ID Ref Patch Type Sub-type
kubernetes#26980 pull request patch Mixed Channel & Lock

Description

A goroutine holds a Mutex at line 24 and blocked at line 35. Another goroutine blocked at line 58 by acquiring the same Mutex.

Kubernetes/30872

Bug ID Ref Patch Type Sub-type
kubernetes#30872 pull request patch Resource AB-BA deadlock

Description

The lock is acquired both at lines 92 and 157.

Kubernetes/38669

Bug ID Ref Patch Type Sub-type
kubernetes#38669 pull request patch Communication Channel

Description

No sender for line 33.

Kubernetes/5316

Bug ID Ref Patch Type Sub-type
kubernetes#5316 pull request patch Communication Channel

Description

If the main goroutine selects a case that doesn’t consumes the channels, the anonymous goroutine will be blocked on sending to channel.

Example execution

G1                      G2
--------------------------------------
finishRequest()
.                       fn()
time.After()
.                       errCh&lt;-/ch&lt;-
--------------G2 leaks----------------

Kubernetes/58107

Bug ID Ref Patch Type Sub-type
kubernetes#58107 pull request patch Resource RWR deadlock

Description

The rules for read and write lock: allows concurrent read lock; write lock has higher priority than read lock.

There are two queues (queue 1 and queue 2) involved in this bug, and the two queues are protected by the same read-write lock (rq.workerLock.RLock()). Before getting an element from queue 1 or queue 2, rq.workerLock.RLock() is acquired. If the queue is empty, cond.Wait() will be invoked. There is another goroutine (goroutine D), which will periodically invoke rq.workerLock.Lock(). Under the following situation, deadlock will happen. Queue 1 is empty, so that some goroutines hold rq.workerLock.RLock(), and block at cond.Wait(). Goroutine D is blocked when acquiring rq.workerLock.Lock(). Some goroutines try to process jobs in queue 2, but they are blocked when acquiring rq.workerLock.RLock(), since write lock has a higher priority.

The fix is to not acquire rq.workerLock.RLock(), while pulling data from any queue. Therefore, when a goroutine is blocked at cond.Wait(), rq.workLock.RLock() is not held.

Example execution

G3                      G4                      G5
--------------------------------------------------------------------
.                       .                       Sync()
rq.workerLock.RLock()   .                       .
q.cond.Wait()           .                       .
.                       .                       rq.workerLock.Lock()
.                       rq.workerLock.RLock()
.                       q.cond.L.Lock()
-----------------------------G3,G4,G5 leak-----------------------------

Kubernetes/62464

Bug ID Ref Patch Type Sub-type
kubernetes#62464 pull request patch Resource RWR deadlock

Description

This is another example for recursive read lock bug. It has been noticed by the go developers that RLock should not be recursively used in the same thread.

Example execution

G1                                  G2
--------------------------------------------------------
m.reconcileState()
m.state.GetCPUSetOrDefault()
s.RLock()
s.GetCPUSet()
.                                   p.RemoveContainer()
.                                   s.GetDefaultCPUSet()
.                                   s.SetDefaultCPUSet()
.                                   s.Lock()
s.RLock()
---------------------G1,G2 leak--------------------------

Kubernetes/6632

Bug ID Ref Patch Type Sub-type
kubernetes#6632 pull request patch Mixed Channel & Lock

Description

When resetChan is full, WriteFrame holds the lock and blocks on the channel. Then monitor() fails to close the resetChan because the lock is already held by WriteFrame.

Example execution

G1                      G2                  helper goroutine
----------------------------------------------------------------
i.monitor()
&lt;-i.conn.closeChan
.                       i.WriteFrame()
.                       i.writeLock.Lock()
.                       i.resetChan &lt;-
.                       .                   i.conn.closeChan&lt;-
i.writeLock.Lock()
----------------------G1,G2 leak--------------------------------

Kubernetes/70277

Bug ID Ref Patch Type Sub-type
kubernetes#70277 pull request patch Communication Channel

Description

wait.poller() returns a function with type WaitFunc. the function creates a goroutine and the goroutine only quits when after or done closed.

The doneCh defined at line 70 is never closed.

Moby/17176

Bug ID Ref Patch Type Sub-type
moby#17176 pull request patch Resource Double locking

Description

devices.nrDeletedDevices takes devices.Lock() but does not release it (line 36) if there are no deleted devices. This will block other goroutines trying to acquire devices.Lock().

Moby/21233

Bug ID Ref Patch Type Sub-type
moby#21233 pull request patch Communication Channel

Description

This test was checking that it received every progress update that was produced. But delivery of these intermediate progress updates is not guaranteed. A new update can overwrite the previous one if the previous one hasn’t been sent to the channel yet.

The call to t.Fatalf terminated the current goroutine which was consuming the channel, which caused a deadlock and eventual test timeout rather than a proper failure message.

Example execution

G1                      G2                  G3
----------------------------------------------------------
testTransfer()          .                   .
tm.Transfer()           .                   .
t.Watch()               .                   .
.                       WriteProgress()     .
.                       ProgressChan&lt;-      .
.                       .                   &lt;-progressChan
.                       ...                 ...
.                       return              .
.                                           &lt;-progressChan
&lt;-watcher.running
----------------------G1,G3 leak--------------------------

Moby/25384

Bug ID Ref Patch Type Sub-type
moby#25384 pull request patch Mixed Misuse WaitGroup

Description

When n=1 (where n is len(pm.plugins)), the location of group.Wait() doesn’t matter. When n > 1, group.Wait() is invoked in each iteration. Whenever group.Wait() is invoked, it waits for group.Done() to be executed n times. However, group.Done() is only executed once in one iteration.

Misuse of sync.WaitGroup

Moby/27782

Bug ID Ref Patch Type Sub-type
moby#27782 pull request patch Communication Channel & Condition Variable

Description

Example execution

G1                      G2                         G3
-----------------------------------------------------------------------
InitializeStdio()
startLogging()
l.ReadLogs()
NewLogWatcher()
.                       l.readLogs()
container.Reset()       .
LogDriver.Close()       .
r.Close()               .
close(w.closeNotifier)  .
.                       followLogs(logWatcher)
.                       watchFile()
.                       New()
.                       NewEventWatcher()
.                       NewWatcher()
.                       .                           w.readEvents()
.                       .                           event.ignoreLinux()
.                       .                           return false
.                       &lt;-logWatcher.WatchClose()   .
.                       fileWatcher.Remove()        .
.                       w.cv.Wait()                 .
.                       .                           w.Events &lt;- event
------------------------------G2,G3 leak-------------------------------

Moby/28462

Bug ID Ref Patch Type Sub-type
moby#28462 pull request patch Mixed Channel & Lock

Description

One goroutine may acquire a lock and try to send a message over channel stop, while the other will try to acquire the same lock. With the wrong ordering, both goroutines will leak.

Example execution

G1                          G2
--------------------------------------------------------------
monitor()
handleProbeResult()
.                           d.StateChanged()
.                           c.Lock()
.                           d.updateHealthMonitorElseBranch()
.                           h.CloseMonitorChannel()
.                           s.stop &lt;- struct{}{}
c.Lock()
----------------------G1,G2 leak------------------------------

Moby/30408

Bug ID Ref Patch Type Sub-type
moby#30408 pull request patch Communication Condition Variable

Description

Wait() at line 22 has no corresponding Signal() or Broadcast().

Example execution

G1                 G2
------------------------------------------
testActive()
.                  p.waitActive()
.                  p.activateWait.L.Lock()
.                  p.activateWait.Wait()
&lt;-done
-----------------G1,G2 leak---------------

Moby/33781

Bug ID Ref Patch Type Sub-type
moby#33781 pull request patch Communication Channel & Context

Description

The goroutine created using an anonymous function is blocked sending a message over an unbuffered channel. However there exists a path in the parent goroutine where the parent function will return without draining the channel.

Example execution

G1              G2             G3
----------------------------------------
monitor()       .
&lt;-time.After()  .
.               .
&lt;-stop          stop&lt;-
.
cancelProbe()
return
.                               result&lt;-
----------------G3 leak------------------

Moby/36114

Bug ID Ref Patch Type Sub-type
moby#36114 pull request patch Resource Double locking

Description

The the lock for the struct svm has already been locked when calling svm.hotRemoveVHDsAtStart().

Moby/4951

Bug ID Ref Patch Type Sub-type
moby#4951 pull request patch Resource AB-BA deadlock

Description

The root cause and patch is clearly explained in the commit description. The global lock is devices.Lock(), and the device lock is baseInfo.lock.Lock(). It is very likely that this bug can be reproduced.

Moby/7559

Bug ID Ref Patch Type Sub-type
moby#7559 pull request patch Resource Double locking

Description

Line 25 is missing a call to .Unlock.

Example execution

G1
---------------------------
proxy.connTrackLock.Lock()
if err != nil { continue }
proxy.connTrackLock.Lock()
-----------G1 leaks--------

Serving/2137

Bug ID Ref Patch Type Sub-type
serving#2137 pull request patch Mixed Channel & Lock

patch:https://github.com/ knative/serving/pull/2137/files pull request:https://github.com/ knative/serving/pull/2137

Description

Example execution

G1                              G2                          G3
----------------------------------------------------------------------------------
b.concurrentRequests(2)         .                           .
b.concurrentRequest()           .                           .
r.lock.Lock()                   .                           .
.                               start.Done()                .
start.Wait()                    .                           .
b.concurrentRequest()           .                           .
r.lock.Lock()                   .                           .
.                               .                           start.Done()
start.Wait()                    .                           .
unlockAll(locks)                .                           .
unlock(lc)                      .                           .
req.lock.Unlock()               .                           .
ok := &lt;-req.accepted            .                           .
.                               b.Maybe()                   .
.                               b.activeRequests &lt;- t       .
.                               thunk()                     .
.                               r.lock.Lock()               .
.                               .                           b.Maybe()
.                               .                           b.activeRequests &lt;- t
----------------------------G1,G2,G3 leak-----------------------------------------

Syncthing/4829

Bug ID Ref Patch Type Sub-type
syncthing#4829 pull request patch Resource Double locking

Description

Double locking at line 17 and line 30.

Example execution

G1
---------------------------
mapping.clearAddresses()
m.mut.Lock() [L2]
m.notify(...)
m.mut.RLock() [L2]
----------G1 leaks---------

Syncthing/5795

Bug ID Ref Patch Type Sub-type
syncthing#5795 pull request patch Communication Channel

Description

&lt;-c.dispatcherLoopStopped at line 82 is blocking forever because dispatcherLoop() is blocking at line 72.

Example execution

G1                            G2
--------------------------------------------------------------
c.Start()
go c.dispatcherLoop() [G3]
.                             select [&lt;-c.inbox, &lt;-c.closed]
c.inbox &lt;- &lt;================> [&lt;-c.inbox]
&lt;-c.dispatcherLoopStopped     .
.                             default
.                             c.ccFn()/c.Close()
.                             close(c.closed)
.                             &lt;-c.dispatcherLoopStopped
---------------------G1,G2 leak-------------------------------