并发编程(三) - Sem 信号量

站长

2024年04月01日 10:10 · 阅读数 84

在上篇文档中介绍了 Mutex 实现的源码。细心的朋友会注意到，在实现 goroutine 的阻塞/唤醒控制时使用的是 runtime_SemacquireMutex和 runtime_Semrelease，也就是我们常说的信号量。本文将详细介绍一下什么是信号量，实现原理，解决了什么问题。

信号量的提出

信号量最初由荷兰计算机科学家 Dijkstra 在 1965 年提出，它是一种用于控制并发访问共享资源的机制。信号量的核心是一个计数器，它记录着可以同时访问共享资源的线程数量。当线程需要访问共享资源时，它会尝试获取一个信号量，如果信号量的计数器大于 0，那么线程可以继续访问共享资源，并且信号量的计数器减 1；如果信号量的计数器等于 0，那么线程就必须等待，直到有一个信号量可用。

P/V 操作

计数信号量具备两种操作动作，称为V（signal()）与P（wait()）（即部分参考书常称的“PV操作”）。V操作会增加信号量的数值，P 操作会减少它。

运作方式：

初始化，给与它一个非负数的整数值「S=4」。
执行 P(s, n)，信号量 S 的值将被减少 n。企图进入临界区段的进程，需要先执行 P 操作。当信号量 S < n 时，进程会被挡住，不能继续执行；当信号量 S >= n 时，进程可以获准进入临界区段。
执行 V(s, n)，信号量 S 的值会被增加 n。结束离开临界区段的进程，将会执行 V。

使用伪代码表示：

// 离开临界区，释放资源
function V(semaphore S, integer I):
    [S ← S + I]
    
// 申请访问临界区，资源不足时将被阻塞
function P(semaphore S, integer I):
    repeat:
        [
            if S ≥ I:
                S ← S − I
                break
        ]

互斥锁其实就是特殊的计数信号量「S=1」，又叫二元信号量，其操作如下：

S = 1

lock():
    P(S, 1)
    
unlock():
    V(S,1)

Go 信号量底层实现

信号量本质上就是一个控制并发访问资源的算法。在 go 的 Mutex 第一版本实现中，就是依赖 CAS 和信号量实现的。其中 sema 就是信号量，初始值为 0；semacquire(&m.sema) 可以理解为 P 操作，由于 sema 初始值为 0 进入阻塞等待；semrelease(&m.sema) 可以理解为 V 操作，信号量加一，唤醒等待的 goroutine。

 // 请求锁
func (m *Mutex) Lock() {
    if xadd(&m.key, 1) == 1 { //标识加1，如果等于1，成功获取到锁
return
    }
    semacquire(&m.sema) // P 操作
}

func (m *Mutex) Unlock() {
    if xadd(&m.key, -1) == 0 { // 将标识减去1，如果等于0，则没有其它等待者
return
    }
    semrelease(&m.sema) // V 操作
}

当然，我们完全可以初始化一个信号量为 1 ，通过 P/V 操作实现一个 Mutex。这里通过 go:linkname 直接调用源码中的 sync.runtime_SemacquireMutex 函数。与官方的实现方式相比，这种方式性能会比较差。大多数情况可以直接通过 CAS 操作获取锁，而不需要去执行 P/V 操作。

 //go:linkname p sync.runtime_SemacquireMutex
func p(s *uint32, lifo bool, skipframes int)

//go:linkname v sync.runtime_Semrelease
func v(s *uint32, lifo bool, skipframes int)

type m struct {
    sem uint32
}

func newM() m {
    return m{
       sem: 1,
    }
}

func (m *m) Lock() {
    p(&m.sem, false, 1)
}

func (m *m) Unlock() {
    v(&m.sem, false, 1)
}

func main() {
    var c = 0
    m := newM()
    wg := &sync.WaitGroup{}
    for i := 0; i < 100000; i++ {
       wg.Add(1)
       go func() {
          defer wg.Done()
          m.Lock()
          defer m.Unlock()
          c++
       }()
    }
    wg.Wait()
    fmt.Println(c)
}

Go 中 P/V 操作的实现可以查看 runtime/sema.go 文件。Go 本身实现了自己的一套调度机制「GMP」。因此，它可以在自己的 runtime 中实现 goroutine 粒度的 P/V 操作。这里不再对源码进行分析。需要而外说明的是，在 go 中，P/V 操作都是基于 goroutine 来说的。当一个 goroutine 执行 P 操作，如果信号量大于 0 则直接返回，否则进入阻塞状态；另一个获取到资源的 goroutine 执行 V 操作，信号量 +1，唤醒阻塞的 goroutine。

func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int, reason waitReason) {
    gp := getg()
    if gp != gp.m.curg {
       throw("semacquire not on the G stack")
    }

    // Easy case.
if cansemacquire(addr) {
       return
    }

    // Harder case:
 // increment waiter count
 // try cansemacquire one more time, return if succeeded
 // enqueue itself as a waiter
 // sleep
 // (waiter descriptor is dequeued by signaler)
s := acquireSudog()
    root := semtable.rootFor(addr)
    t0 := int64(0)
    s.releasetime = 0
    s.acquiretime = 0
    s.ticket = 0
    if profile&semaBlockProfile != 0 && blockprofilerate > 0 {
       t0 = cputicks()
       s.releasetime = -1
    }
    if profile&semaMutexProfile != 0 && mutexprofilerate > 0 {
       if t0 == 0 {
          t0 = cputicks()
       }
       s.acquiretime = t0
    }
    for {
       lockWithRank(&root.lock, lockRankRoot)
       // Add ourselves to nwait to disable "easy case" in semrelease.
root.nwait.Add(1)
       // Check cansemacquire to avoid missed wakeup.
if cansemacquire(addr) {
          root.nwait.Add(-1)
          unlock(&root.lock)
          break
       }
       // Any semrelease after the cansemacquire knows we're waiting
 // (we set nwait above), so go to sleep.
root.queue(addr, s, lifo)
       goparkunlock(&root.lock, reason, traceBlockSync, 4+skipframes)
       if s.ticket != 0 || cansemacquire(addr) {
          break
       }
    }
    if s.releasetime > 0 {
       blockevent(s.releasetime-t0, 3+skipframes)
    }
    releaseSudog(s)
}

func semrelease1(addr *uint32, handoff bool, skipframes int) {
    root := semtable.rootFor(addr)
    atomic.Xadd(addr, 1)

    // Easy case: no waiters?
 // This check must happen after the xadd, to avoid a missed wakeup
 // (see loop in semacquire).
if root.nwait.Load() == 0 {
       return
    }

    // Harder case: search for a waiter and wake it.
lockWithRank(&root.lock, lockRankRoot)
    if root.nwait.Load() == 0 {
       // The count is already consumed by another goroutine,
 // so no need to wake up another goroutine.
unlock(&root.lock)
       return
    }
    s, t0 := root.dequeue(addr)
    if s != nil {
       root.nwait.Add(-1)
    }
    unlock(&root.lock)
    if s != nil { // May be slow or even yield, so unlock first
acquiretime := s.acquiretime
       if acquiretime != 0 {
          mutexevent(t0-acquiretime, 3+skipframes)
       }
       if s.ticket != 0 {
          throw("corrupted semaphore ticket")
       }
       if handoff && cansemacquire(addr) {
          s.ticket = 1
       }
       readyWithTime(s, 5+skipframes)
       if s.ticket == 1 && getg().m.locks == 0 {
          // Direct G handoff
         // readyWithTime has added the waiter G as runnext in the
         // current P; we now call the scheduler so that we start running
         // the waiter G immediately.
         // Note that waiter inherits our time slice: this is desirable
         // to avoid having a highly contended semaphore hog the P
         // indefinitely. goyield is like Gosched, but it emits a
         // "preempted" trace event instead and, more importantly, puts
         // the current G on the local runq instead of the global one.
         // We only do this in the starving regime (handoff=true), as in
         // the non-starving case it is possible for a different waiter
         // to acquire the semaphore while we are yielding/scheduling,
         // and this would be wasteful. We wait instead to enter starving
         // regime, and then we start to do direct handoffs of ticket and
         // P.
         // See issue 33747 for discussion.
        goyield()
       }
    }
}

Go 信号量封装

在 Go 的 runtime 中实现了 P/V 操作，然而开发者无法直接调用。不过在 Go 的扩展包中，基于 Mutex 实现了一套信号量相关的 API。在扩展包中定义了一个 Weighted 类型，对外暴露三个方法。

Acquire: 将 Weighted 的信号量减少 n。如果 ctx 没有结束或者 Weighted 的信号量小于 < n 则一直阻塞。

Release: 将 Weighted 的信号量增加 n。

TryAcquire: 尝试将 Weighted 的信号量减少 n。如果 Weighted 的信号量小于 < n 则直接返回，不更新 Weighted 的信号。和 Acquire 的区别在于不会一致阻塞等待获取信号量。

type Weighted
    func NewWeighted(n int64) *Weighted
    func (s *Weighted) Acquire(ctx context.Context, n int64) error
    func (s *Weighted) Release(n int64)
    func (s *Weighted) TryAcquire(n int64) bool

示例

使用 Weighted 实现一个 Worker pool 模式。Worker pool 是一种常见的并发控制设计模式，主要用于控制系统中并发任务的数量。这种设计模式的目的是防止系统中创建的 goroutine 过多而造成过大的资源开销，造成系统卡死，提高系统的稳定性和效率。

写一个程序，计算 0～32 中每个数对应考拉兹猜想的步数。为了最大的发挥硬件性能，可以使并发执行的 goroutine 数量等于 cpu 数量，保证每个 goroutine 分别占用一个 cpu 执行计算。

func main() {
    ctx := context.TODO()

    var (
       // 获取 cpu 数量，默认设置的 P = num(cpu)
maxWorkers = runtime.GOMAXPROCS(0)
       // 初始化一个信号量，用于控制 goroutine 数量
sem = semaphore.NewWeighted(int64(maxWorkers))
       // 存储计算结果
out = make([]int, 32)
    )

    for i := range out {
       // 获取一个信号量，则开启一个 goroutine; 信号量释放完之后进入阻塞状态，等待其他 goroutine 释放信号量唤醒
if err := sem.Acquire(ctx, 1); err != nil {
          log.Printf("Failed to acquire semaphore: %v", err)
          break
       }
       go func(i int) {
          // 计算完成释放信号量
defer sem.Release(1)
          out[i] = collatzSteps(i + 1)
       }(i)
    }

    // 等待所有计算完成。能同时获取到 maxWorkers 个信号量则表示所有的 goroutine 已经执行完成
    if err := sem.Acquire(ctx, int64(maxWorkers)); err != nil {
       log.Printf("Failed to acquire semaphore: %v", err)
    }
    fmt.Println(out)
}

// See https://en.wikipedia.org/wiki/Collatz_conjecture.
func collatzSteps(n int) (steps int) {
    if n <= 0 {
       panic("nonpositive input")
    }

    for ; n > 1; steps++ {
       if steps < 0 {
          panic("too many steps")
       }

       if n%2 == 0 {
          n /= 2
          continue
       }

       const maxInt = int(^uint(0) >> 1)
       if n > (maxInt-1)/3 {
          panic("overflow")
       }
       n = 3*n + 1
    }

    return steps
}

源码解读

看一下 Weighted 的数据结构：

 // Weighted provides a way to bound concurrent access to a resource.
// The callers can request access with a given weight.
type Weighted struct {
    size    int64 // 信号量容量
    cur     int64 // 已使用的信号量
    mu      sync.Mutex // 互斥锁，用于控制对 cur、waiters 的并发读写操作
    waiters list.List // 资源不足时，将阻塞的申请放入队列中
}

最复杂的部分就是获取信号量的逻辑，主要涉及资源检查，ctx 的 Done 是否已经关闭等。具体可以看代码注释。

func (s *Weighted) Acquire(ctx context.Context, n int64) error {
    // 加锁，保证 cur、waiters 变量的并发安全
    s.mu.Lock()
    // 剩余信号量检查，剩余资源充足则直接返回
    if s.size-s.cur >= n && s.waiters.Len() == 0 {
       s.cur += n
       s.mu.Unlock()
       return nil
    }

    // 永远不可能成功
    if n > s.size {
       // Don't make other Acquire calls block on one that's doomed to fail.
s.mu.Unlock()
       // 等待 ctx 执行完成
       <-ctx.Done()
       return ctx.Err()
    }

    // 把调用者放入等待队列，等待其他 goroutine 释放信号量
    ready := make(chan struct{})
    // 创建一个大小为 n 的 waiter，通过 ready 来唤醒
    w := waiter{n: n, ready: ready}
    // 放到队列尾部
    elem := s.waiters.PushBack(w)
    s.mu.Unlock()

    select {
    // ctx 完成
    case <-ctx.Done():
       err := ctx.Err()
       s.mu.Lock()
       select {
       case <-ready:
          // Acquired the semaphore after we were canceled.  Rather than trying to
 // fix up the queue, just pretend we didn't notice the cancelation.
err = nil
       default:
          // ctx 已经被取消，让然拿不到信号量；直接移除 elem 并唤醒其他 waiter，尝试获取信号量
          isFront := s.waiters.Front() == elem
          s.waiters.Remove(elem)
          // If we're at the front and there're extra tokens left, notify other waiters.
if isFront && s.size > s.cur {
             s.notifyWaiters()
          }
       }
       s.mu.Unlock()
       return err
    // release 释放了信号量
    case <-ready:
       return nil
    }
}

Release 则直接释放对应的信号量，并唤醒其他的 waiter。

func (s *Weighted) Release(n int64) {
    s.mu.Lock()
    s.cur -= n
    if s.cur < 0 { // 释放的信号量大于已经申请的信号量
       s.mu.Unlock()
       panic("semaphore: released more than held")
    }
    s.notifyWaiters()
    s.mu.Unlock()
}

这里可以看下 notifyWaiters 方法。在唤醒阻塞的 goroutine 时，是严格的按照队列机制来唤醒的，如果释放的资源不够，则继续阻塞，直到有 goroutine 释放了足够的资源。

func (s *Weighted) notifyWaiters() {
    for {
       next := s.waiters.Front()
       if next == nil {
          break // No more waiters blocked.
        }

       w := next.Value.(waiter)
       if s.size-s.cur < w.n {
          // Not enough tokens for the next waiter.  We could keep going (to try to
         // find a waiter with a smaller request), but under load that could cause
         // starvation for large requests; instead, we leave all remaining waiters
         // blocked.
         //
         // Consider a semaphore used as a read-write lock, with N tokens, N
         // readers, and one writer.  Each reader can Acquire(1) to obtain a read
         // lock.  The writer can Acquire(N) to obtain a write lock, excluding all
         // of the readers.  If we allow the readers to jump ahead in the queue,
         // the writer will starve — there is always one token available for every
         // reader.
            break
       }
       // 使用的信号量增加，移除等待队列，通知等待的 goroutine
       s.cur += w.n
       s.waiters.Remove(next)
       close(w.ready)
    }
}

总结

信号量的提出是为了解决并发访问资源的问题。系统中往往只能提供有限的资源，因此需要控制访问资源的数量。在信号量算法中，需要先初始化一个资源总量，P 操作表示申请资源，如果剩余资源数量够则返回，不够则阻塞等待。V 操作表示释放申请的资源。

在 Go 的 runtime 中实现了信号量机制。Mutex 就是基于原子操作和信号量机制实现的。然而，runtime 中的信号量机制并未对外暴露，开发者无法使用。基于此，在 Go 的扩展库中，提供了一个 Weighted 结构体。不仅支持对应的信号量操作，还支持了对 ctx 的取消判断。

转载自:https://juejin.cn/post/7337302010068467738