go程序的生前死后

站长

2023年02月24日 22:45 · 阅读数 55

最近想写一篇GC的文章，写着写着发现没有GMP调度模型和Memory Model的知识，gc就很难写下去，然后就又去看GMP模型，然后发现GMP的起点还是得看g0、m0这几个关键数据，这几个数据又是在程序启动时初始化的，然后发现GC和Memory Model其实在初始化时也有很多操作，所以先尝试把go程序启动到销毁的过程讲清楚，再去一步一步的去看runtime相关的内容

开发者的视角

从使用者视角来看，go程序对我们可见的就是这几部分。

全局常量
全局变量
init函数
main.main函数

先明确下这几个步骤的分别执行的次数。

全局常量，同一个package中可以有多个全局常量，同一个go文件中也可以有多个全局常量。
全局变量，同一个package中可以有多个全局变量，同一个go文件中也可以有多个全局变量。
init函数，同一个package中可以有多个init函数，同一个go文件中也可以有多个init函数（也是比较特殊的，常规的函数是不可以重名的，但是init函数可以有多个）。
main.main函数，main包中，仅可以定义一个main函数，其他包也可以定义main函数，但是不可以调用，执行完main函数，程序也就退出了。

明确好执行次数后，先说执行顺序的结论。

整体上的顺序是全局常量->全局变量->init函数->main函数。
如果有多个package，整体的import顺序会从左到右、从上到下构成一棵树状结构，越靠左/上，代表import的顺序越靠前，对于不同层的package，优先执行更深的，对于同层的package，优先执行更左的，也是多叉树后序遍历的顺序。
如果同一个package中有多个go文件，文件中的执行顺序是根据文件名排序，越小的字典序文件，越先被执行。有一个特殊的点是，在执行package级时，会先执行完package里每一个文件的全局常量和全局变量，执行完成后，再去执行init函数。
如果一个go文件中有多个init函数，会根据init的定义顺序执行，先定义的先执行。

拿一个实际例子看一下，有如下的文件结构，每个文件中都有init函数、全局变量和全局常量

.
├── aa
│   └── a1.go
├── bb
│   └── b1.go
├── cc
│   ├── c1.go
│   └── c2.go
├── dd
│   └── d1.go
├── go.mod
└── main.go

其中import顺序如下

a1.go import cc
b1.go import dd
main.go import aa bb

构成的import package树是这个结构

  main
 |   |
aa   bb
 |   |
cc   dd

那么最终的执行顺序就是

1. c1 全局常量
2. c1 全局变量
3. c2 全局常量
4. c2 全局变量
5. c1 init
6. c2 init
7. a1 全局常量
8. a1 全局变量
9. a1 init
10.d1 全局常量
11.d1 全局变量
12.d1 init
13.b1 全局常量
14.b1 全局变量
15.b1 init
16.main 全局常量
17.main 全局变量
18.main init

整体的顺序也就是下图表示的，其中file根据文件名大小执行，package根据后续遍历顺序执行。

go程序的生前死后

不过我们在编程时千万不要依赖各个init间执行顺序的，首先init的顺序go官方是不做保证的，其次依赖init的顺序会让我们的程序逻辑变得很复杂，还需要注意文件命名大小啥的，完全得不偿失，建议的方案是各个init间不要依赖，如果有依赖关系，那就不要用init函数，比如自定义INIT函数，然后自己在程序中指定各个package的INIT顺序。甚至就实际的工程实践而言，我们在工程上是严格避免使用init函数的，隐式的初始化会带来各种隐藏的问题，如果需要初始化，那就显示的去调用，显示的调用会让执行顺序更客观而且避免隐式调用带来的隐藏问题。

runtime的视角

以下代码基于go 1.19，linux amd64环境

入口

上边我们介绍了开发者的视角下程序的初始化，在runtime视角下，就有一些更细致的原因，首先我们先gdb打个断点看一下go程序的入口在哪里，然后直接去看runtime的源码即可。实例程序很简单，就一个空的main函数。

//m1.go
package main

func main() {

}

编译，使用gdb 调试

go build -o m1 m1.go

gdb m1

info files

go程序的生前死后

可以看到程序的入口在0x453860,我们再看下对应的是哪个文件。

b *0x453860

go程序的生前死后

这样我们就找到了go程序的入口rt0_linux_amd64.s，这是在linux系统下程序的入口，其实go程序在不同系统和cpu架构下，会有不同的入口，比如mac os的程序入口就分别有：rt0_darwin_amd64.s 和 rt0_darwin_arm64.s。

go程序的生前死后

入口文件很简单，直接执行了JMP指令到_rt0_amd64。

//rt0_darwin_amd64.s

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

#include "textflag.h"

TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
   JMP    _rt0_amd64(SB)

TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
   JMP    _rt0_amd64_lib(SB)

g0初始化

_rt0_amd64的逻辑也很简单，初始化了argc和argv（也就是命令行传参）之后就去调用runtime·rt0_go，也就是go程序的根goroutine：m0的g0。

TEXT _rt0_amd64(SB),NOSPLIT,$-8
   MOVQ   0(SP), DI  // argc，argument count，传递给main函数的参数个数，也就是我们在命令行输入的
   LEAQ   8(SP), SI  // argv，argument vector，传递给main函数的参数数组，字符串类型
   JMP    runtime·rt0_go(SB)

我们继续看runtime.rt0_go的实现,rt0_go是对g0的初始化，在golang的gmp模型中，每一个M都会有一个g0用于执行runtime相关的代码，其中有一个比较特殊的m0（即go进程的主线程，每个go进程仅一个m0），而下边这段汇编代码就是初始化m0的g0栈，其他g0栈会通过runtime.newm执行。

TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0

   // copy arguments forward on an even stack
   // 将参数向前复制到偶数栈上
   MOVQ   DI, AX    // argc
   MOVQ   SI, BX    // argv
   SUBQ   $(5*8), SP    // 3args 2auto
   ANDQ   $~15, SP //减小SP，使寄存器按16字节对齐，
   // CPU 有一组 SSE 指令，这些指令中出现的内存地址必须是 16 的倍数。
   MOVQ   AX, 24(SP)
   MOVQ   BX, 32(SP)

   // create istack out of the given (operating system) stack.
   // _cgo_init may update stackguard.
   // 从操作系统线程栈内存中分配g0的栈，分配64KB + 104B
   MOVQ   $runtime·g0(SB), DI
   LEAQ   (-64*1024+104)(SP), BX
   MOVQ   BX, g_stackguard0(DI)
   MOVQ   BX, g_stackguard1(DI)
   MOVQ   BX, (g_stack+stack_lo)(DI)
   MOVQ   SP, (g_stack+stack_hi)(DI)

   // find out information about the processor we're on
   // 确定处理器的信息,CPUID命令会将cpu信息放到AX寄存器
   MOVL   $0, AX
   CPUID 
   CMPL   AX, $0
   JE nocpuinfo

   ...(各种操作系统和cpu型号的处理)...

tls初始化

初始化堆栈和cpu型号后，接下来就是线程本地存储（thread local storage，tls）的设置

TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0

   ...（g0的初始化代码）...

#ifndef GOOS_windows
   JMP ok
#endif

// 在windows上会跳过tls的初始化
   ... （其他操作系统跳过tls初始化的代码）...

   //初始化m0的tls
   LEAQ   runtime·m0+m_tls(SB), DI
   CALL   runtime·settls(SB)

   // store through it, to make sure it works
   // 通过tls进行一次存储，保证tls是可以工作的，
   // 对0x123的存储和读取，如果和预期不一致的话，则使用INT进行中断
   get_tls(BX)
   MOVQ   $0x123, g(BX)
   MOVQ   runtime·m0+m_tls(SB), AX
   CMPQ   AX, $0x123
   JEQ 2(PC)
   CALL   runtime·abort(SB)
ok:
   // set the per-goroutine and per-mach "registers"
   // 将g0保存到m0的tls中
   // 将m0保存到AX中
   get_tls(BX) //等价于        MOVQ  TLS, BX
   LEAQ   runtime·g0(SB), CX
   MOVQ   CX, g(BX)
   LEAQ   runtime·m0(SB), AX

   // save m->g0 = g0
   MOVQ   CX, m_g0(AX)
   // save m0 to g0->m
   MOVQ   AX, g_m(CX)

   CLD             // convention is D is always left cleared

   // Check GOAMD64 reqirements
   // We need to do this after setting up TLS, so that
   // we can report an error if there is a failure. See issue 49586.
   
   
   CALL    runtime·check(SB)
    
   ...（后续的code）...

调度流程

堆栈和tls初始化完成后，就开始调度器的初始化和循环调度了。

TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
   ...（前置的code）...
    MOVL   24(SP), AX    // copy argc
    MOVL   AX, 0(SP)
    MOVQ   32(SP), AX    // copy argv
    MOVQ   AX, 8(SP)
    CALL   runtime·args(SB)      //参数初始化
    CALL   runtime·osinit(SB)    //系统核心数初始化
    CALL   runtime·schedinit(SB) //调度器初始化
    
    // create a new goroutine to start program
    // 创建 main goroutine，用于执行runtime.main函数
    MOVQ   $runtime·mainPC(SB), AX       // entry
    PUSHQ  AX  //将mainPC压栈，也就是runtime.newproc的第一个参数就是runtime.main
    CALL   runtime·newproc(SB) // 创建main goroutine，也就是跑我们runtime.main函数的goroutine
    POPQ   AX
    
    // start this M
    // 开始调度循环，此时只有刚才创建的main gorountine，开始执行
    CALL   runtime·mstart(SB)
    
    //理论上runtime.mstart不会退出，作为兜底，如果mstart返回了，直接退出进程
    CALL   runtime·abort(SB)  // mstart should never return
    RET
    
mainPC的定义
// mainPC is a function value for runtime.main, to be passed to newproc.
// The reference to runtime.main is made via ABIInternal, since the
// actual function (not the ABI0 wrapper) is needed by newproc.
DATA   runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
GLOBL  runtime·mainPC(SB),RODATA,$8

//new proc的定义，第一个参数就是新创建的goroutine要执行的函数funvalue
func newproc(fn *funcval) {
   gp := getg()
   pc := getcallerpc()
   systemstack(func() {
      newg := newproc1(fn, gp, pc)

      _p_ := getg().m.p.ptr()
      runqput(_p_, newg, true)

      if mainStarted {
         wakep()
      }
   })
}

到此runtime的汇编代码初始化部分也就结束了，没有展开说的是runtime.args、runtime.osinit和runtime.schedinit我们继续深入去看一下runtime.args，后续的bootstrap我们放到下一章去看。

命令行参数

命令行参数的处理是通过runtime.args执行。

var (
   argc int32
   argv **byte
)

func args(c int32, v **byte) {
//argc和argc是全局变量，在这里去做赋值操作
   argc = c
   argv = v
   sysargs(c, v)
}

//go:linkname executablePath os.executablePath
var executablePath string
//初始化executablePath，
//executablePath就是执行当前命令的完整字符串，"./mian -n 10"就是一个例子
func sysargs(argc int32, argv **byte) {
   // skip over argv, envv and the first string will be the path
   n := argc + 1
   // 
   for argv_index(argv, n) != nil {
      n++
   }
   executablePath = gostringnocopy(argv_index(argv, n+1))

   // strip "executable_path=" prefix if available, it's added after OS X 10.11.
   const prefix = "executable_path="
   if len(executablePath) > len(prefix) && executablePath[:len(prefix)] == prefix {
      executablePath = executablePath[len(prefix):]
   }
}

bootstrap

go程序的命令行参数初始化完成后，后续执行的顺序分别是：

调用osinit，初始化cpu核心数和内存页大小。
调用schedinit，初始化调度器，准备调度。
make & queue new G，创建一个新的G（也就是main goroutine），调用runtime.main函数，并加入调度队列。
调用mstart，开始调度循环，调度循环函数mstart永远不会返回。

核心数

cpu核心数的处理是通过runtime.osinit执行的。


func osinit() {
   ncpu = getproccount() //获取cpu的核心数
   physHugePageSize = getHugePageSize() 
   //获取操作系统重一个物理大页的大小，是2的幂次，
   //在虚拟内存的管理场景下，使用大页，可以使得大大减少虚拟地址映射表的加载量
   //减少内核需要加载的页表数量，提高性能
   
   //架构初始化
   osArchInit()
}

//在源码中没有看到具体的实现，由编译器注入
func osArchInit() {}

调度器初始化

sched的初始化是调度器的初始化，也是golang调度器中最重要的一部分。

func schedinit() {
  // 省略lock的过程

   // raceinit must be the first call to race detector.
   // In particular, it must be done before mallocinit below calls racemapshadow.
   // 获取m0的g0
   _g_ := getg()
   if raceenabled {
      _g_.racectx, raceprocctx0 = raceinit()
   }

   //最多启动1万个工作线程
   sched.maxmcount = 10000

   // The world starts stopped.
   // stw，停止gc工作
   worldStopped()

   moduledataverify()
   
   //初始化栈空间和栈内存分配
   stackinit()
   mallocinit()
   
   //获取GODEBUG环境变量，进行cpu参数初始化
   cpuinit()      // must run before alginit
   //判断是否可以使用aes的hash，并进行初始化，map、rand等会依赖
   alginit()      // maps, hash, fastrand must not be used before this call
   fastrandinit() // must run before mcommoninit
   
   //初始化m0
   mcommoninit(_g_.m, -1)
   //执行go module的初始化
   modulesinit()   // provides activeModules
   //内置map的依赖
   typelinksinit() // uses maps, activeModules
   //也是go module相关的逻辑
   itabsinit()     // uses activeModules
   stkobjinit()    // must run before GC starts

   sigsave(&_g_.m.sigmask)
   initSigmask = _g_.m.sigmask

   if offset := unsafe.Offsetof(sched.timeToRun); offset%8 != 0 {
      println(offset)
      throw("sched.timeToRun not aligned to 8 bytes")
   }

   // 根据argc和argv，将命令行参数拷贝到一个切片中
   goargs()
   //
   goenvs()
   parsedebugvars()
   //gc的初始化
   gcinit()

   lock(&sched.lock)
   sched.lastpoll = uint64(nanotime())
   procs := ncpu
   if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
      procs = n
   }
   if procresize(procs) != nil {
      throw("unknown runnable goroutine during bootstrap")
   }
   unlock(&sched.lock)

   // World is effectively started now, as P's can run.
   // stw结束，调度器可以继续执行调度任务
   worldStarted()

   // For cgocheck > 1, we turn on the write barrier at all times
   // and check all pointer writes. We can't do this until after
   // procresize because the write barrier needs a P.
   // cgo和写屏障相关的逻辑
   if debug.cgocheck > 1 {
      writeBarrier.cgo = true
      writeBarrier.enabled = true
      for _, p := range allp {
         p.wbBuf.reset()
      }
   }

   if buildVersion == "" {
      // Condition should never trigger. This code just serves
      // to ensure runtime·buildVersion is kept in the resulting binary.
      buildVersion = "unknown"
   }
   if len(modinfo) == 1 {
      // Condition should never trigger. This code just serves
      // to ensure runtime·modinfo is kept in the resulting binary.
      modinfo = ""
   }
}

sysmon

第一步就是启动sysmon（system monitor，系统监控）线程，是系统级的daemon线程，在wasm架构下由于没有线程，所以不会启动sysmon，sysmon函数的运行也是比较特殊的，没有绑定的P，直接运行在系统栈上，不会由GMP调度模型进行调度，所以通过go的trace工具也跟踪不到这个线程，sysmon的主要功能如下：

抢占式调度（在go 1.13加入），长时间运行（超过10ms）的goroutine会被抢占调度。
强制GC，如果GC长时间（2min）没有运营，则sysmon会强制开启一轮GC。
NetPooler监控，在G由于网络调用阻塞时，会把G放入网络轮询器（NetPooler）中，由网络轮询器处理异步网络调用，从而使得P可以继续调度，待NetPooler处理完异步网络调用后，再由sysmon监控线程将G切换回来继续调度。
切换P系统调用，如果P因为系统调用导致阻塞（_Psyscall），则会让P切换出去，重新找G和M进行执行（防止P都阻塞在系统调用上，导致大量的G得不到调度）。

与此同时，sysmon的调度周期并不是固定的，是动态的，取决于进程的活动情况。

（。。。）
    // Allow newproc to start new Ms.
    mainStarted = true
    
    if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
       systemstack(func() {
          newm(sysmon, nil, -1)
       })
    }
（。。。）

// Always runs without a P, so write barriers are not allowed.
//
//go:nowritebarrierrec
func sysmon() {
   lock(&sched.lock)
   sched.nmsys++
   checkdead()
   unlock(&sched.lock)

   lasttrace := int64(0)
   idle := 0 // how many cycles in succession we had not wokeup somebody
   delay := uint32(0)

   for {
   //动态的轮询周期
      if idle == 0 { // start with 20us sleep...
         delay = 20
      } else if idle > 50 { // start doubling the sleep after 1ms...
         delay *= 2
      }
      if delay > 10*1000 { // up to 10ms
         delay = 10 * 1000
      }
      usleep(delay)

      // sysmon should not enter deep sleep if schedtrace is enabled so that
      // it can print that information at the right time.
      //
      // It should also not enter deep sleep if there are any active P's so
      // that it can retake P's from syscalls, preempt long running G's, and
      // poll the network if all P's are busy for long stretches.
      //
      // It should wakeup from deep sleep if any P's become active either due
      // to exiting a syscall or waking up due to a timer expiring so that it
      // can resume performing those duties. If it wakes from a syscall it
      // resets idle and delay as a bet that since it had retaken a P from a
      // syscall before, it may need to do it again shortly after the
      // application starts work again. It does not reset idle when waking
      // from a timer to avoid adding system load to applications that spend
      // most of their time sleeping.
      now := nanotime()
      if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
         lock(&sched.lock)
         if atomic.Load(&sched.gcwaiting) != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs) {
            syscallWake := false
            next := timeSleepUntil()
            if next > now {
               atomic.Store(&sched.sysmonwait, 1)
               unlock(&sched.lock)
               // Make wake-up period small enough
               // for the sampling to be correct.
               sleep := forcegcperiod / 2
               if next-now < sleep {
                  sleep = next - now
               }
               shouldRelax := sleep >= osRelaxMinNS
               if shouldRelax {
                  osRelax(true)
               }
               syscallWake = notetsleep(&sched.sysmonnote, sleep)
               if shouldRelax {
                  osRelax(false)
               }
               lock(&sched.lock)
               atomic.Store(&sched.sysmonwait, 0)
               noteclear(&sched.sysmonnote)
            }
            if syscallWake {
               idle = 0
               delay = 20
            }
         }
         unlock(&sched.lock)
      }

      lock(&sched.sysmonlock)
      // Update now in case we blocked on sysmonnote or spent a long time
      // blocked on schedlock or sysmonlock above.
      now = nanotime()

      // trigger libc interceptors if needed
      if *cgo_yield != nil {
         asmcgocall(*cgo_yield, nil)
      }
      // poll network if not polled for more than 10ms
      // 超过10ms未执行pool操作，则强制执行一次
      lastpoll := int64(atomic.Load64(&sched.lastpoll))
      if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
         atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
         list := netpoll(0) // non-blocking - returns list of goroutines
         if !list.empty() {
            // Need to decrement number of idle locked M's
            // (pretending that one more is running) before injectglist.
            // Otherwise it can lead to the following situation:
            // injectglist grabs all P's but before it starts M's to run the P's,
            // another M returns from syscall, finishes running its G,
            // observes that there is no work to do and no other running M's
            // and reports deadlock.
            incidlelocked(-1)
            injectglist(&list)
            incidlelocked(1)
         }
      }
      if GOOS == "netbsd" && needSysmonWorkaround {
         // netpoll is responsible for waiting for timer
         // expiration, so we typically don't have to worry
         // about starting an M to service timers. (Note that
         // sleep for timeSleepUntil above simply ensures sysmon
         // starts running again when that timer expiration may
         // cause Go code to run again).
         //
         // However, netbsd has a kernel bug that sometimes
         // misses netpollBreak wake-ups, which can lead to
         // unbounded delays servicing timers. If we detect this
         // overrun, then startm to get something to handle the
         // timer.
         //
         // See issue 42515 and
         // https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50094.
         if next := timeSleepUntil(); next < now {
            startm(nil, false)
         }
      }
      if scavenger.sysmonWake.Load() != 0 {
         // Kick the scavenger awake if someone requested it.
         scavenger.wake()
      }
      // retake P's blocked in syscalls
      // and preempt long running G's
      // 从系统调用中，将P切换出来
      if retake(now) != 0 {
         idle = 0
      } else {
         idle++
      }
      // check if we need to force a GC
      // 判断是否需要强制执行GC
      if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 {
         lock(&forcegc.lock)
         forcegc.idle = 0
         var list gList
         list.push(forcegc.g)
         injectglist(&list)
         unlock(&forcegc.lock)
      }
      if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
         lasttrace = now
         schedtrace(debug.scheddetail > 0)
      }
      unlock(&sched.sysmonlock)
   }
}

runtime.main

调度器初始化(schedinit)完成后会创建main goroutine，执行runtime.main函数（注意区分package，这里并不是我们写的main函数，我们写的main函数是main.main）

// The main goroutine.
func main() {
   g := getg()

   // Racectx of m0->g0 is used only as the parent of the main goroutine.
   // It must not be used for anything else.
   g.m.g0.racectx = 0

   // Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.
   // Using decimal instead of binary GB and MB because
   // they look nicer in the stack overflow failure message.
   if goarch.PtrSize == 8 {
      maxstacksize = 1000000000
   } else {
      maxstacksize = 250000000
   }

   // An upper limit for max stack size. Used to avoid random crashes
   // after calling SetMaxStack and trying to allocate a stack that is too big,
   // since stackalloc works with 32-bit sizes.
   maxstackceiling = 2 * maxstacksize

   // Allow newproc to start new Ms.
   mainStarted = true

   if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
      systemstack(func() {
         newm(sysmon, nil, -1)
      })
   }

   // Lock the main goroutine onto this, the main OS thread,
   // during initialization. Most programs won't care, but a few
   // do require certain calls to be made by the main thread.
   // Those can arrange for main.main to run in the main thread
   // by calling runtime.LockOSThread during initialization
   // to preserve the lock.
   lockOSThread()

   if g.m != &m0 {
      throw("runtime.main not on m0")
   }

   // Record when the world started.
   // Must be before doInit for tracing init.
   runtimeInitTime = nanotime()
   if runtimeInitTime == 0 {
      throw("nanotime returning zero")
   }

   if debug.inittrace != 0 {
      inittrace.id = getg().goid
      inittrace.active = true
   }

   doInit(&runtime_inittask) // Must be before defer.

   // Defer unlock so that runtime.Goexit during init does the unlock too.
   needUnlock := true
   defer func() {
      if needUnlock {
         unlockOSThread()
      }
   }()

   gcenable()

   main_init_done = make(chan bool)
   if iscgo {
      if _cgo_thread_start == nil {
         throw("_cgo_thread_start missing")
      }
      if GOOS != "windows" {
         if _cgo_setenv == nil {
            throw("_cgo_setenv missing")
         }
         if _cgo_unsetenv == nil {
            throw("_cgo_unsetenv missing")
         }
      }
      if _cgo_notify_runtime_init_done == nil {
         throw("_cgo_notify_runtime_init_done missing")
      }
      // Start the template thread in case we enter Go from
      // a C-created thread and need to create a new thread.
      startTemplateThread()
      cgocall(_cgo_notify_runtime_init_done, nil)
   }

   doInit(&main_inittask)

   // Disable init tracing after main init done to avoid overhead
   // of collecting statistics in malloc and newproc
   inittrace.active = false

   close(main_init_done)

   needUnlock = false
   unlockOSThread()

   if isarchive || islibrary {
      // A program compiled with -buildmode=c-archive or c-shared
      // has a main, but it is not executed.
      return
   }
   fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
   fn()
   if raceenabled {
      racefini()
   }

   // Make racy client program work: if panicking on
   // another goroutine at the same time as main returns,
   // let the other goroutine finish printing the panic trace.
   // Once it does, it will exit. See issues 3934 and 20018.
   if atomic.Load(&runningPanicDefers) != 0 {
      // Running deferred functions should not take long.
      for c := 0; c < 1000; c++ {
         if atomic.Load(&runningPanicDefers) == 0 {
            break
         }
         Gosched()
      }
   }
   if atomic.Load(&panicking) != 0 {
      gopark(nil, nil, waitReasonPanicWait, traceEvGoStop, 1)
   }

   exit(0)
   for {
      var x *int32
      *x = 0
   }
}

gcenable

runtime.main执行的逻辑是gcenable，这里就是启动垃圾收集器（gc），函数内部会启动2个gouroutine，分别是bgsweep（后台执行gc扫描工作）和bgscavenge（后台执行gc的清理工作），这两个goroutine永远不会退出。

// gcenable is called after the bulk of the runtime initialization,
// just before we're about to start letting user code run.
// It kicks off the background sweeper goroutine, the background
// scavenger goroutine, and enables GC.
func gcenable() {
   // Kick off sweeping and scavenging.
   c := make(chan int, 2)
   go bgsweep(c)
   go bgscavenge(c)
   <-c
   <-c
   memstats.enablegc = true // now that runtime is initialized, GC is okay
}

doInit

在doInit前，就会把init chan准备好，这个函数其实就是执行用户自定义的各个package的init函数，通过源码，我们也可以看到init函数的执行顺序，就是我们在上文描述的顺序，同时也能看到main.main函数在所有的init执行完后才开始执行

func doInit(t *initTask) {
   switch t.state {
   case 2: // fully initialized
      return
   case 1: // initialization in progress
      throw("recursive call during initialization - linker skew")
   default: // not initialized yet
      t.state = 1 // initialization in progress

      for i := uintptr(0); i < t.ndeps; i++ {
         p := add(unsafe.Pointer(t), (3+i)*goarch.PtrSize)
         t2 := *(**initTask)(p)
         doInit(t2)
      }

      if t.nfns == 0 {
         t.state = 2 // initialization done
         return
      }

      var (
         start  int64
         before tracestat
      )

      if inittrace.active {
         start = nanotime()
         // Load stats non-atomically since tracinit is updated only by this init goroutine.
         before = inittrace
      }

      firstFunc := add(unsafe.Pointer(t), (3+t.ndeps)*goarch.PtrSize)
      for i := uintptr(0); i < t.nfns; i++ {
         p := add(firstFunc, i*goarch.PtrSize)
         f := *(*func())(unsafe.Pointer(&p))
         f()
      }

      if inittrace.active {
         end := nanotime()
         // Load stats non-atomically since tracinit is updated only by this init goroutine.
         after := inittrace

         f := *(*func())(unsafe.Pointer(&firstFunc))
         pkg := funcpkgpath(findfunc(abi.FuncPCABIInternal(f)))

         var sbuf [24]byte
         print("init ", pkg, " @")
         print(string(fmtNSAsMS(sbuf[:], uint64(start-runtimeInitTime))), " ms, ")
         print(string(fmtNSAsMS(sbuf[:], uint64(end-start))), " ms clock, ")
         print(string(itoa(sbuf[:], after.bytes-before.bytes)), " bytes, ")
         print(string(itoa(sbuf[:], after.allocs-before.allocs)), " allocs")
         print("\n")
      }

      t.state = 2 // initialization done
   }
}

main_main

在runtime.main函数中，我们可以看到它在执行完init chan后就开始执行main_main函数。

（...）
    fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
    fn()
    if raceenabled {
       racefini()
    }
（...）

通过源码，我们也可以看到main_main函数其实就是我们在main包写的main函数，链接到了runtime.main_main。

//go:linkname main_main main.main
func main_main()

exit

在main_main执行完后，就会执行exit函数，退出码为0，这个和linux的exit函数同理，内部也是调用了**libc**提供的exit_trampoline函数，退出进程。

// This is exported via linkname to assembly in runtime/cgo.
//
//go:nosplit
//go:cgo_unsafe_args
//go:linkname exit
func exit(code int32) {
   libcCall(unsafe.Pointer(abi.FuncPCABI0(exit_trampoline)), unsafe.Pointer(&code))
}
func exit_trampoline()

不过为了防止go程序不能顺利退出，最后还有一段经典的空指针来做兜底，直接异常中断退出程序。

exit(0)
for {
   var x *int32
   *x = 0
}

mstart

runtime.mstart会开始调度器循环

// mstart is the entry-point for new Ms.
// It is written in assembly, uses ABI0, is marked TOPFRAME, and calls mstart0.
func mstart()

mstart的实现与系统架构有关，用对应平台的汇编语言实现，这里我们就不深入看细节了，在汇编初始化完后，会继续调用mstart0函数。mstart0函数就是做了一些m0的g0栈初始化。

mstart0


// mstart0 is the Go entry-point for new Ms.
// This must not split the stack because we may not even have stack
// bounds set up yet.
//
// May run during STW (because it doesn't have a P yet), so write
// barriers are not allowed.
//
//go:nosplit
//go:nowritebarrierrec
func mstart0() {
   _g_ := getg()

   osStack := _g_.stack.lo == 0
   if osStack {
      // Initialize stack bounds from system stack.
      // Cgo may have left stack size in stack.hi.
      // minit may update the stack bounds.
      //
      // Note: these bounds may not be very accurate.
      // We set hi to &size, but there are things above
      // it. The 1024 is supposed to compensate this,
      // but is somewhat arbitrary.
      size := _g_.stack.hi
      if size == 0 {
         size = 8192 * sys.StackGuardMultiplier
      }
      _g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
      _g_.stack.lo = _g_.stack.hi - size + 1024
   }
   // Initialize stack guard so that we can start calling regular
   // Go code.
   _g_.stackguard0 = _g_.stack.lo + _StackGuard
   // This is the g0, so we can also call go:systemstack
   // functions, which check stackguard1.
   _g_.stackguard1 = _g_.stackguard0
   mstart1()

   // Exit this thread.
   if mStackIsSystemAllocated() {
      // Windows, Solaris, illumos, Darwin, AIX and Plan 9 always system-allocate
      // the stack, but put it in _g_.stack before mstart,
      // so the logic above hasn't set osStack yet.
      osStack = true
   }
   mexit(osStack)
}

mstart1

在初始化栈完成后，mstart0函数也是调用了mstart1函数（ps：runtime的函数命名挺随心的），mstart1函数在绑定了调度器的一些变量到g0上后，继续调用了真正的调度循环函数：schedule。


// The go:noinline is to guarantee the getcallerpc/getcallersp below are safe,
// so that we can set up g0.sched to return to the call of mstart1 above.
//
//go:noinline
func mstart1() {
   _g_ := getg()

   if _g_ != _g_.m.g0 {
      throw("bad runtime·mstart")
   }

   // Set up m.g0.sched as a label returning to just
   // after the mstart1 call in mstart0 above, for use by goexit0 and mcall.
   // We're never coming back to mstart1 after we call schedule,
   // so other calls can reuse the current frame.
   // And goexit0 does a gogo that needs to return from mstart1
   // and let mstart0 exit the thread.
   _g_.sched.g = guintptr(unsafe.Pointer(_g_))
   _g_.sched.pc = getcallerpc()
   _g_.sched.sp = getcallersp()

   asminit()
   minit()

   // Install signal handlers; after minit so that minit can
   // prepare the thread to be able to handle the signals.
   if _g_.m == &m0 {
      mstartm0()
   }

   if fn := _g_.m.mstartfn; fn != nil {
      fn()
   }

   if _g_.m != &m0 {
      acquirep(_g_.m.nextp.ptr())
      _g_.m.nextp = 0
   }
   schedule()
}

schedule

我们继续看schedule的实现，schedule就是一轮调度，找到一个可以运行的goroutine并执行它。

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
    //获取g0
   _g_ := getg()

   if _g_.m.locks != 0 {
      throw("schedule: holding locks")
   }

   if _g_.m.lockedg != 0 {
      stoplockedm()
      execute(_g_.m.lockedg.ptr(), false) // Never returns.
   }

   // We should not schedule away from a g that is executing a cgo call,
   // since the cgo call is using the m's g0 stack.
   if _g_.m.incgo {
      throw("schedule: in cgo")
   }

top:
   pp := _g_.m.p.ptr()
   pp.preempt = false

   // Safety check: if we are spinning, the run queue should be empty.
   // Check this before calling checkTimers, as that might call
   // goready to put a ready goroutine on the local run queue.
   if _g_.m.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
      throw("schedule: spinning with local work")
   }

    //找到一个可以运行的G
   gp, inheritTime, tryWakeP := findRunnable() // blocks until work is available

   // This thread is going to run a goroutine and is not spinning anymore,
   // so if it was marked as spinning we need to reset it now and potentially
   // start a new spinning M.
   if _g_.m.spinning {
      resetspinning()
   }

   if sched.disable.user && !schedEnabled(gp) {
      // Scheduling of this goroutine is disabled. Put it on
      // the list of pending runnable goroutines for when we
      // re-enable user scheduling and look again.
      lock(&sched.lock)
      if schedEnabled(gp) {
         // Something re-enabled scheduling while we
         // were acquiring the lock.
         unlock(&sched.lock)
      } else {
         sched.disable.runnable.pushBack(gp)
         sched.disable.n++
         unlock(&sched.lock)
         goto top
      }
   }

   // If about to schedule a not-normal goroutine (a GCworker or tracereader),
   // wake a P if there is one.
   if tryWakeP {
      wakep()
   }
   if gp.lockedm != 0 {
      // Hands off own p to the locked m,
      // then blocks waiting for a new p.
      startlockedm(gp)
      goto top
   }

   execute(gp, inheritTime)
}

总结

本文先是从开发者的视角介绍了全局变量、全局常量和init函数的执行顺序，然后从runtime的视角，介绍了linux_amd64架构下的go程序的运行前后的全流程。

参考文档

转载自:https://juejin.cn/post/7202149834362585145