从这篇文章开始我将尝试从源码的角度解析Go语言并发之道。这次的源码解析可比python源码解析难度要大的多。鄙人不才,有问题还请指教。
啰嗦一句,还请阅读我之前的文章,了解协程和Go语言并发模型的基本知识。
go语言并发原理和机制【一】
go语言并发原理和机制【二】
目录
1. Go程序入口——m0、g0
go语言并发模型调度器的源码大多集中在/runtime/文件夹之下。此文件夹之下有很多文件。包括 .s 类型的汇编码和 .go 类型的go语言源码。
首先编译器通过rt0_linux_arm64.s文件开启Go语言调度器。此文件名后半部分对应着不同的系统版本。
这些文件大多完成一些初始化工作。这里我选则研究 linux_arm64 版本。汇编略懂一些,但能力还不足以看工业级别的汇编码;不过它有注释~
(1)稍微看几段代码
下面创建了一个空的g0;它不运行代码程序,而是用于各种goroutine在m、p之间的调度。
这里创建了一个新线程,用于接下来runtime的初始化和返回;它就是m0,毕竟一个线程对应着一个m;不管有没有goroutine被创建,一个go进程总要有一个线程的。
第一行汇编代码就是跳转到创建线程的函数 _cgo_sys_thread_create(SB), R4;其中R4数值作为参数输入函数。
代码最后调用runtime.rt0_go(这就跳转至runtime/asm_linux_amd64.s中),初始化g0、m0;将其相互引用。
上述汇编代码调用了os_linux_arm64.go和proc.go中的许多函数。而go语言调度器源码,就在proc.go中。我们的重点就在于此,多的咱也不说了。
(2)那么上述创建的m0和g0有什么用呢?
总结一下:
g0和m0是在proc.go文件中的两个全局变量,m0就是进程启动后的初始线程,g0也是代表着初始线程的stack。上文提到的汇编中新建的第一个线程就是m0,它在全局变量中, 无需在heap上分配,是一个脱离go本身内存分配机制的存在。而m0中的g0也是全局变量,上面提到的runtime.rt0_go中设置了很多g0的各个成员变量。
PS:其实每个都有自己的g0
每个之后创建的m也都有自己的g0,负责调度而不是执行用户程序里面的函数。
每个M可以运行各个goroutine,在结构体M的定义中有一个相对特殊的goroutine叫g0。g0的特殊之处在于它是带有调度栈的goroutine,下文就将其称为“m的g0栈“。Go在执行调度相关代码时,都是使用的m的g0栈。当一个g执行的是调度相关的代码时,它并不是直接在自己的栈中执行,而是先切换到m的g0栈然后再执行代码。
m的g0栈是一个特殊的栈,g0的分配和普通goroutine的分配过程不同,g0是在m建立时就生成的,并且给它分配的栈空间比较大,可以假定它的大小是足够大而不必使用分段栈。而普通的goroutine是在runtime.newproc时建立(后面会解释),并且初始栈空间分配得很小(4K),会在需要时增长。不仅如此,m的g0栈同时也是这个m对应的物理线程的栈。
参考:https://www.w3cschool.cn/go_internals/go_internals-419t283o.html
在此之前先对整个模型架构有一个了解。图源见图片描述。
2. M,P,G
上篇文章讲到Golang调度器有三个主要数据结构。
- M,操作系统的线程,被操作系统管理的,原生线程。
- G,goroutine,被Golang语言本身管理的线程,该结构体中包含一些指令或者调度的信息。
- P,调度的上下文,运行在M上的调度器。
他们的数据结构定义都在/runtime2中:
/src/runtime/runtime2.gogithub.com
【下面我会展示其定义源码 ,概括一下都有哪些抽象定义;不过重点还是在其状态的定义上。至于分析源码,我将在讲解具体调度规则上做逐句分析。】
(1)G
type g struct {
// Stack parameters.
// stack describes the actual stack memory: [stack.lo, stack.hi).
// stackguard0 is the stack pointer compared in the Go stack growth prologue.
// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
// stackguard1 is the stack pointer compared in the C stack growth prologue.
// It is stack.lo+StackGuard on g0 and gsignal stacks.
// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
stack stack // offset known to runtime/cgo
stackguard0 uintptr // offset known to liblink
stackguard1 uintptr // offset known to liblink
_panic *_panic // innermost panic - offset known to liblink
_defer *_defer // innermost defer
m *m // current m; offset known to arm liblink
sched gobuf
syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
stktopsp uintptr // expected sp at top of stack, to check in traceback
param unsafe.Pointer // passed parameter on wakeup
atomicstatus uint32
stackLock uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
goid int64
schedlink guintptr
waitsince int64 // approx time when the g become blocked
waitreason waitReason // if status==Gwaiting
preempt bool // preemption signal, duplicates stackguard0 = stackpreempt
paniconfault bool // panic (instead of crash) on unexpected fault address
preemptscan bool // preempted g does scan for gc
gcscandone bool // g has scanned stack; protected by _Gscan bit in status
gcscanvalid bool // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
throwsplit bool // must not split stack
raceignore int8 // ignore race detection events
sysblocktraced bool // StartTrace has emitted EvGoInSyscall about this goroutine
sysexitticks int64 // cputicks when syscall has returned (for tracing)
traceseq uint64 // trace event sequencer
tracelastp puintptr // last P emitted an event for this goroutine
lockedm muintptr
sig uint32
writebuf []byte
sigcode0 uintptr
sigcode1 uintptr
sigpc uintptr
gopc uintptr // pc of go statement that created this goroutine
ancestors *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
startpc uintptr // pc of goroutine function
racectx uintptr
waiting *sudog // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
cgoCtxt []uintptr // cgo traceback context
labels unsafe.Pointer // profiler labels
timer *timer // cached timer for time.Sleep
selectDone uint32 // are we participating in a select and did someone win the race?
// Per-G GC state
// gcAssistBytes is this G's GC assist credit in terms of
// bytes allocated. If this is positive, then the G has credit
// to allocate gcAssistBytes bytes without assisting. If this
// is negative, then the G must correct this by performing
// scan work. We track this in bytes to make it fast to update
// and check for debt in the malloc hot path. The assist ratio
// determines how this corresponds to scan work debt.
gcAssistBytes int64
}
G定义了一个比较重要的字段:atomicstatus,表示当前这个G的状态:
主要有_Gidle、_Grunnable、_Grunning、_Gsyscall和_Gwaiting五个状态;
其中_Gidle中被定义为iota,iota在文件“builtin.go”中声明为一个无类型整数序号 0;
// iota is a predeclared identifier representing the untyped integer ordinal
// number of the current const specification in a (usually parenthesized)
// const declaration. It is zero-indexed.
const iota = 0 // Untyped int
其他四个G状态的声明如下源码;我总结为下面这幅图:
const (
// _Gidle means this goroutine was just allocated and has not
// yet been initialized.
_Gidle = iota // 0
// _Grunnable means this goroutine is on a run queue. It is
// not currently executing user code. The stack is not owned.
_Grunnable // 1
// _Grunning means this goroutine may execute user code. The
// stack is owned by this goroutine. It is not on a run queue.
// It is assigned an M and a P.
_Grunning // 2
// _Gsyscall means this goroutine is executing a system call.
// It is not executing user code. The stack is owned by this
// goroutine. It is not on a run queue. It is assigned an M.
_Gsyscall // 3
// _Gwaiting means this goroutine is blocked in the runtime.
// It is not executing user code. It is not on a run queue,
// but should be recorded somewhere (e.g., a channel wait
// queue) so it can be ready()d when necessary. The stack is
// not owned *except* that a channel operation may read or
// write parts of the stack under the appropriate channel
// lock. Otherwise, it is not safe to access the stack after a
// goroutine enters _Gwaiting (e.g., it may get moved).
_Gwaiting // 4
// _Gmoribund_unused is currently unused, but hardcoded in gdb
// scripts.
_Gmoribund_unused // 5
// _Gdead means this goroutine is currently unused. It may be
// just exited, on a free list, or just being initialized. It
// is not executing user code. It may or may not have a stack
// allocated. The G and its stack (if any) are owned by the M
// that is exiting the G or that obtained the G from the free
// list.
_Gdead // 6
// _Genqueue_unused is currently unused.
_Genqueue_unused // 7
// _Gcopystack means this goroutine's stack is being moved. It
// is not executing user code and is not on a run queue. The
// stack is owned by the goroutine that put it in _Gcopystack.
_Gcopystack // 8
// _Gscan combined with one of the above states other than
// _Grunning indicates that GC is scanning the stack. The
// goroutine is not executing user code and the stack is owned
// by the goroutine that set the _Gscan bit.
//
// _Gscanrunning is different: it is used to briefly block
// state transitions while GC signals the G to scan its own
// stack. This is otherwise like _Grunning.
//
// atomicstatus&~Gscan gives the state the goroutine will
// return to when the scan completes.
_Gscan = 0x1000
_Gscanrunnable = _Gscan + _Grunnable // 0x1001
_Gscanrunning = _Gscan + _Grunning // 0x1002
_Gscansyscall = _Gscan + _Gsyscall // 0x1003
_Gscanwaiting = _Gscan + _Gwaiting // 0x1004
)
_Gscan与除_Grunning之外的上述状态之一结合,以表示GC正在扫描堆栈。因为状态的转换总是要涉及到堆栈的获取和释放,获取堆栈之前设置_Gscan位;_GscanXXX表示正在扫描,就相当于是互斥锁。
goroutine没有正在执行用户代码的话,堆栈就由设置_Gscan位的goroutine所拥有。上i面说了_Gscanrunning是不同的,因为:当GC给G发送信号以扫描它自己的堆栈时,它被用来暂时地阻止状态转换。其他方面就和_Grunning不同。
atomicstatus&~Gscan(就是atomicstatus和_Gscan的非(0X0111)进行与计算)给出了在扫描完成时goroutine将返回的状态。
除了一般表示G状态的作用,更像是一把控制线程堆栈的锁;因此,也就有了选择执行用户代码的能力。
(2)P
type p struct {
id int32
status uint32 // one of pidle/prunning/... 真线程的状态
link puintptr
schedtick uint32 // incremented on every scheduler call
syscalltick uint32 // incremented on every system call
sysmontick sysmontick // last tick observed by sysmon
m muintptr // back-link to associated m (nil if idle)
mcache *mcache
raceprocctx uintptr
deferpool [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
deferpoolbuf [5][32]*_defer
// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
goidcache uint64
goidcacheend uint64
// Queue of runnable goroutines. Accessed without lock.
runqhead uint32
runqtail uint32
runq [256]guintptr
// runnext, if non-nil, is a runnable G that was ready'd by
// the current G and should be run next instead of what's in
// runq if there's time remaining in the running G's time
// slice. It will inherit the time left in the current time
// slice. If a set of goroutines is locked in a
// communicate-and-wait pattern, this schedules that set as a
// unit and eliminates the (potentially large) scheduling
// latency that otherwise arises from adding the ready'd
// goroutines to the end of the run queue.
runnext guintptr
// Available G's (status == Gdead)
gFree struct {
gList
n int32
}
sudogcache []*sudog
sudogbuf [128]*sudog
tracebuf traceBufPtr
// traceSweep indicates the sweep events should be traced.
// This is used to defer the sweep start event until a span
// has actually been swept.
traceSweep bool
// traceSwept and traceReclaimed track the number of bytes
// swept and reclaimed by sweeping in the current sweep loop.
traceSwept, traceReclaimed uintptr
palloc persistentAlloc // per-P to avoid mutex
_ uint32 // Alignment for atomic fields below
// Per-P GC state
gcAssistTime int64 // Nanoseconds in assistAlloc
gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker (atomic)
gcBgMarkWorker guintptr // (atomic)
gcMarkWorkerMode gcMarkWorkerMode
// gcMarkWorkerStartTime is the nanotime() at which this mark
// worker started.
gcMarkWorkerStartTime int64
// gcw is this P's GC work buffer cache. The work buffer is
// filled by write barriers, drained by mutator assists, and
// disposed on certain GC state transitions.
gcw gcWork
// wbBuf is this P's GC write barrier buffer.
//
// TODO: Consider caching this in the running G.
wbBuf wbBuf
runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point
pad cpu.CacheLinePad
}
可以看到P中定义了一些变量,用来表示线程上下文的“个人”信息(就像id,status,schedtick,syscalltick);还有与P相关联的m(muintptr,指向m的指针)、与P关联的G(run queue,运行队列);还有一些有关堆栈、有关实体线程信息的指针和缓存字段。
PS:链接在p上的run queue叫local list,此外还有global list,参考第2节开头的图。
其中status在调度中会经常变换,我们可以看看。包括Pidle、Prunning、Psyscall、Pgcstop、Pdead(下划线就不写了)。
Pilde表示:
- 闲置的P;他没有执行用户代码,或者没有被调度;但是他在闲置P链表中,并且可以被调度;没有运行队列;
- 它被闲置P链表拥有,或是其他正在转换其状态的东西。
Prunning表示:
- 运行状态;正在执行用户代码,或者被调度;
- 它被与之关联的M所拥有;
- 只有此M可以转换其状态:没有G可工作——Pidle;系统调用——Psyscall;垃圾回收——Pgcstop;
- M可以把P的拥有权直接转让给另一个M;
Psyscall表示:
- 系统调用状态;P没用执行用户代码,因为G中代码执行系统调用去了;相当于M直接与G关联;
- P可能还会和M有着一定的关系,但不被M拥有;处于一直游离状态,此时的P可能被其他的M偷走;
- 与Pidle相似又有点不同;此时的P处于一种轻量级的过渡状态并且与M还有一些联系;
- 当G中代码离开系统调用必须通过CAS操作重新获得P,或者从别处获取一个P;
- PS:CAS操作是在修改共享变量的时候,用“检查”、“复制”的方式,代替“锁”;以此减少“获得-释放‘锁’”的开销。可以看看下面的链接和一篇80年代的论文(这个论文挺好的,我在操作系统课上还讲了):
Go并发编程之美-CAS操作 - 云+社区 - 腾讯云
论文 1981-tods-kung-robinson.pdf
- 同时注意”A->B->A“陷阱:状态A变化为状态B再回到状态A;
Pgcstop表示:
- 停止状态;此时的P规M所有,这个M是用来”停止整个程序运行“的;相当于是守护进程,有相当于是垃圾回收;
- 同时P保留它的运行队列,程序重启时也会在空运行队列P上重启调度器;
Pdead表示:没了;
const (
// P status
// _Pidle means a P is not being used to run user code or the
// scheduler. Typically, it's on the idle P list and available
// to the scheduler, but it may just be transitioning between
// other states.
//
// The P is owned by the idle list or by whatever is
// transitioning its state. Its run queue is empty.
_Pidle = iota
// _Prunning means a P is owned by an M and is being used to
// run user code or the scheduler. Only the M that owns this P
// is allowed to change the P's status from _Prunning. The M
// may transition the P to _Pidle (if it has no more work to
// do), _Psyscall (when entering a syscall), or _Pgcstop (to
// halt for the GC). The M may also hand ownership of the P
// off directly to another M (e.g., to schedule a locked G).
_Prunning
// _Psyscall means a P is not running user code. It has
// affinity to an M in a syscall but is not owned by it and
// may be stolen by another M. This is similar to _Pidle but
// uses lightweight transitions and maintains M affinity.
//
// Leaving _Psyscall must be done with a CAS, either to steal
// or retake the P. Note that there's an ABA hazard: even if
// an M successfully CASes its original P back to _Prunning
// after a syscall, it must understand the P may have been
// used by another M in the interim.
_Psyscall
// _Pgcstop means a P is halted for STW and owned by the M
// that stopped the world. The M that stopped the world
// continues to use its P, even in _Pgcstop. Transitioning
// from _Prunning to _Pgcstop causes an M to release its P and
// park.
//
// The P retains its run queue and startTheWorld will restart
// the scheduler on Ps with non-empty run queues.
_Pgcstop
// _Pdead means a P is no longer used (GOMAXPROCS shrank). We
// reuse Ps if GOMAXPROCS increases. A dead P is mostly
// stripped of its resources, though a few things remain
// (e.g., trace buffers).
_Pdead
)
(3)M
type m struct {
g0 *g // goroutine with scheduling stack
morebuf gobuf // gobuf arg to morestack
divmod uint32 // div/mod denominator for arm - known to liblink
// Fields not known to debuggers.
procid uint64 // for debuggers, but offset not hard-coded
gsignal *g // signal-handling g
goSigStack gsignalStack // Go-allocated signal handling stack
sigmask sigset // storage for saved signal mask
tls [6]uintptr // thread-local storage (for x86 extern register)
mstartfn func()
curg *g // current running goroutine
caughtsig guintptr // goroutine running during fatal signal
p puintptr // attached p for executing go code (nil if not executing go code)
nextp puintptr
oldp puintptr // the p that was attached before executing a syscall
id int64
mallocing int32
throwing int32
preemptoff string // if != "", keep curg running on this m
locks int32
dying int32
profilehz int32
spinning bool // m is out of work and is actively looking for work
blocked bool // m is blocked on a note
newSigstack bool // minit on C thread called sigaltstack
printlock int8
incgo bool // m is executing a cgo call
freeWait uint32 // if == 0, safe to free g0 and delete m (atomic)
fastrand [2]uint32
needextram bool
traceback uint8
ncgocall uint64 // number of cgo calls in total
ncgo int32 // number of cgo calls currently in progress
cgoCallersUse uint32 // if non-zero, cgoCallers in use temporarily
cgoCallers *cgoCallers // cgo traceback if crashing in cgo call
park note
alllink *m // on allm
schedlink muintptr
mcache *mcache
lockedg guintptr
createstack [32]uintptr // stack that created this thread.
lockedExt uint32 // tracking for external LockOSThread
lockedInt uint32 // tracking for internal lockOSThread
nextwaitm muintptr // next m waiting for lock
waitunlockf func(*g, unsafe.Pointer) bool
waitlock unsafe.Pointer
waittraceev byte
waittraceskip int
startingtrace bool
syscalltick uint32
thread uintptr // thread handle
freelink *m // on sched.freem
// these are here because they are too large to be on the stack
// of low-level NOSPLIT functions.
libcall libcall
libcallpc uintptr // for cpu profiler
libcallsp uintptr
libcallg guintptr
syscall libcall // stores syscall parameters on windows
vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call)
vdsoPC uintptr // PC for traceback while in VDSO call
dlogPerM
mOS
}
上述是M的结构。它对应着实体线程。可以看到它有一些对于线程的抽象,比如:procid-线程id、mallocing-分配内存,还有很多啦,等后面遇到了在做分析好了。
这里可以注意一下:spinning;
spinning:m处于一种像纺轮的状态,处于轮转的状态;此时的m没有可以工作的G,正在积极的寻找;后面我们会看到这样的场景。
3. 调度——框架
引用一篇文章(链接看图描述)的图片。他详细描述了Go并发调度的细节!并且标注了相应的函数,和它的工作原理。
ps:不知道看不看的清。看不清也没事,分块讲解的时候我会截小图。
调度的机制用一句话描述:
runtime准备好G,P,M,然后M绑定P,M从各种队列中获取G,切换到G的执行栈上并执行G上的任务函数,调用goexit做清理工作并回到M,如此反复。
按照顺序,调度器启动:
- 创建m0、g0,关联它们;【main,main.main】
- 调度器初始化;【schedinit】
- 管理P列表;【procresize】
- 创建和管理G;【newproc,runqput】
- 运行和退出G;【execu、goexit0】
- 获取G(调度);【schedule、findrunnable】
4. 接下来讲什么?
好不容易这么长的寒假,确实不该浪费掉了。
接下来我准备根据(3)中的框架,讲解集中在/proc.go/中的源码。我们可以看到它具体执行了哪些操作,保存了什么变量;MPG的状态变换时,具体发什么了什么事情。
么么哒~
END
有疑问加站长微信联系(非本文作者)