$ go run -gcflags '-m -l' escape.go ./escape.go:6: moved to heap: x ./escape.go:7: &x escape to heap ./escape.go:11: bar new(int) does not escape
上面的意思是 foo() 中的 x 最后在堆上分配,而 bar() 中的 x 最后分配在了栈上。在官网 (golang.org) FAQ 上有一个关于变量分配的问题如下:
How do I know whether a variable is allocated on the heap or the stack?
From a correctness standpoint, you don’t need to know. Each variable in Go exists as long as there are references to it. The storage location chosen by the implementation is irrelevant to the semantics of the language.
The storage location does have an effect on writing efficient programs. When possible, the Go compilers will allocate variables that are local to a function in that function’s stack frame. However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors. Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.
In the current compilers, if a variable has its address taken, that variable is a candidate for allocation on the heap. However, a basic escape analysis recognizes some cases when such variables will not live past the return from the function and can reside on the stack.
// Per-thread (in Go, per-P) cache for small objects. // No locking needed because it is per-thread (per-P). type mcache struct { // The following members are accessed on every malloc, // so they are grouped here for better caching. next_sample int32// trigger heap sample after allocating this many bytes local_scan uintptr// bytes of scannable heap allocated
// 小对象分配器,小于 16 byte 的小对象都会通过 tiny 来分配。 tiny uintptr tinyoffset uintptr local_tinyallocs uintptr// number of tiny allocs not counted in other stats
// The rest is not accessed on every malloc. alloc [_NumSizeClasses]*mspan // spans to allocate from
stackcache [_NumStackOrders]stackfreelist
// Local allocator stats, flushed during GC. local_nlookup uintptr// number of pointer lookups local_largefree uintptr// bytes freed for large objects (>maxsmallsize) local_nlargefree uintptr// number of frees for large objects (>maxsmallsize) local_nsmallfree [_NumSizeClasses]uintptr// number of frees for small objects (<=maxsmallsize) }
type mspan struct { next *mspan // next span in list, or nil if none prev *mspan // previous span in list, or nil if none list *mSpanList // For debugging. TODO: Remove.
startAddr uintptr// address of first byte of span aka s.base() npages uintptr// number of pages in span stackfreelist gclinkptr // list of free stacks, avoids overloading freelist // freeindex is the slot index between 0 and nelems at which to begin scanning // for the next free object in this span. freeindex uintptr // TODO: Look up nelems from sizeclass and remove this field if it // helps performance. nelems uintptr// number of object in the span. ... // 用位图来管理可用的 free object,1 表示可用 allocCache uint64 ... sizeclass uint8// size class ... elemsize uintptr// computed from sizeclass or from npages ... }
type mcentral struct { lock mutex sizeclass int32 nonempty mSpanList // list of spans with a free object, ie a nonempty free list empty mSpanList // list of spans with no free objects (or cached in an mcache) }
type mSpanList struct { first *mspan last *mspan }
type mheap struct { lock mutex free [_MaxMHeapList]mSpanList // free lists of given length freelarge mSpanList // free lists length >= _MaxMHeapList busy [_MaxMHeapList]mSpanList // busy lists of large objects of given length busylarge mSpanList // busy lists of large objects length >= _MaxMHeapList sweepgen uint32// sweep generation, see comment in mspan sweepdone uint32// all spans are swept
// allspans is a slice of all mspans ever created. Each mspan // appears exactly once. // // The memory for allspans is manually managed and can be // reallocated and move as the heap grows. // // In general, allspans is protected by mheap_.lock, which // prevents concurrent access as well as freeing the backing // store. Accesses during STW might not hold the lock, but // must ensure that allocation cannot happen around the // access (since that may free the backing store). allspans []*mspan // all spans out there
// spans is a lookup table to map virtual address page IDs to *mspan. // For allocated spans, their pages map to the span itself. // For free spans, only the lowest and highest pages map to the span itself. // Internal pages map to an arbitrary span. // For pages that have never been allocated, spans entries are nil. // // This is backed by a reserved region of the address space so // it can grow without moving. The memory up to len(spans) is // mapped. cap(spans) indicates the total reserved memory. spans []*mspan
// sweepSpans contains two mspan stacks: one of swept in-use // spans, and one of unswept in-use spans. These two trade // roles on each GC cycle. Since the sweepgen increases by 2 // on each cycle, this means the swept spans are in // sweepSpans[sweepgen/2%2] and the unswept spans are in // sweepSpans[1-sweepgen/2%2]. Sweeping pops spans from the // unswept stack and pushes spans that are still in-use on the // swept stack. Likewise, allocating an in-use span pushes it // on the swept stack. sweepSpans [2]gcSweepBuf
_ uint32// align uint64 fields on 32-bit for atomics
// Proportional sweep pagesInUse uint64// pages of spans in stats _MSpanInUse; R/W with mheap.lock spanBytesAlloc uint64// bytes of spans allocated this cycle; updated atomically pagesSwept uint64// pages swept this cycle; updated atomically sweepPagesPerByte float64// proportional sweep ratio; written with lock, read without // TODO(austin): pagesInUse should be a uintptr, but the 386 // compiler can't 8-byte align fields.
// Malloc stats. largefree uint64// bytes freed for large objects (>maxsmallsize) nlargefree uint64// number of frees for large objects (>maxsmallsize) nsmallfree [_NumSizeClasses]uint64// number of frees for small objects (<=maxsmallsize)
// range of addresses we might see in the heap bitmap uintptr// Points to one byte past the end of the bitmap bitmap_mapped uintptr arena_start uintptr arena_used uintptr// always mHeap_Map{Bits,Spans} before updating arena_end uintptr arena_reserved bool
// central free lists for small size classes. // the padding makes sure that the MCentrals are // spaced CacheLineSize bytes apart, so that each MCentral.lock // gets its own cache line. central [_NumSizeClasses]struct { mcentral mcentral pad [sys.CacheLineSize]byte }
spanalloc fixalloc // allocator for span* cachealloc fixalloc // allocator for mcache* specialfinalizeralloc fixalloc // allocator for specialfinalizer* specialprofilealloc fixalloc // allocator for specialprofile* speciallock mutex // lock for special record allocators. }
//bsd func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { if sys.PtrSize == 8 && uint64(n) > 1<<32 || sys.GoosNacl != 0 { *reserved = false return v }
p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) ifuintptr(p) < 4096 { returnnil } *reserved = true return p }
//darwin func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { *reserved = true p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) ifuintptr(p) < 4096 { returnnil } return p }
//linux func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { ... p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) ifuintptr(p) < 4096 { returnnil } *reserved = true return p } //windows func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { *reserved = true // v is just a hint. // First try at v. v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE)) if v != nil { return v }
// Next let the kernel choose the address. return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE)) }
spansStart := p1 mheap_.bitmap = p1 + spansSize + bitmapSize if sys.PtrSize == 4 { // Set arena_start such that we can accept memory // reservations located anywhere in the 4GB virtual space. mheap_.arena_start = 0 } else { mheap_.arena_start = p1 + (spansSize + bitmapSize) } mheap_.arena_end = p + pSize mheap_.arena_used = p1 + (spansSize + bitmapSize) mheap_.arena_reserved = reserved
if mheap_.arena_start&(_PageSize-1) != 0 { println("bad pagesize", hex(p), hex(p1), hex(spansSize), hex(bitmapSize), hex(_PageSize), "start", hex(mheap_.arena_start)) throw("misrounded allocation in mallocinit") }
// Initialize the rest of the allocator. mheap_.init(spansStart, spansSize) //获取当前 G _g_ := getg() // 获取 G 上绑定的 M 的 mcache _g_.m.mcache = allocmcache() }
p 是从连续虚拟地址的起始地址,先进行对齐,然后初始化 arena,bitmap,spans 地址。mheap_.init()会初始化 fixalloc 等相关的成员,还有 mcentral 的初始化。
// nextFreeFast returns the next free object if one is quickly available. // Otherwise it returns 0. func nextFreeFast(s *mspan) gclinkptr { theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache? if theBit < 64 { result := s.freeindex + uintptr(theBit) if result < s.nelems { freeidx := result + 1 if freeidx%64 == 0 && freeidx != s.nelems { return0 } s.allocCache >>= (theBit + 1) s.freeindex = freeidx v := gclinkptr(result*s.elemsize + s.base()) s.allocCount++ return v } } return0 }
// nextFree returns the next free object from the cached span if one is available. // Otherwise it refills the cache with a span with an available object and // returns that object along with a flag indicating that this was a heavy // weight allocation. If it is a heavy weight allocation the caller must // determine whether a new GC cycle needs to be started or if the GC is active // whether this goroutine needs to assist the GC. func (c *mcache) nextFree(sizeclass uint8) (v gclinkptr, s *mspan, shouldhelpgc bool) { s = c.alloc[sizeclass] shouldhelpgc = false freeIndex := s.nextFreeIndex() if freeIndex == s.nelems { // The span is full. ifuintptr(s.allocCount) != s.nelems { println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems) throw("s.allocCount != s.nelems && freeIndex == s.nelems") } systemstack(func() { // 这个地方 mcache 向 mcentral 申请 c.refill(int32(sizeclass)) }) shouldhelpgc = true s = c.alloc[sizeclass] // mcache 向 mcentral 申请完之后,再次从 mcache 申请 freeIndex = s.nextFreeIndex() }
... }
// nextFreeIndex returns the index of the next free object in s at // or after s.freeindex. // There are hardware instructions that can be used to make this // faster if profiling warrants it. // 这个函数和 nextFreeFast 有点冗余了 func (s *mspan) nextFreeIndex() uintptr { ... }
func (h *mheap) alloc(npage uintptr, sizeclass int32, large bool, needzero bool) *mspan { ... var s *mspan systemstack(func() { s = h.alloc_m(npage, sizeclass, large) }) ... }
func (h *mheap) alloc_m(npage uintptr, sizeclass int32, large bool) *mspan { ... s := h.allocSpanLocked(npage) ... }
func (h *mheap) allocSpanLocked(npage uintptr) *mspan { ... s = h.allocLarge(npage) if s == nil { if !h.grow(npage) { returnnil } s = h.allocLarge(npage) if s == nil { returnnil } } ... }
func (h *mheap) grow(npage uintptr) bool { // Ask for a big chunk, to reduce the number of mappings // the operating system needs to track; also amortizes // the overhead of an operating system mapping. // Allocate a multiple of 64kB. npage = round(npage, (64<<10)/_PageSize) ask := npage << _PageShift if ask < _HeapAllocChunk { ask = _HeapAllocChunk }
$ go run -gcflags '-m -l' escape.go ./escape.go:6: moved to heap: x ./escape.go:7: &x escape to heap ./escape.go:11: bar new(int) does not escape
上面的意思是 foo() 中的 x 最后在堆上分配,而 bar() 中的 x 最后分配在了栈上。在官网 (golang.org) FAQ 上有一个关于变量分配的问题如下:
How do I know whether a variable is allocated on the heap or the stack?
From a correctness standpoint, you don’t need to know. Each variable in Go exists as long as there are references to it. The storage location chosen by the implementation is irrelevant to the semantics of the language.
The storage location does have an effect on writing efficient programs. When possible, the Go compilers will allocate variables that are local to a function in that function’s stack frame. However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors. Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.
In the current compilers, if a variable has its address taken, that variable is a candidate for allocation on the heap. However, a basic escape analysis recognizes some cases when such variables will not live past the return from the function and can reside on the stack.
// Per-thread (in Go, per-P) cache for small objects. // No locking needed because it is per-thread (per-P). type mcache struct { // The following members are accessed on every malloc, // so they are grouped here for better caching. next_sample int32// trigger heap sample after allocating this many bytes local_scan uintptr// bytes of scannable heap allocated
// 小对象分配器,小于 16 byte 的小对象都会通过 tiny 来分配。 tiny uintptr tinyoffset uintptr local_tinyallocs uintptr// number of tiny allocs not counted in other stats
// The rest is not accessed on every malloc. alloc [_NumSizeClasses]*mspan // spans to allocate from
stackcache [_NumStackOrders]stackfreelist
// Local allocator stats, flushed during GC. local_nlookup uintptr// number of pointer lookups local_largefree uintptr// bytes freed for large objects (>maxsmallsize) local_nlargefree uintptr// number of frees for large objects (>maxsmallsize) local_nsmallfree [_NumSizeClasses]uintptr// number of frees for small objects (<=maxsmallsize) }
type mspan struct { next *mspan // next span in list, or nil if none prev *mspan // previous span in list, or nil if none list *mSpanList // For debugging. TODO: Remove.
startAddr uintptr// address of first byte of span aka s.base() npages uintptr// number of pages in span stackfreelist gclinkptr // list of free stacks, avoids overloading freelist // freeindex is the slot index between 0 and nelems at which to begin scanning // for the next free object in this span. freeindex uintptr // TODO: Look up nelems from sizeclass and remove this field if it // helps performance. nelems uintptr// number of object in the span. ... // 用位图来管理可用的 free object,1 表示可用 allocCache uint64 ... sizeclass uint8// size class ... elemsize uintptr// computed from sizeclass or from npages ... }
type mcentral struct { lock mutex sizeclass int32 nonempty mSpanList // list of spans with a free object, ie a nonempty free list empty mSpanList // list of spans with no free objects (or cached in an mcache) }
type mSpanList struct { first *mspan last *mspan }
type mheap struct { lock mutex free [_MaxMHeapList]mSpanList // free lists of given length freelarge mSpanList // free lists length >= _MaxMHeapList busy [_MaxMHeapList]mSpanList // busy lists of large objects of given length busylarge mSpanList // busy lists of large objects length >= _MaxMHeapList sweepgen uint32// sweep generation, see comment in mspan sweepdone uint32// all spans are swept
// allspans is a slice of all mspans ever created. Each mspan // appears exactly once. // // The memory for allspans is manually managed and can be // reallocated and move as the heap grows. // // In general, allspans is protected by mheap_.lock, which // prevents concurrent access as well as freeing the backing // store. Accesses during STW might not hold the lock, but // must ensure that allocation cannot happen around the // access (since that may free the backing store). allspans []*mspan // all spans out there
// spans is a lookup table to map virtual address page IDs to *mspan. // For allocated spans, their pages map to the span itself. // For free spans, only the lowest and highest pages map to the span itself. // Internal pages map to an arbitrary span. // For pages that have never been allocated, spans entries are nil. // // This is backed by a reserved region of the address space so // it can grow without moving. The memory up to len(spans) is // mapped. cap(spans) indicates the total reserved memory. spans []*mspan
// sweepSpans contains two mspan stacks: one of swept in-use // spans, and one of unswept in-use spans. These two trade // roles on each GC cycle. Since the sweepgen increases by 2 // on each cycle, this means the swept spans are in // sweepSpans[sweepgen/2%2] and the unswept spans are in // sweepSpans[1-sweepgen/2%2]. Sweeping pops spans from the // unswept stack and pushes spans that are still in-use on the // swept stack. Likewise, allocating an in-use span pushes it // on the swept stack. sweepSpans [2]gcSweepBuf
_ uint32// align uint64 fields on 32-bit for atomics
// Proportional sweep pagesInUse uint64// pages of spans in stats _MSpanInUse; R/W with mheap.lock spanBytesAlloc uint64// bytes of spans allocated this cycle; updated atomically pagesSwept uint64// pages swept this cycle; updated atomically sweepPagesPerByte float64// proportional sweep ratio; written with lock, read without // TODO(austin): pagesInUse should be a uintptr, but the 386 // compiler can't 8-byte align fields.
// Malloc stats. largefree uint64// bytes freed for large objects (>maxsmallsize) nlargefree uint64// number of frees for large objects (>maxsmallsize) nsmallfree [_NumSizeClasses]uint64// number of frees for small objects (<=maxsmallsize)
// range of addresses we might see in the heap bitmap uintptr// Points to one byte past the end of the bitmap bitmap_mapped uintptr arena_start uintptr arena_used uintptr// always mHeap_Map{Bits,Spans} before updating arena_end uintptr arena_reserved bool
// central free lists for small size classes. // the padding makes sure that the MCentrals are // spaced CacheLineSize bytes apart, so that each MCentral.lock // gets its own cache line. central [_NumSizeClasses]struct { mcentral mcentral pad [sys.CacheLineSize]byte }
spanalloc fixalloc // allocator for span* cachealloc fixalloc // allocator for mcache* specialfinalizeralloc fixalloc // allocator for specialfinalizer* specialprofilealloc fixalloc // allocator for specialprofile* speciallock mutex // lock for special record allocators. }
//bsd func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { if sys.PtrSize == 8 && uint64(n) > 1<<32 || sys.GoosNacl != 0 { *reserved = false return v }
p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) ifuintptr(p) < 4096 { returnnil } *reserved = true return p }
//darwin func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { *reserved = true p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) ifuintptr(p) < 4096 { returnnil } return p }
//linux func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { ... p := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) ifuintptr(p) < 4096 { returnnil } *reserved = true return p } //windows func sysReserve(v unsafe.Pointer, n uintptr, reserved *bool) unsafe.Pointer { *reserved = true // v is just a hint. // First try at v. v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE)) if v != nil { return v }
// Next let the kernel choose the address. return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE)) }
spansStart := p1 mheap_.bitmap = p1 + spansSize + bitmapSize if sys.PtrSize == 4 { // Set arena_start such that we can accept memory // reservations located anywhere in the 4GB virtual space. mheap_.arena_start = 0 } else { mheap_.arena_start = p1 + (spansSize + bitmapSize) } mheap_.arena_end = p + pSize mheap_.arena_used = p1 + (spansSize + bitmapSize) mheap_.arena_reserved = reserved
if mheap_.arena_start&(_PageSize-1) != 0 { println("bad pagesize", hex(p), hex(p1), hex(spansSize), hex(bitmapSize), hex(_PageSize), "start", hex(mheap_.arena_start)) throw("misrounded allocation in mallocinit") }
// Initialize the rest of the allocator. mheap_.init(spansStart, spansSize) //获取当前 G _g_ := getg() // 获取 G 上绑定的 M 的 mcache _g_.m.mcache = allocmcache() }
p 是从连续虚拟地址的起始地址,先进行对齐,然后初始化 arena,bitmap,spans 地址。mheap_.init()会初始化 fixalloc 等相关的成员,还有 mcentral 的初始化。
// nextFreeFast returns the next free object if one is quickly available. // Otherwise it returns 0. func nextFreeFast(s *mspan) gclinkptr { theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache? if theBit < 64 { result := s.freeindex + uintptr(theBit) if result < s.nelems { freeidx := result + 1 if freeidx%64 == 0 && freeidx != s.nelems { return0 } s.allocCache >>= (theBit + 1) s.freeindex = freeidx v := gclinkptr(result*s.elemsize + s.base()) s.allocCount++ return v } } return0 }
// nextFree returns the next free object from the cached span if one is available. // Otherwise it refills the cache with a span with an available object and // returns that object along with a flag indicating that this was a heavy // weight allocation. If it is a heavy weight allocation the caller must // determine whether a new GC cycle needs to be started or if the GC is active // whether this goroutine needs to assist the GC. func (c *mcache) nextFree(sizeclass uint8) (v gclinkptr, s *mspan, shouldhelpgc bool) { s = c.alloc[sizeclass] shouldhelpgc = false freeIndex := s.nextFreeIndex() if freeIndex == s.nelems { // The span is full. ifuintptr(s.allocCount) != s.nelems { println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems) throw("s.allocCount != s.nelems && freeIndex == s.nelems") } systemstack(func() { // 这个地方 mcache 向 mcentral 申请 c.refill(int32(sizeclass)) }) shouldhelpgc = true s = c.alloc[sizeclass] // mcache 向 mcentral 申请完之后,再次从 mcache 申请 freeIndex = s.nextFreeIndex() }
... }
// nextFreeIndex returns the index of the next free object in s at // or after s.freeindex. // There are hardware instructions that can be used to make this // faster if profiling warrants it. // 这个函数和 nextFreeFast 有点冗余了 func (s *mspan) nextFreeIndex() uintptr { ... }
func (h *mheap) alloc(npage uintptr, sizeclass int32, large bool, needzero bool) *mspan { ... var s *mspan systemstack(func() { s = h.alloc_m(npage, sizeclass, large) }) ... }
func (h *mheap) alloc_m(npage uintptr, sizeclass int32, large bool) *mspan { ... s := h.allocSpanLocked(npage) ... }
func (h *mheap) allocSpanLocked(npage uintptr) *mspan { ... s = h.allocLarge(npage) if s == nil { if !h.grow(npage) { returnnil } s = h.allocLarge(npage) if s == nil { returnnil } } ... }
func (h *mheap) grow(npage uintptr) bool { // Ask for a big chunk, to reduce the number of mappings // the operating system needs to track; also amortizes // the overhead of an operating system mapping. // Allocate a multiple of 64kB. npage = round(npage, (64<<10)/_PageSize) ask := npage << _PageShift if ask < _HeapAllocChunk { ask = _HeapAllocChunk }