Go 内存模型(The Go Memory Model)

炼狱的吹笛人 · · 2056 次点击 · · 开始浏览    
这是一个创建于 的文章,其中的信息可能已经有所发展或是发生改变。

原文链接:https://golang.org/ref/mem,2014.05.31日的版本。自翻,仅供参考。

Introduction

The Go memory model specifies the conditions under which reads of a variable in one goroutine can be guaranteed to observe values produced by writes to the same variable in a different goroutine.

介绍

Go 内存模型介绍了在一个对某个变量的读协程中,能够确保观察到来自另一个协程的,对同一个变量的写操作的条件。

Advice

Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.
To serialize access, protect the data with channel operations or other synchronization primitives such as those in the sync and sync/atomic packages.
If you must read the rest of this document to understand the behavior of your program, you are being too clever.
Don't be clever.

建议

如果程序想要修改被多个协程同时访问的数据,必须将那些访问串行化。
为了实现串行化,可以使用管道操作或其他同步原语(如 sync 和 sync/atomic 包)来保护数据。
如果你必须通过阅读本文档的剩余内容来理解你的程序,那你可太聪明了。
别这么聪明。(什么鬼)

Happens Before

Within a single goroutine, reads and writes must behave as if they executed in the order specified by the program. That is, compilers and processors may reorder the reads and writes executed within a single goroutine only when the reordering does not change the behavior within that goroutine as defined by the language specification. Because of this reordering, the execution order observed by one goroutine may differ from the order perceived by another. For example, if one goroutine executes a = 1; b = 2;, another might observe the updated value of b before the updated value of a.
To specify the requirements of reads and writes, we define happens before, a partial order on the execution of memory operations in a Go program. If event e1 happens before event e2, then we say that e2 happens after e1. Also, if e1 does not happen before e2 and does not happen after e2, then we say that e1 and e2 happen concurrently.

先行发生

在一个单独的 Go 协程中,读和写都会按照程序代码中所指定的顺序来执行。编译器和执行器只可能在某个条件下对单个 Go 协程的读写操作进行重排序,那就是重排序的结果不会改变由编程语言所规定的行为。因为重排序的发生,一个协程的执行顺序在另一个协程的观察下可能有所不同。比如,如果一个协程执行 a=1;b=2,另一个协程可能会观察到 b 的更新发生在 a 之前。
我们定义了“先行发生(Happens Before)”来说明读写操作的要求,先行发生是在 Go 程序的内存操作中局部的执行顺序。如果事件 e1 先行发生于事件 e2,我们就说 e2 “后来发生(Happens After)”于 e1。同样,如果 e1 既不先行发生也不后来发生于 e2,那 e1、e2 就是同时发生的。

Within a single goroutine, the happens-before order is the order expressed by the program.
A read r of a variable v is allowed to observe a write w to v if both of the following hold:
 1.r does not happen before w.
 2.There is no other write w' to v that happens after w but before r.
To guarantee that a read r of a variable v observes a particular write w to v, ensure that w is the only write r is allowed to observe. That is, r is guaranteed to observe w if both of the following hold:
 1.w happens before r.
 2.Any other write to the shared variable v either happens before w or after r.
This pair of conditions is stronger than the first pair; it requires that there are no other writes happening concurrently with w or r.

在 Go 协程中,先行发生是由程序所表示的顺序。
如果对于一个变量 v 的读操作 r 和写操作 w 满足以下条件,r 就能观察到 w:

  1. r 并不先发于 w
  2. 没有其他写操作 w‘ 后发于 w 且先发于 r

为了保证变量 v 的一个读操作 r 能够观察到一个特定的写操作 w,需要确保 w 是 r 被允许观察的唯一的写操作。那么,如果 r、w 都满足以下条件,r 就能确保观察到 w:

  1. w 先行发生于 r
  2. 任何其他对于共享变量 v 的写操作要么先行发生于 w,要么后来发生于 r

这一组条件要比前一组更健壮,因为它要求没有其他任何的写操作与 w 和 r 同时发生。

Within a single goroutine, there is no concurrency, so the two definitions are equivalent: a read r observes the value written by the most recent write w to v. When multiple goroutines access a shared variable v, they must use synchronization events to establish happens-before conditions that ensure reads observe the desired writes.
The initialization of variable v with the zero value for v's type behaves as a write in the memory model.
Reads and writes of values larger than a single machine word behave as multiple machine-word-sized operations in an unspecified order.

单独一个 Go 协程是没有并发发生的,因此以上两组定义是等价的,即:一个读操作 r 观察由最近的写操作 w 写入的变量 v 的值。当多个 Go 协程访问同一个共享变量 v 时,它们必须通过同步事件来构建“先行发生”条件,确保读操作能观察到预期的写操作。
变量v根据其类型初始化为零值,这个动作在内存模型中表现为一个写操作。
对大于单个机器字长的变量的读写,将表现为多个机器字长大小的不具备特定顺序的读写操作。

Synchronization

Initialization

Program initialization runs in a single goroutine, but that goroutine may create other goroutines, which run concurrently.
If a package p imports package q, the completion of q's init functions happens before the start of any of p's.
The start of the function main.main happens after all init functions have finished.

同步

初始化

程序的初始化会在一个单独的 Go 协程中运行,但这个协程可能会创建其他并发运行的协程。
如果一个包 p 导入了另一个包 q,那么 q 的初始(init)函数的执行完成会先行发生于 p 的任何一个初始函数的开始执行。
而函数 main.main 的开始执行会后来发生于所有初始函数的执行完成。

Goroutine creation

The go statement that starts a new goroutine happens before the goroutine's execution begins.
For example, in this program:


func f() {
    print(a)
}

func hello() {
    a = "hello, world"
    go f()
}

calling hello will print "hello, world" at some point in the future (perhaps after hello has returned).

Go 协程创建

创建一个新的 Go 协程的语句会先行发生于该 Go 协程的开始执行。
比如以下程序:


func f() {
    print(a)
}

func hello() {
    a = "hello, world"
    go f()
}

调用 hello 方法将会在之后的某些时间点(可能是hello返回之后)打印“hello, world“。

Goroutine destruction

The exit of a goroutine is not guaranteed to happen before any event in the program. For example, in this program:

var a string

func hello() {
    go func() { a = "hello" }()
    print(a)
}

the assignment to a is not followed by any synchronization event, so it is not guaranteed to be observed by any other goroutine. In fact, an aggressive compiler might delete the entire go statement.
If the effects of a goroutine must be observed by another goroutine, use a synchronization mechanism such as a lock or channel communication to establish a relative ordering.

Go 协程销毁

Go 协程的退出不能保证先行发生于程序中的任何事件。比如,在以下程序中:

var a string

func hello() {
    go func() { a = "hello" }()
    print(a)
}

变量 a 的赋值语句后面没有任何同步时间,所以这次赋值不能保证被任何一个其他的协程观察到。实际上,一个激进的编译器可能会把这整个 Go 语句删掉。
如果一个 Go 协程的运行成果必须要被另一个所观察到,就需要使用一个同步机制(比如锁或者信道通信)来建立相对顺序。

Channel communication

Channel communication is the main method of synchronization between goroutines. Each send on a particular channel is matched to a corresponding receive from that channel, usually in a different goroutine.
A send on a channel happens before the corresponding receive from that channel completes.
This program:

var c = make(chan int, 10)
var a string

func f() {
    a = "hello, world"
    c <- 0
}

func main() {
    go f()
    <-c
    print(a)
}

is guaranteed to print "hello, world". The write to a happens before the send on c, which happens before the corresponding receive on c completes, which happens before the print.
The closing of a channel happens before a receive that returns a zero value because the channel is closed.
In the previous example, replacing c <- 0 with close(c) yields a program with the same guaranteed behavior.

信道通信

信道通信是协程间同步的主要手段。每一个面向信道的发送操作都会对应一个接受操作,通常这个接收操作来自于另一个协程。
向通道的发送操作先行发生于对应的接收操作完成。
以下程序:

var c = make(chan int, 10)
var a string

func f() {
    a = "hello, world"
    c <- 0
}

func main() {
    go f()
    <-c
    print(a)
}

能够保证打印出“hello, world“。变量 a 的写操作先行发生于向 c 的发送,而该发送先行发生于从c的接收,而该接受又先行发生于打印。
如果一个接收操作因为信道关闭而返回了0,那么信道的关闭过程先行发生于该接收操作。
在前面的例子中,用 close(c) 替换 c<-0 能得到同样的效果。

A receive from an unbuffered channel happens before the send on that channel completes.
This program (as above, but with the send and receive statements swapped and using an unbuffered channel):

var c = make(chan int)
var a string

func f() {
    a = "hello, world"
    <-c
}

func main() {
    go f()
    c <- 0
    print(a)
}

is also guaranteed to print "hello, world". The write to a happens before the receive on c, which happens before the corresponding send on c completes, which happens before the print.
If the channel were buffered (e.g., c = make(chan int, 1)) then the program would not be guaranteed to print "hello, world". (It might print the empty string, crash, or do something else.)

一个来自于无缓冲信道的接收操作会先行发生于对该信道的发送操作的完成。
以下代码(和上面的代码类似,但交换了发送和接收语句并且使用了无缓冲信道):

var c = make(chan int)
var a string

func f() {
    a = "hello, world"
    <-c
}

func main() {
    go f()
    c <- 0
    print(a)
}

也能确保打印出“hello, world”。对变量 a 的赋值先行发生于对 c 的接收,而对 c 的接收先行发生于相应的发送操作完成,发送操作又先行发生于打印。
如果信道有缓冲区(比如:c = make(chan int, 1) ),那么这个程序就不能保证可以打印出“hello, world“(可能打印空字符串、程序崩溃或者做其他事)。

The kth receive on a channel with capacity C happens before the k+Cth send from that channel completes.
This rule generalizes the previous rule to buffered channels. It allows a counting semaphore to be modeled by a buffered channel: the number of items in the channel corresponds to the number of active uses, the capacity of the channel corresponds to the maximum number of simultaneous uses, sending an item acquires the semaphore, and receiving an item releases the semaphore. This is a common idiom for limiting concurrency.
This program starts a goroutine for every entry in the work list, but the goroutines coordinate using the limit channel to ensure that at most three are running work functions at a time.

var limit = make(chan int, 3)

func main() {
    for _, w := range work {
        go func(w func()) {
            limit <- 1
            w()
            <-limit
        }(w)
    }
    select{}
}

来自容量为 C 的信道的第 k 个接收操作,先行发生于向该信道的第 k + C 个写发送操作的完成。
这一条同时也概括了前面有缓冲区信道的规则。这样就可以用一个有缓冲信道来模拟可计数信号量:信道中元素的个数相当于当前活跃用户,信道容量相当于最大并发用户数,发送元素给信道相当于请求一个信号量,而接收就相当于释放。这是一种限制并发的惯用手段。
以下程序对于 work 列表的每一个循环都会开启一个 Go 协程,但由于使用了 limit 信道进行调度,能够在同一时间最多只有三个 work 函数在运行。

var limit = make(chan int, 3)

func main() {
    for _, w := range work {
        go func(w func()) {
            limit <- 1
            w()
            <-limit
        }(w)
    }
    select{}
}

Locks

The sync package implements two lock data types, sync.Mutex and sync.RWMutex.
For any sync.Mutex or sync.RWMutex variable l and n < m, call n of l.Unlock() happens before call m of l.Lock() returns.
This program:

var l sync.Mutex
var a string

func f() {
    a = "hello, world"
    l.Unlock()
}

func main() {
    l.Lock()
    go f()
    l.Lock()
    print(a)
}

is guaranteed to print "hello, world". The first call to l.Unlock() (in f) happens before the second call to l.Lock() (in main) returns, which happens before the print.
For any call to l.RLock on a sync.RWMutex variable l, there is an n such that the l.RLock happens (returns) after call n to l.Unlock and the matching l.RUnlock happens before call n+1 to l.Lock.

sync 包实现了两种锁类型,sync.Mutex 和 sync.RWMutex。
对于任意的sync.Mutex 或 sync.RWMutex 变量 l 并且 n < m,对第 n 个 l.Unlock() 的调度先行发生于第 m 个 l.Lock() 的返回。
以下程序:

var l sync.Mutex
var a string

func f() {
    a = "hello, world"
    l.Unlock()
}

func main() {
    l.Lock()
    go f()
    l.Lock()
    print(a)
}

能够确保打印出“hello, world”。对 l.Unlock() 的第一次调用(在函数 f 中)先行发生于第二次调用 l.Lock()(在main函数中)的返回,而该调用又先行发生于打印。
一个 sync.RWMutex 变量 l,对于任何 l.RLock 的调用,存在一个 n,使得第 n 个 l.RLock 后来发生于第 n 个的 l.Unlock,同时该 RLock 所对应的 l.RUnlock 先行发生于第 n+1 个 l.Lock。

Once

The sync package provides a safe mechanism for initialization in the presence of multiple goroutines through the use of the Once type. Multiple threads can execute once.Do(f) for a particular f, but only one will run f(), and the other calls block until f() has returned.
A single call of f() from once.Do(f) happens (returns) before any call of once.Do(f) returns.
In this program:

var a string
var once sync.Once

func setup() {
    a = "hello, world"
}

func doprint() {
    once.Do(setup)
    print(a)
}

func twoprint() {
    go doprint()
    go doprint()
}

calling twoprint will call setup exactly once. The setup function will complete before either call of print. The result will be that "hello, world" will be printed twice.

Once类型 最多跑一次

sync 包通过 Once 类型,提供了一种用于多协程情况下的安全的初始化机制。对于一个特定的函数 f,可以用多个线程来调用 once.Do(f),但只有一个线程会真正运行 f,其他的调用则会阻塞直到 f() 返回。
通过 once.Do(f) 的对 f() 的单个调用,先行发生于所有其他 once.Do(f) 的返回。
以下函数:

var a string
var once sync.Once

func setup() {
    a = "hello, world"
}

func doprint() {
    once.Do(setup)
    print(a)
}

func twoprint() {
    go doprint()
    go doprint()
}

调用 twoprint 只会实际调用一次 setup。setup 函数会在任意一个 print 调用之前完成。结果就是“hello, world”会被打印两次。

Incorrect synchronization

Note that a read r may observe the value written by a write w that happens concurrently with r. Even if this occurs, it does not imply that reads happening after r will observe writes that happened before w.
In this program:

var a, b int

func f() {
    a = 1
    b = 2
}

func g() {
    print(b)
    print(a)
}

func main() {
    go f()
    g()
}

it can happen that g prints 2 and then 0.
This fact invalidates a few common idioms.

不正确的同步

注意到一个读操作 r 可能会观察到由一个和 r 并发的写操作 w 写入的值。但即便这真的发生了,也不能认为后发于 r 的读操作能够观察到先发于 w 的写操作。
在以下程序中:

var a, b int

func f() {
    a = 1
    b = 2
}

func g() {
    print(b)
    print(a)
}

func main() {
    go f()
    g()
}

可能会发生的情况是 g 先打印2再打印0。
这个事实会让一些常见的用法失效。

Double-checked locking is an attempt to avoid the overhead of synchronization. For example, the twoprint program might be incorrectly written as:

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
}

func doprint() {
    if !done {
        once.Do(setup)
    }
    print(a)
}

func twoprint() {
    go doprint()
    go doprint()
}

but there is no guarantee that, in doprint, observing the write to done implies observing the write to a. This version can (incorrectly) print an empty string instead of "hello, world".

双重检查加锁是一种尝试在同步中减少开销的方法。举个例子,twoprint 函数也可能会被错误地编写成下面的样子:

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
}

func doprint() {
    if !done {
        once.Do(setup)
    }
    print(a)
}

func twoprint() {
    go doprint()
    go doprint()
}

但这并不能保证在 doprint 中,观察到 done 被修改意味着观察到 a 被修改。这个版本可能会不正确地打印出一个空字符串而不是“hello, world“。

Another incorrect idiom is busy waiting for a value, as in:

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
}

func main() {
    go setup()
    for !done {
    }
    print(a)
}

As before, there is no guarantee that, in main, observing the write to done implies observing the write to a, so this program could print an empty string too. Worse, there is no guarantee that the write to done will ever be observed by main, since there are no synchronization events between the two threads. The loop in main is not guaranteed to finish.

另外一种不正确的习惯用法是忙式等待(busy waiting)一个值,比如:

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
}

func main() {
    go setup()
    for !done {
    }
    print(a)
}

就像前一个例子一样,不能保证在 main 中,观察到 done 被修改意味着观察到 a 被修改,所以这个程序也可能会打印出空字符串。更糟糕的是,由于两个线程间没有同步事件,因此不能保证 done 的修改会被 main 观察到。main 中的循环不一定能够结束。(这是为啥)

There are subtler variants on this theme, such as this program.

type T struct {
    msg string
}

var g *T

func setup() {
    t := new(T)
    t.msg = "hello, world"
    g = t
}

func main() {
    go setup()
    for g == nil {
    }
    print(g.msg)
}

Even if main observes g != nil and exits its loop, there is no guarantee that it will observe the initialized value for g.msg.
In all these examples, the solution is the same: use explicit synchronization.

这个事情还能整一些比较微妙的变体,比如下面这个程序:

type T struct {
    msg string
}

var g *T

func setup() {
    t := new(T)
    t.msg = "hello, world"
    g = t
}

func main() {
    go setup()
    for g == nil {
    }
    print(g.msg)
}

即便 main 观察到 g != nil 然后结束了它的循环,也不能保证它能获得 g.msg 初始化得到的值。
对于所有这些例子,解决方法都是相同的:使用显式的同步方法。


有疑问加站长微信联系(非本文作者)

本文来自:简书

感谢作者:炼狱的吹笛人

查看原文:Go 内存模型(The Go Memory Model)

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

2056 次点击  ∙  1 赞  
加入收藏 微博
上一篇:Golang反转链表
1 回复  |  直到 2021-03-31 15:41:43
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传