第二章 Goroutine泄漏的调试

范彬2017 · 2017-06-10 15:06:45 · 2733 次点击 · 预计阅读时间 8 分钟 · 大约8小时之前开始浏览

这是一个创建于 2017-06-10 15:06:45 的文章，其中的信息可能已经有所发展或是发生改变。

在我们谈论协程(Goroutines)泄漏之前，我们先看看并发编程的概念。并发编程处理程序的并发执行。多个连续流任务通过并发编程同时执行，得到更快的执行完成。对于运行在多核处理器上的现代软件，并发编程是必要的，它有助于更好地利用多核处理器的功能，实现更快的并发/并行程序。

协程 (Goroutines)

协程实现了并发执行，协程是Go运行时轻量级线程，协程和线程之间并无一对一的关系，协程由Go管理调度，运行在不同的线程上。Go协程的设计隐藏了许多线程创建和管理方面的复杂工作。

关于并发/并行程序，并发程序可能是并行的，也可能不是。并行是一种通过使用多处理器以提高速度的能力。一个设计良好的并发程序在并行方面的表现也非常出色。在Go语言中，为了使你的程序可以使用多个核心运行，这时协程就真正的是并行运行了，你必须使用GOMAXPROCS变量。详细参考：https://github.com/Unknwon/the-way-to -go_ZH_CN/blob/master/eBook/14.1.md

同步 (synchronize)

进程、线程、协程协作都有一个共同的目标：同步和通讯。

Go语言中，Channels用于协程的同步。传统线程模式通讯是共享内存。Go鼓励使用Channel在协程之间传递引用，而不是显式地使用锁来协调对共享数据的访问。这种方法确保在给定时间只有一个goroutine可以访问数据。

如下面的例子所示，每个worker执行完成后，他们需要与main协程协作，将返回结果通过channels传递给main协程，之后main协程退出程序。

同步出错

请注意，每次使用go关键字时，Go例程将如何退出。有时候同步可能出现错误，导致一些goroutine永远等待。在Go语言中，如下情况可能导致同步出错：

Channel没有接受者

没有一个接受者来接受发送者发送的数据，Channel是阻塞的。没有接受者的Channel会引起程序挂起。下面的例子，ch1没有接受者，将导致Channel是阻塞的。

package main

import "fmt"

func main() {

ch1 :=make(chanint)

go pump(ch1)// pump hangs

fmt.Println(<-ch1)// prints only 0

}

funcpump(chchanint) {

fori :=0; ; i++ {

ch <- i

}

}

Channel没有写入者

如下情况会出现channel没有写入者的情况，会出现goroutine泄漏。

例 1: for-select

for {

select {

case <-c:

// process here

}

}

例 2: channel循环

go func() {

for range ch { }

}()

例3: 演示tasks循环，导致channel没有写入者，需要主程序调用close(tasks)来避免goroutine泄漏问题。

package main

import "fmt"

func concurrency() {

// lets first create a channel with a buffer

tasks := make(chan string, 20)

// create another one to receive the results

results := make(chan string, 20)

workers := []int{1, 2, 3, 4}

// inserting tasks inside the channel

for task := 0; task < 10; task++ {

tasks <- fmt.Sprintf("Task %d", task)

}

for _, w := range workers {

// starging one goroutine for each worker

go work(w, tasks, results)

}

close(tasks)

// lets print the resutls

fmt.Println("Will print the results")

for res := 0; res < 10; res++ {

fmt.Println("Result:", <-results)

}

}

func work(workerID int, tasks chan string, results chan string) {

// worker will block util a new task arrives in the channel

for t := range tasks {

// simple task as example

results <- fmt.Sprintf("Worker %d got %v", workerID, t)

}

}

func main() {

concurrency()

}

好的做法

使用timeOut

timeout := make(chan bool, 1)

go func() {

time.Sleep(1e9) // one second

timeout <- true

}()

select {

case <- ch:

// a read from ch has occurred

case <- timeout:

// the read from ch has timed out

} OR select {

case res := <-c1:

fmt.Println(res)

case <-time.After(time.Second * 1):

fmt.Println("timeout 1")

}

使用Golang context package

Golang context package可以用来优雅地结束例程甚至超时

泄漏检测

仪器（instrumentation）端点

检测Web服务器泄漏的办法是添加仪器端点，并将其与负载测试一起使用。

// get the count of number of go routines in the system.

func countGoRoutines() int {

returnruntime.NumGoroutine()

}

func getGoroutinesCountHandler(w http.ResponseWriter, r *http.Request) {

// Get the count of number of go routines running.

count := countGoRoutines()

w.Write([]byte(strconv.Itoa(count)))

}

func main() {

http.HandleFunc("/_count", getGoroutinesCountHandler)

}

在负载测试之前和之后，通过仪器端点响应在系统中存在的goroutines数量。以下是负载测试程序的流程：

Step 1: Call the instrumentation endpoint and get the count of number of goroutines alive in your webserver.

Step 2: Perform load test.Lets the load be concurrent.

for i := 0; i < 100 ; i++ {

go callEndpointUnderInvestigation()

}

Step 3: Call the instrumentation endpoint and get the count of number of goroutines alive in your webserver.

如果负载测试后系统中存在异常增加的goroutine数量，则证明存在泄漏。这是一个具有漏洞端点的Web服务器的小例子。通过简单的测试我们可以确定服务器是否存在泄漏。

// First run the leaky server $ go run leaky-server.go

// Run the load test now.$ go run load.go

3 Go routines before the load test in the system.

54 Go routines after the load test in the system.

您可以清楚地看到，通过50个并发请求到泄漏端点，系统中增加了50个程序。

让我们再次运行负载测试。

$ go run load.go

53 Go routines before the load test in the system.

104 Go routines after the load test in the system.

很清楚，在每次运行的负载测试中，服务器中的执行次数都在增加，而不是下降。这是一个明显的泄漏证据。

识别泄漏的起因

使用栈跟踪端点

一旦发现Web服务器中存在泄漏，需要确定泄漏的来源。可以通过添加返回Web服务器的栈跟踪端点可以帮助识别泄漏的来源。

import (

"runtime/debug"

"runtime/pprof"

)

func getStackTraceHandler(w http.ResponseWriter, r *http.Request) {

stack := debug.Stack()

w.Write(stack)

pprof.Lookup("goroutine").WriteTo(w, 2)

}

func main() {

http.HandleFunc("/_stack", getStackTraceHandler)

}

在确定泄漏的存在之后，使用端点在负载之前和之后获取栈跟踪信息，以识别泄漏的来源。

将栈跟踪工具添加到泄漏服务器并再次执行负载测试。

如下栈跟踪信息清楚地指出泄漏的震中：

// First run the leaky server$ go run leaky-server.go

// Run the load test now.$ go run load.go

3 Go routines before the load test in the system.

54 Go routines after the load test in the system. goroutine 149 [chan send]:

main.sum(0xc420122e58, 0x3, 0x3, 0xc420112240)

/home/karthic/gophercon/count-instrument.go:39 +0x6c

created by main.sumConcurrent

/home/karthic/gophercon/count-instrument.go:51 +0x12b

goroutine 243 [chan send]:

main.sum(0xc42021a0d8, 0x3, 0x3, 0xc4202760c0)

/home/karthic/gophercon/count-instrument.go:39 +0x6c

created by main.sumConcurrent

/home/karthic/gophercon/count-instrument.go:51 +0x12b

goroutine 259 [chan send]:

main.sum(0xc4202700d8, 0x3, 0x3, 0xc42029c0c0)

/home/karthic/gophercon/count-instrument.go:39 +0x6c

created by main.sumConcurrent

/home/karthic/gophercon/count-instrument.go:51 +0x12b

goroutine 135 [chan send]:

main.sum(0xc420226348, 0x3, 0x3, 0xc4202363c0)

/home/karthic/gophercon/count-instrument.go:39 +0x6c

created by main.sumConcurrent

/home/karthic/gophercon/count-instrument.go:51 +0x12b

goroutine 166 [chan send]:

main.sum(0xc4202482b8, 0x3, 0x3, 0xc42006b8c0)

/home/karthic/gophercon/count-instrument.go:39 +0x6c

created by main.sumConcurrent

/home/karthic/gophercon/count-instrument.go:51 +0x12b

goroutine 199 [chan send]:

main.sum(0xc420260378, 0x3, 0x3, 0xc420256480)

/home/karthic/gophercon/count-instrument.go:39 +0x6c

created by main.sumConcurrent

/home/karthic/gophercon/count-instrument.go:51 +0x12b

........

使用profiling

由于泄漏的goroutine通常被阻止去尝试读取或写入channel或甚至可能睡眠，profilling分析将帮助识别泄漏的起因。参见benchmarks and profiling谈论基准测试和分析，或https://github.com/Unknwon/the-way-to-go_ZH_CN/blob/master/eBook/13.10.md。

避免泄漏，赶早不赶晚

单元测试和功能测试中使用instrument机制可以帮助早期识别泄漏。计数试验前后的goroutine数。

func TestMyFunc() {

// get count of go routines. perform the test.

// get the count diff.

// alert if there's an unexpected rise.

}

测试中的栈差异

栈差异是一个简单的程序，它在测试之前和之后对栈跟踪进行差异比较，并在任何不期望的goroutine遗留的系统情况下发出警报。将将其与单元测试和功能测试集成，可以帮助在开发过程中识别泄漏。

import (

github.com/fortytw2/leaktest

)

func TestMyFunc(t *testing.T) {

defer leaktest.Check(t)()

go func() {

for {

time.Sleep(time.Second)

}

}()

}

安全设计

当系统受到一个端点/服务受到泄漏或资源中断影响的时候，微服务架构的服务做为独立容器/过程运行可以保护整个系统。推荐使用容器编排工具，如Kubernetes，Mesosphere和Docker Swarm。

Goroutine泄漏就像慢性自杀。设想获取整个系统的栈跟踪，并尝试识别哪些服务导致数百个服务中的泄漏！真的吓人!!!! 他们在一段时间浪费你的计算资源，慢慢积累，你甚至不会注意到。真的很重要去意识到泄漏并尽早调试它们！

Go will make you love programming again. I promise.

Go会让你再次爱编程。我承诺。

参考：

《The Way to Go》中文译本《Go入门指南》https://github.com/Unknwon/the-way-to-go_ZH_CN

Debugging go routine leaks:https://youtu.be/hWo0FEVr92A

https://github.com/fortytw2/leaktest

http://www.tuicool.com/articles/2AZf63J

有疑问加站长微信联系（非本文作者）

本文来自：简书

感谢作者：范彬2017

查看原文：第二章 Goroutine泄漏的调试

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

2733 次点击

加入收藏微博

收入我的专栏

上一篇：初窥dep

下一篇：go服务端----使用dotweb框架搭建简易服务

goroutine

测试

channel

线程

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

关注我

扫码关注领全套学习资料
加入 QQ 群：
- 192706294（已满）
- 731990104（已满）
- 798786647（已满）
- 729884609（已满）
- 977810755（已满）
- 815126783（已满）
- 812540095（已满）
- 1006366459（已满）
- 692541889
加入微信群：liuxiaoyan-s，备注入群
也欢迎加入知识星球 Go粉丝们（免费）

第二章 Goroutine泄漏的调试

协程 (Goroutines)

同步 (synchronize)

同步出错

Channel没有接受者

好的做法

使用Golang context package

泄漏检测

仪器（instrumentation）端点

识别泄漏的起因

使用栈跟踪端点

使用profiling

避免泄漏，赶早不赶晚

安全设计

用户登录

今日阅读排行

一周阅读排行

关注我

协程 (Goroutines)

同步 (synchronize)

同步出错

Channel没有接受者

好的做法

使用Golang context package

泄漏检测

仪器（instrumentation）端点

识别泄漏的起因

使用栈跟踪端点

使用profiling

避免泄漏，赶早不赶晚

安全设计

第二章 Goroutine泄漏的调试

协程 (Goroutines)

同步 (synchronize)

同步出错

Channel没有接受者

好的做法

使用Golang context package

泄漏检测

仪器（instrumentation）端点

识别泄漏的起因

使用栈跟踪端点

使用profiling

避免泄漏，赶早不赶晚

安全设计

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

协程 (Goroutines)

同步 (synchronize)

同步出错

Channel没有接受者

好的做法

使用Golang context package

泄漏检测

仪器（instrumentation）端点

识别泄漏的起因

使用栈跟踪端点

使用profiling

避免泄漏，赶早不赶晚

安全设计

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏