Go程序出异常怎么办?pprof工具分析啊,可是如果是代码方面bug等呢?分析代码bug有时需要结合执行过程,加日志呗,可是某些异常问题服务重启之后,可能会很难复现。这时候我们可以断点调试,这样就能分析每一行代码的执行,每一个变量的结果,C语言通常使用GDB调试,Go语言有专门的调试工具dlv,本篇文章主要介绍dlv的基本使用。
## dlv 概述
  dlv全称delve,安装也比较简单,go install就能安装:
```
//下载&安装
$ git clone https://github.com/go-delve/delve
$ cd delve
$ go install github.com/go-delve/delve/cmd/dlv
//go 1.16版本以上
# Install at a specific version or pseudo-version:
$ go install github.com/go-delve/delve/cmd/dlv@v1.7.3
#On macOS make sure you also install the command line developer tools:
xcode-select --install
```
  dlv支持多种方式跟踪你的Go程序,help命令查看:
```
dlv help
//参数传递
Pass flags to the program you are debugging using `--`, for example:
`dlv exec ./hello -- server --config conf/config.toml`
Usage:
dlv [command]
Available Commands:
//常用来调试异常进程
attach Attach to running process and begin debugging.
//启动并调试二进制程序
exec Execute a precompiled binary, and begin a debug session.
debug Compile and begin debugging main package in current directory, or the package specified.
......
```
  dlv与GDB还是比较类似的,可打印变量的值,可设置断点,可单步执行,可查看调用栈,另外还可以查看当前Go进程的所有协程、线程等;常用的功能(命令)如下:
```
Running the program:
//运行到断点处,或者直到程序终止
continue (alias: c) --------- Run until breakpoint or program termination.
//单步执行
next (alias: n) ------------- Step over to next source line.
//重新启动进程
restart (alias: r) ---------- Restart process.
//进入函数,普通的n函数调用是一行代码,会直接跳过
step (alias: s) ------------- Single step through program.
//退出函数执行
stepout (alias: so) --------- Step out of the current function.
Manipulating breakpoints:
//设置断点
break (alias: b) ------- Sets a breakpoint.
//查看所有断点
breakpoints (alias: bp) Print out info for active breakpoints.
//删除断点
clear ------------------ Deletes breakpoint.
//删除所有断点
clearall --------------- Deletes multiple breakpoints.
Viewing program variables and memory:
//输出函数参数
args ----------------- Print function arguments.
//输出局部变量
locals --------------- Print local variables.
//输出某一个变量
print (alias: p) ----- Evaluate an expression.
//输出寄存器内存
regs ----------------- Print contents of CPU registers.
//修改变量的值
set ------------------ Changes the value of a variable.
Listing and switching between threads and goroutines:
//输出协程调用栈或者切换到指定协程
goroutine (alias: gr) -- Shows or changes current goroutine
//输出所有协程
goroutines (alias: grs) List program goroutines.
//切换到指定线程
thread (alias: tr) ----- Switch to the specified thread.
//输出所有线程
threads ---------------- Print out info for every traced thread.
Viewing the call stack and selecting frames:
//输出调用栈
stack (alias: bt) Print stack trace.
Other commands:
//输出程序汇编指令
disassemble (alias: disass) Disassembler.
//显示源代码
list (alias: ls | l) ------- Show source code.
```
  dlv的命令虽然比较多,但是常用的也就几个,一般只要会设置断点,单步执行,输出变量、调用栈等就能满足基本的调试需求。
## dlv 实战
  我们写一个小程序,通过dlv调试,复习下之前介绍的管道读写,以及调度器流程。注意,Go是多线程/多协程程序,实际执行过程可能比较复杂,而且笔者也省略了部分调试过程,所以即使你完全跟着步骤调试,结果可能也不一样。程序如下:
```
package main
import (
"fmt"
"time"
)
func main() {
queue := make(chan int, 1)
go func() {
for {
data := <- queue
fmt.Print(data, " ")
}
}()
for i := 0; i < 10; i ++ {
queue <- i
}
time.Sleep(time.Second * 1000)
}
```
  编译Go程序并通过dlv启动执行:
```
//编译标识注意 -N -l ,禁止编译优化
go build -gcflags '-N -l' test.go
dlv exec test
Type 'help' for list of commands.
(dlv)
```
  接下来就可以输入上面介绍的诸多调试命令,开启dlv调试之旅了。我们之前已经介绍过管道的实现原理以及Go调度器相关,管道的读写操作实现函数为runtime.chanrecv/runtime.chansend,调度器主逻辑是runtime.schedule;另外,读者需要知道,我们的主协程也就是main函数,编译后对应的函数是main.main。在这几个函数都添加断点。
```
//有些时候只根据函数名无法区分,设置断点可能需要携带包名,如runtime.chansend
(dlv) b chansend
Breakpoint 1 set at 0x1003f0a for runtime.chansend() /go1.18/src/runtime/chan.go:159
(dlv) b chanrecv
Breakpoint 2 set at 0x1004c2f for runtime.chanrecv() /go1.18/src/runtime/chan.go:455
(dlv) b schedule
Breakpoint 3 set at 0x1037aea for runtime.schedule() /go1.18/src/runtime/proc.go:3111
(dlv) b main.main
Breakpoint 4 set at 0x1089a0a for main.main() ./test.go:8
```
  continue(简写c)命令执行到断点处:
```
(dlv) c
> runtime.schedule() /go1.18/src/runtime/proc.go:3111 (hits total:1) (PC: 0x1037aea)
=>3111: func schedule() {
3112: _g_ := getg()
3113:
3114: if _g_.m.locks != 0 {
3115: throw("schedule: holding locks")
3116: }
```
  =>指向当前执行的代码,第一次竟然执行到了runtime.schedule,没有到main函数?要知道main函数最终也是作为主协程调度执行的,所以main函数肯定不是第一个执行的,调度主协程之前肯定需要线程,创建主协程,执行调度逻辑等等。那Go程序第一行代码应该是什么?我们看一下调用栈:
```
(dlv) bt
0 0x0000000001037aea in runtime.schedule
at /go1.18/src/runtime/proc.go:3111
1 0x000000000103444d in runtime.mstart1
at /go1.18/src/runtime/proc.go:1425
2 0x000000000103434c in runtime.mstart0
at /go1.18/src/runtime/proc.go:1376
3 0x00000000010585e5 in runtime.mstart
at /go1.18/src/runtime/asm_amd64.s:368
4 0x0000000001058571 in runtime.rt0_go
at /go1.18/src/runtime/asm_amd64.s:331
```
  Go程序第一行代码在runtime/asm_amd64.s,入口函数是runtime.rt0_go,有兴趣的可以看看,都是汇编代码。接下来,继续c执行到断点,你会发现还是程序还是会执行的暂停到runtime.schedule,甚至是runtime.chanrecv,这是因为在调度主协程之前,还需要做很多初始化工作(有用到这几个函数)。所以我们通常是先设置断点main.main,c执行到这里,再设置其他断点,restart重新执行程序,删除其他断点,重新在main.main设置断点,并continue执行到断点处:
```
(dlv) r
Process restarted with PID 57676
(dlv) clearall
(dlv) b main.main
Breakpoint 5 set at 0x1089a0a for main.main() ./test.go:8
(dlv) c
> main.main() ./test.go:8 (hits goroutine(1):1 total:1) (PC: 0x1089a0a)
=> 8: func main() {
9: queue := make(chan int, 1)
10: go func() {
```
  这下程序终于执行到main.main函数处了,接下来在管道读写函数设置断点,并continue执行到断点处:
```
(dlv) b chansend
Breakpoint 1 set at 0x1003f0a for runtime.chansend() /go1.18/src/runtime/chan.go:159
(dlv) b chanrecv
Breakpoint 2 set at 0x1004c2f for runtime.chanrecv() /go1.18/src/runtime/chan.go:455
(dlv) c
> runtime.chansend() /go1.18/src/runtime/chan.go:159 (hits goroutine(1):1 total:1) (PC: 0x1003f0a)
=> 159: func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
160: if c == nil {
161: if !block {
162: return false
163: }
```
  程序执行到了runtime.chansend函数,对应的应该是"queue <- i"这一行代码。bt看看函数栈桢确认下是不是:
```
(dlv) bt
0 0x0000000001003f0a in runtime.chansend
at /go1.18/src/runtime/chan.go:159
1 0x0000000001003edd in runtime.chansend1
at /go1.18/src/runtime/chan.go:144
2 0x0000000001089aa9 in main.main
at ./test.go:18
//查看参数
(dlv) args
c = (*runtime.hchan)(0xc00005a070)
ep = unsafe.Pointer(0xc000070f58)
block = true //会阻塞协程
callerpc = 17341097
~r0 = (unreadable empty OP stack)
//循环第一次写入管道的数值应该是0,x命令可查看内存
(dlv) x 0xc000070f58
0xc000070f58: 0x00
```
  这里我们通过args命令看一下输入参数,block为true说明会阻塞当前协程(如果管道不可写),ep是一个地址,存储待写入数据,x命令可以查看内存,我们看到就是数值0。
  还记得我们之前介绍的管道chan的实现原理吗?底层维护着一个循环队列(有缓冲管道),写数据主要包含这几步逻辑:1)如果管道为nil,阻塞当前协程(block=true);2)如果已关闭,抛出panic异常;3)如果有协程在等待读,直接将数据交给目标协程,并唤醒该协程;4)如果管道还有剩余容量,写数据;4)管道容量已经满了,阻塞当前协程(block=true)。
  接下来可以单步执行,看看管道写操作的执行流程。这一过程比较简单,重复较多,就不再赘述了,我们只列出来单步执行的一个中间过程:
```
(dlv) n
1 > runtime.chansend() /go1.18/src/runtime/chan.go:208 (PC: 0x10040e0)
Warning: debugging optimized function
203: if c.closed != 0 {
204: unlock(&c.lock)
205: panic(plainError("send on closed channel"))
206: }
207:
=> 208: if sg := c.recvq.dequeue(); sg != nil {
209: // Found a waiting receiver. We pass the value we want to send
210: // directly to the receiver, bypassing the channel buffer (if any).
211: send(c, sg, ep, func() { unlock(&c.lock) }, 3)
212: return true
213: }
```
  单步执行过程中,你可能会发现阻塞协程是通过gopark函数将协程换出,切换到调度器循环的。我们在runtime.schedule以及runtime.gopark函数再设置断点,观察协程切换情况:
```
(dlv) b schedule
Breakpoint 8 set at 0x1037aea for runtime.schedule() /go1.18/src/runtime/proc.go:3111
(dlv) b gopark
Breakpoint 9 set at 0x1031aca for runtime.gopark() /go1.18/src/runtime/proc.go:344
(dlv) c
> runtime.gopark() /go1.18/src/runtime/proc.go:344 (hits goroutine(1):2 total:2) (PC: 0x1031aca)
=> 344: func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
345: if reason != waitReasonSleep {
346: checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
347: }
348: mp := acquirem()
349: gp := mp.curg
```
  runtime.gopark函数主要是切换到调度栈,并执行runtime.schedule调度器(查找可执行协程并调度),所以再次continue会执行到runtime.schedule断点处:
```
(dlv) c
> [b] runtime.schedule() /go1.18/src/runtime/proc.go:3111 (hits total:19) (PC: 0x1037aea)
=>3111: func schedule() {
3112: _g_ := getg()
(dlv) bt
0 0x0000000001037aea in runtime.schedule
at /Users/lile/Documents/go1.18/src/runtime/proc.go:3111
1 0x000000000103826d in runtime.park_m
at /Users/lile/Documents/go1.18/src/runtime/proc.go:3336
2 0x0000000001058663 in runtime.mcall
at /Users/lile/Documents/go1.18/src/runtime/asm_amd64.s:425
```
  bt查看调用栈,发现栈底函数是runtime.mcall,调用栈这么短吗?怎么看不到runtime.gopark函数呢?因为这里切换了栈桢,从用户协程栈切换到调度栈,所以调用链路肯定不一样了,是看不到之前用户栈的调用链路的。runtime.mcall函数就是用来切换栈桢的。
## 总结
  dlv是Go程序调试非常好的工具,不仅可以帮助我们学习理解Go语言,也可以帮助我们快速排查定位程序bug等,一定要熟练掌握。
有疑问加站长微信联系(非本文作者))