Go程序调试、分析与优化

bigwhite · · 7418 次点击 · · 开始浏览

这是一个创建于的文章，其中的信息可能已经有所发展或是发生改变。

Brad Fitzpatrick在YAPC Asia 2015（Yet Another Perl Conference）上做了一次技术分享，题为："Go Debugging, Profiling, and Optimization"。个人感觉这篇分享中价值最大的是BradFitz现场演示的一个有关如何对Go程序进行调试、分析和优化的 Demo，Brad将demo上传到了他个人在github.com的repo中，但不知为何，repo中的代码似乎与repo里talk.md中的说明不甚一致(btw，我并没有看video)。于是打算在这里按照Brad的思路重新走一遍demo的演示流程(所有演示代码在这里可以下载到)。

一、实验环境

$uname -a
Linux pc-tony 3.13.0-61-generic #100~precise1-Ubuntu SMP Wed Jul 29 12:06:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

注意:在Darwin或Windows下，profile的结果可能与这里有很大不同(甚至完全不一样的输出和瓶颈热点)。

$go version
go version go1.5 linux/amd64

$ go env
GOARCH="amd64"
GOBIN="/home1/tonybai/.bin/go15/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home1/tonybai/proj/GoProjects"
GORACE=""
GOROOT="/home1/tonybai/.bin/go15"
GOTOOLDIR="/home1/tonybai/.bin/go15/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

代码基于Brad的github.com/bradfitz/talk-yapc-asia-2015。

二、待优化程序(step0)

待优化程序，也就是原始程序，我们放在step0中：

//go-debug-profile-optimization/step0/demo.go

package main

import (
    "fmt"
    "log"
    "net/http"
    "regexp"
)

var visitors int

func handleHi(w http.ResponseWriter, r *http.Request) {
    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }
    visitors++
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("

Welcome!

You are visitor number " + fmt.Sprint(visitors) + "!"))
}

func main() {
    log.Printf("Starting on port 8080")
    http.HandleFunc("/hi", handleHi)
    log.Fatal(http.ListenAndServe("127.0.0.1:8080", nil))
}

$go run demo.go
2015/08/25 09:42:35 Starting on port 8080

在浏览器输入：http://localhost:8080/hi

一切顺利的话，页面会显示：

Welcome!

You are visitor number 1!

三、添加测试代码

按照talk.md中的说明，brad repo中demo中根本没有测试代码(commit 2427d0faa12ed1fb05f1e6a1e69307c11259c2b2)。

于是我根据作者的意图，新增了demo_test.go，采用TestHandleHi_Recorder和TestHandleHi_TestServer对HandleHi进行测试：

//go-debug-profile-optimization/step0/demo_test.go
package main

import (
    "bufio"
    "net/http"
    "net/http/httptest"
    "strings"
    "testing"
)

func TestHandleHi_Recorder(t *testing.T) {
    rw := httptest.NewRecorder()
    handleHi(rw, req(t, "GET / HTTP/1.0\r\n\r\n"))
    if !strings.Contains(rw.Body.String(), "visitor number") {
        t.Errorf("Unexpected output: %s", rw.Body)
    }
}

func req(t *testing.T, v string) *http.Request {
    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader(v)))
    if err != nil {
        t.Fatal(err)
    }
    return req
}

func TestHandleHi_TestServer(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(handleHi))
    defer ts.Close()
    res, err := http.Get(ts.URL)
    if err != nil {
        t.Error(err)
        return
    }
    if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
        t.Errorf("Content-Type = %q; want %q", g, w)
    }
    slurp, err := ioutil.ReadAll(res.Body)
    defer res.Body.Close()
    if err != nil {
        t.Error(err)
        return
    }
    t.Logf("Got: %s", slurp)
}

$ go test -v
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:45: Got:

Welcome!

You are visitor number 2!
PASS
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0 0.007s

测试通过！

至此，step0使命结束。

四、Race Detector(竞态分析）

并发设计使得程序可以更好更有效的利用现代处理器的多核心。但并发设计很容易引入竞态，导致严重bug。Go程序中竞态就是当多个goroutine并发访问某共享数据且未使用同步机制时，且至少一个goroutine进行了写操作。不过go工具自带race分析功能。在分析优化step0中demo代码前，我们先要保证demo代码中不存在竞态。

工具的使用方法就是在go test后加上-race标志，在step0目录下：

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:45: Got:

Welcome!

You are visitor number 2!
PASS
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0 1.012s

-race通过做运行时分析做竞态分析，虽然不存在误报，但却存在实际有竞态，但工具没发现的情况。接下来我们改造一下测试代码，让test并发起来：

向step1(copy自step0)中demo_test.go中添加一个test method:

//go-debug-profile-optimization/step1/demo_test.go
… …
func TestHandleHi_TestServer_Parallel(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(handleHi))
    defer ts.Close()
    var wg sync.WaitGroup
    for i := 0; i < 2; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            res, err := http.Get(ts.URL)
            if err != nil {
                t.Error(err)
                return
            }
            if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
                t.Errorf("Content-Type = %q; want %q", g, w)
            }
            slurp, err := ioutil.ReadAll(res.Body)
            defer res.Body.Close()
            if err != nil {
                t.Error(err)
                return
            }
            t.Logf("Got: %s", slurp)
        }()
    }
    wg.Wait()
}
… …

执行竞态test：

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:46: Got:

Welcome!

You are visitor number 2!
=== RUN   TestHandleHi_TestServer_Parallel
==================
WARNING: DATA RACE
Read by goroutine 22:
_/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
      /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0xf5
net/http.HandlerFunc.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1422 +0×47
net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
net/http.serverHandler.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1862 +0×206
net/http.(*conn).serve()
      /tmp/workdir/go/src/net/http/server.go:1361 +0x117c

Previous write by goroutine 25:
_/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
      /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0×111
net/http.HandlerFunc.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1422 +0×47
net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
net/http.serverHandler.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1862 +0×206
net/http.(*conn).serve()
      /tmp/workdir/go/src/net/http/server.go:1361 +0x117c

Goroutine 22 (running) created at:
net/http.(*Server).Serve()
/tmp/workdir/go/src/net/http/server.go:1910 +0×464

Goroutine 25 (running) created at:
net/http.(*Server).Serve()
/tmp/workdir/go/src/net/http/server.go:1910 +0×464
==================
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
demo_test.go:71: Got:

Welcome!

You are visitor number 3!
demo_test.go:71: Got:

Welcome!

You are visitor number 4!
PASS
Found 1 data race(s)
exit status 66
FAIL _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1 1.023s

工具发现demo.go第17行：
visitors++
是一处潜在的竞态条件。

visitors被多个goroutine访问但未采用同步机制。

既然发现了竞态条件，我们就需要fix it。有多种fix方法可选：

1、使用channel
2、使用Mutex
3、使用atomic

Brad使用了atomic：

//go-debug-profile-optimization/step1/demo.go
… …
var visitors int64 // must be accessed atomically

func handleHi(w http.ResponseWriter, r *http.Request) {
    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }
    visitNum := atomic.AddInt64(&visitors, 1)
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("

Welcome!

You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …

再做一次测试：

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:46: Got:

Welcome!

You are visitor number 2!
=== RUN TestHandleHi_TestServer_Parallel
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
demo_test.go:71: Got:

Welcome!

You are visitor number 3!
demo_test.go:71: Got:

Welcome!

You are visitor number 4!
PASS
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1 1.020s

竞态条件被消除了！

至此，step1结束了使命！

五、CPU Profiling

要做CPU Profilling，我们需要benchmark数据，Go test提供benchmark test功能，我们只要写对应的Benchmark测试方法即可：

//go-debug-profile-optimization/step2/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
b.ReportAllocs()

    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    for i := 0; i < b.N; i++ {
        rw := httptest.NewRecorder()
        handleHi(rw, req)
    }
}
… …

$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4 100000 14808 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.648s

开始CPU Profiling：

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4 200000 14679 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 3.096s

执行完benchmark test后，step2目录下出现两个新文件prof.cpu和step2.test，这两个文件将作为后续go tool pprof的输入：
$ls
demo.go demo_test.go prof.cpu step2.test*

使用go profile viewer工具：

$ go tool pprof step2.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top
1830ms of 3560ms total (51.40%)
Dropped 53 nodes (cum <= 17.80ms)
Showing top 10 nodes out of 133 (cum >= 1290ms)
      flat flat%   sum%        cum   cum%
     480ms 13.48% 13.48%      980ms 27.53% runtime.growslice
     360ms 10.11% 23.60%      700ms 19.66% runtime.mallocgc
     170ms 4.78% 28.37%      170ms 4.78% runtime.heapBitsSetType
     170ms 4.78% 33.15%      200ms 5.62% runtime.scanblock
     120ms 3.37% 36.52%     1100ms 30.90% regexp.makeOnePass.func2
     120ms 3.37% 39.89%      550ms 15.45% runtime.newarray
     110ms 3.09% 42.98%      300ms 8.43% runtime.makeslice
     110ms 3.09% 46.07%      220ms 6.18% runtime.mapassign1
     100ms 2.81% 48.88%      100ms 2.81% runtime.futex
      90ms 2.53% 51.40%     1290ms 36.24% regexp.makeOnePass

(pprof) top –cum
0.18s of 3.56s total ( 5.06%)
Dropped 53 nodes (cum <= 0.02s)
Showing top 10 nodes out of 133 (cum >= 1.29s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      3.26s 91.57% runtime.goexit
     0.02s 0.56% 0.56%      2.87s 80.62% BenchmarkHi
         0     0% 0.56%      2.87s 80.62% testing.(*B).launch
         0     0% 0.56%      2.87s 80.62% testing.(*B).runN
     0.03s 0.84% 1.40%      2.80s 78.65% step2.handleHi
     0.01s 0.28% 1.69%      2.46s 69.10% regexp.MatchString
         0     0% 1.69%      2.24s 62.92% regexp.Compile
         0     0% 1.69%      2.24s 62.92% regexp.compile
     0.03s 0.84% 2.53%      1.56s 43.82% regexp.compileOnePass
     0.09s 2.53% 5.06%      1.29s 36.24% regexp.makeOnePass

(pprof) list handleHi
Total: 3.56s
ROUTINE ======================== handleHi in go-debug-profile-optimization/step2/demo.go
      30ms      2.80s (flat, cum) 78.65% of Total
         .          .      9:)
         .          .     10:
         .          .     11:var visitors int64 // must be accessed atomically
         .          .     12:
         .          .     13:func handleHi(w http.ResponseWriter, r *http.Request) {
         .      2.47s     14:    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
         .          .     15:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     16:        return
         .          .     17:    }
      10ms       20ms     18:    visitNum := atomic.AddInt64(&visitors, 1)
      10ms       90ms     19:    w.Header().Set("Content-Type", "text/html; charset=utf-8")
      10ms       20ms     20:    w.Write([]byte("

Welcome!

You are visitor number " + fmt.Sprint(visitNum) + "!"))
         .          .     22:}
         .          .     23:
         .          .     24:func main() {
         .          .     25:    log.Printf("Starting on port 8080")
         .          .     26:    http.HandleFunc("/hi", handleHi)
(pprof)

从top –cum来看，handleHi消耗cpu较大，而handleHi中，又是MatchString耗时最长。

六、第一次优化

前面已经发现MatchString较为耗时，优化手段：让正则式仅编译一次(step3)：

// go-debug-profile-optimization/step3/demo.go

… …
var visitors int64 // must be accessed atomically

var rxOptionalID = regexp.MustCompile(`^\d*$`)

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("

Welcome!

You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …

运行一下bench：

$ go test -bench=.
PASS
BenchmarkHi-4 1000000 1678 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 1.710s

对比之前在step2中运行的bench结果：

$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4 100000 14808 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.648s

耗时相同，但优化后的bench运行了100w次，而之前的Bench运行10w次，相当于性能提高10倍。

再看看cpu prof结果：

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=3s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4 3000000 1640 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 6.540s

$ go tool pprof step3.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top –cum 30
2.74s of 8.07s total (33.95%)
Dropped 72 nodes (cum <= 0.04s)
Showing top 30 nodes out of 103 (cum >= 0.56s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      7.17s 88.85% runtime.goexit
     0.05s 0.62% 0.62%      6.21s 76.95% step3.BenchmarkHi
         0     0% 0.62%      6.21s 76.95% testing.(*B).launch
         0     0% 0.62%      6.21s 76.95% testing.(*B).runN
     0.06s 0.74% 1.36%      4.96s 61.46% step3.handleHi
     1.15s 14.25% 15.61%      2.35s 29.12% runtime.mallocgc
     0.02s 0.25% 15.86%      1.63s 20.20% runtime.systemstack
         0     0% 15.86%      1.53s 18.96% net/http.Header.Set
     0.06s 0.74% 16.60%      1.53s 18.96% net/textproto.MIMEHeader.Set
     0.09s 1.12% 17.72%      1.22s 15.12% runtime.newobject
     0.05s 0.62% 18.34%      1.09s 13.51% fmt.Sprint
     0.20s 2.48% 20.82%         1s 12.39% runtime.mapassign1
         0     0% 20.82%      0.81s 10.04% runtime.mcall
     0.01s 0.12% 20.94%      0.79s 9.79% runtime.schedule
     0.05s 0.62% 21.56%      0.76s 9.42% regexp.(*Regexp).MatchString
     0.09s 1.12% 22.68%      0.71s 8.80% regexp.(*Regexp).doExecute
     0.01s 0.12% 22.80%      0.71s 8.80% runtime.concatstring5
     0.20s 2.48% 25.28%      0.70s 8.67% runtime.concatstrings
         0     0% 25.28%      0.69s 8.55% runtime.gosweepone
     0.05s 0.62% 25.90%      0.69s 8.55% runtime.mSpan_Sweep
         0     0% 25.90%      0.68s 8.43% runtime.bgsweep
     0.04s   0.5% 26.39%      0.68s 8.43% runtime.newarray
     0.01s 0.12% 26.52%      0.67s 8.30% runtime.goschedImpl
     0.01s 0.12% 26.64%      0.65s 8.05% runtime.gosched_m
         0     0% 26.64%      0.65s 8.05% runtime.gosweepone.func1
     0.01s 0.12% 26.77%      0.65s 8.05% runtime.sweepone
     0.28s 3.47% 30.24%      0.62s 7.68% runtime.makemap
     0.17s 2.11% 32.34%      0.59s 7.31% runtime.heapBitsSweepSpan
     0.02s 0.25% 32.59%      0.58s 7.19% fmt.(*pp).doPrint
     0.11s 1.36% 33.95%      0.56s 6.94% fmt.(*pp).printArg

handleHi耗时有一定下降。

七、Mem Profiling

在step3目录下执行bench，获取mem分配数据：

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1657 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 5.002s

使用pprof工具分析mem：

$ go tool pprof –alloc_space step3.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
      flat flat%   sum%        cum   cum%
1076.35MB 52.06% 52.06% 1076.35MB 52.06% net/textproto.MIMEHeader.Set
535.54MB 25.90% 77.97% 2066.91MB   100% step3.BenchmarkHi
406.52MB 19.66% 97.63% 1531.37MB 74.07% step3.handleHi
   47.50MB 2.30% 99.93%    48.50MB 2.35% fmt.Sprint
         0     0% 99.93% 1076.35MB 52.06% net/http.Header.Set
         0     0% 99.93% 2066.91MB   100% runtime.goexit
         0     0% 99.93% 2066.91MB   100% testing.(*B).launch
         0     0% 99.93% 2066.91MB   100% testing.(*B).runN

(pprof) top -cum
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
      flat flat%   sum%        cum   cum%
535.54MB 25.90% 25.90% 2066.91MB   100% step3.BenchmarkHi
         0     0% 25.90% 2066.91MB   100% runtime.goexit
         0     0% 25.90% 2066.91MB   100% testing.(*B).launch
         0     0% 25.90% 2066.91MB   100% testing.(*B).runN
406.52MB 19.66% 45.57% 1531.37MB 74.07% step3.handleHi
         0     0% 45.57% 1076.35MB 52.06% net/http.Header.Set
1076.35MB 52.06% 97.63% 1076.35MB 52.06% net/textproto.MIMEHeader.Set
   47.50MB 2.30% 99.93%    48.50MB 2.35% fmt.Sprint

(pprof) list handleHi
Total: 2.02GB
     ROUTINE =========step3.handleHi in step3/demo.go
406.52MB     1.50GB (flat, cum) 74.07% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
         .     1.05GB     22:    w.Header().Set("Content-Type", "text/html; charset=utf-8")
         .          .     23:    w.Write([]byte("

Welcome!

You are visitor number " + fmt.Sprint(visitNum) + "!"))
         .          .     25:}
         .          .     26:
         .          .     27:func main() {
         .          .     28:    log.Printf("Starting on port 8080")
         .          .     29:    http.HandleFunc("/hi", handleHi)
(pprof)

可以看到handleHi22、23两行占用了较多内存。

八、第二次优化

第二次优化的方法：
1、删除w.Header().Set这行
2、用fmt.Fprintf替代w.Write

第二次优化的代码在step4目录中：

// go-debug-profile-optimization/step4/demo.go
… …
func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

visitNum := atomic.AddInt64(&visitors, 1)
fmt.Fprintf(w, "

Welcome!

You are visitor number %d!", r.FormValue("color"), visitNum)
}
… …

执行一遍pprof:

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1428 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4 4.343s

$ go tool pprof –alloc_space step4.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
      flat flat%   sum%        cum   cum%
559.54MB 64.42% 64.42%   868.06MB 99.94% step4.BenchmarkHi
219.52MB 25.27% 89.70%   219.52MB 25.27% bytes.makeSlice
      89MB 10.25% 99.94%   308.52MB 35.52% step4.handleHi
         0     0% 99.94%   219.52MB 25.27% bytes.(*Buffer).Write
         0     0% 99.94%   219.52MB 25.27% bytes.(*Buffer).grow
         0     0% 99.94%   219.52MB 25.27% fmt.Fprintf
         0     0% 99.94%   219.52MB 25.27% net/http/httptest.(*ResponseRecorder).Write
         0     0% 99.94%   868.06MB 99.94% runtime.goexit
         0     0% 99.94%   868.06MB 99.94% testing.(*B).launch
         0     0% 99.94%   868.06MB 99.94% testing.(*B).runN
(pprof) top –cum
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
      flat flat%   sum%        cum   cum%
559.54MB 64.42% 64.42%   868.06MB 99.94% step4.BenchmarkHi
         0     0% 64.42%   868.06MB 99.94% runtime.goexit
         0     0% 64.42%   868.06MB 99.94% testing.(*B).launch
         0     0% 64.42%   868.06MB 99.94% testing.(*B).runN
      89MB 10.25% 74.67%   308.52MB 35.52% step4.handleHi
         0     0% 74.67%   219.52MB 25.27% bytes.(*Buffer).Write
         0     0% 74.67%   219.52MB 25.27% bytes.(*Buffer).grow
219.52MB 25.27% 99.94%   219.52MB 25.27% bytes.makeSlice
         0     0% 99.94%   219.52MB 25.27% fmt.Fprintf
         0     0% 99.94%   219.52MB 25.27% net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 868.56MB
ROUTINE ============ step4.handleHi in step4/demo.go
      89MB   308.52MB (flat, cum) 35.52% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
      89MB   308.52MB     22:    fmt.Fprintf(w, "

Welcome!

You are visitor number %d!", r.FormValue("color"), visitNum)
         .          .     23:}
         .          .     24:
         .          .     25:func main() {
         .          .     26:    log.Printf("Starting on port 8080")
         .          .     27:    http.HandleFunc("/hi", handleHi)
(pprof)

可以看出内存占用大幅减少。

九、Benchcmp

golang.org/x/tools中有一个工具：benchcmp，可以给出两次bench的结果对比。

github.com/golang/tools是golang.org/x/tools的一个镜像。安装benchcmp步骤：

1、go get -u github.com/golang/tools
2、mkdir -p $GOPATH/src/golang.org/x
3、mv $GOPATH/src/github.com/golang/tools $GOPATH/src/golang.org/x
4、go install golang.org/x/tools/cmd/benchcmp

我们分别在step2、step3和step4下执行如下命令：

$ go-debug-profile-optimization/step2$ go test -bench=. -memprofile=prof.mem | tee mem.2
PASS
BenchmarkHi-4 100000 14786 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.644s

go-debug-profile-optimization/step3$ go test -bench=. -memprofile=prof.mem | tee mem.3
PASS
BenchmarkHi-4 1000000 1662 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 1.694s

go-debug-profile-optimization/step4$ go test -bench=. -memprofile=prof.mem | tee mem.4
PASS
BenchmarkHi-4 1000000 1428 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4 1.456s

利用benchcmp工具对比结果（benchcmp old new）：

$ benchcmp step3/mem.3 step4/mem.4
benchmark old ns/op new ns/op delta
BenchmarkHi-4 1662 1428 -14.08%

benchmark old allocs new allocs delta
BenchmarkHi-4 9 6 -33.33%

benchmark old bytes new bytes delta
BenchmarkHi-4 720 304 -57.78%

$ benchcmp step2/mem.2 step4/mem.4
benchmark old ns/op new ns/op delta
BenchmarkHi-4 14786 1428 -90.34%

benchmark old allocs new allocs delta
BenchmarkHi-4 81 6 -92.59%

benchmark old bytes new bytes delta
BenchmarkHi-4 4961 304 -93.87%

可以看出优化后，内存分配大幅减少，gc的时间也随之减少。

十、内存来自哪

我们在BenchmarkHi中清理每次handleHi执行后的内存：

//step5/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
b.ReportAllocs()

    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    for i := 0; i < b.N; i++ {
        rw := httptest.NewRecorder()
        handleHi(rw, req)
        reset(rw)
    }
}

func reset(rw *httptest.ResponseRecorder) {
    m := rw.HeaderMap
    for k := range m {
        delete(m, k)
    }
    body := rw.Body
    body.Reset()
    *rw = httptest.ResponseRecorder{
        Body:      body,
        HeaderMap: m,
    }
}

… …
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1518 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5 4.577s

$ go tool pprof –alloc_space step5.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
290.52MB of 291.52MB total (99.66%)
Dropped 14 nodes (cum <= 1.46MB)
      flat flat%   sum%        cum   cum%
         0     0%     0%   291.02MB 99.83% runtime.goexit
179.01MB 61.41% 61.41%   290.52MB 99.66% step5.BenchmarkHi
         0     0% 61.41%   290.52MB 99.66% testing.(*B).launch
         0     0% 61.41%   290.52MB 99.66% testing.(*B).runN
   26.50MB 9.09% 70.50%   111.51MB 38.25% step5.handleHi
         0     0% 70.50%    85.01MB 29.16% bytes.(*Buffer).Write
         0     0% 70.50%    85.01MB 29.16% bytes.(*Buffer).grow
   85.01MB 29.16% 99.66%    85.01MB 29.16% bytes.makeSlice
         0     0% 99.66%    85.01MB 29.16% fmt.Fprintf
         0     0% 99.66%    85.01MB 29.16% net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 291.52MB
ROUTINE ======================== _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5.handleHi in /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5/demo.go
   26.50MB   111.51MB (flat, cum) 38.25% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
   26.50MB   111.51MB     22:    fmt.Fprintf(w, "

Welcome!

You are visitor number %d!", r.FormValue("color"), visitNum)
         .          .     23:}
         .          .     24:
         .          .     25:func main() {
         .          .     26:    log.Printf("Starting on port 8080")
         .          .     27:    http.HandleFunc("/hi", handleHi)
(pprof)

内存从300MB降到111MB。内存来自哪？看到list handleHi，fmt.Fprintf分配了111.51MB。

我们来看这一行代码：
fmt.Fprintf(w, "

Welcome!

You are visitor number %d!",
r.FormValue("color"), num)

fmt.Fprintf的manual：

$ go doc fmt.Fprintf
func Fprintf(w io.Writer, format string, a …interface{}) (n int, err error)

Fprintf formats according to a format specifier and writes to w. It returns
the number of bytes written and any write error encountered.

这里回顾一下Go type在runtime中的内存占用：

A Go interface is 2 words of memory: (type, pointer).
A Go string is 2 words of memory: (base pointer, length)
A Go slice is 3 words of memory: (base pointer, length, capacity)

每次调用fmt.Fprintf，参数以value值形式传入函数时，程序就要为每个变参分配一个占用16bytes的empty interface，然后用传入的类型初始化该interface value。这就是这块累计分配内存较多的原因。

十一、消除所有内存分配

下面的优化代码可能在实际中并不需要，但一旦真的成为瓶颈，可以这么做：

//go-debug-profile-optimization/step6/demo.go
… …
var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    buf := bufPool.Get().(*bytes.Buffer)
    defer bufPool.Put(buf)
    buf.Reset()
    buf.WriteString("

Welcome!

You are visitor number ")
    b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
    b = append(b, '!')
    w.Write(b)
}
… …

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 5000000 780 ns/op 192 B/op 3 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6 4.709s

go tool pprof –alloc_space step6.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
1.07GB of 1.07GB total ( 100%)
Dropped 5 nodes (cum <= 0.01GB)
      flat flat%   sum%        cum   cum%
    1.07GB   100%   100%     1.07GB   100% step6.BenchmarkHi
         0     0%   100%     1.07GB   100% runtime.goexit
         0     0%   100%     1.07GB   100% testing.(*B).launch
         0     0%   100%     1.07GB   100% testing.(*B).runN

$ go test -bench=. -memprofile=prof.mem | tee mem.6
PASS
BenchmarkHi-4 2000000 790 ns/op 192 B/op 3 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6 2.401s

$ benchcmp step5/mem.5 step6/mem.6
benchmark old ns/op new ns/op delta
BenchmarkHi-4 1513 790 -47.79%

benchmark old allocs new allocs delta
BenchmarkHi-4 6 3 -50.00%

benchmark old bytes new bytes delta
BenchmarkHi-4 304 192 -36.84%

可以看到handleHi已经不在top列表中了。benchcmp结果也显示内存分配又有大幅下降！

十二、竞争(Contention)优化

为handleHi编写一个Parallel benchmark test:

//go-debug-profile-optimization/step7/demo_test.go
… …
func BenchmarkHiParallel(b *testing.B) {
    r, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    b.RunParallel(func(pb *testing.PB) {
        rw := httptest.NewRecorder()
        for pb.Next() {
            handleHi(rw, r)
            reset(rw)
        }
    })
}
… …

执行测试，并分析结果:

$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4 5000000 305 ns/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step7 1.947s

$ go tool pprof step7.test prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
3.68s of 3.72s total (98.82%)
Dropped 29 nodes (cum <= 0.02s)
Showing top 10 nodes out of 20 (cum >= 1.84s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      3.72s   100% runtime.goexit
     1.84s 49.46% 49.46%      1.84s 49.46% runtime.chanrecv1
         0     0% 49.46%      1.84s 49.46% main.main
         0     0% 49.46%      1.84s 49.46% runtime.main
         0     0% 49.46%      1.84s 49.46% testing.(*M).Run
         0     0% 49.46%      1.84s 49.43% testing.(*B).run
         0     0% 49.46%      1.84s 49.43% testing.RunBenchmarks
         0     0% 49.46%      1.84s 49.36% step7.BenchmarkHiParallel
     1.84s 49.36% 98.82%      1.84s 49.36% sync.(*WaitGroup).Wait
         0     0% 98.82%      1.84s 49.36% testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 3.72s
ROUTINE ====== step7.BenchmarkHiParallel in step7/demo_test.go
         0      1.84s (flat, cum) 49.36% of Total
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .          .    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .      1.84s    118:    })
         .          .    119:}
ROUTINE ==== step7.BenchmarkHiParallel.func1 in step7/demo_test.go
         0    43.02ms (flat, cum) 1.16% of Total
         .          .    110:    }
         .          .    111:
         .          .    112:    b.RunParallel(func(pb *testing.PB) {
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .    43.02ms    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .          .    118:    })
         .          .    119:}
(pprof) list handleHi
Total: 3.72s
ROUTINE =====step7.handleHi in step7/demo.go
         0    43.02ms (flat, cum) 1.16% of Total
         .          .     18:        return new(bytes.Buffer)
         .          .     19:    },
         .          .     20:}
         .          .     21:
         .          .     22:func handleHi(w http.ResponseWriter, r *http.Request) {
         .    43.01ms     23:    if !rxOptionalID.MatchString(r.FormValue("color")) {
         .          .     24:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     25:        return
         .          .     26:    }
         .          .     27:
         .          .     28:    visitNum := atomic.AddInt64(&visitors, 1)
         .     2.50us     29:    buf := bufPool.Get().(*bytes.Buffer)
         .          .     30:    defer bufPool.Put(buf)
         .          .     31:    buf.Reset()
         .          .     32:    buf.WriteString("

Welcome!

You are visitor number ")
(pprof)

handleHi中MatchString这块是一个焦点，这里耗时较多。

优化方法（step8）：

//go-debug-profile-optimization/step8/demo.go
… …
var colorRxPool = sync.Pool{
New: func() interface{} { return regexp.MustCompile(`\w*$`) },
}

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    buf := bufPool.Get().(*bytes.Buffer)
    defer bufPool.Put(buf)
    buf.Reset()
    buf.WriteString("

Welcome!

You are visitor number ")
    b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
    b = append(b, '!')
    w.Write(b)
}
… …

测试执行与分析：

$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4 100000 19190 ns/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step8 2.219s

$ go tool pprof step8.test prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
4.22s of 4.23s total (99.69%)
Dropped 28 nodes (cum <= 0.02s)
Showing top 10 nodes out of 12 (cum >= 2.11s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      4.23s   100% runtime.goexit
     2.11s 49.90% 49.90%      2.11s 49.90% runtime.chanrecv1
         0     0% 49.90%      2.11s 49.89% main.main
         0     0% 49.90%      2.11s 49.89% runtime.main
         0     0% 49.90%      2.11s 49.89% testing.(*M).Run
         0     0% 49.90%      2.11s 49.86% testing.(*B).run
         0     0% 49.90%      2.11s 49.86% testing.RunBenchmarks
         0     0% 49.90%      2.11s 49.79% step8.BenchmarkHiParallel
     2.11s 49.79% 99.69%      2.11s 49.79% sync.(*WaitGroup).Wait
         0     0% 99.69%      2.11s 49.79% testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 4.23s
ROUTINE ======step8.BenchmarkHiParallel in step8/demo_test.go
         0      2.11s (flat, cum) 49.79% of Total
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .          .    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .      2.11s    118:    })
         .          .    119:}
ROUTINE ======step8.BenchmarkHiParallel.func1 in step8/demo_test.go
         0    11.68ms (flat, cum) 0.28% of Total
         .          .    110:    }
         .          .    111:
         .          .    112:    b.RunParallel(func(pb *testing.PB) {
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .    11.68ms    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .          .    118:    })
         .          .    119:}
(pprof) list handleHi
Total: 4.23s
ROUTINE ======step8.handleHi in step8/demo.go
         0    11.68ms (flat, cum) 0.28% of Total
         .          .     21:var colorRxPool = sync.Pool{
         .          .     22:    New: func() interface{} { return regexp.MustCompile(`\w*$`) },
         .          .     23:}
         .          .     24:
         .          .     25:func handleHi(w http.ResponseWriter, r *http.Request) {
         .     5.66ms     26:    if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
         .          .     27:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     28:        return
         .          .     29:    }
         .          .     30:
         .          .     31:    visitNum := atomic.AddInt64(&visitors, 1)
         .     6.02ms     32:    buf := bufPool.Get().(*bytes.Buffer)
         .          .     33:    defer bufPool.Put(buf)
         .          .     34:    buf.Reset()
         .          .     35:    buf.WriteString("

Welcome!

You are visitor number ")
(pprof)

优化后，MatchString从43ms降到5.66ms。

有疑问加站长微信联系（非本文作者）

本文来自：Tony Bai

感谢作者：bigwhite

查看原文：Go程序调试、分析与优化

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

7418 次点击 ∙ 3 赞

加入收藏微博

收入我的专栏

上一篇：Go语言中Tcp协议粘包问题处理

下一篇：golang捕获http.ResponseWriter被close的两种方式（有无context）

http

runtime

pprof

net

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Go程序调试、分析与优化

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

Welcome!

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏