Go 1.8 performance improvements, one month in

Dave Cheney · · 964 次点击 · · 开始浏览    
这是一个创建于 的文章,其中的信息可能已经有所发展或是发生改变。

Sunday September the 18th marks a month since the Go 1.8 cycle opened officially. I’m passionate about the performance of Go programs, and of the compiler itself. This post is a brief look at the state of play, roughly 1/2 way into the development cycle for Go 1.81.

Note: these results are of course preliminary and represent only a point in time, not the performance of the final Go 1.8 release.

Compile times

Nothing much to report here. Using the methodology from my previous Go 1.7 benchmarks, there is a 3.22%–5.11% improvement in full compile time compared to Go 1.7.

Go 1.4.3, Go 1.7, Go tip

Performance improvements

Intel amd64

Better code generation and small improvements to the runtime and standard library show some small improvements for amd642, but really nothing to write home about yet.

name                       old time/op    new time/op  delta
BinaryTree17-4              3.07s ± 2%     3.06s ± 2%    ~      (p=0.661 n=10+9)
Fannkuch11-4                3.23s ± 1%     3.22s ± 0%  -0.43%   (p=0.008 n=9+10)
FmtFprintfEmpty-4          64.4ns ± 0%    61.8ns ± 4%  -4.17%   (p=0.005 n=9+10)
FmtFprintfString-4          162ns ± 0%     162ns ± 0%    ~      (p=0.065 n=10+9)
FmtFprintfInt-4             142ns ± 0%     142ns ± 0%    ~      (p=0.137 n=8+10)
FmtFprintfIntInt-4          220ns ± 0%     217ns ± 0%  -1.18%   (p=0.000 n=9+10)
FmtFprintfPrefixedInt-4     224ns ± 0%     224ns ± 1%    ~       (p=0.206 n=9+9)
FmtFprintfFloat-4           313ns ± 0%     312ns ± 0%  -0.26%   (p=0.001 n=10+9)
FmtManyArgs-4               906ns ± 0%     894ns ± 0%  -1.32%    (p=0.000 n=7+6)
GobDecode-4                8.88ms ± 1%    8.81ms ± 0%  -0.81%  (p=0.003 n=10+10)
GobEncode-4                7.93ms ± 1%    7.88ms ± 0%  -0.66%   (p=0.008 n=9+10)
Gzip-4                      272ms ± 1%     277ms ± 0%  +1.95%   (p=0.000 n=10+9)
Gunzip-4                   47.4ms ± 0%    47.4ms ± 0%    ~      (p=0.720 n=9+10)
HTTPClientServer-4          201µs ± 4%     202µs ± 2%    ~     (p=0.631 n=10+10)
JSONEncode-4               19.3ms ± 0%    19.3ms ± 0%    ~     (p=0.063 n=10+10)
JSONDecode-4               61.0ms ± 0%    61.2ms ± 0%  +0.33%   (p=0.000 n=10+8)
Mandelbrot200-4            5.20ms ± 0%    5.20ms ± 0%    ~      (p=0.475 n=10+7)
GoParse-4                  3.95ms ± 1%    3.97ms ± 1%  +0.65%    (p=0.003 n=9+9)
RegexpMatchEasy0_32-4      88.4ns ± 0%    88.7ns ± 0%  +0.34%   (p=0.001 n=10+9)
RegexpMatchEasy0_1K-4      1.14µs ± 0%    1.14µs ± 0%    ~       (p=0.369 n=9+6)
RegexpMatchEasy1_32-4      82.6ns ± 0%    82.0ns ± 0%  -0.70%   (p=0.000 n=9+10)
RegexpMatchEasy1_1K-4       469ns ± 0%     463ns ± 0%  -1.23%    (p=0.000 n=6+9)
RegexpMatchMedium_32-4      138ns ± 1%     136ns ± 0%  -1.38%   (p=0.000 n=10+9)
RegexpMatchMedium_1K-4     43.6µs ± 1%    42.0µs ± 0%  -3.74%    (p=0.000 n=9+9)
RegexpMatchHard_32-4       2.25µs ± 1%    2.23µs ± 0%  -0.57%    (p=0.000 n=8+8)
RegexpMatchHard_1K-4       68.8µs ± 0%    68.6µs ± 0%  -0.37%    (p=0.000 n=8+8)
Revcomp-4                   477ms ± 1%     472ms ± 0%  -1.03%    (p=0.000 n=8+8)
Template-4                 76.1ms ± 0%    76.4ms ± 0%  +0.35%    (p=0.000 n=9+9)
TimeParse-4                 367ns ± 0%     366ns ± 0%  -0.16%   (p=0.003 n=10+8)
TimeFormat-4                386ns ± 0%     384ns ± 0%  -0.58%    (p=0.000 n=9+9)

name                     old speed      new speed      delta
GobDecode-4              86.4MB/s ± 1%  87.1MB/s ± 0%  +0.81%  (p=0.003 n=10+10)
GobEncode-4              96.7MB/s ± 1%  97.4MB/s ± 0%  +0.66%   (p=0.007 n=9+10)
Gzip-4                   71.4MB/s ± 1%  70.0MB/s ± 0%  -1.91%   (p=0.000 n=10+9)
Gunzip-4                  409MB/s ± 0%   410MB/s ± 0%    ~      (p=0.703 n=9+10)
JSONEncode-4              101MB/s ± 0%   100MB/s ± 0%    ~     (p=0.084 n=10+10)
JSONDecode-4             31.8MB/s ± 0%  31.7MB/s ± 0%  -0.33%   (p=0.000 n=10+8)
GoParse-4                14.7MB/s ± 1%  14.6MB/s ± 1%  -0.67%    (p=0.002 n=9+9)
RegexpMatchEasy0_32-4     362MB/s ± 0%   361MB/s ± 0%  -0.36%   (p=0.000 n=10+9)
RegexpMatchEasy0_1K-4     898MB/s ± 0%   898MB/s ± 0%    ~       (p=0.762 n=9+8)
RegexpMatchEasy1_32-4     387MB/s ± 0%   390MB/s ± 0%  +0.70%   (p=0.000 n=9+10)
RegexpMatchEasy1_1K-4    2.18GB/s ± 0%  2.21GB/s ± 0%  +1.20%    (p=0.000 n=9+9)
RegexpMatchMedium_32-4   7.23MB/s ± 1%  7.32MB/s ± 0%  +1.19%   (p=0.000 n=10+9)
RegexpMatchMedium_1K-4   23.5MB/s ± 1%  24.4MB/s ± 0%  +3.88%    (p=0.000 n=9+9)
RegexpMatchHard_32-4     14.2MB/s ± 1%  14.3MB/s ± 0%  +0.58%    (p=0.000 n=8+8)
RegexpMatchHard_1K-4     14.9MB/s ± 0%  14.9MB/s ± 0%  +0.34%    (p=0.000 n=8+7)
Revcomp-4                 533MB/s ± 1%   539MB/s ± 0%  +1.04%    (p=0.000 n=8+8)
Template-4               25.5MB/s ± 0%  25.4MB/s ± 0%  -0.36%    (p=0.000 n=9+9)

ARM

The major improvement that landed recently in the development branch is the conversion of the remaining architecture backends to use the compiler’s SSA form. This has brought a substantial improvement in generated code for non Intel architectures, like ARM3.

name                       old time/op    new time/op    delta
BinaryTree17-4              33.8s ± 1%      27.7s ± 0%  -18.06%  (p=0.000 n=10+10)
Fannkuch11-4                42.0s ± 0%      19.3s ± 0%  -54.10%  (p=0.000 n=10+10)
FmtFprintfEmpty-4           670ns ± 1%      581ns ± 1%  -13.30%  (p=0.000 n=10+10)
FmtFprintfString-4         2.04µs ± 1%     1.65µs ± 0%  -19.09%  (p=0.000 n=10+10)
FmtFprintfInt-4            1.71µs ± 0%     1.21µs ± 0%  -29.39%   (p=0.000 n=10+9)
FmtFprintfIntInt-4         2.69µs ± 1%     1.94µs ± 0%  -27.77%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4    2.70µs ± 0%     1.85µs ± 0%  -31.41%   (p=0.000 n=10+9)
FmtFprintfFloat-4          5.15µs ± 0%     3.65µs ± 0%  -29.01%   (p=0.000 n=9+10)
FmtManyArgs-4              11.3µs ± 0%      8.5µs ± 0%  -24.79%   (p=0.000 n=10+9)
GobDecode-4                 112ms ± 0%       77ms ± 1%  -31.04%    (p=0.000 n=9+9)
GobEncode-4                88.5ms ± 1%     77.2ms ± 1%  -12.78%  (p=0.000 n=10+10)
Gzip-4                      4.79s ± 0%      3.34s ± 0%  -30.18%    (p=0.000 n=9+9)
Gunzip-4                    702ms ± 0%      463ms ± 0%  -34.05%  (p=0.000 n=10+10)
HTTPClientServer-4          645µs ± 3%      571µs ± 3%  -11.45%  (p=0.000 n=10+10)
JSONEncode-4                227ms ± 0%      186ms ± 0%  -18.16%  (p=0.000 n=10+10)
JSONDecode-4                845ms ± 0%      618ms ± 0%  -26.81%  (p=0.000 n=10+10)
Mandelbrot200-4            59.3ms ± 0%     40.0ms ± 0%  -32.47%  (p=0.000 n=10+10)
GoParse-4                  45.0ms ± 0%     37.0ms ± 0%  -17.68%    (p=0.000 n=9+9)
RegexpMatchEasy0_32-4       974ns ± 0%      878ns ± 0%   -9.81%   (p=0.000 n=10+9)
RegexpMatchEasy0_1K-4      4.60µs ± 0%     4.48µs ± 0%   -2.57%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4      1.02µs ± 0%     0.94µs ± 0%   -8.08%   (p=0.000 n=8+10)
RegexpMatchEasy1_1K-4      6.92µs ± 0%     6.08µs ± 0%  -12.10%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4     1.61µs ± 0%     1.27µs ± 0%  -20.98%    (p=0.000 n=9+6)
RegexpMatchMedium_1K-4      447µs ± 0%      317µs ± 0%  -29.05%   (p=0.000 n=10+9)
RegexpMatchHard_32-4       24.9µs ± 0%     18.4µs ± 0%  -25.89%  (p=0.000 n=10+10)
RegexpMatchHard_1K-4        740µs ± 0%      552µs ± 0%  -25.36%  (p=0.000 n=10+10)
Revcomp-4                  81.0ms ± 1%     65.2ms ± 0%  -19.53%    (p=0.000 n=9+9)
Template-4                  1.17s ± 0%      0.81s ± 0%  -31.28%    (p=0.000 n=9+9)
TimeParse-4                5.52µs ± 0%     3.79µs ± 0%  -31.42%   (p=0.000 n=10+9)
TimeFormat-4               10.6µs ± 0%      8.5µs ± 0%  -19.14%  (p=0.000 n=10+10)

name                     old speed      new speed        delta
GobDecode-4              6.86MB/s ± 0%   9.95MB/s ± 1%  +45.00%    (p=0.000 n=9+9)
GobEncode-4              8.67MB/s ± 1%   9.94MB/s ± 1%  +14.69%  (p=0.000 n=10+10)
Gzip-4                   4.05MB/s ± 0%   5.81MB/s ± 0%  +43.32%   (p=0.000 n=10+9)
Gunzip-4                 27.6MB/s ± 0%   41.9MB/s ± 0%  +51.63%  (p=0.000 n=10+10)
JSONEncode-4             8.53MB/s ± 0%  10.43MB/s ± 0%  +22.20%  (p=0.000 n=10+10)
JSONDecode-4             2.30MB/s ± 0%   3.14MB/s ± 0%  +36.39%   (p=0.000 n=9+10)
GoParse-4                1.29MB/s ± 0%   1.56MB/s ± 0%  +20.93%   (p=0.000 n=9+10)
RegexpMatchEasy0_32-4    32.8MB/s ± 0%   36.4MB/s ± 0%  +10.87%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-4     222MB/s ± 0%    228MB/s ± 0%   +2.64%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4    31.3MB/s ± 0%   34.0MB/s ± 0%   +8.75%   (p=0.000 n=9+10)
RegexpMatchEasy1_1K-4     148MB/s ± 0%    168MB/s ± 0%  +13.76%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4    620kB/s ± 0%    790kB/s ± 0%  +27.42%   (p=0.000 n=10+8)
RegexpMatchMedium_1K-4   2.29MB/s ± 0%   3.23MB/s ± 0%  +41.05%  (p=0.000 n=10+10)
RegexpMatchHard_32-4     1.29MB/s ± 0%   1.74MB/s ± 0%  +34.88%   (p=0.000 n=9+10)
RegexpMatchHard_1K-4     1.38MB/s ± 0%   1.85MB/s ± 0%  +34.06%  (p=0.000 n=10+10)
Revcomp-4                31.4MB/s ± 1%   39.0MB/s ± 0%  +24.26%    (p=0.000 n=9+9)
Template-4               1.65MB/s ± 0%   2.41MB/s ± 0%  +45.71%   (p=0.000 n=10+9)

Notes:

  1. Despite the Go 1.8 development cycle opening 18 days late, in order to keep to the 6 month cadence, the feature freeze for this cycle will still occur on the 1st of November.
  2. Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 3.13.0-95-generic #142-Ubuntu
  3. Freescale i.MX6, 3.14.77-1-ARCH

有疑问加站长微信联系(非本文作者)

本文来自:Dave Cheney

感谢作者:Dave Cheney

查看原文:Go 1.8 performance improvements, one month in

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

964 次点击  
加入收藏 微博
上一篇:SOLID Go Design
下一篇:Introducing Go 2.0
0 回复
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传