Go in Go
随着 Go 1.5 版本的开发,现在整个系统都是使用 Go 编写的(有一小部分汇编)
C 已经成为过去时了。
注: gccgo
仍然很强大
这篇文章主要探讨原始编译器 gcc
为什么之前用 C 编写?
启动
(而且 Go 的主要目的不是作为一个编译器实现语言)
为什么编译器使用 Go 重写?
不单单是验证,我们还有更多实际的原因:
Go 比 C 容易编写(实际上)
Go 比 C 容易调试(即使没有调试器的情况下)
Go 将成为你唯一需要会的语言,鼓励贡献
Go 有更好的模块化,工具链,测试工具,配置工具等等
Go 很简单就能进行并行操作
虽然看起来很多优势,但是还是夸的太早了:)
设计文档: golang.org/s/go13compiler
为什么运行时也是用 Go 重写?
我们有自己的 C 编译器来编译运行时
我们需要一个带有跟 Go ABI 一样编译器,比如 segmented stacks。
使用 Go 编写可以摆脱 C 编译器的劣势。
这是比使用 Go 重写编译器还重要。
(所有使用 Go 重写编译器的理由都可以作为使用 Go 重写运行时的理由)
限制运行时只用一种语言编写,更容易进行集成,管理 stack 等等操作。
跟往常一样,简化是首要的考虑因素。
历史
为什么我们要有完全属于自己的工具链?
我们的 ABI?
我们自身的文件格式?
History, familiarity, and ease of moving forward. And speed.
Go 的大部分重大改进要比 GCC 或者 LLVM 要更困难。
news.ycombinator.com/item?id=8817990
重大改进
由于使用自身工具简化的功能和使用 Go 重写后的一些改进:
linker 重构
新垃圾收集器
堆叠图
连续栈
写屏障
最后三个都不可能用 C 实现:
C 是非类型安全的
因为最佳化而造成堆栈槽混淆
(Gccgo
很快会有 segmented stacks 和 imprecise (stack) collection )
Goroutine 栈
Until 1.2: Stacks were segmented.
1.3: Stacks were contiguous unless executing C code (runtime).
1.4: Stacks made contiguous by restricting C to system stack.
1.5: Stacks made contiguous by eliminating C.
These were each huge steps, made quickly (led by khr@
).
Converting the runtime
Mostly done by hand with machine assistance.
Challenge to implement the runtime in a safe language.
Some use of unsafe
to deal with pointers as raw bits in the GC, for instance.
But less than you might think.
The translator (next sections) helped for some of the translation.
Converting the compiler
Why translate it, not write it from scratch? Correctness, testing.
Steps:
Write a custom translator from C to Go.
Run the translator, iterate until success.
Measure success by bit-identical output.
Clean up the code by hand and by machine.
Turn it from C-in-Go to idiomatic Go (still happening).
Translator
First output was C line-by-line translated to (bad!) Go.
Tool to do this written by rsc@
(talked about at GopherCon 2014).
Custom written for this job, not a general C-to-Go translator.
Steps:
Parse C code using new simple C parser (
yacc
)Remove or rewrite C-isms such as
*p++
as an expressionWalk the C parse tree, print the C code in Go syntax
Compile the output
Run, compare generated code
Repeat
The Yacc
grammar was translated by sam-powered hands.
Translator configuration
Aided by hand-written rewrite rules, such as:
this field is a bool
this function returns a bool
Also diff-like rewrites for things such as using the standard library:
diff { - g.Rpo = obj.Calloc(g.Num*sizeof(g.Rpo[0]), 1).([]*Flow) - idom = obj.Calloc(g.Num*sizeof(idom[0]), 1).([]int32) - if g.Rpo == nil || idom == nil { - Fatal("out of memory") - } + g.Rpo = make([]*Flow, g.Num) + idom = make([]int32, g.Num) }
Another example
This one due to semantic difference between the languages.
diff { - if nreg == 64 { - mask = ^0 // can't rely on C to shift by 64 - } else { - mask = (1 << uint(nreg)) - 1 - } + mask = (1 << uint(nreg)) - 1 }
Grind
Once in Go, new tool grind
deployed (by rsc@
):
parses Go, type checks
records a list of edits to perform: "insert this text at this position"
at end, applies edits to source (hard to edit AST).
Changes guided by profiling and other analysis:
removes dead code
removes gotos
removes unused labels, needless indirections, etc.
moves
var
declarations nearer to first use
Performance problems
Output from translator was poor Go, and ran about 10X slower.
Most of that slowdown has been recovered.
Problems with C to Go:
C patterns can be poor Go; e.g.: complex
for
loopsC stack variables never escape; Go compiler isn't as sure
interfaces such as
fmt.Stringer
vs. C'svarargs
no
unions
in Go, so usestructs
instead: bloatvariable declarations in wrong place
C compiler didn't free much memory, but Go has a GC.
Adds CPU and memory overhead.
Performance fixes
Profile! (Never done before!)
move
vars
closer to first usesplit
vars
into multiplereplace code in the compiler with code in the library: e.g.
math/big
use interface or other tricks to combine
struct
fieldsbetter escape analysis (
drchase@
).hand tuning code and data layout
Use tools like grind
, gofmt
-r
and eg
for much of this.
Removing interface argument from a debugging print library got 15% overall!
More remains to be done.
Technical benefits
Other benefits of the conversion:
Garbage collection means no more worry about introducing a dangling pointer.
Chance to clean up the back ends.
Unified 386
and amd64
architectures throughout the tool chain.
New architectures are easier to add.
Unified the tools: now one compiler, one assembler, one linker.
Compiler
GOOS=YYY
GOARCH=XXX
go
tool
compile
One compiler; no more 6g
, 8g
etc.
About 50K lines of portable code.
Even the registerizer is portable now; architectures well characterized.
Non-portable: Peepholing, details like registers bound to instructions.
Typically around 10% of the portable LOC.
Assembler
GOOS=YYY
GOARCH=XXX
go
tool
asm
New assembler, all in Go, written from scratch by r@
.
Clean, idiomatic Go code.
Less than 4000 lines, <10% machine-dependent.
Almost completely compatible with previous yacc
and C assemblers.
How is this possible?
shared syntax originating in the Plan 9 assemblers
unified back-end logic (old
liblink
, nowinternal/obj
)
Linker
GOOS=YYY
GOARCH=XXX
go
tool
link
Mostly hand- and machine- translated from C code.
New library, internal/obj
, part of original linker, captures details about machines, writes object files.
27000 lines summed across 4 architectures, mostly tables (plus some ugliness).
arm
: 4000arm64
: 6000ppc64
: 5000x86
: 7500 (386
andamd64
)
Example benefit: one print routine to print any instruction for any architecture.
启动
不需要 C 编译器,只需要一个 Go 编译器
因此需要从 1.5 的源代码去下载安装构建 Go
我们使用 Go 1.4+ 作为基础库来构建 1.5+ 的工具链
详情: golang.org/s/go15bootstrap
未来
未来仍然有很多任务要完成,但是 1.5 已经完成的差不多了。
未来的计划:
更好的转义分析
新编译器后端使用 SSA(使用 Go 会比 C 简单很多)。
更多优化
从 PDFs (或者是 XML)生成机器描述
将会有一个纯机器生成指令定义
“从 PDF 读入,写出一个汇编配置”
已经部署反汇编程序
总结
摆脱 C 是 Go 项目的一个巨大改进,代码更整洁,提升可测试性,可部署性,也更容易运行。
新的统一工具链减少了代码数量,提升可维护性。
灵活的工具链对可移植性也很重要。
Thank you
Rob Pike
via talks.golang.org
有疑问加站长微信联系(非本文作者)