[转载自 开源中国 oschina.net]
本文地址:http://www.oschina.net/translate/go-at-google-language-design-in-the-service-of-software-engineering
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们
[转载自 开源中国 oschina.net]
本文地址:http://www.oschina.net/translate/go-at-google-language-design-in-the-service-of-software-engineering
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们
Go在谷歌:以软件工程为目的的语言设计
1. 摘要
(本文是根据Rob Pike于2012年10月25日在Tucson, Arizona举行的SPLASH 2012大会上所做的主题演讲进行修改后所撰写的。)
针对我们在Google公司内开发软件基础设施时遇到的一些问题,我们于2007年末构思出Go编程语言。当今的计算领域同创建如今所使用的编程语言(使用最多的有C++、Java和Python)时的环境几乎没什么关系了。由多核处理器、系统的网络化、大规模计算机集群和Web编程模型带来的编程问题都是以迂回的方式而不是迎头而上的方式解决的。此外,程序的规模也已发生了变化:现在的服务器程序由成百上千甚至成千上万的程序员共同编写,源代码也以数百万行计,而且实际上还需要每天都进行更新。更加雪上加霜的是,即使在大型编译集群之上进行一次build,所花的时间也已长达数十分钟甚至数小时。
之所以设计开发Go,就是为了提高这种环境下的工作效率。Go语言设计时考虑的因素,除了大家较为了解的内置并发和内存垃圾自动回收这些方面之外,还包括严格的依赖管理、对随系统增大而在体系结构方面发生变化的适应性、跨组件边界的健壮性(robustness)。
本文将详细讲解在构造一门轻量级并让人感觉愉悦的、高效的编译型编程语言时,这些问题是如何得到解决的。讲解过程中使用的例子都是来自Google公司中所遇到的现实问题。
2. 简介
Go语言开发自Google,是一门支持并发编程和内存垃圾回收的编译型静态类型语言。它是一个开源的项目:Google从公共的代码库中导入代码而不是相反。
Go语言运行效率高,具有较强的可伸缩性(scalable),而且使用它进行工作时的效率也很高。有些程序员发现用它编程很有意思;还有一些程序员认为它缺乏想象力甚至很烦人。在本文中我们将解释为什么这两种观点并不相互矛盾。Go是为解决Google在软件开发中遇到的问题而设计的,虽然因此而设计出的语言不会是一门在研究领域里具有突破性进展的语言,但它却是大型软件项目中软件工程方面的一个非常棒的工具。
3. Google公司中的Go语言
为了帮助解决Google自己的问题,Google设计了Go这门编程语言,可以说,Google有很大的问题。
硬件的规模很大而且软件的规模也很大。软件的代码行数以百万计,服务器软件绝大多数用的是C++,还有很多用的是Java,剩下的一部分还用到了Python。成千上万的工程师在这些代码上工作,这些代码位于由所有软件组成的一棵树上的“头部”,所以每天这棵树的各个层次都会发生大量的修改动作。尽管使用了一个大型自主设计的分布式Build系统才让这种规模的开发变得可行,但这个规模还是太大 了。
当然,所有这些软件都是运行在无数台机器之上的,但这些无数台的机器只是被看做数量并不多若干互相独立而仅通过网络互相连接的计算机集群。
简言之,Google公司的开发规模很大,速度可能会比较慢,看上去往往也比较笨拙。但很有效果。
Go项目的目标是要消除Google公司软件开发中的慢速和笨拙,从而让开发过程更加高效并且更加具有可伸缩性。该语言的设计者和使用者都是要为大型软件系统编写、阅读和调试以及维护代码的人。
因此,Go语言的目的不是要在编程语言设计方面进行科研;它要能为它的设计者以及设计者的同事们改善工作环境。Go语言考虑更多的是软件工程而不是编程语言方面的科研。或者,换句话说,它是为软件工程服务而进行的语言设计。
但是,编程语言怎么会对软件工程有所帮助呢?下文就是该问题的答案。
4. 痛之所在
当Go刚推出来时,有人认为它缺乏某些大家公认的现代编程语言中所特有的特性或方法论。缺了这些东西,Go语言怎么可能会有存在的价值?我们回答这个问题的答案在于,Go的确具有一些特性,而这些特性可以解决困扰大规模软件开发的一些问题。这些问题包括:
- Build速度缓慢
- 失控的依赖关系
- 每个程序员使用同一门语言的不同子集
- 程序难以理解(代码难以阅读,文档不全面等待)
- 很多重复性的劳动
- 更新的代价大
- 版本偏斜(version skew)
- 难以编写自动化工具
- 语言交叉Build(cross-language build)产生的问题
一门语言每个单个的特性都解决不了这些问题。这需要从软件工程的大局观,而在Go语言的设计中我们试图致力于解决所有这些问题。
举个简单而独立的例子,我们来看看程序结果的表示方式。有些评论者反对Go中使用象C一样用花括号表示块结构,他们更喜欢Python或Haskell风格式,使用空格表示缩进。可是,我们无数次地碰到过以下这种由语言交叉Build造成的Build和测试失败:通过类似SWIG调用的方式,将一段Python代码嵌入到另外一种语言中,由于修改了这段代码周围的一些代码的缩进格式,从而导致Python代码也出乎意料地出问题了并且还非常难以觉察。 因此,我们的观点是,虽然空格缩进对于小规模的程序来说非常适用,但对大点的程序可不尽然,而且程序规模越大、代码库中的代码语言种类越多,空格缩进造成的问题就会越多。为了安全可靠,舍弃这点便利还是更好一点,因此Go采用了花括号表示的语句块。
5.C和C++中的依赖
在处理包依赖(package dependency)时会出现一些伸缩性以及其它方面的问题,这些问题可以更加实质性的说明上个小结中提出的问题。让我们先来回顾一下C和C++是如何处理包依赖的。
ANSI C第一次进行标准化是在1989年,它提倡要在标准的头文件中使用#ifndef这样的"防护措施"。 这个观点现已广泛采用,就是要求每个头文件都要用一个条件编译语句(clause)括起来,这样就可以将该头文件包含多次而不会导致编译错误。比如,Unix中的头文件<sys/stat.h>看上去大致是这样的:
1 |
/*
Large copyright and licensing notice */ |
2 |
#ifndef
_SYS_STAT_H_ |
3 |
#define
_SYS_STAT_H_ |
4 |
/*
Types and other definitions */ |
5 |
#endif |
此举的目的是让C的预处理器在第二次以及以后读到该文件时要完全忽略该头文件。符号_SYS_STAT_H_在文件第一次读到时进行定义,可以“防止”后继的调用。
这么设计有一些好处,最重要的是可以让每个头文件能够安全地include它所有的依赖,即时其它的头文件也有同样的include语句也不会出问题。 如果遵循此规则,就可以通过对所有的#include语句按字母顺序进行排序,让代码看上去更整洁。
但是,这种设计的可伸缩性非常差。
在1984年,有人发现在编译Unix中ps命令的源程序ps.c时,在整个的预处理过程中,它包含了<sys/stat.h>这个头文件37次之多。尽管在这么多次的包含中有36次它的文件的内容都不会被包含进来,但绝大多数C编译器实现都会把"打开文件并读取文件内容然后进行字符串扫描"这串动作做37遍。这么做可真不聪明,实际上,C语言的预处理器要处理的宏具有如此复杂的语义,其势必导致这种行为。
对软件产生的效果就是在C程序中不断的堆积#include语句。多加一些#include语句并不会导致程序出问题,而且想判断出其中哪些是再也不需要了的也很困难。删除一条#include语句然后再进行编译也不太足以判断出来,因为还可能有另外一条#include所包含的文件中本身还包含了你刚刚删除的那条#include语句。
从技术角度讲,事情并不一定非得弄成这样。在意识到使用#ifndef这种防护措施所带来的长期问题之后,Plan 9的library的设计者采取了一种不同的、非ANSI标准的方法。Plan 9禁止在头文件中使用#include语句,并要求将所有的#include语句放到顶层的C文件中。 当然,这么做需要一些训练 —— 程序员需要一次列出所有需要的依赖,还要以正确的顺序排列 —— 但是文档可以帮忙而且实践中效果也非常好。这么做的结果是,一个C源程序文件无论需要多少依赖,在对它进行编译时,每个#include文件只会被读一次。当然,这样一来,对于任何#include语句都可以通过先拿掉然后在进行编译的方式判断出这条#include语句到底有无include的必要:当且仅当不需要该依赖时,拿掉#include后的源程序才能仍然可以通过编译。
Plan 9的这种方式产生的一个最重要的结果是编译速度比以前快了很多:采用这种方式后编译过程中所需的I/O量,同采用#ifndef的库相比,显著地减少了不少。
但在Plan 9之外,那种“防护”式的方式依然是C和C++编程实践中大家广为接受的方式。实际上,C++还恶化了该问题,因为它把这种防护措施使用到了更细的粒度之上。按照惯例,C++程序通常采用每个类或者一小组相关的类拥有一个头文件这种结构,这种分组方式要更小,比方说,同<stdio.h>相比要小。因而其依赖树更加错综复杂,它反映的不是对库的依赖而是对完整类型层次结构的依赖。而且,C++的头文件通常包含真正的代码 —— 类型、方法以及模板声明 ——不像一般的C语言头文件里面仅仅有一些简单的常量定义和函数签名。这样,C++就把更多的工作推给了编译器,这些东西编译起来要更难一些,而且每次编译时编译器都必须重复处理这些信息。当要build一个比较大型的C++二进制程序时,编译器可能需要成千上万次地处理头文件<string>以了解字符串的表示方式。(根据当时的记录,大约在1984年,Tom Cargill说道,在C++中使用C预处理器来处理依赖管理将是个长期的不利因素,这个问题应该得到解决。)
在Google,Build一个单个的C++二进制文件就能够数万次地打开并读取数百个头文件中的每个头文件。在2007年,Google的build工程师们编译了一次Google里一个比较主要的C++二进制程序。该文件包含了两千个文件,如果只是将这些文件串接到一起,总大型为4.2M。将#include完全扩展完成后,就有8G的内容丢给编译器编译,也就是说,C++源代码中的每个自己都膨胀成到了2000字节。 还有一个数据是,在2003年Google的Build系统转变了做法,在每个目录中安排了一个Makefile,这样可以让依赖更加清晰明了并且也能好的进行管理。一般的二进制文件大小都减小了40%,就因为记录了更准确的依赖关系。即使如此,C++(或者说C引起的这个问题)的特性使得自动对依赖关系进行验证无法得以实现,直到今天我们仍然我发准确掌握Google中大型的C++二进制程序的依赖要求的具体情况。 由于这种失控的依赖关系以及程序的规模非常之大,所以在单个的计算机上build出Google的服务器二进制程序就变得不太实际了,因此我们创建了一个大型分布式编译系统。该系统非常复杂(这个Build系统本身也是个大型程序)还使用了大量机器以及大量缓存,藉此在Google进行Build才算行得通了,尽管还是有些困难。 即时采用了分布式Build系统,在Google进行一次大规模的build仍需要花几十分钟的时间才能完成。前文提到的2007年那个二进制程序使用上一版本的分布式build系统花了45分钟进行build。现在所花的时间是27分钟,但是,这个程序的长度以及它的依赖关系在此期间当然也增加了。为了按比例增大build系统而在工程方面所付出的劳动刚刚比软件创建的增长速度提前了一小步。6. 走进 Go 语言
当编译缓慢进行时,我们有充足的时间来思考。关于 Go 的起源有一个传说,话说正是一次长达45分钟的编译过程中,Go 的设想出现了。人们深信,为类似谷歌网络服务这样的大型程序编写一门新的语言是很有意义的,软件工程师们认为这将极大的改善谷歌程序员的生活质量。
尽管现在的讨论更专注于依赖关系,这里依然还有很多其他需要关注的问题。这一门成功语言的主要因素是:
- 它必须适应于大规模开发,如拥有大量依赖的大型程序,且又一个很大的程序员团队为之工作。
- 它必须是熟悉的,大致为 C 风格的。谷歌的程序员在职业生涯的早期,对函数式语言,特别是 C 家族更加熟稔。要想程序员用一门新语言快速开发,新语言的语法不能过于激进。
- 它必须是现代的。C、C++以及Java的某些方面,已经过于老旧,设计于多核计算机、网络和网络应用出现之前。新方法能够满足现代世界的特性,例如内置的并发。
说完了背景,现在让我们从软件工程的角度谈一谈 Go 语言的设计。
7. Go 语言的依赖处理
既然我们谈及了很多C 和 C++ 中依赖关系处理细节,让我们看看 Go 语言是如何处理的吧。在语义和语法上,依赖处理是由语言定义的。它们是明确的、清晰的、且“能被计算的”,就是说,应该很容易被编写工具分析。
在包封装(下节的主题)之后,每个源码文件都或有至少一个引入语句,包括 import 关键词和一个用来明确当前(只是当前)文件引入包的字符串:
import "encoding/json"
使 Go 语言规整的第一步就是:睿智的依赖处理,在编译阶段,语言将未被使用的依赖视为错误(并非警告,是错误)。如果源码文件引入一个包却没有使用它,程序将无法完成编译。这将保证 Go 程序的依赖关系是明确的,没有任何多余的边际。另一方面,它可以保证编译过程不会包含无用代码,降低编译消耗的时间。
第二步则是由编译器实现的,它将通过深入依赖关系确保编译效率。设想一个含有三个包的 Go 程序,其依赖关系如下:
- A 包 引用 B 包;
- B 包 引用 C 包;
- A 包 不引用 C 包
这就意味着,A 包对 C 包的调用是由对 B 包的调用间接实现的;也就是说,在 A 包的代码中,不存在 C 包的标识符。例如,C 包中有一个类型定义,它是 B 包中的某个为 A 包调用的结构体中的字段类型,但其本身并未被 A 包调用。具一个更实际的例子,设想一下,A 包引用了一个 格式化 I/O 包 B,B 包则引用了 C 包提供的缓冲 I/O 实现,A 包本身并没有声明缓冲 I/O。
要编译这个程序,首先 C 被编译,被依赖的包必须在依赖于它们的包之前被编译。之后 B 包被编译;最后 A 包被编译,然后程序将被连接。
当 A 包编译完成之后,编译器将读取 B 包的目标文件,而不是代码。此目标文件包含编译器处理 A 包代码中
import "B"
语句所需的所有类型信息。这些信息也包含着 B 包在编译是所需的 C 包的信息。换句话说,当 B 包被编译时,生成的目标文件包含了所有 B 包公共接口所需的全部依赖的类型信息。
这种设计拥有很重要的意义,当编译器处理 import 语句时,它将打开一个文件——该语句所明确的对象文件。当然,这不由的让人想起 Plan 9 C (非 ANSI C)对依赖管理方法,但不同的是,当 Go 代码文件被编译完成时,编译器将写入头文件。同 Plan 9 C 相比,这个过程将更自动化、更高效,因为:在处理 import 时读取的数据只是“输出”数据,而非程序代码。这对编译效率的影响是巨大的,而且,即便代码增长,程序依然规整如故。处理依赖树并对之编译的时间相较于 C 和 C++ 的“引入被引用文件”的模型将极大的减少。
值得一提的是,这个依赖管理的通用方法并不是原始的;这些思维要追溯到1970年代的像Modula-2和Ada语言。在C语言家族里,Java就包含这一方法的元素。
为了使编译更加高效,对象文件以导出数据作为它的首要步骤,这样编译器一旦到达文件的末尾就可以停止读取。这种依赖管理方法是为什么Go编译比C或C++编译更快的最大原因。另一个因素是Go语言把导出数据放在对象文件中;而一些语言要求程序员编写或让编译器生成包含这一信息的另一个文件。这相当于两次打开文件。在Go语言中导入一个程序包只需要打开一次文件。并且,单一文件方法意味着导出数据(或在C/C++的头文件)相对于对象文件永远不会过时。
为了准确起见,我们对Google中用Go编写的某大型程序的编译进行了测算,将源代码的展开情况同前文中对C++的分析做一对比。结果发现是40倍,要比C++好50倍(同样也要比C++简单因而处理速度也快),但是这仍然比我们预期的要大。原因有两点。第一,我们发现了一个bug:Go编译器在export部分产生了大量的无用数据。第二,export数据采用了一种比较冗长的编码方式,还有改善的余地。我们正计划解决这些问题。
然而,仅需作50分之1的事情就把原来的Build时间从分钟级的变为秒级的,将咖啡时间转化为交互式build。
Go的依赖图还有另外一个特性,就是它不包含循环。Go语言定义了不允许其依赖图中有循环性的包含关系,编译器和链接器都会对此进行检查以确保不存在循环依赖。虽然循环依赖偶尔也有用,但它在大规模程序中会引入巨大的问题。循环依赖要求编译器同时处理大量源文件,从而会减慢增量式build的速度。更重要的是,如果允许循环依赖,我们的经验告诉我们,这种依赖最后会形成大片互相纠缠不清的源代码树,从而让树中各部分也变得很大,难以进行独立管理,最后二进制文件会膨胀,使得软件开发中的初始化、测试、重构、发布以及其它一些任务变得过于复杂。
不支持循环import偶尔会让人感到苦恼,但却能让依赖树保持清晰明了,对package的清晰划分也提了个更高的要求。就象Go中其它许多设计决策一样,这会迫使程序员早早地就对一些大规模程序里的问题提前进行思考(在这种情况下,指的是package的边界),而这些问题一旦留给以后解决往往就会永远得不到满意的解决。 在标准库的设计中,大量精力花在了控制依赖关系上了。为了使用一个函数,把所需的那一小段代码拷贝过来要比拉进来一个比较大的库强(如果出现新的核心依赖的话,系统build里的一个test会报告问题)。在依赖关系方面保持良好状况要比代码重用重要。在实践中有这样一个例子,底层的网络package里有自己的整数到小数的转换程序,就是为了避免对较大的、依赖关系复杂的格式化I/O package的依赖。还有另外一个例子,字符串转换package的strconv拥有一个对‘可打印’字符的进行定义的private实现,而不是将整个大哥的Unicode字符类表格拖进去, strconv里的Unicode标准是通过package的test进行验证的。8. 包
Go 的包系统设计结合了一些库、命名控件和模块的特性。
每个 Go 的代码文件,例如“encoding/json/json.go”,都以包声明开始,如同:
package json
“json” 就是“包名称”,一个简单的识别符号。通常包名称都比较精炼。
要使用包,使用 import 声明引入代码,并以 包路径 区分。“路径”的意义并未在语言中指定,而是约定为以/分割的代码包目录路径,如下:
import "encoding/json"
后面用包名称(有别于路径)则用来限定引入自代码文件中包的条目。
var dec = json.NewDecoder(reader)
这种设计非常清晰,从语法(Namevs.pkg.Name)上就能识别一个名字是否属于某个包(在此之后)。
在我们的示例中,包的路径是“encoding/json”而包的名称是 json。标准资源库以外,通常约定以项目或公司名作为命名控件的根:
import "google/base/go/log"
确认包路径的唯一性非常重要,而对包名称则不必强求。包必须通过唯一的路径引入,而包名称则为引用者调用内容方式的一个约定。包名称不必唯一,可以通过引入语句重命名识别符。下面有两个自称为“package log”的包,如果要在单个源码文件中引入,需要在引入时重命名一个。
import "log" // Standard package import googlelog "google/base/go/log" // Google-specific package
每个公司都可能有自己的 log 包,不必要特别命名。恰恰相反:Go 的风格建议包名称保持简短和清晰,且不必担心冲突。
另一个例子:在 Google 代码库中有很多server 库。
9. 远程包
Go的包管理系统的一个重要特性是包路径,通常是一个字符串,通过识别 网站资源的URL 可以增加远程存储库。
下面就是如何使用储存在 github 上的包。go get 命令使用 go 编译工具获取资源并安装。一旦安装完毕,就可以如同其它包一样引用它。
$ go get github.com/4ad/doozer // Shell command to fetch package import "github.com/4ad/doozer" // Doozer client's import statement var client doozer.Conn // Client's use of package
这是值得注意的,go get 命令递归下载依赖,此特性得以实现的原因就是依赖关系的明确性。另外,由于引入路径的命名空间依赖于 URL,使得 Go 相较于其它语言,在包命名上更加分散和易于扩展。
10. 语法
语法就是编程语言的用户界面。虽然对于一门编程语言来说更重要的是语意,并且语法对于语意的影响也是有限的,但是语法决定了编程语言的可读性和明确性。同时,语法对于编程语言相关工具的编写至关重要:如果编程语言难以解析,那么自动化工具也将难以编写。
Go语言因此在设计阶段就为语言的明确性和相关工具的编写做了考虑,设计了一套简洁的语法。与C语言家族的其他几个成员相比,Go语言的词法更为精炼,仅25个关键字(C99为37个;C++11为84个;并且数量还在持续增加)。更为重要的是,Go语言的词法是规范的,因此也是易于解析的(应该说绝大部分是规范的;也存在一些我们本应修正却没有能够及时发现的怪异词法)。与C、Java特别是C++等语言不同,Go语言可以在没有类型信息或者符号表的情况下被解析,并且没有类型相关的上下文信息。Go语言的词法是易于推论的,降低了相关工具编写的难度。
Go 语法不同于 C 的一个细节是,它的变量声明语法相较于 C 语言,更接近 Pascal 语言。声明的变量名称在类型之前,而有更多的关键词很:
var fn func([]int) int type T struct { a, b int }
相较于 C 语言
int (*fn)(int[]); struct T { int a, b; }
无论是对人还是对计算机,通过关键词进行变量声明将更容易被识别。而通过类型语法而非 C 的表达式语法对词法分析有一个显著的影响:它增加了语法,但消除了歧义。不过,还有一个:你可以丢掉 var 关键词,而只在表达式用使用变量的类型。两种变量声明是等价的;只是第二个更简短且共通用:
var buf *bytes.Buffer = bytes.NewBuffer(x) // 精确 buf := bytes.NewBuffer(x) // 衍生
golang.org/s/decl-syntax 是一篇更详细讲解 Go 语言声明语句以及为什么同 C 如此不同的文章。
函数声明语法对于简单函数非常直接。这里有一个 Abs 函数的声明示例,它接受一个类型为 T 的变量 x,并返回一个64位浮点值:
func Abs(x T) float64
一个方法只是一个拥有特殊参数的函数,而它的 接收器(receiver)则可以使用标准的“点”符号传递给函数。方法的声明语法将接收器放在函数名称之前的括号里。下面是一个与之前相同的函数,但它是 T 类型的一个方法:
func (x T) Abs() float64
下面则是拥有 T 类型参数的一个变量(闭包);Go 语言拥有第一类函数和闭包功能:
negAbs := func(x T) float64 { return -Abs(x) }
最后,在 Go 语言中,函数可以返回多个值。通用的方法是成对返回函数结果和错误值,例如:
func ReadByte() (c byte, err error) c, err := ReadByte() if err != nil { ... }
我们过会儿再说错误。
Go语言缺少的一个特性是它不支持缺省参数。这是它故意简化的。经验告诉我们缺省参数太容易通过添加更多的参数来给API设计缺陷打补丁,进而导致太多使程序难以理清深圳费解的交互参数。默认参数的缺失要求更多的函数或方法被定义,因为一个函数不能控制整个接口,但这使得一个API更清晰易懂。哪些函数也都需要独立的名字, 使程序更清楚存在哪些组合,同时也鼓励更多地考虑命名--一个有关清晰性和可读性的关键因素。一个默认参数缺失的缓解因素是Go语言为可变参数函数提供易用和类型安全支持的特性。
11. 命名
Go 采用了一个不常见的方法来定义标识符的可见性(可见性:包使用者(client fo a package)通过标识符使用包内成员的能力)。Go 语言中,名字自己包含了可见性的信息,而不是使用常见的private,public等关键字来标识可见性:标识符首字母的大小写决定了可见性。如果首字母是大写字母,这个标识符是exported(public); 否则是私有的。
- 首字母大写:名字对于包使用者可见
- 否则:name(或者_Name)是不可见的。
这条规则适用于变量,类型,函数,方法,常量,域成员...等所有的东西。关于命名,需要了解的就这么多。
这个设计不是个容易的决定。我们挣扎了一年多来决定怎么表示可见性。一旦我们决定了用名字的大小写来表示可见性,我们意识到这变成了Go语言最重要特性之一。毕竟,包使用者使用包时最关注名字;把可见性放在名字上而不是类型上,当用户想知道某个标示符是否是public接口,很容易就可以看出来。用了Go语言一段时间后,再用那些需要查看声明才知道可见性的语言就会觉得很麻烦。
很清楚,这样再一次使程序源代码清晰简洁的表达了程序员的意图。
另一个简洁之处是Go语言有非常紧凑的范围体系:
- 全局(预定义的标示符例如 int 和 string)
- 包(包里的所有源代码文件在同一个范围)
- 文件(只是在引入包时重命名,实践中不是很重要)
- 函数(所有函数都有,不解释)
- 块(不解释)
Go语言没有命名空间,类或者其他范围。名字只来源于很少的地方,而且所有名字都遵循一样的范围体系:在源码的任何位置,一个标示符只表示一个语言对象,而独立于它的用法。(唯一的例外是语句标签(label)-break和其他类似跳转语句的目标地址;他们总是在当前函数范围有效)。
这样就使Go语言很清晰。例如,方法总是显式(expicit)的表明接受者(receiver)-用来访问接受者的域成员或者方法,而不是隐式(impliciti)的调用。也就是,程序员总是写
rcvr.Field
(rcvr 代表接受者变量) 所以在词法上(lexically),每个元素总是绑定到接受者类型的某个值。 同样,包命修饰符(qualifier)总是要写在导入的名字前-要写成io.Reader而不是Reader。除了更清晰,这样Reader这种很常用的名字可以使用在任何包中。事实上,在标准库中有多个包都导出Reader,Printf这些名字,由于加上包的修饰符,这些名字引用于那个包就很清晰,不会被混淆。
最终,这些规则组合起来确保了:除了顶级预先定义好的名字例如 int,每一个名字(的第一个部分-x.y中的x)总是声明在当前包。
简单说,名字是本地的。在C,C++,或者Java名字 y 可以指向任何事。在Go中,y(或Y)总是定义在包中, x.Y 的解释也很清晰:本地查找x,Y就在x里。
这些规则为可伸缩性提供了一个很重要的价值,因为他们确保为一个包增加一个公开的名字不会破坏现有的包使用者。命名规则解耦包,提供了可伸缩性,清晰性和强健性。
关于命名有一个更重要的方面要说一下:方法查找总是根据名字而不是方法的签名(类型) 。也就是说,一个类型里不会有两个同名的方法。给定一个方法 x.M,只有一个M在x中。这样,在只给定名字的情况下,这种方法很容易可以找到它指向那个方法。这样也使的方法调用的实现简单化了。
12. 语意
Go语言的程序语句在语意上基本与C相似。它是一种拥有指针等特性的编译型的、静态类型的过程式语言。它有意的给予习惯于C语言家族的程序员一种熟悉感。对于一门新兴的编程语言来说,降低目标受众程序员的学习门槛是非常重要的;植根于C语言家族有助于确保那些掌握Java、JavaScript或是C语言的年轻程序员能更轻松的学习Go语言。
尽管如此,Go语言为了提高程序的健壮性,还是对C语言的语意做出了很多小改动。它们包括:
- 不能对指针进行算术运算
- 没有隐式的数值转换
- 数组的边界总是会被检查
- 没有类型别名(进行type X int的声明后,X和int是两种不同的类型而不是别名)
- ++和--是语句而不是表达式
- 赋值不是一种表达式
- 获取栈变量的地址是合法的(甚至是被鼓励的)
- 其他
还有一些很大的改变,同传统的C 、C++ 、甚至是JAVA 的模型十分不同。它包含了对以下功能的支持:
- 并发
- 垃圾回收
- 接口类型
- 反射
- 类型转换
下面的章节从软件工程的角度对 Go 语言这几个主题中的两个的讨论:并发和垃圾回收。对于语言的语义和应用的完整讨论,请参阅 golang.org 网站中的更多资源。
13. 并发
运行于多核机器之上并拥有众多客户端的web服务器程序,可称为Google里最典型程序。在这样的现代计算环境中,并发很重要。这种软件用C++或Java做都不是特别好,因为它们缺在与语言级对并发支持的都不够好。
Go采用了一流的channel,体现为CSP的一个变种。之所以选择CSP,部分原因是因为大家对它的熟悉程度(我们中有一位同事曾使用过构建于CSP中的概念之上的前任语言),另外还因为CSP具有一种在无须对其模型做任何深入的改变就能轻易添加到过程性编程模型中的特性。也即,对于类C语言,CSP可以一种最长正交化(orthogonal)的方式添加到这种语言中,为该语言提供额外的表达能力而且还不会对该语言的其它用它施加任何约束。简言之,就是该语言的其它部分仍可保持“通常的样子”。
这种方法就是这样对独立执行非常规过程代码的组合。
结果得到的语言可以允许我们将并发同计算无缝结合都一起。假设Web服务器必须验证它的每个客户端的安全证书;在Go语言中可以很容易的使用CSP来构建这样的软件,将客户端以独立执行的过程来管理,而且还具有编译型语言的执行效率,足够应付昂贵的加密计算。
总的来说,CSP对于Go和Google来说非常实用。在编写Web服务器这种Go语言的典型程序时,这个模型简直是天作之合。
有一条警告很重要:因为有并发,所以Go不能成为纯的内存安全(memory safe)的语言。共享内存是允许的,通过channel来传递指针也是一种习惯用法(而且效率很高)。
有些并发和函数式编程专家很失望,因为Go没有在并发计算的上下文中采用只写一次的方式作为值语义,比如这一点上Go和Erlang就太象。其中的原因大体上还是在于对问题域的熟悉程度和适合程度。Go的并发特性在大多数程序员所熟悉的上下文中运行得很好。Go让使得简单而安全的并发编程成为可能,但它并不阻止糟糕的编程方式。这个问题我们通过惯例来折中,训练程序员将消息传递看做拥有权限控制的一个版本。有句格言道:“不要通过共享内存来通信,要通过通信来共享内存。”
在对Go和并发编程都是刚刚新接触的程序员方面我们经验有限,但也表明了这是一种非常实用的方式。程序员喜欢这种支持并发为网络软件所带来的简单性,而简单性自然会带来健壮性。
14. 垃圾回收
对于一门系统级的编程语言来说,垃圾回收可能会是一项非常有争议的特性,但我们还是毫不犹豫地确定了Go语言将会是一门拥有垃圾回收机制的编程语言。Go语言没有显式的内存释放操作,那些被分配的内存只能通过垃圾回收器这一唯一途径来返回内存池。
做出这个决定并不难,因为内存管理对于一门编程语言的实际使用方式有着深远的影响。在C和C++中,程序员们往往需要花费大量的时间和精力在内存的分配和释放上,这样的设计有助于暴露那些本可以被隐藏得很好的内存管理的细节;但反过来说,对于内存使用的过多考量又限制了程序员使用内存的方式。相比之下,垃圾回收使得接口更容易被指定。
此外,拥有自动化的内存管理机制对于一门并发的面向对象的编程语言来说很关键,因为一个内存块可能会在不同的并发执行单元间被来回传递,要管理这样一块内存的所有权对于程序员来说将会是一项挑战。将行为与资源的管理分离是很重要的。
垃圾回收使得Go语言在使用上显得更加简单。
当然,垃圾回收机制会带来很大的成本:资源的消耗、回收的延迟以及复杂的实现等。尽管如此,我们相信它所带来的好处,特别是对于程序员的编程体验来说,是要大于它所带来的成本的,因为这些成本大都是加诸在编程语言的实现者身上。
在面向用户的系统中使用Java来进行服务器编程的经历使得一些程序员对垃圾回收顾虑重重:不可控的资源消耗、极大的延迟以及为了达到较好的性能而需要做的一大堆参数优化。Go语言则不同,语言本身的属性能够减轻以上的一些顾虑,虽然不是全部。
有个关键点在于,Go为程序员提供了通过控制数据结构的格式来限制内存分配的手段。请看下面这个简单的类型定义了包含一个字节(数组)型的缓冲区:
type X struct { a, b, c int buf [256]byte }
在Java中,buffer字段需要再次进行内存分配,因为需要另一层的间接访问形式。然而在Go中,该缓冲区同包含它的struct一起分配到了一块单独的内存块中,无需间接形式。对于系统编程,这种设计可以得到更好的性能并减少回收器(collector)需要了解的项目数。要是在大规模的程序中,这么做导致的差别会非常巨大。
有个更加直接一点的例子,在Go中,可以非常容易和高效地提供二阶内存分配器(second-order allocator),例如,为一个由大量struct组成的大型数组分配内存,并用一个自由列表(a free list)将它们链接起来的arena分配器(an arena allocator)。在重复使用大量小型数据结构的库中,可以通过少量的提前安排,就能不产生任何垃圾还能兼顾高效和高响应度。
虽然Go是一种支持内存垃圾回收的编程语言,但是资深程序员能够限制施加给回收器的压力从而提高程序的运行效率(Go的安装包中还提供了一些非常好的工具,用这些工具可以研究程序运行过程中动态内存的性能。)
要给程序员这样的灵活性,Go必需支持指向分配在堆中对象的指针,我们将这种指针称为内部指针。上文的例子中X.buff字段保存于struct之中,但也可以保留这个内部字段的地址。比如,可以将这个地址传递给I/O子程序。在Java以及许多类似的支持垃圾回收的语音中,不可能构造象这样的内部指针,但在Go中这么做很自然。这样设计的指针会影响可以使用的回收算法,并可能会让算法变得更难写,但经过慎重考虑,我们决定允许内部指针是必要的,因为这对程序员有好处,让大家具有降低对(可能实现起来更困难)回收器的压力的能力。到现在为止,我们的将大致相同的Go和Java程序进行对比的经验表明,使用内部指针能够大大影响arena总计大型、延迟和回收次数。
总的说来,Go是一门支持垃圾回收的语言,但它同时也提供给程序员一些手段,可以对回收开销进行控制。
垃圾回收器目前仍在积极地开发中。当前的设计方案是并行的边标示边扫描(mark-and-sweep)的回收器,未来还有机会提高其性能甚至其设计方案。(Go语言规范中并没有限定必需使用哪种特定的回收器实现方案)。尽管如此,如果程序员在使用内存时小心谨慎,当前的实现完全可以在生产环境中使用。
15. 要组合,不要继承
Go 采用了一个不寻常的方法来支持面向对象编程,允许添加方法到任意类型,而不仅仅是class,但是并没有采用任何类似子类化的类型继承。这也就意味着没有类型体系(type hierarchy)。这是精心的设计选择。虽然类型继承已经被用来建立很多成功的软件,但是我们认为它还是被过度使用了,我们应该在这个方向上退一步。
Go使用接口(interface), 接口已经在很多地方被详尽的讨论过了 (例如 research.swtch.com/interfaces ), 但是这里我还是简单的说一下。
在 Go 中,接口只是一组方法。例如,下面是标准库中的Hash接口的定义。
type Hash interface { Write(p []byte) (n int, err error) Sum(b []byte) []byte Reset() Size() int BlockSize() int }
实现了这组方法的所有数据类型都满足这个接口;而不需要用implements声明。即便如此,由于接口匹配在编译时静态检查,所以这样也是类型安全的。
一个类型往往要满足多个接口,其方法的每一个子集满足每一个接口。例如,任何满足Hash接口的类型同时也满足Writer接口:
type Writer interface { Write(p []byte) (n int, err error) }
这种接口满足的流动性会促成一种不同的软件构造方法。但在解释这一点之前,我们应该先解释一下为什么Go中没有子类型化(subclassing)。
面向对象的编程提供了一种强大的见解:数据的行为可以独立于数据的表示进行泛化。这个模型在行为(方法集)是固定不变的情况下效果最好,但是,一旦你为某类型建立了一个子类型并添加了一个方法后,其行为就再也不同了。如果象Go中的静态定义的接口这样,将行为集固定下来,那么这种行为的一致性就使得可以把数据和程序一致地、正交地(orthogonally)、安全地组合到一起了。
有个极端一点的例子,在Plan 9的内核中,所有的系统数据项完全都实现了同一个接口,该接口是一个由14个方法组成的文件系统API。即使在今天看来,这种一致性所允许的对象组合水平在其它系统中是很罕见的。这样的例子数不胜数。这里还有一个:一个系统可以将TCP栈导入(这是Plan 9中的术语)一个不支持TCP甚至以太网的计算机中,然后通过网络将其连接到另一台具有不同CPU架构的机器上,通过导入其/proctree,就可以允许一个本地的调试器对远程的进程进行断点调试。这类操作在Plan 9中很是平常,一点也不特殊。能够做这样的事情的能力完全来自其设计方案,无需任何特殊安排(所有的工作都是在普通的C代码中完成的)。
我们认为,这种系统构建中的组合风格完全被推崇类型层次结构设计的语言所忽略了。类型层次结构造成非常脆弱的代码。层次结构必需在早期进行设计,通常会是程序设计的第一步,而一旦写出程序后,早期的决策就很难进行改变了。所以,类型层次结构这种模型会促成早期的过度设计,因为程序员要尽力对软件可能需要的各种可能的用法进行预测,不断地为了避免挂一漏万,不断的增加类型和抽象的层次。这种做法有点颠倒了,系统各个部分之间交互的方式本应该随着系统的发展而做出相应的改变,而不应该在一开始就固定下来。
因此,通过使用简单到通常只有一个方法的接口来定义一些很细小的行为,将这些接口作为组件间清晰易懂的边界, Go鼓励使用组合而不是继承,
上文中提到过Writer接口,它定义于io包中。任何具有相同签名(signature)的Write方法的类型都可以很好的同下面这个与之互补的Reader接口共存:
type Reader interface { Read(p []byte) (n int, err error) }
这两个互补的方法可以拿来进行具有多种不同行为的、类型安全的连接(chaining),比如,一般性的Unix管道。文件、缓冲区、加密程序、压缩程序、图像编码程序等等都能够连接到一起。与C中的FILE*不同,Fprintf格式化I/O子程序带有anio.Writer。格式化输出程序并不了解它要输出到哪里;可能是输出给了图像编码程序,该程序接着输出给了压缩程序,该程序再接着输出给了加密程序,最后加密程序输出到了网络连接之中。
接口组合是一种不同的编程风格,已经熟悉了类型层次结构的人需要调整其思维方式才能做得好,但调整思维所得到的是类型层次结构中难以获得的具有高度适应性的设计方案。
还要注意,消除了类型层次结构也就消除了一种形式的依赖层次结构。接口满足式的设计使得程序无需预先确定的合约就能实现有机增长,而且这种增长是线性的;对一个接口进行更改影响的只有直接使用该接口的类型;不存在需要更改的子树。 没有implements声明会让有些人感觉不安但这么做可以让程序以自然、优雅、安全的方式进行发展。
Go的接口对程序设计有一个主要的影响。我们已经看到的一个地方就是使用具有接口参数的函数。这些不是方法而是函数。几个例子就应该能说明它们的威力。ReadAll返回一段字节(数组),其中包含的是能够从anio.Reader中读出来的所有数据:
func ReadAll(r io.Reader) ([]byte, error)
封装器 —— 指的是以接口为参数并且其返回结果也是一个接口的函数,用的也很广泛。这里有几个原型。LoggingReader将每次的Read调用记录到传人的参数r这个Reader中。LimitingReader在读到n字节后便停止读取操作。ErrorInjector通过模拟I/O错误用以辅助完成测试工作。还有更多的例子。
func LoggingReader(r io.Reader) io.Reader func LimitingReader(r io.Reader, n int64) io.Reader func ErrorInjector(r io.Reader) io.Reader
这种设计方法同层次型的、子类型继承方法完全不同。它们更加松散(甚至是临时性的),属于有机式的、解耦式的、独立式的,因而具有强大的伸缩性。
16. 错误
Go不具有传统意义上的异常机制,也就是说,Go里没有同错误处理相关的控制结构。(Go的确为类似被零除这样的异常情况的提供了处理机制。 有一对叫做panic和recover的内建函数,用来让程序员处理这些情况。然而,这些函数是故意弄的不好用因而也很少使用它们,而且也不像Java库中使用异常那样,并没有将它们集成到库中。)
Go语言中错误处理的一个关键特性是一个预先定义为error的接口类型,它具有一个返回一个字符串读到Error方法,表示了一个错误值。:
type error interface { Error() string }
func (b *Reader) ReadByte() (c byte, err error)
这样的设计简单清晰,也非常容易理解。error仅仅是一种值,程序可以象对其它别的类型的值一样,对error值进行计算。
Go中不包含异常,是我们故意为之的。虽然有大量的批评者并不同意这个设计决策,但是我们相信有几个原因让我们认为这样做才能编写出更好的软件。
首先,计算机程序中的错误并不是真正的异常情况。例如,无法打开一个文件是种常见的问题,无需任何的特殊语言结构,if和return完全可以胜任。f, err := os.Open(fileName) if err != nil { return err }
再者,如果错误要使用特殊的控制结构,错误处理就会扭曲处理错误的程序的控制流(control flow)。象Java那样try-catch-finally语句结构会形成交叉重叠的多个控制流,这些控制流之间的交互方式非常复杂。虽然相比较而言,Go检查错误的方式更加繁琐,但这种显式的设计使得控制流更加直截了当 —— 从字面上的确如此。
毫无疑问这会使代码更长一些,但如此编码带来的清晰度和简单性可以弥补其冗长的缺点。显式地错误检查会迫使程序员在错误出现的时候对错误进行思考并进行相应的处理。异常机制只是将错误处理推卸到了调用堆栈之中,直到错过了修复问题或准确诊断错误情况的时机,这就使得程序员更容易去忽略错误而不是处理错误了。
17. 工具
软件工程需要工具的支持。每种语言都要运行于同其它语言共存的环境,它还需要大量工具才能进行编译、编辑、调试、性能分析、测试已经运行。
Go的语法、包管理系统、命名规则以及其它功能在设计时就考虑了要易于为这种语言编写工具以及包括词法分析器、语法分析器以及类型检测器等等在内的各种库。
操作Go程序的工具非常容易编写,因此现在已经编写出了许多这样的工具,其中有些工具对软件工程来讲已经产生了一些值得关注的效果。
其中最著名的是gofmt,它是Go源程序的格式化程序。该项目伊始,我们就将Go程序定位为由机器对其进行格式化, 从而消除了在程序员中具有争议的一大类问题:我要以什么样的格式写代码?我们对我们所需的所有Go程序运行Gofmt,绝大多数开源社区也用它进行代码格式化。 它是作为“提交前”的例行检查运行的,它在代码提交到代码库之前运行,以确保所有检入的Go程序都是具有相同的格式。
Go fmt 往往被其使用者推崇为Go最好的特性之一,尽管它本身并属于Go语言的一个部分。 存在并使用gofmt意味着,从一开始社区里看到的Go代码就是用它进行格式化过的代码,因此Go程序具有现在已为人熟知的单一风格。同一的写法使得代码阅读起来更加容易,因而用起来速度也快。没有在格式化代码方面浪费的时间就是剩下来的时间。Gofmt也会影响伸缩性:既然所有的代码看上去格式完全相同,团队就更易于展开合作,用起别人的代码来也更容易。
Go fmt 还让编写我们并没有清晰地预见到的另一类工具成为可能。Gofmt的运行原理就是对源代码进行语法分析,然后根据语法树本身对代码进行格式化。这让在格式化代码之前对语法树进行更改成为可能,因此产生了一批进行自动重构的工具。这些工具编写起来很容易,因为它们直接作用于语法分析树之上,因而其语义可以非常多样化,最后产生的格式化代码也非常规范。
第一个例子就是gofmt本身的a-r(重写)标志,该标志采用了一种很简单的模式匹配语言,可以用来进行表达式级的重写。例如,有一天我们引入了一段表达式右侧缺省值:该段表达式的长度。整个Go源代码树要使用该缺省值进行更新,仅限使用下面这一条命令:
gofmt -r 'a[b:len(a)] -> a[b:]'
该变换中的一个关键点在于,因为输入和输出二者均为规范格式(canonical format),对源代码的唯一更改也是语义上的更改
采用与此类似但更复杂一些的处理就可以让gofmt用于在Go语言中的语句以换行而不再是分号结尾的情况下,对语法树进行相应的更新。
gofix是另外一个非常重要的工具,它是语法树重写模块,而且它用Go语言本身所编写的,因而可以用来完成更加高级的重构操作。 gofix工具可以用来对直到Go 1发布为止的所有API和语言特性进行全方位修改,包括修改从map中删除数据项的语法、引入操作时间值的一个完全不同的API等等很多更新。随着这些更新一一推出,使用者可以通过运行下面这条简单的命令对他们的所有代码进行更新
gofix
注意,这些工具允许我们即使在旧代码仍旧能够正常运行的情况下对它们进行更新。 因此,Go的代码库很容易就能随着library的更新而更新。弃用旧的API可以很快以自动化的形式实现,所以只有最新版本的API需要维护。例如,我们最近将Go的协议缓冲区实现更改为使用“getter”函数,而原本的接口中并不包含该函数。我们对Google中所有的Go代码运行了gofix命令,对所有使用了协议缓冲区的程序进行了更新,所以,现在使用中的协议缓冲区API只有一个版本。要对C++或者 Java库进行这样的全面更新,对于Google这样大的代码库来讲,几乎是不可能实现的。
Go的标准库中具有语法分析包也使得编写大量其它工具成为可能。例如,用来管理程序构建的具有类似从远程代码库中获取包等功能的gotool;用来在library更新时验证API兼容性协约的文档抽取程序godoc;类似还有很多工具。
虽然类似这些工具很少在讨论语言设计时提到过,但是它们属于一种语言的生态系统中不可或缺的部分。事实上Go在设计时就考虑了工具的事情,这对该语言及其library以及整个社区的发展都已产生了巨大的影响。
18. 结论
Go在google内部的使用正在越来越广泛。
很多大型的面向用户的服务都在使用它,包括youtube.comanddl.google.com(为chrome、android等提供下载服务的下载服务器),我们的golang.org也是用go搭建的。当然很多小的服务也在使用go,大部分都是使用Google App Engine上的内建Go环境。
还有很多公司也在使用Go,名单很长,其中有一些是很有名的:
- BBC国际广播
- Canonical
- Heroku
- 诺基亚
- SoundCloud
相对规模比较小,有些小问题还不太对,可能会在该语言的下一个(Go 2?)版本中得以纠正。例如,变量定义的语法形式过多,程序员容易被非nil接口中的nil值搞糊涂,还有许多library以及接口的方面的细节还可以再经过一轮的设计。
但是,值得注意的是,在升级到Go版本1时,gofix和gofmt给予了我们修复很多其它问题的机会。今天的Go同其设计者所设想的样子之间的距离因此而更近了一步,要是没有这些工具的支持就很难做到这一点,而这些工具也是因为该语言的设计思想才成为可能的。
不过,现在不是万事皆定了。我们仍在学习中(但是,该语言本身现在已经确定下来了。)
该语言有个最大的弱点,就是它的实现仍需进一步的工作。特别是其编译器所产生的代码以及runtime的运行效率还有需要改善的地方,它们还在继续的改善之中。现在已经有了一些进展;实际上,有些基准测试表明,同2012年早期发布的第一个Go版本1相比,现在开发版的性能已得到双倍提升。
19. 总结
软件工程指导下的Go语言的设计。同绝大多数通用型编程语言相比,Go语言更多的是为了解决我们在构建大型服务器软件过程中所遇到的软件工程方面的问题而设计的。 乍看上去,这么讲可能会让人感觉Go非常无趣且工业化,但实际上,在设计过程中就着重于清晰和简洁,以及较高的可组合性,最后得到的反而会是一门使用起来效率高而且很有趣的编程语言,很多程序员都会发现,它有极强的表达力而且功能非常强大。
造成这种效果的因素有:
- 清晰的依赖关系
- 清晰的语法
- 清晰的语义
- 偏向组合而不是继承
- 编程模型(垃圾回收、并发)所代理的简单性
- 易于为它编写工具(Easy tooling )(gotool、gofmt、godoc、gofix)
如果你还没有尝试过用Go编程,我们建议你试一下。
本文地址:http://www.oschina.net/translate/go-at-google-language-design-in-the-service-of-software-engineering
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们
1. Abstract
(This is a modified version of the keynote talk given by Rob Pike at the SPLASH 2012 conference in Tucson, Arizona, on October 25, 2012.)
The Go programming language was conceived in late 2007 as an answer to some of the problems we were seeing developing software infrastructure at Google. The computing landscape today is almost unrelated to the environment in which the languages being used, mostly C++, Java, and Python, had been created. The problems introduced by multicore processors, networked systems, massive computation clusters, and the web programming model were being worked around rather than addressed head-on. Moreover, the scale has changed: today's server programs comprise tens of millions of lines of code, are worked on by hundreds or even thousands of programmers, and are updated literally every day. To make matters worse, build times, even on large compilation clusters, have stretched to many minutes, even hours.
Go was designed and developed to make working in this environment more productive. Besides its better-known aspects such as built-in concurrency and garbage collection, Go's design considerations include rigorous dependency management, the adaptability of software architecture as systems grow, and robustness across the boundaries between components.
This article explains how these issues were addressed while building an efficient, compiled programming language that feels lightweight and pleasant. Examples and explanations will be taken from the real-world problems faced at Google.
2. Introduction
Go is a compiled, concurrent, garbage-collected, statically typed language developed at Google. It is an open source project: Google imports the public repository rather than the other way around.
Go is efficient, scalable, and productive. Some programmers find it fun to work in; others find it unimaginative, even boring. In this article we will explain why those are not contradictory positions. Go was designed to address the problems faced in software development at Google, which led to a language that is not a breakthrough research language but is nonetheless an excellent tool for engineering large software projects.
3. Go at Google
Go is a programming language designed by Google to help solve Google's problems, and Google has big problems.
The hardware is big and the software is big. There are many millions of lines of software, with servers mostly in C++ and lots of Java and Python for the other pieces. Thousands of engineers work on the code, at the "head" of a single tree comprising all the software, so from day to day there are significant changes to all levels of the tree. A large custom-designed distributed build system makes development at this scale feasible, but it's still big.
And of course, all this software runs on zillions of machines, which are treated as a modest number of independent, networked compute clusters.
In short, development at Google is big, can be slow, and is often clumsy. But it is effective.
The goals of the Go project were to eliminate the slowness and clumsiness of software development at Google, and thereby to make the process more productive and scalable. The language was designed by and for people who write—and read and debug and maintain—large software systems.
Go's purpose is therefore not to do research into programming language design; it is to improve the working environment for its designers and their coworkers. Go is more about software engineering than programming language research. Or to rephrase, it is about language design in the service of software engineering.
But how can a language help software engineering? The rest of this article is an answer to that question.
4. Pain points
When Go launched, some claimed it was missing particular features or methodologies that were regarded as de rigueur for a modern language. How could Go be worthwhile in the absence of these facilities? Our answer to that is that the properties Go does have address the issues that make large-scale software development difficult. These issues include:
- slow builds
- uncontrolled dependencies
- each programmer using a different subset of the language
- poor program understanding (code hard to read, poorly documented, and so on)
- duplication of effort
- cost of updates
- version skew
- difficulty of writing automatic tools
- cross-language builds
Individual features of a language don't address these issues. A larger view of software engineering is required, and in the design of Go we tried to focus on solutions to these problems.
As a simple, self-contained example, consider the representation of program structure. Some observers objected to Go's C-like block structure with braces, preferring the use of spaces for indentation, in the style of Python or Haskell. However, we have had extensive experience tracking down build and test failures caused by cross-language builds where a Python snippet embedded in another language, for instance through a SWIG invocation, is subtly and invisibly broken by a change in the indentation of the surrounding code. Our position is therefore that, although spaces for indentation is nice for small programs, it doesn't scale well, and the bigger and more heterogeneous the code base, the more trouble it can cause. It is better to forgo convenience for safety and dependability, so Go has brace-bounded blocks.
5. Dependencies in C and C++
A more substantial illustration of scaling and other issues arises in the handling of package dependencies. We begin the discussion with a review of how they work in C and C++.
ANSI C, first standardized in 1989, promoted the idea of #ifndef
"guards" in the standard header files. The idea, which is ubiquitous now, is that each header file be bracketed
with a conditional compilation clause so that the file may be included multiple times without error. For instance, the Unix header file <sys/stat.h>
looks schematically like this:
/* Large copyright and licensing notice */ #ifndef _SYS_STAT_H_ #define _SYS_STAT_H_ /* Types and other definitions */ #endif
The intent is that the C preprocessor reads in the file but disregards the contents on the second and subsequent readings of the file. The symbol _SYS_STAT_H_
, defined the
first time the file is read, "guards" the invocations that follow.
This design has some nice properties, most important that each header file can safely #include
all its dependencies, even if other header files will also include them. If
that rule is followed, it permits orderly code that, for instance, sorts the #include
clauses alphabetically.
But it scales very badly.
In 1984, a compilation of ps.c
, the source to the Unix ps
command, was observed to #include
<sys/stat.h>
37
times by the time all the preprocessing had been done. Even though the contents are discarded 36 times while doing so, most C implementations would open the file, read it, and scan it all 37 times. Without great cleverness, in fact, that behavior is required
by the potentially complex macro semantics of the C preprocessor.
The effect on software is the gradual accumulation of #include
clauses in C programs. It won't break a program to add them, and it's very hard to know when they are no longer
needed. Deleting a #include
and compiling the program again isn't even sufficient to test that, since another #include
might itself
contain a #include
that pulls it in anyway.
Technically speaking, it does not have to be like that. Realizing the long-term problems with the use of #ifndef
guards, the designers of the Plan 9 libraries took a different,
non-ANSI-standard approach. In Plan 9, header files were forbidden from containing further#include
clauses; all #includes
were
required to be in the top-level C file. This required some discipline, of course—the programmer was required to list the necessary dependencies exactly once, in the correct order—but documentation helped and in practice it worked very well. The result was
that, no matter how many dependencies a C source file had, each #include
file was read exactly once when compiling that file. And, of course, it was also easy to see if an #include
was
necessary by taking it out: the edited program would compile if and only if the dependency was unnecessary.
The most important result of the Plan 9 approach was much faster compilation: the amount of I/O the compilation requires can be dramatically less than when compiling a program using libraries with #ifndef
guards.
Outside of Plan 9, though, the "guarded" approach is accepted practice for C and C++. In fact, C++ exacerbates the problem by using the same approach at finer granularity. By convention, C++ programs are usually structured with one header
file per class, or perhaps small set of related classes, a grouping much smaller than, say, <stdio.h>
. The dependency tree is therefore much more intricate, reflecting not library dependencies
but the full type hierarchy. Moreover, C++ header files usually contain real code—type, method, and template declarations—not just the simple constants and function signatures typical of a C header file. Thus not only does C++ push more to the compiler, what
it pushes is harder to compile, and each invocation of the compiler must reprocess this information. When building a large C++ binary, the compiler might be taught thousands of times how to represent a string by processing the header file <string>
.
(For the record, around 1984 Tom Cargill observed that the use of the C preprocessor for dependency management would be a long-term liability for C++ and should be addressed.)
The construction of a single C++ binary at Google can open and read hundreds of individual header files tens of thousands of times. In 2007, build engineers at Google instrumented the compilation of a major Google binary. The file contained
about two thousand files that, if simply concatenated together, totaled 4.2 megabytes. By the time the #includes
had been expanded, over 8 gigabytes were being delivered to the input of the compiler,
a blow-up of 2000 bytes for every C++ source byte.
As another data point, in 2003 Google's build system was moved from a single Makefile to a per-directory design with better-managed, more explicit dependencies. A typical binary shrank about 40% in file size, just from having more accurate dependencies recorded. Even so, the properties of C++ (or C for that matter) make it impractical to verify those dependencies automatically, and today we still do not have an accurate understanding of the dependency requirements of large Google C++ binaries.
The consequence of these uncontrolled dependencies and massive scale is that it is impractical to build Google server binaries on a single computer, so a large distributed compilation system was created. With this system, involving many machines, much caching, and much complexity (the build system is a large program in its own right), builds at Google are practical, if still cumbersome.
Even with the distributed build system, a large Google build can still take many minutes. That 2007 binary took 45 minutes using a precursor distributed build system; today's version of the same program takes 27 minutes, but of course the program and its dependencies have grown in the interim. The engineering effort required to scale up the build system has barely been able to stay ahead of the growth of the software it is constructing.
6. Enter Go
When builds are slow, there is time to think. The origin myth for Go states that it was during one of those 45 minute builds that Go was conceived. It was believed to be worth trying to design a new language suitable for writing large Google programs such as web servers, with software engineering considerations that would improve the quality of life of Google programmers.
Although the discussion so far has focused on dependencies, there are many other issues that need attention. The primary considerations for any language to succeed in this context are:
- It must work at scale, for large programs with large numbers of dependencies, with large teams of programmers working on them.
- It must be familiar, roughly C-like. Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical.
- It must be modern. C, C++, and to some extent Java are quite old, designed before the advent of multicore machines, networking, and web application development. There are features of the modern world that are better met by newer approaches, such as built-in concurrency.
With that background, then, let us look at the design of Go from a software engineering perspective.
7. Dependencies in Go
Since we've taken a detailed look at dependencies in C and C++, a good place to start our tour is to see how Go handles them. Dependencies are defined, syntactically and semantically, by the language. They are explicit, clear, and "computable", which is to say, easy to write tools to analyze.
The syntax is that, after the package
clause (the subject of the next section), each source file may have one or more import statements, comprising the import
keyword
and a string constant identifying the package to be imported into this source file (only):
import "encoding/json"
The first step to making Go scale, dependency-wise, is that the language defines that unused dependencies are a compile-time error (not a warning, an error). If the source file imports a package it does not use, the program will not compile. This guarantees by construction that the dependency tree for any Go program is precise, that it has no extraneous edges. That, in turn, guarantees that no extra code will be compiled when building the program, which minimizes compilation time.
There's another step, this time in the implementation of the compilers, that goes even further to guarantee efficiency. Consider a Go program with three packages and this dependency graph:
- package
A
imports packageB
; - package
B
imports packageC
; - package
A
does not import packageC
This means that package A
uses C
only transitively through its use of B
;
that is, no identifiers from C
are mentioned in the source code to A
, even if some of the items A
is
using from B
do mention C
. For instance, package A
might reference
a struct
type defined in B
that has a field with a type defined in C
but
that A
does not reference itself. As a motivating example, imagine that A
imports a formatted I/O package B
that
uses a buffered I/O implementation provided by C
, but that A
does not itself invoke buffered I/O.
To build this program, first, C
is compiled; dependent packages must be built before the packages that depend on them. Then B
is
compiled; finally A
is compiled, and then the program can be linked.
When A
is compiled, the compiler reads the object file for B
, not its source code. That object file for B
contains
all the type information necessary for the compiler to execute the
import "B"
clause in the source code for A
. That information includes whatever information about C
that clients of B
will
need at compile time. In other words, when B
is compiled, the generated object file includes type information for all dependencies of B
that
affect the public interface of B
.
This design has the important effect that when the compiler executes an import clause, it opens exactly one file, the object file identified by the string in the import clause. This is, of course, reminiscent of the Plan 9 C (as opposed to ANSI C) approach to dependency management, except that, in effect, the compiler writes the header file when the Go source file is compiled. The process is more automatic and even more efficient than in Plan 9 C, though: the data being read when evaluating the import is just "exported" data, not general program source code. The effect on overall compilation time can be huge, and scales well as the code base grows. The time to execute the dependency graph, and hence to compile, can be exponentially less than in the "include of include file" model of C and C++.
It's worth mentioning that this general approach to dependency management is not original; the ideas go back to the 1970s and flow through languages like Modula-2 and Ada. In the C family Java has elements of this approach.
To make compilation even more efficient, the object file is arranged so the export data is the first thing in the file, so the compiler can stop reading as soon as it reaches the end of that section.
This approach to dependency management is the single biggest reason why Go compilations are faster than C or C++ compilations. Another factor is that Go places the export data in the object file; some languages require the author to write or the compiler to generate a second file with that information. That's twice as many files to open. In Go there is only one file to open to import a package. Also, the single file approach means that the export data (or header file, in C/C++) can never go out of date relative to the object file.
For the record, we measured the compilation of a large Google program written in Go to see how the source code fanout compared to the C++ analysis done earlier. We found it was about 40X, which is fifty times better than C++ (as well as being simpler and hence faster to process), but it's still bigger than we expected. There are two reasons for this. First, we found a bug: the Go compiler was generating a substantial amount of data in the export section that did not need to be there. Second, the export data uses a verbose encoding that could be improved. We plan to address these issues.
Nonetheless, a factor of fifty less to do turns minutes into seconds, coffee breaks into interactive builds.
Another feature of the Go dependency graph is that it has no cycles. The language defines that there can be no circular imports in the graph, and the compiler and linker both check that they do not exist. Although they are occasionally useful, circular imports introduce significant problems at scale. They require the compiler to deal with larger sets of source files all at once, which slows down incremental builds. More important, when allowed, in our experience such imports end up entangling huge swaths of the source tree into large subpieces that are difficult to manage independently, bloating binaries and complicating initialization, testing, refactoring, releasing, and other tasks of software development.
The lack of circular imports causes occasional annoyance but keeps the tree clean, forcing a clear demarcation between packages. As with many of the design decisions in Go, it forces the programmer to think earlier about a larger-scale issue (in this case, package boundaries) that if left until later may never be addressed satisfactorily.
Through the design of the standard library, great effort was spent on controlling dependencies. It can be better to copy a little code than to pull in a big library for one function. (A test in the system build complains if new core dependencies
arise.) Dependency hygiene trumps code reuse. One example of this in practice is that the (low-level) net
package has its own integer-to-decimal conversion routine to avoid depending on the bigger
and dependency-heavy formatted I/O package. Another is that the string conversion packagestrconv
has a private implementation of the definition of 'printable' characters rather than pull in the
large Unicode character class tables; that strconv
honors the Unicode standard is verified by the package's tests.
8. Packages
The design of Go's package system combines some of the properties of libraries, name spaces, and modules into a single construct.
Every Go source file, for instance "encoding/json/json.go"
, starts with a package clause, like this:
package json
where json
is the "package name", a simple identifier. Package names are usually concise.
To use a package, the importing source file identifies it by its package path in the import clause. The meaning of "path" is not specified by the language, but in practice and by convention it is the slash-separated directory path of the source package in the repository, here:
import "encoding/json"
Then the package name (as distinct from path) is used to qualify items from the package in the importing source file:
var dec = json.NewDecoder(reader)
This design provides clarity. One may always tell whether a name is local to package from its syntax: Name
vs. pkg.Name
.
(More on this later.)
For our example, the package path is "encoding/json"
while the package name is json
. Outside the standard
repository, the convention is to place the project or company name at the root of the name space:
import "google/base/go/log"
It's important to recognize that package paths are unique, but there is no such requirement for package names. The path must uniquely identify the package to be imported, while the name is just a convention for how clients
of the package can refer to its contents. The package name need not be unique and can be overridden in each importing source file by providing a local identifier in the import clause. These two imports both reference packages that call themselves package
log
,
but to import them in a single source file one must be (locally) renamed:
import "log" // Standard package import googlelog "google/base/go/log" // Google-specific package
Every company might have its own log
package but there is no need to make the package name unique. Quite the opposite: Go style suggests keeping package names short and
clear and obvious in preference to worrying about collisions.
Another example: there are many server
packages in Google's code base.
9. Remote packages
An important property of Go's package system is that the package path, being in general an arbitrary string, can be co-opted to refer to remote repositories by having it identify the URL of the site serving the repository.
Here is how to use the doozer
package from github
. The go
get
command
uses the go
build tool to fetch the repository from the site and install it. Once installed, it can be imported and used like any regular package.
$ go get github.com/4ad/doozer // Shell command to fetch package import "github.com/4ad/doozer" // Doozer client's import statement var client doozer.Conn // Client's use of package
It's worth noting that the go
get
command downloads dependencies recursively, a property made possible only
because the dependencies are explicit. Also, the allocation of the space of import paths is delegated to URLs, which makes the naming of packages decentralized and therefore scalable, in contrast to centralized registries used by other languages.
10. Syntax
Syntax is the user interface of a programming language. Although it has limited effect on the semantics of the language, which is arguably the more important component, syntax determines the readability and hence clarity of the language. Also, syntax is critical to tooling: if the language is hard to parse, automated tools are hard to write.
Go was therefore designed with clarity and tooling in mind, and has a clean syntax. Compared to other languages in the C family, its grammar is modest in size, with only 25 keywords (C99 has 37; C++11 has 84; the numbers continue to grow). More important, the grammar is regular and therefore easy to parse (mostly; there are a couple of quirks we might have fixed but didn't discover early enough). Unlike C and Java and especially C++, Go can be parsed without type information or a symbol table; there is no type-specific context. The grammar is easy to reason about and therefore tools are easy to write.
One of the details of Go's syntax that surprises C programmers is that the declaration syntax is closer to Pascal's than to C's. The declared name appears before the type and there are more keywords:
var fn func([]int) int type T struct { a, b int }
as compared to C's
int (*fn)(int[]); struct T { int a, b; }
Declarations introduced by keyword are easier to parse both for people and for computers, and having the type syntax not be the expression syntax as it is in C has a significant effect on parsing: it adds grammar but eliminates ambiguity.
But there is a nice side effect, too: for initializing declarations, one can drop the var
keyword and just take the type of the variable from that of the expression. These two declarations are
equivalent; the second is shorter and idiomatic:
var buf *bytes.Buffer = bytes.NewBuffer(x) // explicit buf := bytes.NewBuffer(x) // derived
There is a blog post at golang.org/s/decl-syntax with more detail about the syntax of declarations in Go and why it is so different from C.
Function syntax is straightforward for simple functions. This example declares the function Abs
, which accepts a single variable x
of
type T
and returns a single float64
value:
func Abs(x T) float64
A method is just a function with a special parameter, its receiver, which can be passed to the function using the standard "dot" notation. Method declaration syntax places the receiver in parentheses before the function name.
Here is the same function, now as a method of type T
:
func (x T) Abs() float64
And here is a variable (closure) with a type T
argument; Go has first-class functions and closures:
negAbs := func(x T) float64 { return -Abs(x) }
Finally, in Go functions can return multiple values. A common case is to return the function result and an error
value as a pair, like this:
func ReadByte() (c byte, err error) c, err := ReadByte() if err != nil { ... }
We'll talk more about errors later.
One feature missing from Go is that it does not support default function arguments. This was a deliberate simplification. Experience tells us that defaulted arguments make it too easy to patch over API design flaws by adding more arguments, resulting in too many arguments with interactions that are difficult to disentangle or even understand. The lack of default arguments requires more functions or methods to be defined, as one function cannot hold the entire interface, but that leads to a clearer API that is easier to understand. Those functions all need separate names, too, which makes it clear which combinations exist, as well as encouraging more thought about naming, a critical aspect of clarity and readability.
One mitigating factor for the lack of default arguments is that Go has easy-to-use, type-safe support for variadic functions.
11. Naming
Go takes an unusual approach to defining the visibility of an identifier, the ability for a client of a package to use the item named by the identifier. Unlike, for instance, private
and public
keywords,
in Go the name itself carries the information: the case of the initial letter of the identifier determines the visibility. If the initial character is an upper case letter, the identifier is exported (public); otherwise it is not:
- upper case initial letter:
Name
is visible to clients of package - otherwise:
name
(or_Name
) is not visible to clients of package
This rule applies to variables, types, functions, methods, constants, fields... everything. That's all there is to it.
This was not an easy design decision. We spent over a year struggling to define the notation to specify an identifier's visibility. Once we settled on using the case of the name, we soon realized it had become one of the most important properties about the language. The name is, after all, what clients of the package use; putting the visibility in the name rather than its type means that it's always clear when looking at an identifier whether it is part of the public API. After using Go for a while, it feels burdensome when going back to other languages that require looking up the declaration to discover this information.
The result is, again, clarity: the program source text expresses the programmer's meaning simply.
Another simplification is that Go has a very compact scope hierarchy:
- universe (predeclared identifiers such as
int
andstring
) - package (all the source files of a package live at the same scope)
- file (for package import renames only; not very important in practice)
- function (the usual)
- block (the usual)
There is no scope for name space or class or other wrapping construct. Names come from very few places in Go, and all names follow the same scope hierarchy: at any given location in the source, an identifier denotes exactly one language
object, independent of how it is used. (The only exception is statement labels, the targets of break
statements and the like; they always have function scope.)
This has consequences for clarity. Notice for instance that methods declare an explicit receiver and that it must be used to access fields and methods of the type. There is no implicit this
.
That is, one always writes
rcvr.Field
(where rcvr is whatever name is chosen for the receiver variable) so all the elements of the type always appear lexically bound to a value of the receiver type. Similarly, a package qualifier is always present for imported names; one
writes io.Reader
not Reader
. Not only is this clear, it frees up the identifier Reader
as
a useful name to be used in any package. There are in fact multiple exported identifiers in the standard library with name Reader
, or Printf
for
that matter, yet which one is being referred to is always unambiguous.
Finally, these rules combine to guarantee that, other than the top-level predefined names such as int
, (the first component of) every name is always declared in the current
package.
In short, names are local. In C, C++, or Java the name y
could refer to anything. In Go, y
(or even Y
)
is always defined within the package, while the interpretation of x.Y
is clear: find x
locally, Y
belongs
to it.
These rules provide an important property for scaling because they guarantee that adding an exported name to a package can never break a client of that package. The naming rules decouple packages, providing scaling, clarity, and robustness.
There is one more aspect of naming to be mentioned: method lookup is always by name only, not by signature (type) of the method. In other words, a single type can never have two methods with the same name. Given a method x.M
,
there's only ever one M
associated with x
. Again, this makes it easy to identify which method is referred to given only the name.
It also makes the implementation of method invocation simple.
12. Semantics
The semantics of Go statements is generally C-like. It is a compiled, statically typed, procedural language with pointers and so on. By design, it should feel familiar to programmers accustomed to languages in the C family. When launching a new language it is important that the target audience be able to learn it quickly; rooting Go in the C family helps make sure that young programmers, most of whom know Java, JavaScript, and maybe C, should find Go easy to learn.
That said, Go makes many small changes to C semantics, mostly in the service of robustness. These include:
- there is no pointer arithmetic
- there are no implicit numeric conversions
- array bounds are always checked
- there are no type aliases (after
type X int
,X
andint
are distinct types not aliases) ++
and--
are statements not expressions- assignment is not an expression
- it is legal (encouraged even) to take the address of a stack variable
- and many more
There are some much bigger changes too, stepping far from the traditional C, C++, and even Java models. These include linguistic support for:
- concurrency
- garbage collection
- interface types
- reflection
- type switches
The following sections provide brief discussions of two of these topics in Go, concurrency and garbage collection, mostly from a software engineering perspective. For a full discussion of the language semantics and uses see the many resources on the golang.orgweb site.
13. Concurrency
Concurrency is important to the modern computing environment with its multicore machines running web servers with multiple clients, what might be called the typical Google program. This kind of software is not especially well served by C++ or Java, which lack sufficient concurrency support at the language level.
Go embodies a variant of CSP with first-class channels. CSP was chosen partly due to familiarity (one of us had worked on predecessor languages that built on CSP's ideas), but also because CSP has the property that it is easy to add to a procedural programming model without profound changes to that model. That is, given a C-like language, CSP can be added to the language in a mostly orthogonal way, providing extra expressive power without constraining the language's other uses. In short, the rest of the language can remain "ordinary".
The approach is thus the composition of independently executing functions of otherwise regular procedural code.
The resulting language allows us to couple concurrency with computation smoothly. Consider a web server that must verify security certificates for each incoming client call; in Go it is easy to construct the software using CSP to manage the clients as independently executing procedures but to have the full power of an efficient compiled language available for the expensive cryptographic calculations.
In summary, CSP is practical for Go and for Google. When writing a web server, the canonical Go program, the model is a great fit.
There is one important caveat: Go is not purely memory safe in the presence of concurrency. Sharing is legal and passing a pointer over a channel is idiomatic (and efficient).
Some concurrency and functional programming experts are disappointed that Go does not take a write-once approach to value semantics in the context of concurrent computation, that Go is not more like Erlang for example. Again, the reason is largely about familiarity and suitability for the problem domain. Go's concurrent features work well in a context familiar to most programmers. Go enables simple, safe concurrent programming but does not forbid bad programming. We compensate by convention, training programmers to think about message passing as a version of ownership control. The motto is, "Don't communicate by sharing memory, share memory by communicating."
Our limited experience with programmers new to both Go and concurrent programming shows that this is a practical approach. Programmers enjoy the simplicity that support for concurrency brings to network software, and simplicity engenders robustness.
14. Garbage collection
For a systems language, garbage collection can be a controversial feature, yet we spent very little time deciding that Go would be a garbage-collected language. Go has no explicit memory-freeing operation: the only way allocated memory returns to the pool is through the garbage collector.
It was an easy decision to make because memory management has a profound effect on the way a language works in practice. In C and C++, too much programming effort is spent on memory allocation and freeing. The resulting designs tend to expose details of memory management that could well be hidden; conversely memory considerations limit how they can be used. By contrast, garbage collection makes interfaces easier to specify.
Moreover, in a concurrent object-oriented language it's almost essential to have automatic memory management because the ownership of a piece of memory can be tricky to manage as it is passed around among concurrent executions. It's important to separate behavior from resource management.
The language is much easier to use because of garbage collection.
Of course, garbage collection brings significant costs: general overhead, latency, and complexity of the implementation. Nonetheless, we believe that the benefits, which are mostly felt by the programmer, outweigh the costs, which are largely borne by the language implementer.
Experience with Java in particular as a server language has made some people nervous about garbage collection in a user-facing system. The overheads are uncontrollable, latencies can be large, and much parameter tuning is required for good performance. Go, however, is different. Properties of the language mitigate some of these concerns. Not all of them of course, but some.
The key point is that Go gives the programmer tools to limit allocation by controlling the layout of data structures. Consider this simple type definition of a data structure containing a buffer (array) of bytes:
type X struct { a, b, c int buf [256]byte }
In Java, the buf
field would require a second allocation and accesses to it a second level of indirection. In Go, however, the buffer is allocated in a single block of memory
along with the containing struct and no indirection is required. For systems programming, this design can have a better performance as well as reducing the number of items known to the collector. At scale it can make a significant difference.
As a more direct example, in Go it is easy and efficient to provide second-order allocators, for instance an arena allocator that allocates a large array of structs and links them together with a free list. Libraries that repeatedly use many small structures like this can, with modest prearrangement, generate no garbage yet be efficient and responsive.
Although Go is a garbage collected language, therefore, a knowledgeable programmer can limit the pressure placed on the collector and thereby improve performance. (Also, the Go installation comes with good tools for studying the dynamic memory performance of a running program.)
To give the programmer this flexibility, Go must support what we call interior pointers to objects allocated in the heap. The X.buf
field in the example above lives
within the struct but it is legal to capture the address of this inner field, for instance to pass it to an I/O routine. In Java, as in many garbage-collected languages, it is not possible to construct an interior pointer like this, but in Go it is idiomatic.
This design point affects which collection algorithms can be used, and may make them more difficult, but after careful thought we decided that it was necessary to allow interior pointers because of the benefits to the programmer and the ability to reduce pressure
on the (perhaps harder to implement) collector. So far, our experience comparing similar Go and Java programs shows that use of interior pointers can have a significant effect on total arena size, latency, and collection times.
In summary, Go is garbage collected but gives the programmer some tools to control collection overhead.
The garbage collector remains an active area of development. The current design is a parallel mark-and-sweep collector and there remain opportunities to improve its performance or perhaps even its design. (The language specification does not mandate any particular implementation of the collector.) Still, if the programmer takes care to use memory wisely, the current implementation works well for production use.
15. Composition not inheritance
Go takes an unusual approach to object-oriented programming, allowing methods on any type, not just classes, but without any form of type-based inheritance like subclassing. This means there is no type hierarchy. This was an intentional design choice. Although type hierarchies have been used to build much successful software, it is our opinion that the model has been overused and that it is worth taking a step back.
Instead, Go has interfaces, an idea that has been discussed at length elsewhere (see research.swtch.com/interfaces for example), but here is a brief summary.
In Go an interface is just a set of methods. For instance, here is the definition of the Hash
interface from the standard library.
type Hash interface { Write(p []byte) (n int, err error) Sum(b []byte) []byte Reset() Size() int BlockSize() int }
All data types that implement these methods satisfy this interface implicitly; there is no implements
declaration. That said, interface satisfaction is statically checked
at compile time so despite this decoupling interfaces are type-safe.
A type will usually satisfy many interfaces, each corresponding to a subset of its methods. For example, any type that satisfies the Hash
interface also satisfies the Writer
interface:
type Writer interface { Write(p []byte) (n int, err error) }
This fluidity of interface satisfaction encourages a different approach to software construction. But before explaining that, we should explain why Go does not have subclassing.
Object-oriented programming provides a powerful insight: that the behavior of data can be generalized independently of the representation of that data. The model works best when the behavior (method set) is fixed, but once you subclass a type and add a method,the behaviors are no longer identical. If instead the set of behaviors is fixed, such as in Go's statically defined interfaces, the uniformity of behavior enables data and programs to be composed uniformly, orthogonally, and safely.
One extreme example is the Plan 9 kernel, in which all system data items implemented exactly the same interface, a file system API defined by 14 methods. This uniformity permitted a level of object composition seldom achieved in other
systems, even today. Examples abound. Here's one: A system could import (in Plan 9 terminology) a TCP stack to a computer that didn't have TCP or even Ethernet, and over that network connect to a machine with a different CPU architecture, import its /proc
tree,
and run a local debugger to do breakpoint debugging of the remote process. This sort of operation was workaday on Plan 9, nothing special at all. The ability to do such things fell out of the design; it required no special arrangement (and was all done in
plain C).
We argue that this compositional style of system construction has been neglected by the languages that push for design by type hierarchy. Type hierarchies result in brittle code. The hierarchy must be designed early, often as the first step of designing the program, and early decisions can be difficult to change once the program is written. As a consequence, the model encourages early overdesign as the programmer tries to predict every possible use the software might require, adding layers of type and abstraction just in case. This is upside down. The way pieces of a system interact should adapt as it grows, not be fixed at the dawn of time.
Go therefore encourages composition over inheritance, using simple, often one-method interfaces to define trivial behaviors that serve as clean, comprehensible boundaries between components.
Consider the Writer
interface shown above, which is defined in package io
: Any item that has a Write
method
with this signature works well with the complementary Reader
interface:
type Reader interface { Read(p []byte) (n int, err error) }
These two complementary methods allow type-safe chaining with rich behaviors, like generalized Unix pipes. Files, buffers, networks, encryptors, compressors, image encoders, and so on can all be connected together. The Fprintf
formatted
I/O routine takes anio.Writer
rather than, as in C, a FILE*
. The formatted printer has no knowledge of what it is writing to;
it may be a image encoder that is in turn writing to a compressor that is in turn writing to an encryptor that is in turn writing to a network connection.
Interface composition is a different style of programming, and people accustomed to type hierarchies need to adjust their thinking to do it well, but the result is an adaptability of design that is harder to achieve through type hierarchies.
Note too that the elimination of the type hierarchy also eliminates a form of dependency hierarchy. Interface satisfaction allows the program to grow organically without predetermined contracts. And it is a linear form of growth; a change
to an interface affects only the immediate clients of that interface; there is no subtree to update. The lack of implements
declarations disturbs some people but it enables programs to grow naturally,
gracefully, and safely.
Go's interfaces have a major effect on program design. One place we see this is in the use of functions that take interface arguments. These are not methods, they are functions. Some examples should illustrate their power. ReadAll
returns
a byte slice (array) holding all the data that can be read from an io.Reader
:
func ReadAll(r io.Reader) ([]byte, error)
Wrappers—functions that take an interface and return an interface—are also widespread. Here are some prototypes. LoggingReader
logs every Read
call
on the incoming Reader
. LimitingReader
stops reading after n
bytes. ErrorInjector
aids
testing by simulating I/O errors. And there are many more.
func LoggingReader(r io.Reader) io.Reader func LimitingReader(r io.Reader, n int64) io.Reader func ErrorInjector(r io.Reader) io.Reader
The designs are nothing like hierarchical, subtype-inherited methods. They are looser (even ad hoc), organic, decoupled, independent, and therefore scalable.
16. Errors
Go does not have an exception facility in the conventional sense, that is, there is no control structure associated with error handling. (Go does provide mechanisms for handling exceptional situations such as division by zero. A pair
of built-in functions called panic
andrecover
allow the programmer to protect against such things. However, these functions are
intentionally clumsy, rarely used, and not integrated into the library the way, say, Java libraries use exceptions.)
The key language feature for error handling is a pre-defined interface type called error
that represents a value that has an Error
method
returning a string:
type error interface { Error() string }
Libraries use the error
type to return a description of the error. Combined with the ability for functions to return multiple values, it's easy to return the computed result
along with an error value, if any. For instance, the equivalent to C's getchar
does not return an out-of-band value at EOF, nor does it throw an exception; it just returns an error
value
alongside the character, with a nil
error
value signifying success. Here is the signature of the ReadByte
method
of the buffered I/O package's bufio.Reader
type:
func (b *Reader) ReadByte() (c byte, err error)
This is a clear and simple design, easily understood. Errors are just values and programs compute with them as they would compute with values of any other type.
It was a deliberate choice not to incorporate exceptions in Go. Although a number of critics disagree with this decision, there are several reasons we believe it makes for better software.
First, there is nothing truly exceptional about errors in computer programs. For instance, the inability to open a file is a common issue that does not deserve special linguistic constructs; if
and return
are
fine.
f, err := os.Open(fileName) if err != nil { return err }
Also, if errors use special control structures, error handling distorts the control flow for a program that handles errors. The Java-like style of try-catch-finally
blocks
interlaces multiple overlapping flows of control that interact in complex ways. Although in contrast Go makes it more verbose to check errors, the explicit design keeps the flow of control straightforward—literally.
There is no question the resulting code can be longer, but the clarity and simplicity of such code offsets its verbosity. Explicit error checking forces the programmer to think about errors—and deal with them—when they arise. Exceptions make it too easy to ignorethem rather than handle them, passing the buck up the call stack until it is too late to fix the problem or diagnose it well.
17. Tools
Software engineering requires tools. Every language operates in an environment with other languages and myriad tools to compile, edit, debug, profile, test, and run programs.
Go's syntax, package system, naming conventions, and other features were designed to make tools easy to write, and the library includes a lexer, parser, and type checker for the language.
Tools to manipulate Go programs are so easy to write that many such tools have been created, some with interesting consequences for software engineering.
The best known of these is gofmt
, the Go source code formatter. From the beginning of the project, we intended Go programs to be formatted by machine, eliminating an entire
class of argument between programmers: how do I lay out my code? Gofmt
is run on all Go programs we write, and most of the open source community uses it too. It is run as a "presubmit" check
for the code repositories to make sure that all checked-in Go programs are formatted the same.
Gofmt
is often cited by users as one of Go's best features even though it is not part of the language. The existence and use of gofmt
means
that from the beginning, the community has always seen Go code as gofmt
formats it, so Go programs have a single style that is now familiar to everyone. Uniform presentation makes code easier
to read and therefore faster to work on. Time not spent on formatting is time saved. Gofmt
also affects scalability: since all code looks the same, teams find it easier to work together or with
others' code.
Gofmt
enabled another class of tools that we did not foresee as clearly. The program works by parsing the source code and reformatting it from the parse tree itself. This
makes it possible to edit the parse tree before formatting it, so a suite of automatic refactoring tools sprang up. These are easy to write, can be semantically rich because they work directly on the parse tree, and automatically produce canonically
formatted code.
The first example was a -r
(rewrite) flag on gofmt
itself, which uses a simple pattern-matching language
to enable expression-level rewrites. For instance, one day we introduced a default value for the right-hand side of a slice expression: the length itself. The entire Go source tree was updated to use this default with the single command:
gofmt -r 'a[b:len(a)] -> a[b:]'
A key point about this transformation is that, because the input and output are both in the canonical format, the only changes made to the source code are semantic ones.
A similar but more intricate process allowed gofmt
to be used to update the tree when the language no longer required semicolons as statement terminators if the statement
ended at a newline.
Another important tool is gofix
, which runs tree-rewriting modules written in Go itself that are therefore are capable of more advanced refactorings. The gofix
tool
allowed us to make sweeping changes to APIs and language features leading up to the release of Go 1, including a change to the syntax for deleting entries from a map, a radically different API for manipulating time values, and many more. As these changes rolled
out, users could update all their code by running the simple command
gofix
Note that these tools allow us to update code even if the old code still works. As a result, Go repositories are easy to keep up to date as libraries evolve. Old APIs can be deprecated quickly and automatically so only one version
of the API needs to be maintained. For example, we recently changed Go's protocol buffer implementation to use "getter" functions, which were not in the interface before. We ran gofix
on all of
Google's Go code to update all programs that use protocol buffers, and now there is only one version of the API in use. Similar sweeping changes to the C++ or Java libraries are almost infeasible at the scale of Google's code base.
The existence of a parsing package in the standard Go library has enabled a number of other tools as well. Examples include the go
tool, which manages program construction
including acquiring packages from remote repositories; the godoc
document extractor, a program to verify that the API compatibility contract is maintained as the library is updated, and many
more.
Although tools like these are rarely mentioned in the context of language design, they are an integral part of a language's ecosystem and the fact that Go was designed with tooling in mind has a huge effect on the development of the language, its libraries, and its community.
18. Conclusion
Go's use is growing inside Google.
Several big user-facing services use it, including youtube.com
and dl.google.com
(the download server that
delivers Chrome, Android and other downloads), as well as our own golang.org. And of course many small ones do, mostly built using Google App
Engine's native support for Go.
Many other companies use Go as well; the list is very long, but a few of the better known are:
- BBC Worldwide
- Canonical
- Heroku
- Nokia
- SoundCloud
It looks like Go is meeting its goals. Still, it's too early to declare it a success. We don't have enough experience yet, especially with big programs (millions of lines of code) to know whether the attempts to build a scalable language have paid off. All the indicators are positive though.
On a smaller scale, some minor things aren't quite right and might get tweaked in a later (Go 2?) version of the language. For instance, there are too many forms of variable declaration syntax, programmers are easily confused by the behavior of nil values inside non-nil interfaces, and there are many library and interface details that could use another round of design.
It's worth noting, though, that gofix
and gofmt
gave us the opportunity to fix many other problems during
the leadup to Go version 1. Go as it is today is therefore much closer to what the designers wanted than it would have been without these tools, which were themselves enabled by the language's design.
Not everything was fixed, though. We're still learning (but the language is frozen for now).
A significant weakness of the language is that the implementation still needs work. The compilers' generated code and the performance of the runtime in particular should be better, and work continues on them. There is progress already; in fact some benchmarks show a doubling of performance with the development version today compared to the first release of Go version 1 early in 2012.
19. Summary
Software engineering guided the design of Go. More than most general-purpose programming languages, Go was designed to address a set of software engineering issues that we had been exposed to in the construction of large server software. Offhand, that might make Go sound rather dull and industrial, but in fact the focus on clarity, simplicity and composability throughout the design instead resulted in a productive, fun language that many programmers find expressive and powerful.
The properties that led to that include:
- Clear dependencies
- Clear syntax
- Clear semantics
- Composition over inheritance
- Simplicity provided by the programming model (garbage collection, concurrency)
- Easy tooling (the
go
tool,gofmt
,godoc
,gofix
)
If you haven't tried Go already, we suggest you do.
Authors
有疑问加站长微信联系(非本文作者)