Little things matter in Go

I am a system administrator. These days, Go is the language I craft my larger, more complex tools with. I've been using Go seriously for about a year-and-a-half now. Most of my previous programming experience, however, has been in languages like Perl and Python. When you work with those languages, prevailing wisdom is that it's almost never worth it to scrutinize the fine details of how data moves through programs. I don't mean that no attention is paid to data structure design or dataflow. I mean that casting and converting individual data from one type or format to another -- anytime, all the time, without much thought or concern -- is common. Part of this is because being loosely typed is a feature of those languages, so it's natural to treat data loosely, reshaping it as needed to make things easy on yourself. Another part of it is because in those languages, the runtime overhead provides a level of natural damping on theoretical performance gains from being careful to not, say, cast integers as a strings. In that milieu, optimization tends to involve things like using standard library code in place of hand-rolled code; finding wins from algorithm or code changes; moving from a pure-Perl/Python module to one with C bindings; or reworking clumsy structures into more easily manipulable ones, akin to refactoring a bad schema in an RDBMS. So I have been surprised to see the effects that being careful about accessing and structuring comparatively tiny pieces of data can have in Go. I've been working on a networked application recently. Down at the very heart of it is a utility package I wrote which more-or-less does Unix shell style tokenizing on a slice of bytes. It gets called twice for every socket read; if it's slow then the application will definitely be slowed down. It's also one of the first "serious" pieces of Go code that I ever wrote, so it seemed like an excellent piece to go back to, and to try to find some cleanups or optimizations. The first thing I noticed when revisiting the code was that I seemed to have had an allergy to dealing with things as bytes when I initially wrote it. I was carelessly converting slices of bytes to strings, and then back, all over the place. I now know that's inefficient, so I did away with as many conversions as I could and rewrote the library internals to work entirely with byteslices. Strings are only produced by functions which return them. Then I did away with something which seemed pointlessly inclusive: to detect whitespace I was matching against the Unicode whitespace regexp class. I changed this to a simple equivalence check for the ASCII space or horizontal tab characters. This may need to be revisited one day, but for now the less flexible approach works fine. I wanted to see how much of a speedup (if any?) these changes would cause, so I had added some simple benchmarking routines to my test suite beforehand. I wasn't expecting more than a 20-30% increase, but what I saw was a 4-5X increase. That didn't seem possible, so I checked out the "before" version and reran the benchmarks several times. Then back to the new version and benchmarked again. The numbers stubbornly refused to change: the code was now around 400% faster than it had been originally, due to a tiny set of changes which I would have thought almost inconsequential. After that, I realized that I could do a more traditional optimization and rewrite a function to cut out unnecessary work in a common scenario. That didn't have anything to do with Go, but it got me another 1.8X speedup for that use case (on the benchmarked inputs; actual performance gain will scale with input size). Finally, I realized that this common case was very predictable, and instead of returning a slice, it could always return a three element array. Would this be worth it? Surely a variable one to three slice appends wouldn't be detectably slower than three array assignments. Wrong again: this change netted another 10% speedup, to 2X faster than the original implementation. The library is now 4 to 5 times faster all around, and 8 to 10 times faster in my most common case. All due to paying close attention to how I accessed and moved around very small pieces of data. My takeaway is that once you have working and correct code, it's worth reviewing to see if you've been sloppy with data handling -- especially if you have a background in languages where this sort of scrutiny might be considered frivolous. In Go, it could make an unexpected difference. <hr/>**评论：** __crackers__: <pre>I'm a Python coder trying to get into Go, and this is something I've definitely noticed in the community. Optimisation is a much bigger topic amongst Gophers, which from my perspective, is amusing. The first program I ported to Go didn't seem to run significantly faster than the original Python, at a bit over 1 second. Then I realised I was using <code>go run script.go</code>, so the wallclock time included compilation, and the penny dropped when I also noticed the units in <code>log.Printf("done in %v", duration)</code> weren't seconds… The Go version was orders of magnitude faster. And these people want more???</pre>jmoiron: <pre>I think this might be largely historical. I can relate my own experience and that of many of my peers and attempt to generalise it, though I'm not sure it's appropriate. 3 years ago iron.io wrote one of the <a href="https://www.iron.io/how-we-went-from-30-servers-to-2-go/">earliest Go adoption posts</a> about replacing a ruby backend with Go and seeing 20-30x throughput improvements. About a year before that, <a href="http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html">r wrote</a> (in what is still the best piece of writing about the Go philosophy and of Go adoption) that he was surprised to see dynamic language users adopting it. I think what we had was a time period where: <ul> <li>Lots of backend programmers were around who had "come of age" with ruby and Python via rails and django (released 2004/2005)</li> <li>Lots of startups being built around "big data", data science, and internet services that would require higher throughput</li> <li>Go was first to "market" with a compiled language that had a modern toolchain, painless build system, and no weird baggage</li> </ul> All of these are kind of important. Companies that were already dealing with bigger data on the internet had already largely adopted compiled languages, even if they'd started with dynamic ones: scala replacing ruby at twitter ca. 2008/9ish, java & c++ with python at google ages ago, c++ at facebook via php compilation insanity. In this environment, if Go was to gain any traction, it'd be first as a language to displace some of the brittle and slow infrastructure glue that had been getting written in Python and ruby. It proved to be really good at this, so people wrote about it.</pre>__crackers__: <pre>I think a lot of the Pythonistas getting into Go were prompted to look around by the extremely slow-motion train wreck that is Python 3. We were told we'd have to rewrite all our programs for Py3 with very little to gain from it. In certain regards, Py3 doesn't so much solve Py2's issues as replace them with different, harder to solve problems. Py2 is in bugfix mode and Py3 is broken, so we went looking for an alternative, and Go is a great fit. It's philosophically attractive to the kind of person who likes Python and is vastly superior for many tasks.</pre>jmoiron: <pre>100% agree. When the announcement was made that 2.7 EOL was extended I tweeted: <blockquote> Last 5 years have been spent making the jump from 2.x to 3.x less difficult, not more compelling. </blockquote> There's certainly an element of that from a Python perspective. I don't have enough history/experience with Ruby to know if the 1.9 -> 2.x transition was similar, from what I can tell it's not quite so fractured. I also think that the Python authors underestimate how restrictive the GIL is. I spent years believing that "getting rid of the GIL would hurt single-CPU performance", and then here comes a language with better performance and no GIL hamstring in an 0.x version, and it worked just fine. Obviously it does not have the same semantics as Python and it's a bit apples/oranges, but it was certainly worth looking into.</pre>treeder123: <pre>The Ruby transition from 1->2 was barely noticeable. </pre>__crackers__: <pre><blockquote> There's certainly an element of that from a Python perspective. I don't have enough history/experience with Ruby to know if the 1.9 -> 2.x transition was similar, from what I can tell it's not quite so fractured. </blockquote> Not even close to Py2 vs 3. I'm no Ruby guru, but my understanding is that while much new greatness was added to v2, pre-2 code runs just the same as ever because Ruby 2 adds new features rather than altering existing ones. <blockquote> I also think that the Python authors underestimate how restrictive the GIL is. I spent years believing that "getting rid of the GIL would hurt single-CPU performance", and then here comes a language with better performance and no GIL hamstring in an 0.x version, and it worked just fine. Obviously it does not have the same semantics as Python and it's a bit apples/oranges, but it was certainly worth looking into. </blockquote> The thing I find most ironic/depressing is that Python had Go's concurrency model a decade ago with Stackless Python. Similarly, Twisted has been around for ~15 years, but still <code>asyncio</code> has somehow managed to steal its thunder, despite being over a decade late to the party and also a toy in comparison to Twisted. If Py3 had been stackless Python plus the new string model, instead of vanilla Py2 plus the new string model, I'm 99% certain that Py3 would have been a roaring success, and Python developers wouldn't be coveting Go's channels etc. Sadly, it clings to the old execution model while breaking all manner of other shit that worked just fine in Py2.</pre>jasonwatkinspdx: <pre><blockquote> Ruby to know if the 1.9 -> 2.x transition was similar </blockquote> It was mostly painless to update apps. But even though in the end it was painless, it took eons for ruby to get a half decent JIT, despite there being a funded and officially blessed small team working on it since the pre rails days. I think a lot of us lost faith in ruby-dev's ability to keep pace with change over those years. They are very nice and very dedicated folks, but they're a bit isolated by language and culture from the huge rails community, let alone the larger FOSS world.</pre>kurin: <pre>I'm working on a small library at work that has a function that will be invoked synchronously in every RPC call, which means all the goddamn time. This is the first time I've ever been working on this scale. It's down to ~600ns per call. (The C++ version of the library, which we wrote first, clocked in at ~5us per call, which the RPC library team flat out told us was a no-go. That's down to <200ns now.)</pre>__crackers__: <pre>Did you figure out why the C++ version was so much slower? I was being flippant wrt the speed of Go vs Python. It's really rather natural that optimisation is a bigger deal with Go than Python, simply because if execution speed is an absolute requirement, you wouldn't be using Python anyway.</pre>kurin: <pre>The C++ version was the initial implementation. After optimizations (data structures, etc) it became its lean self. The Go version was able to piggyback on the work done for the C++ version.</pre>princeandin: <pre><a href="http://img.pandawhale.com/post-40322-I-feel-the-need-for-speed-gif-JN84.gif" rel="nofollow">crackers we just want to Go FAST</a></pre>bestform: <pre>Very interesting. Thanks for the info and in depth analysis. This is yet another point, why it is so important to write good tests, write code, refactor code. Good thing you actually measured the performance instead of just observing that it is "somewhat faster" :)</pre>sboyette2: <pre>Thank you.</pre>b4ux1t3: <pre>Great write-up, but I do have one issue, which I'll put at the end and has nothing to do with the topic at hand. Go is the first language that I have used that made me feel comfortable working with raw data. And when I say raw, I mean literal bytes, as in your <del>application</del> library. Part of that comes from the fact that I found Go after I was already fairly competent at programming. I had a firm base of understanding under my feet before I even heard of Go, and as a result I was better able to grasp its concepts. But I don't think that's the only reason. (Also, all of this is coming from my personal experience. I have no references for any of this.) Go is just friendly. It's a tool that doesn't punish you for being wrong. If you screw up in C, for instance, you can severely damage your system (or, well, you could in the past, when I was starting to learn programming. I believe this has changed a lot, but I don't think that's a result of the language changing, just of operating systems being written better). I distinctly remember when I screwed up my first computer by writing what amounted to a virus completely by accident. (I wonder if I still have a copy of that program. I'd be curious to see how well it works nowadays.) Again, that was a result of me being young and having no idea what I was doing. But I've never had that problem with Go. I've never gotten to the point where I write code that I don't understand. And part of that is just how well-thought out Go is. When I was trying to learn C, apart from the actual documentation by GNU, there were various and sundry resources out there that all said different things. There were a million and a half different voices out there. Go is different. It's all-in-one. It is its own compiler, its own documentation, and its own build framework. If you need information on how Go works, you go to one place: golang.org. And sure, there are plenty of great communities out there for all different languages, Go included. But many of those communities have, like I said before, a million and a half different resources that they all hold up as gospel. Not Go. If you ask a question about Go, people will point you to Go's docs directly. All of that leads to what I took away from your post: Working with raw data is so easy in Go because Go seems designed to work with raw data. From it's sane slice syntax to its brilliant That got a little ramble-y. Sorry about that. Normally when I sing Go's praises, I get yelled at for being a hipster. <hr/> Now, my one small issue with your post. <blockquote> Part of this is because being loosely typed is a feature of those languages </blockquote> Python is very specifically a strongly-typed language. It is also a dynamically-typed language. I believe Perl is both as well, but don't quote me on that. For instance, here's Javascript, a weakly(loosely)-typed language: <h1>hello.js</h1> <pre><code>console.log("Hello" + 1) </code></pre> <h1>output</h1> <pre><code>"Hello1" </code></pre> And here is Python: <h1>hello.py</h1> <pre><code>print("Hello" + 1) </code></pre> <h1>output</h1> <pre><code>Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot concatenate 'str' and 'int' objects </code></pre> That said, you can do this, because it has dynamic typing: <pre><code>i = 1 i = str(i) print("hello" + i) </code></pre> The performance hits you're seeing with Python have nothing (okay, not nothing, but very little) to do with the typing paradigm in Python. It used to be the case that the overhead of checking what each variable is at runtime cost a lot. However, as far as I know, the sheer power of our hardware coupled with some very clever tricks have offset that cost in all but a few situations. <hr/> Sorry, that's just one of those things that really irks me. As a programmer, talking about programming is a lot easier when everyone is using the same dictionary. Static/dynamic and strong/weak typing mean specific things.</pre>earthboundkid: <pre>Perl is a weakly typed language. It has a different operator for numerical addition and string concatenation, but ultimately strings and numbers are considered to be the same thing by the runtime and implicit numerical conversions happen all the time, which is the definition of weak typing.</pre>metamatic: <pre>With bloatware endemic, it's easy to forget that computers these days are ridiculously fast. For example, I needed to chop up a malformed XML file and send the chunks to an XML parser, so I wrote a file scanner which uses a buffered stream and a circular byte buffer to scan for arbitrary byte sequences, slice out the sections between them, and parse the results. When I ran it on my test files, it was basically instant.</pre>gerbs: <pre>Which is fine for an application that parses XML files, since it most likely won't run more than once a second. Where as an application that "gets called twice for every socket read" is going to have a bit more potential to bog down resources when running many concurrent operations.</pre>metamatic: <pre><blockquote> Which is fine for an application that parses XML files, since it most likely won't run more than once a second. </blockquote> XMPP. Logging.</pre>fatAbb0t: <pre>Interesting stuff, thanks for sharing</pre>sboyette2: <pre>Thanks.</pre>ido50: <pre>Interesting post. I'm a Perl developer now working with Go for a few months since starting a new job. Obviously we are in different worlds now, coming from a loosely typed language to a (use) strict one, but I'm not sure the "need for optimizations on smaller levels" is that much different between Go and Perl or any language for that matter. As a Perl developer, I always revisit certain parts of code, trying to optimize them as much as possible, and over the years I've optimized small and large parts of my own and colleague's code with significant gains.</pre>Partageons: <pre>I'm probably going to get downvoted for this, but for your three-element array, would it be possible get the benefits of both by using <pre><code>variable := make([]yourType, 0, 3) </code></pre> ?</pre>sboyette2: <pre>I considered that later on, and my intuition is to agree with you: the append() is the expensive part, not the use of a slice.</pre>cathalgarvey: <pre>Thank you! Curious: is the array/slce situation still noticeably different if you specify a cap of 3 when making the slice? i.e <code>fooslice := make([]foo, 0, 3)</code>? Or, for that matter, allocating a length of three and assigning like you are already with the array? I ask because using arrays directly is generally ugly when it interfaces other code, so keeping things slice-like is nice if it doesn't carry a performance hit..</pre>sboyette2: <pre>Agreed. See my response to <a href="/u/Partageons" rel="nofollow">u/Partageons</a> :)</pre>earthboundkid: <pre>I wrote a <a href="https://github.com/carlmjohnson/sudoku" rel="nofollow">Sudoku solver</a> in Go a couple of years ago, and I was tinkering with it again this week. It was based on a Python version written by someone else, so I figured I'd go back and benchmark it for comparison. My Go version processes all the puzzles in the test suite in about 200ms. The Python version? At first, I assumed it was broken because it wasn't giving me an answer… Then I realized, no it works, it's just much, much slower. It took around 15 minutes to complete the test suite. I think the big difference is that the Go version is using bit flags for testing set membership and the Python version is using a <code>set</code> filled with strings for <code>1</code>, <code>2</code>, etc. Probably you could if you tried get the Python version to compete with the Go version if you used a <code>list</code> of <code>int</code> to do the calculations. But it's not natural to think about performance in the same ways in Python.</pre>schumacherfm: <pre>I'm doing a test and benchmark driven development. Sure it takes more time but once you know how to optimize your code you instantly code the "Go way". I'm coming from PHP ;-)</pre>besna: <pre>Did you check that you are reusing already allocated memory as much as possible? I found out that this doesn't really improve the performance overall, but makes the deriviation between one benchmark run to another much smaller. You can see how many allocs you do with the go test benchmark runs if you enable the memory flags.</pre>sboyette2: <pre>I haven't looked at that sort of thing yet. It sounds interesting though. I should play with that soon.</pre>

用户登录

今日阅读排行

一周阅读排行

最新主题