How will Golang's GC perform with very large memory footprints and fast garbage generation?

polaris · · 612 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Someone I know and respect claimed the following: that for 30 years, people have <em>incorrectly</em> claimed to be able to create new GCs that allow for apps with much, much larger memory footprints. I can sort of see the reason why this would be difficult even today -- memory is far away from the CPU, so we need look-ahead caches and try to take advantage of locality and serial scanning. But even if we can scan pages serially, the pointers in these pages probably point to random virtual memory addresses...</p> <p>So, if multiple cores are generating complex garbage very quickly, and the application has a large memory footprint with many pointers, how can Golang&#39;s GC handle the load? Will Golang&#39;s GC be a parallel GC?</p> <p>Mostly, I want to know if Golang might be able to handle a long-lived application with multiple TB&#39;s of memory many pointers, with fast concurrent garbage generation. Or maybe such an app is not within Golang&#39;s design scope.</p> <hr/>**评论:**<br/><br/>f2f: <pre><p>i could just tell you to search through the group, but instead i&#39;ll try to be more helpful. check this: </p> <p><a href="http://www.infoq.com/presentations/go-gc-performance" rel="nofollow">http://www.infoq.com/presentations/go-gc-performance</a></p> <p>see the slide around the 27 minute mark. that leaves a lot. and i mean <em>a</em> <em>lot</em> of room for applications. notice that we&#39;re talking 200, 250 gigabytes of heap sizes now! </p></pre>tendermint: <pre><p>Thanks. The slide shows GC pause time at around the 200, 250 GB range. I&#39;m sure that pause time can be mitigated, but bounded pause time != effective GC, if the GC can&#39;t collect enough at the rate of generation. And the bigger the heap size, the more slower it will be to collect objects, right?</p></pre>SportingSnow21: <pre><p>The pause time is when all free memory is released to the stack/os. The entire stack and heap are scanned between STW pauses, so stale objects aren&#39;t ageing through multiple cycles.</p></pre>tendermint: <pre><p>Yes, but if the heap is huge, the scan time between STW pauses must get very long. For a fixed rate of garbage generation, if the heap is large enough, a single threaded GC routine won&#39;t be able to keep up.</p> <p>I guess my question boils down to, is/will the Golang GC (become) parallel? This thread suggests that it is. <a href="https://groups.google.com/forum/#!searchin/golang-nuts/golang$20garbage$20collector$20parallel/golang-nuts/YaRDry8ZCYk/F5T1w5D-AwAJ" rel="nofollow">https://groups.google.com/forum/#!searchin/golang-nuts/golang$20garbage$20collector$20parallel/golang-nuts/YaRDry8ZCYk/F5T1w5D-AwAJ</a></p></pre>nussjustin: <pre><p>Go has a concurrent GC that normally uses 25% of your GOMAXPROCS threads for concurrent operations and only pauses for some cleanup work (which gets less with each release). If the GC can&#39;t keep up with your allocations it will force your goroutines to assist, thus slowing down allocations and shortening GCs. Also with Go 1.7 (coming out I think fall) the sweeper will be mostly &#34;removed&#34; (see <a href="https://github.com/golang/proposal/blob/master/design/12800-sweep-free-alloc.md" rel="nofollow">https://github.com/golang/proposal/blob/master/design/12800-sweep-free-alloc.md</a>)</p></pre>iends: <pre><p>The go team has made no promises yet, but it seems like it&#39;s been hinted at. It&#39;s certainly something they&#39;ve seemed to consider.</p></pre>kjk: <pre><p>GC load is not about the size of memory but about the number of pointers that need to be scanned. 1TB []byte slice takes no time to GC. Create 1 GB of small structures that all reference each other via pointers and GC will spend a lot of time chasing those pointers.</p> <p>That being said, no GC will keep up with terabytes of heavy allocations. Neither will straightforward manual management with malloc/free.</p> <p>There is no magic technology that absolves you from perf tuning your program, especially if you&#39;re planning to do something that requires touching terabytes of memory. </p> <p>If your program dos lots of randomly-sized, randomly ordered, small allocations with malloc()/free() (i.e. the &#34;efficient&#34; manual memory management) then even with the best memory allocator you&#39;ll spend majority of your CPU time in malloc()/free() calls and you&#39;ll fragment your memory and you&#39;ll find yourself having gigabytes of total free memory but unable to allocate a contiguous 1 MB chunk.</p> <p>Go&#39;s GC used to be relatively simple but with 1.5 it&#39;s very good, will be even better in 1.6 and probably will continue to improve for Go&#39;s lifetime because, reading between the lines from Go commits, the improvements are driven by internal Google needs and if there&#39;s a company that exercises Go code with 1 TB of memory, that&#39;s Google.</p> <p>Go&#39;s GC is probably in the top 5 implementation out there. Maybe Oracle&#39;s Java&#39;s GC is better and if you really need absolutely the best, you can give your money to Azule for their Java implementation.</p> <p>If Go&#39;s GC is not good enough to keep up with your software, then you&#39;re unlikely to find a better implementation that will and it&#39;s time to start writing your code in a way that optimizes the thing that is slow. Generating lots of garbage is slow.</p> <p>I once significantly sped up poppler by adding a simple custom memory allocator because the way the program was written, it was allocating a lot of (small) strings all over the place.</p> <p>GC&#39;ed code can also be optimized: use less pointers (you can e.g. replace pointers to objects with an index into an array of objects, the array serving as a custom pool allocator).</p> <p>The good thing about Go is that it makes those kinds of memory optimizations much easier because, unlike Java where everything is a reference (i.e. a pointer), Go allows embedding structs by value.</p> <p>For example, I&#39;ve sped up a binary-trees benchmarks by 4x by implementing node references with integers, not pointers, and allocating nodes in bulk. See <a href="http://blog.kowalczyk.info/article/u5o7/Speeding-up-Go-and-C-with-custom-allocators.html" rel="nofollow">http://blog.kowalczyk.info/article/u5o7/Speeding-up-Go-and-C-with-custom-allocators.html</a></p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

612 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传