Can go GC achieve zero latency?

blov · · 494 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>This may sound stupid but after I&#39;ve read a blog post about Go concurrent GC I was wondering: Given you have enough cores for GC to run without interfering with the main program can it achieve zero latency? What would make it stop the world? A use case is a audio/video transcoding program which needs predictable performance(zero latency) when run on a given hardware</p> <p>Edit: it seems video decoding was not a good example but there are plenty of applications that require real time / predictable performance so the question stands.</p> <hr/>**评论:**<br/><br/>zippoxer: <pre><p>Video codecs usually allocate ahead of time and don&#39;t generate much garbage. They usually know how large the buffers they need to process given an input resolution and settings. They re-use buffers allocated for first frame to process following frames.</p></pre>brokenprogram: <pre><p>Ok, maybe video decoding was not a good example but there are plenty of applications that require real time / predictable performance. </p></pre>TheMerovius: <pre><p>IIUC you are asking about the current implementation of the GC and in that case the answer is no. You might be able to get the latency very low, but not zero, as the algorithm still depends on a STW phase. This <em>might</em> change in the future, but AFAIK this isn&#39;t currently planned. go is an engineering project and as such it makes a tradeoff between latency and throughput and likely will get to a &#34;good enough&#34; for both and not really further.</p> <p>I also don&#39;t really understand why you&#39;d need zero latency for an audio/video transcoder. Surely a bounded latency together with good overall throughput of the transcoder is enough?</p></pre>Faffenheim: <pre><p>aclemens (who works on the GC with Richard Hudson) answered the following question:</p> <blockquote> <p>Is pause-less GC a long term plan?</p> </blockquote> <p>in the AMA a few months ago.<a href="https://www.reddit.com/r/golang/comments/46bd5h/ama_we_are_the_go_contributors_ask_us_anything/d040ow6?context=3" rel="nofollow"> Link to the answer</a>. tl;dr: no plans at all, not even long term.</p></pre>brokenprogram: <pre><p>That answers the question. I think best of both worlds would be set the compiler with a specific latency/throughput ratio so that you can get either more throughput and higher latency or the other way around based on your requirements.</p></pre>_ak: <pre><blockquote> <p>A use case is a audio/video transcoding program which needs predictable performance(zero latency)</p> </blockquote> <p>Predictable performance != zero latency. How many fps do you need to encode, how many can you practically encode? What&#39;s the GC&#39;s guarantee in terms of latency? Can you fit in your encoding time plus the GC latency in your encoding timeframe? That&#39;s the kind of questions you need to ask.</p> <p>FWIW, I&#39;m using Go for a soft realtime application at work, and it&#39;s totally usable if you know your requirements and time constraints.</p></pre>Fwippy: <pre><p>Exactly - if you&#39;re transcoding at 60FPS, that&#39;s ~17ms per frame. Given that go&#39;s GC pause time is often around 3-5ms, there&#39;s still plenty of room in which to do your work without jitters.</p> <p>Even if that tiny bump puts you outside of your frame-time requirements, you can always make up for it by just putting the tiniest buffer in front of the output - and in reality, video de/encoding is heavily variable in time-per-frame, so you&#39;re gonna want a bit of a buffer anyways. Unless you&#39;ve got more than enough headroom, in which case GC isn&#39;t going to be relevant anyways.</p></pre>FUZxxl: <pre><p>Maybe. That would require engineering a way to scan stacks while the corresponding Go routines are running, which is a little difficult.</p></pre>WellAdjustedOutlaw: <pre><p>Removing the STW phase doesn&#39;t make GC zero latency, because any execution thread that needs to GC is going to pause. Trying to move that work to a non-local core will incur a significant performance penalty.</p></pre>DoctrinalAsshole: <pre><p>You seem to assume that memory management can be zero cost. It was never free (for anyone). Cost manifests itself in runtime, abstraction, complexity, developer overhead, debugging, or other bookkeeping. Somebody pays—always.</p> <p>You could attempt a zero allocation design in Go, but you&#39;d be hedging your bets that the compiler, runtime, and memory manager never evolve. That is fragile. What is the cost of going overboard with reading the tea leaves of the escape analyzer?</p></pre>Entropy: <pre><p>Preallocation and reuse are the ways to not generate garbage. What bet are you hedging by not generating garbage to begin with?</p></pre>jringstad: <pre><p>memory management cost can be zero (or basically zero &amp; bounded) if you only allocate once (which is feasible for something like video decoders) or if you use your own simple allocator (e.g. a stack-based allocator, a linear allocator, ...) where you just increment a number to allocate and decrement it to de-allocate. You can find libraries for these to use, and if they fit your problem description, they don&#39;t really add any extra developer overhead, I would say.</p> <p>Not sure what you mean by &#34;never evolve&#34;. If you write your go program such that it never allocates (or only allocates once), you will always have optimal performance, short of some sort of big, strange performance regression happening in the go runtime. Writing it the &#34;naive&#34; way might result in code that gets faster over time as the go runtime improves, but I don&#39;t think it will ever beat out the allocate-once implementation. If you need a piece of code to really fly, I think it&#39;s a legit strategy<sup>*.</sup> Fitting an algorithm into an &#34;allocate once&#34; strategy doesn&#39;t necessarily make its design more complicated or convoluted either, although that can certainly be the case as well.</p> <p><sup>*</sup> It&#39;s a judgement call, of course, but then, compiled go code is already pretty fast, so it shouldn&#39;t be a judgement call you need to make often.</p></pre>jringstad: <pre><p>It&#39;s somewhat meaningless to ask that question without a specific use-case. In general, for a complex system written in idiomatic go, no, absolutely not. In specific cases like video decoding, the answer can surely be yes. You will probably need to bend over backwards in a few ways, like not using queues (presumably you&#39;ll have to use mutexes instead) et cetera, but it&#39;ll likely be possible to do video decoding with minimal or zero garbage created.</p> <p>Video <em>en</em>coding would be significantly more challenging, and it&#39;ll depend on what codec and how you do it. I think it&#39;d still be doable, but a go implementation would likely not be very competitive with a C (or even GPU-based) variant for various other reasons, like a lack of SIMD, array access speeds and probably slightly reduced performance on TLP primitives like mutexes. Again some of these an be circumvented with effort, but I haven&#39;t seen e.g. a usable solution for SIMD in go yet, short of writing assembly. This kind of thing hits building blocks like UMHexagonS, hadamard transforms, pixel interpolation used in codecs pretty hard in terms of speed. So even if you can get the GC latency to be insignificant here, you&#39;d probably not care to do this over just using a C library (or a hardware interface like OpenMAX or whatever.)</p></pre>: <pre><p>[deleted]</p></pre>brokenprogram: <pre><p>I though the Go concurrent GC could give you this option. I think &#34;productivity&#34; is the main reason to use a GC language instead of non-GC language unless I&#39;m missing something...</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

494 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传