Scaling option for Go service on a 64 core machine

xuanbao · · 689 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

So we wrote a go service which does 30k+QPS using ZMQ(via cgo calls) on a 16 core EC2 instance. We now want to put it on a private cloud with 56 core VM. We want to scale up and there are 2 ways we can do it. <ol> <li>Launch 4 instances of same service as it is with config changes.</li> <li>Launch 4 instances of same service with following option. <ul> <li>GOMAXPROCS=16 ./run_my_unoptimized_code</li> </ul></li> </ol> We can perhaps ping each service to 16 cores each via taskset. Which one would be wise choice? Does anyone have experience with scenarios like this? FWIW, the service uses 3 sets of channels for a pipeline pattern and around 50 goroutines if it matters. Sorry, code is owned by employer so cannot share :( <hr/>**评论：** kostix: <pre>Two points: <ul> <li>The result of pinning your instances to <code>taskset</code> is almost the same as specifying <code>GOMAXPROCS</code>: the instance would see exactly the number of cores it was pinned to, and that would make it create that many <code>P</code>s ("processes"—thigies used to run goroutines on OS threads). Of course, the remaining difference is that in the case of using <code>taskset</code>, the OS won't schedule the instance's threads on the other 16 cores.</li> <li>The scheduler in the Go runtime is itself concurrent: that is, it is not implemented as something monilithic protected by a global lock—quite on the contrary, as much as can be done concurrently, is done concurrently. Specifically, each <code>P</code> has its own run queue of the goroutines, and different <code>P</code>s are able to steal goroutines from the runqs of other <code>P</code>s w/o touching the global scheduler state.</li> </ul> That said, one another point to possibly consider is that goroutines are not fully preemptible (that's actually a good thing but read on): this means that long runs of Go code which do not call any functions could effectively "pin" a goroutine to its underlying <code>P</code> (and hence to its underlying <code>M</code>—the "machine", an OS thread) preventing fair distribution of CPU quanta across goroutines. In such cases, the more <code>P</code>s ther scheduler has, the better, but such cases are pathological anyway. You could try to see whether you have such a case by inspecting a so-called "scheduler trace" captured over a run under a typical workload—see <a href="https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs" rel="nofollow">this</a>.</pre>fakeNAcsgoPlayer: <pre>Thanks, we decided to move ahead with default options and let OS choose what is best for the 4 processes.</pre>robe_and_wizard_hat: <pre>Benchmark both scenarios and see which one performs better.</pre>tmornini: <pre>This.</pre>fakeNAcsgoPlayer: <pre>Don't have this luxury as we do not own the Cloud, hence the question here. :) </pre>tuxlinuxien: <pre>Since you have more cores, why don't you let your process using all the cores?</pre>fakeNAcsgoPlayer: <pre>I would gladly, I am curious what should be strategy? Let OS handle each instance or restrict each instance to a fixed number of cores? I am leaning towards not touching anything and see how it goes. I am assuming even though runtime among the instances cannot communicate with each other, OS would fairly allocate resources for all 4 of them.</pre>tuxlinuxien: <pre>Well, If you only want to limit the number of cores per process, it will require you to setup CPU scheduling. Let's imagine you have 40 cores and you want to launch 4 processes, <ul> <li>P1 => core 0-9</li> <li>P2 => core 10-19</li> <li>P3 => core 20-29</li> <li>P4 => core 30-39</li> </ul> As you see, each process will not share core resources (the system will consume some processing power still) but it's a bit harder to set up. From my point of view, I would just let my one process using all the cores, It's easier to setup. if you still want to go with your way, then check <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html" rel="nofollow">cgroups</a></pre>fakeNAcsgoPlayer: <pre>Looks like that is what I am going to do.</pre>lexpi: <pre>This maybe a stupid question but since it's on the same box any reason to have 4 instances can't you just have 1 larger?</pre>fakeNAcsgoPlayer: <pre>Well like I mention in OP, the code uses CGO, so it won't scale as you are expecting. Also, redundancy is nice to have. Plus scaling via process model is not only simple, it is easy to reason about. </pre>lexpi: <pre>I understand the redundancy reason, but what about cgo calls that make the single process instance scaling up difficult? Not arguing just genuinely curios.</pre>tmornini: <pre>With that many go routines, and assuming it bakes out all cores, simply running it on a larger machine may well result in higher throughout. If not, I'd wrap the entire pipeline structure and launch as many pipelines (the current set of go routines) as you desire...</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

689 次点击

加入收藏微博

runtime

linux

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Scaling option for Go service on a 64 core machine

用户登录

今日阅读排行

一周阅读排行

最新主题