<p>I love channels, but I have to avoid them. I'm wondering if I'm doing something wrong?</p>
<p>My use case is: high-traffic server software. Each request gets a goroutine, does about 3ms of computation/database access, spits out some information, then quits. The part in question is spitting out the information. We have around 3000 requests in the system at any time, each with their goroutine. If they all spit out an item into a channel with 1 consumer, the channel backs up, which is fine.. However, it sometimes takes a very long time for that consumer to get scheduled, upwards of several milliseconds. I end up having to have a buffer of over 100k, and even then, the latency between writing and reading becomes too high for a lot of the time-based information I want to push.</p>
<p>What is the trick here? How do I ensure the consumer goroutine gets scheduled?Is this simply not a good use case for channels? Only about 50 goroutines have work at any given time, the rest are waiting on IO, and we run on 48 core machines so it surprises me that they don't get any cpu time.</p>
<hr/>**评论:**<br/><br/>drwiggly: <pre><p>For something like this I wrote a consumer pool like structure that would scale up and down creating more consumer go routines based on how many users were connected. I suppose in your situation you would want to scale based on number of in flight requests.</p>
<p>Here is basically the pattern I used with an example dummy app.</p>
<p><a href="http://play.golang.org/p/gEZdu4hDPr">http://play.golang.org/p/gEZdu4hDPr</a></p>
<p><em>update</em> - better protection on the counters.</p></pre>jerf: <pre><p>The system in question has 48 cores, a "mere" 3000 requests in flight at a time, and probably at least 16 GB of RAM, conservatively (64 or 128 or more is not out of the question). Screwing around with trying to dynamically scale up and down is probably a complete waste of valuable program complexity vs. just provisioning the system with what it needs to handle its max load, all the time.</p></pre>drwiggly: <pre><p>Yeah I was using to not run 1x1 into an external queue system.</p>
<p>You'll be at the mercy of the scheduler which I believe thinks channel operations are a good synchronization point. You might try pushing into a list then signaling another go routine via a channel, then that routine would drain the list avoiding scheduler operations, it might give your drain worker more cpu. You might try go-nuts but I'm pretty sure there is no way to signal to the scheduler a particular routine needs more time.</p></pre>jerf: <pre><blockquote>
<p>If they all spit out an item into a channel with 1 consumer, the channel backs up, which is fine.. However, it sometimes takes a very long time for that consumer to get scheduled, upwards of several milliseconds.</p>
</blockquote>
<p>So, I mean this politely and informatively, but I don't think there's enough information here to know what's going on remotely; all we can do is spitball some things to check.</p>
<p>So here's another thing to check... are you <em>sure</em> your consumer is dispatching the results as quickly as you think? If your consumer has any variance in the latency of whatever it is performing, it'll back up the whole system while it takes milliseconds to do whatever it is doing. If you can grease that up with debugging you might find the request handling isn't going as smoothly as you think. I don't know what that routine is doing, but if it's sending things on a network or touching disk, it's very easy to get much higher variance than you'd expect. Also recall that if you're using TCP, the <em>remote</em> side can slow you down by slowing the stream down, and writing to a slowed-down TCP stream is a blocking operation. (This is <em>normally</em> a good thing, but might not be if that blocking then blocks a central goroutine in your entire system.)</p>
<p>By Amdahl's Law, if all the parallel routines are draining into one goroutine that is doing "something", you'll be blocked in system performance by the performance of that routine no matter how you parallelize the rest of the system.</p>
<p>Second thing to check, since I don't know what the consumer is doing... pardon if I'm repeating something you know, but you know running multiple consumers on a channel is just as easy as running multiple producers? On a 48-core machine it is theoretically possible to blow out what one channel is capable of doing, but probably not with 3ms of processing time behind a request (Amdahl's law again, 3ms limits the rate at which you could be pushing things through a channel to probably within what a single channel could do even if all the cores were producing and consuming on one channel, though I wouldn't promise that's true, it's a bit close to the line. But this only matters if you really are going <em>full blast</em>, it sounds like you actually aren't anywhere near that). If there's any way you can spawn a couple dozen consumers, it's worth a try.</p>
<p>Channels are actually a bit runtime-expensive compared to simple mutexes because they are in fact full-on multi-producer, multi-consumer queues. You're paying for that capability regardless; you might as well take advantage of it.</p></pre>danredux: <pre><p>Some interesting food for thought...</p>
<p>One of the consumers in question did nothing more than keep a running average, max, min, and sum of the times being put into the channel. Everything would put a time into that channel.</p>
<p>Most the time, the channel would have a few hundred elements in it... But every now and then, possibly due to GC or bad scheduling or something, the consumer would get no time and the channel would back up to thousands. Ideally it would be totally unbuffered but that seems even worse.</p>
<p>If a consumer did <em>nothing</em> but read from an unbuffered channel, how long might you be blocking trying to push onto that channel? I will have to mock up this test to explain my problem better. Even with thousands of producers you would expect none of them to be blocking very long... </p></pre>jerf: <pre><blockquote>
<p>One of the consumers in question did nothing more than keep a running average, max, min, and sum of the times being put into the channel.</p>
</blockquote>
<p>In that <em>particular</em> case, the channel overhead is almost certainly dominating the overhead of doing the math in question. You might be better off just wrapping a lock around the relevant data. Then you'd pretty much only have cache coherency to worry about. It may be enough to carry the day.</p>
<p>Go, perhaps frustratingly, does lack some tools that makes this level of performance easier to obtain in some other specialized environments, such as the ability to ask "What CPU am I currently on?" and using that to key into a set of statistic structs for each CPU, to be collected and merged together later by some process that only does so when you care about the results.</p>
<p>You could also potentially see a win by simulating a channel with a buffer & a sync.Condition, and making it so when the consumer gets the Condition's lock, it entirely empties the buffer.</p></pre>nuunien: <pre><p>Perhaps you've not set <a href="https://golang.org/pkg/runtime/#GOMAXPROCS">GOMAXPROCS</a>?</p></pre>danredux: <pre><p>I have set GOMAXPROCS to the amount of cores on the machine.</p></pre>danredux: <pre><p>One sub-question to this.. If I have an unbuffered channel and a producer writes to it and blocks, why doesn't the consumer that's waiting to read get priority scheduling? I would expect that with 1 consumer and multiple producers all going full-speed-ahead, the scheduler would try its hardest to give that consumer dedicated cpu time and yet it doesn't. With enough producers, the consumer can be waiting several milliseconds to get a chance to read.</p></pre>jmoiron: <pre><p>There's the throughput/latency tradeoff I suppose. I keep channels off of really hot paths (stuff that needs to be sub-usec) but for batches of those they're worth the price of admission. I haven't had any trouble yet with the scheduler but I can't say I've actually <em>looked</em> for any trouble there. I suppose there's the old "use the source" nut, but you may have better luck asking this kind of question on the mailing list.</p>
<p>We tend to run a modest number of goroutines, and our messages contain many individual units by which we measure our throughput; basically, things are batched in hugely varying payloads, and work at the payload level is meaningless.</p></pre>jbuberel: <pre><p>I think you'd get much better feedback if you posted a working code sample or link to a repo/gist.</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传