Ideas needed: Solving thundering heards

xuanbao · · 633 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

Hi I am writing a service which is suffering from some thundering heard problems. Here's broadly what it does <ul> <li>Receive messages from a queue, fire up go routines to deal with each one</li> <li>Within the go routine read the payload for a collection of IDs</li> <li>For each ID, go to a cache layer and see if the corresponding content exists</li> <li>For each ID that doesn't exist go to <code>client</code> which does a HTTP GET to a service to retrieve the content. Assuming all is well put it in the cache. Important to note we can sometimes see performance problems where the service can take longer than 5s to respond.</li> <li>Do stuff with the content! </li> </ul> This works broadly fine, however when there is a burst of messages where the IDs are the same we run into a problem where you have many go routines <ul> <li>Go to cache, find nothing</li> <li>Fire off same HTTP request N times</li> </ul> Ideally we only want the client to get the resource once which is why we have the cache. It can sometimes be an expensive call and put a lot of load on the downstream service. It would be ok conceptually for things to block waiting for the 1 request to finish and get the content. I figured I need to make it so the <code>client</code> is aware of what content it is currently fetching and if it gets the same one again it doesn't make the request but just waits for the current one to finish and return the result of that. I hacked something together which held an internal <code>map[string]bool</code> of ids it was fetching and used channels to return the response to the multiple calls but it felt pretty hacky. This doesn't seem like an especially unique problem so I was wondering if there were any tips, design patterns or even libraries to deal with this issue. Cheers edit - Herds ¬_¬ <hr/>**评论：** Tacticus: <pre>Have you looked at how groupcache prevents thundering herds? IIRC it uses a consistent hash to set a responsible node for that key and that is the only host that goes and fetches from the backend. this works well because the hash ring allows hosts to be added and removed and only a proportional set needs to be rebalanced onto the new or from the old <a href="https://github.com/golang/groupcache" rel="nofollow">https://github.com/golang/groupcache</a></pre>ligustah: <pre>If you just need it within a single process you can also use <a href="https://godoc.org/golang.org/x/sync/singleflight" rel="nofollow">https://godoc.org/golang.org/x/sync/singleflight</a> , which is basically what groupcache uses under the hood (AFAIK it was first developed for groupcache and extracted later).</pre>quiI: <pre>This looks really interesting, thanks</pre>justinisrael: <pre>I have a service which converts and serves preview quality images from a cache, after converting them from source. We had the same issue with thundering herd request for the same source image which was not cached. I ended up adding support for addressing this. The first thing I did was update my cache item interface to have something like a Get() or Value() method. Then when a request comes in for an uncached image I immediately place a "promise" concrete item into the cache for the key. The promise contains a mutex which is initially locked. It remains locked until the initial goroutine finishes processing the image and updates the promise item with the final value and unlocks. Any other goroutines that come in for the same value will look up the cached promise and block when calling Get() for the value until the original goroutine has finished. The refactor was kind of a drop in replacement to the cache since consumers don't know the difference between getting the value now or once it is ready. </pre>cdoxsey: <pre>singleflight is the way to go for something like this. In general a circuit breaker is also useful for thundering herds: <a href="https://martinfowler.com/bliki/CircuitBreaker.html" rel="nofollow">https://martinfowler.com/bliki/CircuitBreaker.html</a> You can use a leaky bucket on the number active requests, error rates, or latency, and once exceeded block all new requests for a period of time.</pre>earthboundkid: <pre>"Herd".</pre>Creshal: <pre>Slap varnish in front of the backend, it can handle thundering herds by bundling incoming requests for the same resource, processing all of them with a single request to the backend proper.</pre>quiI: <pre>I didn't know varnish can do this! Interesting</pre>Creshal: <pre>There doesn't seem to be any dedicated documentation to it because it's enabled by default, but it's touched on occasionally: <a href="https://varnish-cache.org/docs/4.0/users-guide/vcl-grace.html" rel="nofollow">https://varnish-cache.org/docs/4.0/users-guide/vcl-grace.html</a> <a href="https://info.varnish-software.com/blog/hit-for-pass-varnish-cache" rel="nofollow">https://info.varnish-software.com/blog/hit-for-pass-varnish-cache</a></pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

633 次点击

加入收藏微博

groupcache

godoc

goroutine

github

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Ideas needed: Solving thundering heards

用户登录

今日阅读排行

一周阅读排行

最新主题