Is there a good way to tell the optimal number of HTTP requests to make simultaneously?

blov · · 478 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;ve written a couple of web crawler-type programs in Go, and whenever I do, I run into the question of how many worker goroutines I should spawn. It has to be limited to keep from exhausting file descriptors, but what is a good limit? I usually just spawn 10 or 20 because those seem to give decent performance, but is there a good way of knowing beyond benchmarking (which can be unreliable since you&#39;re connecting to an external service anyway)? I suppose the same question goes for read from/writing to disk, but there I feel more confident about my ability to do a reliable benchmark.</p> <hr/>**评论:**<br/><br/>tmornini: <pre><p>The optimal mix is a calculus of:</p> <p>How many can the other end handle?</p> <p>How many can you fit in RAM?</p> <p>How many consume all CPU capacity?</p> <p>How many consume all network bandwidth?</p> <p>The answer is as many as possible until one of those limits is reached.</p></pre>comradeswitch: <pre><p>I have no problems firing off hundreds or thousands of simultaneous requests, typically, but I end up being limited by my network speed or whatever processing I&#39;m doing after receiving the response.</p> <p>It&#39;s going to be more important to avoid hammering one domain over and over and getting rate limited or worse, banned. </p> <p>I&#39;ve taken several approaches to handling spacing requests out. sync.Pool is always a good choice. If you wanted to be very careful about your rate of requests to one or more particular domains, a token bucket and http client per domain is the way to go imo. There are a bunch of implementations out there but it&#39;s easy enough to hack together yourself in just a few minutes.</p> <p>I have not used fasthttp yet or really looked into the details very closely, but I&#39;ve heard really good things and there are a bunch of constructs designed to better manage resources in the context of concurrency.</p> <p>The short answer is no, unfortunately. I was attempting to benchmark the number of concurrent requests a few days ago, and I came to the conclusion that, compared to the volatility of network speeds, servers out of your control, io, etc. the effect that variation in the number of requests was so small that if I benchmarked it, it would take longer than just picking something sane and letting it go.</p></pre>dewey4iv: <pre><p>I&#39;ve done similar work with Go and my experience is that your network is almost ALWAYS the limiting factor <em>unless</em> you aren&#39;t reaching across the internet. </p> <p>Figure out how much bandwidth you have available to you (realistically) and limit the number of requests based on that number.</p> <p>If you can make separate HEAD requests and do a little bit of simple math you can even push things a little harder and try to make sure that the number of in-flight requests doesn&#39;t exceed your available bandwidth. Obviously, this isn&#39;t super precise but it has worked out pretty well for me.</p></pre>onlywheels: <pre><p>Just to add to this. Is there a recommended way of getting a pool of go routines to each connect to a separate proxy to make their own requests? I&#39;ve experimented with this in the past but found when a go routine would change the proxy ip it would apply to the others currently running. Ideally i would have say 100 routines running with each their own proxy they connect through. Sometimes these proxies go down so if that routine couldn&#39;t complete the job it would pass it back to the pool for another to pick up. </p></pre>nhooyr: <pre><p>Why not just use the number of cores? <a href="https://golang.org/pkg/runtime/#GOMAXPROCS" rel="nofollow">https://golang.org/pkg/runtime/#GOMAXPROCS</a></p></pre>tmornini: <pre><p>That number is going to be far less than optimal due to network latency between the requestor and the responder.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

478 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传