How to use go channels to solve this problem?

xuanbao · · 459 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I wrote a scraper which collects all image links from every page in a given subreddit(e.g. wallpapers). A linkcrawler function takes a subreddit url and it uses goquery to find images links and next page link and calls itself with next page link as argument, meanwhile for every image it finds it calls (go download(img_link)) , in order to constrict throughput of downloads, scrape function need to know how many images are downloading at the moment so that it decides whether to call itself with the next page link or wait. </p> <p>code source : <a href="https://github.com/nikhil264/rscraper/blob/master/rscraper.go" rel="nofollow">https://github.com/nikhil264/rscraper/blob/master/rscraper.go</a></p> <hr/>**评论:**<br/><br/>epiris: <pre><p>Tons of ways to go about this, but I personally would use <a href="https://godoc.org/golang.org/x/sync/errgroup#ex-Group--Parallel" rel="nofollow">errgroup</a> to create N sized worker list and a single goroutine to feed them urls, pages, however you divide your work then close the channel it sends on once it&#39;s done. Have your main collect from that channel via a range loop.</p></pre>adamtanner: <pre><p>Sounds like you want a semaphore. There are several implementations on Github and a trivial channel based implementation is given as an example in Effective Go.</p> <p><a href="https://golang.org/doc/effective_go.html#channels" rel="nofollow">https://golang.org/doc/effective_go.html#channels</a></p></pre>bru7us: <pre><p>You can create a buffered channel to act as a &#34;token bucket&#34;. Create the channel and &#34;fill&#34; it before scraping; pass the channel to each download() goroutine and start the func with a blocking read on the channel. If a token is available, the read will unblock and the download can continue. Defer putting a token back into the channel, so the next download can commence on completion or failure of this one.</p></pre>fancy_pantser: <pre><p>You can emulate this behavior without a channel or tokens by just using a shared counter.</p></pre>bru7us: <pre><p>The issue with a shared counter is that you have to manage it and it&#39;s not efficient to wait for the count to drop, you either have to busy-poll the counter in a loop until it&#39;s drops low enough for your thread to start, chewing up cpu cycles that could be used by the other download workers. </p> <p>With the blocking nature of channels, you turn this: </p> <pre><code>for count &gt;= limit { } count++ // risky race condition here ... //do work count-- </code></pre> <p>into this: </p> <pre><code>&lt;-tokens defer func(){ tokens &lt;- struct{}{} } //do work </code></pre> <p>No busy polling on the cpu, and no race.</p></pre>robe_and_wizard_hat: <pre><p>Yeah, agreed. Channel semantics are a great way to avoid busy polling for a contended resource.</p></pre>fancy_pantser: <pre><p>Just for edification, there are some reasonable alternatives available.</p> <p>While it doesn&#39;t come with semaphores, Go does have atomic counters in <code>sync/atomic</code> and <a href="https://golang.org/pkg/sync/#Cond" rel="nofollow">Cond</a> can be used with Wait/Broadcast and related functions. You can also use <code>Pool</code> in a similar way to your channel example.</p> <p>Of course, because they&#39;re so handy, Go&#39;s <code>runtime</code> package does come with <a href="https://golang.org/src/runtime/sema.go" rel="nofollow">a semaphore implementation</a> but you shouldn&#39;t sue that directly. There&#39;s also <code>x/sync/semaphore</code> as well (<a href="https://godoc.gopheracademy.com/golang.org/x/sync/semaphore" rel="nofollow">docs</a>).</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

459 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传