i/o timeout (dial udp) on a crawler

polaris · · 2120 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I have been using the PuerkitoBio&#39;s fetchbot to crawl some pages and receive a bunch of:</p> <pre><code>[ERR] HEAD http://bbc.com - Head http://www.bbc.com/: dial tcp: lookup www.bbc.com on 127.0.1.1:53: read udp 127.0.0.1:53755-&gt;127.0.1.1:53: i/o timeout </code></pre> <p>I opened up an issue but I&#39;m not sure how active the project still is: <a href="https://github.com/PuerkitoBio/fetchbot/issues/23" rel="nofollow">https://github.com/PuerkitoBio/fetchbot/issues/23</a></p> <p>I noticed there was also a closed issue <a href="https://github.com/golang/go/issues/16865" rel="nofollow">https://github.com/golang/go/issues/16865</a> and wanted to know if this is being fixed or if someone smarter than me can enlighten me?</p> <p>I&#39;ve tried on several different versions of Go - 1.7, 1.7.1, 1.7.5 and 1.8 linux/amd64. I am running Ubuntu 16.04 (which I upgraded from 14.04 and got the same errors).</p> <p>EDIT: The answer seems to be that my router was using Google&#39;s DNS servers. I removed that and now everything seems to be working fine.</p> <hr/>**评论:**<br/><br/>adrian_blx: <pre><p>Your DNS(-cache) seems to have issues.</p></pre>userofmostinterest: <pre><p>I have been messing around with the some of the http client&#39;s settings:</p> <pre><code>f.HttpClient = &amp;http.Client{ Transport: &amp;http.Transport{ TLSClientConfig: &amp;tls.Config{InsecureSkipVerify: true}, Dial: (&amp;nett.Dialer{ Timeout: 30 * time.Second, Resolver: &amp;nett.CacheResolver{TTL: 10 * time.Minute}, }).Dial, DisableKeepAlives: true, }, Timeout: 40 * time.Second, } </code></pre> <p>So, I am caching the DNS lookup for 10 minutes and setting some timeouts. Is there anything else I can try?</p></pre>userofmostinterest: <pre><p>The full gist can be found here: <a href="https://gist.github.com/kristen1980/9d689b6ae0ab9f8a330c4598060295e4" rel="nofollow">https://gist.github.com/kristen1980/9d689b6ae0ab9f8a330c4598060295e4</a></p></pre>userofmostinterest: <pre><p>Thanks! The DNS was the issue. Turns out my router was using Google&#39;s DNS servers. I removed that and can now crawl in peace. Thanks for pointing me in the right direction!</p></pre>Yojihito: <pre><p>What do you want to fetch from the BBC site?</p></pre>userofmostinterest: <pre><p>It isn&#39;t just the BBC site that fails. I get multiple failures.</p></pre>userofmostinterest: <pre><p>I get several thousand of these errors quickly all for different domains and urls. I just posted one example and didn&#39;t want to post a repetitive looking log.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

2120 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传