<p>Code: <a href="https://github.com/Marmeladenbrot/Crawler/tree/master/src/crawler" rel="nofollow">https://github.com/Marmeladenbrot/Crawler/tree/master/src/crawler</a></p>
<p>Site: <a href="http://www.baldur-garten.de/" rel="nofollow">http://www.baldur-garten.de/</a></p>
<p>I'm running into an odd error on this specific site - while crawling this site it crashes with the mentioned error, but it works perfect on other sites.</p>
<p>I tried to track it down to a specific site but the number until it crashes varies on every run so I have no idea what and why went wrong.</p>
<p>Does anybody know how I could fix this?</p>
<p>Stack traces: <a href="http://www.file-upload.net/download-11018443/StacktraceLogs.tar.gz.html" rel="nofollow">http://www.file-upload.net/download-11018443/StacktraceLogs.tar.gz.html</a></p>
<pre><code>goroutine 9 [running]:
bufio.(*Reader).Read(0xc82298d6e0, 0xc8228c608d, 0x8, 0x200, 0xc820001e00, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:214 +0x3c8
io.ReadAtLeast(0x7f76e4a9db20, 0xc82298d6e0, 0xc8228c608d, 0x8, 0x200, 0x8, 0x0, 0x0, 0x0)
/usr/local/go/src/io/io.go:298 +0xe6
io.ReadFull(0x7f76e4a9db20, 0xc82298d6e0, 0xc8228c608d, 0x8, 0x200, 0x0, 0x0, 0x0)
/usr/local/go/src/io/io.go:316 +0x62
compress/gzip.(*Reader).Read(0xc8228c6000, 0xc82263a01f, 0xfe1, 0xfe1, 0x0, 0x7f76e4a98028, 0xc82000a150)
/usr/local/go/src/compress/gzip/gunzip.go:259 +0x2be
net/http.(*gzipReader).Read(0xc822dd84c0, 0xc82263a01f, 0xfe1, 0xfe1, 0xc82263af60, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1346 +0x14a
net/http.(*bodyEOFSignal).Read(0xc8220af080, 0xc82263a01f, 0xfe1, 0xfe1, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1296 +0x26a
golang.org/x/net/html.readAtLeastOneByte(0x7f76e4a9dc00, 0xc8220af080, 0xc82263a01f, 0xfe1, 0xfe1, 0xc82002ad80, 0x0, 0x0)
/home/m/go/src/golang.org/x/net/html/token.go:299 +0x68
golang.org/x/net/html.(*Tokenizer).readByte(0xc8247f0680, 0xc8247f0664)
/home/m/go/src/golang.org/x/net/html/token.go:273 +0x44e
golang.org/x/net/html.(*Tokenizer).readTagAttrVal(0xc8247f0680)
/home/m/go/src/golang.org/x/net/html/token.go:917 +0x108
golang.org/x/net/html.(*Tokenizer).readTag(0xc8247f0680, 0xb01)
/home/m/go/src/golang.org/x/net/html/token.go:831 +0xb0
golang.org/x/net/html.(*Tokenizer).readStartTag(0xc8247f0680, 0x6c)
/home/m/go/src/golang.org/x/net/html/token.go:779 +0x35
golang.org/x/net/html.(*Tokenizer).Next(0xc8247f0680, 0x81262a)
/home/m/go/src/golang.org/x/net/html/token.go:1021 +0x20a
main.collectLinks(0xc822cfa8c0, 0xd1, 0x7f76e4a9dc00, 0xc8220af080, 0x0, 0x0, 0x0)
/home/m/workspace/Crawler/src/crawler/collectLinks.go:21 +0x1e7
main.Crawl(0xc822cfa8c0, 0xd1, 0x2)
/home/m/workspace/Crawler/src/crawler/crawl.go:55 +0xc0e
main.worker(0x2)
/home/m/workspace/Crawler/src/crawler/main.go:139 +0x1c0
created by main.main
/home/m/workspace/Crawler/src/crawler/main.go:106 +0x7a3
</code></pre>
<hr/>**评论:**<br/><br/>itsmontoya: <pre><p>Why do you have resp as a global var? If you do any kind of concurrency, this is going to create issues (<a href="https://github.com/Marmeladenbrot/Crawler/blob/master/src/crawler/crawl.go#L9" rel="nofollow">https://github.com/Marmeladenbrot/Crawler/blob/master/src/crawler/crawl.go#L9</a>)</p></pre>Yojihito: <pre><p>As far as I remember I had problems with re-declaring the variable in the loop without the global var.</p>
<p>If I comment that out "resp" is undeclared?</p></pre>cfsalguero: <pre><p>after maxRetries unsuccessful requests, there is no body to close/parse.
I thinks that's the reason because it is failing.
You are doing links := collectLinks(link, resp.Body) but probably resp.Body is nil.</p></pre>Yojihito: <pre><p>But I return from the function = ending it if that happens? There shouldn't be any function call after the return?</p>
<pre><code>if i == maxRetries {
Error.Printf("ERROR \t RESP Connection Error for workerID %d : %s : %s \n", workerID, link, err)
AddErrCount()
mutex.Lock()
visited[link] = true
mutex.Unlock()
return
</code></pre></pre>IntellectualReserve: <pre><p>What version of Go are you using? I've been letting your crawler run on my machine and I haven't been able to reproduce the issue.</p></pre>mc_hammerd: <pre><p>good place to put a defer recover (in collectLinks). better to skip one page than crash the app.</p>
<p>as far as i can tell its <code>collectLinks(resp.Body)</code> causing the fault at collectlinks.21:go, because <code>readAtLeastOneByte()</code> is failing. so maybe ioreader was empty?</p>
<p>can you check the header when it crashes? Maybe it was only a 302 moved or a 401 unauthorized, or a code with no actual resp.body text. just a guess though.</p></pre>gohacker: <pre><p>It looks like a bug somewhere in <code>net</code> or <code>net/http</code> - one of the readers down the chain returns <code>n</code> (number of bytes read) greater than the length of the passed slice, and that makes <code>bufio.Read</code> panic. What Go version are you using? I wasn't able to reproduce it with Go 1.5.1.</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传