Golang for crawling and parsing

xuanbao · · 385 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>What libraries or framework for there tasks you know?</p> <p>For crawling I plan to use the selenium cluster. But here with convenient parsing HTML I can`t solve in any way... In python I use BeatifulSoup and search for something similar for go.</p> <p>What do you think about this ?</p> <hr/>**评论:**<br/><br/>ishanjain28: <pre><p>I use GoQuery for parsing HTML in pretty much all of my scraping jobs and It has worked really well for me. </p></pre>monoxiphoid: <pre><p>+1 for GoQuery -- the API should be very familiar to you if you&#39;ve ever done any work with jQuery on the frontend.</p></pre>dgryski: <pre><p><a href="http://go-colly.org/" rel="nofollow">http://go-colly.org/</a> or <a href="https://github.com/PuerkitoBio/fetchbot" rel="nofollow">https://github.com/PuerkitoBio/fetchbot</a></p></pre>NilhEx: <pre><p>Thanks. May be you know some solutions as alternative to selenium webdriver? I would like to use browser control or emulation (phantomjs) to better mask the spider and clicks</p></pre>tmlbl: <pre><p>I have used <a href="https://github.com/yhat/scrape" rel="nofollow">https://github.com/yhat/scrape</a> for several scraping projects. But it’s only good for HTML parsing, not aware of anything headless browser-wise in Go. I always go back to node.js for those. </p></pre>Ploobers: <pre><p>This is super well written and works better than Puppeteer #nomorenode :)</p> <p><a href="https://github.com/chromedp/chromedp" rel="nofollow">https://github.com/chromedp/chromedp</a></p></pre>tural-esger: <pre><p><a href="http://surf.readthedocs.io/" rel="nofollow">http://surf.readthedocs.io/</a> is it something you look for?</p></pre>NilhEx: <pre><p>Very likely. It is necessary to study</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

385 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传