Collaborating on a "BeautifulSoup" for go?

agolangf · · 588 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Hi <a href="/r/golang">/r/golang</a>!</p> <p>I&#39;ve been learning go for a couple months primarily because I built a Python program to scrape whole zone files for specific sites and found the concurrency speed severely lacking. </p> <p>.</p> <p>One thing I&#39;ve noticed is that html parsing is a bit tedious/verbose in go but doesn&#39;t seem like it has to be </p> <p>. </p> <p>Would any of you be interested in collaborating to build a go package like Python&#39;s BeautifulSoup for Go?</p> <p>.</p> <p>I&#39;m thinking we name it UglyStew</p> <hr/>**评论:**<br/><br/>kjk: <pre><p>There&#39;s already <a href="https://github.com/PuerkitoBio/goquery">https://github.com/PuerkitoBio/goquery</a></p></pre>jerf: <pre><p>I&#39;m on the phone so I can&#39;t look it up, but there also an implementation of the HTML5 parser, which is really the definitive answer to that problem now. </p> <p>Beautiful soup is a great library, but predates the HTML5 algorithm by many years, and is therefore it&#39;s own quirky thing, whereas the HTML5 parser should actually be consistent with other such parsers.</p></pre>daydreamdrunk: <pre><p>godoc link: <a href="https://godoc.org/golang.org/x/net/html">https://godoc.org/golang.org/x/net/html</a></p> <p>goquery is built atop it.</p></pre>Dat_Nig_Slim_Shady: <pre><p>This is what I&#39;ve been using and it still seems tedious, lots of iterating through tokens and their data.</p> <p>.</p> <p>Just seemed like a lot of that work could be abstracted away</p></pre>jerf: <pre><p>Thanks.</p></pre>__crackers__: <pre><p>BS can use several different parsers. Pretty sure there are a couple of html5 options in there. </p></pre>CaptaincCodeman: <pre><p>Also worth checking out is <a href="https://github.com/microcosm-cc/bluemonday" rel="nofollow">https://github.com/microcosm-cc/bluemonday</a></p></pre>avinassh: <pre><p>I am interested OP. Let me know how do I begin</p></pre>ofpiyush: <pre><p>Start by reading <a href="https://github.com/PuerkitoBio/goquery" rel="nofollow">goquery</a></p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

588 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传