<p>Hi <a href="/r/golang">/r/golang</a>!</p>
<p>I've been learning go for a couple months primarily because I built a Python program to scrape whole zone files for specific sites and found the concurrency speed severely lacking. </p>
<p>.</p>
<p>One thing I've noticed is that html parsing is a bit tedious/verbose in go but doesn't seem like it has to be </p>
<p>. </p>
<p>Would any of you be interested in collaborating to build a go package like Python's BeautifulSoup for Go?</p>
<p>.</p>
<p>I'm thinking we name it UglyStew</p>
<hr/>**评论:**<br/><br/>kjk: <pre><p>There's already <a href="https://github.com/PuerkitoBio/goquery">https://github.com/PuerkitoBio/goquery</a></p></pre>jerf: <pre><p>I'm on the phone so I can't look it up, but there also an implementation of the HTML5 parser, which is really the definitive answer to that problem now. </p>
<p>Beautiful soup is a great library, but predates the HTML5 algorithm by many years, and is therefore it's own quirky thing, whereas the HTML5 parser should actually be consistent with other such parsers.</p></pre>daydreamdrunk: <pre><p>godoc link: <a href="https://godoc.org/golang.org/x/net/html">https://godoc.org/golang.org/x/net/html</a></p>
<p>goquery is built atop it.</p></pre>Dat_Nig_Slim_Shady: <pre><p>This is what I've been using and it still seems tedious, lots of iterating through tokens and their data.</p>
<p>.</p>
<p>Just seemed like a lot of that work could be abstracted away</p></pre>jerf: <pre><p>Thanks.</p></pre>__crackers__: <pre><p>BS can use several different parsers. Pretty sure there are a couple of html5 options in there. </p></pre>CaptaincCodeman: <pre><p>Also worth checking out is <a href="https://github.com/microcosm-cc/bluemonday" rel="nofollow">https://github.com/microcosm-cc/bluemonday</a></p></pre>avinassh: <pre><p>I am interested OP. Let me know how do I begin</p></pre>ofpiyush: <pre><p>Start by reading <a href="https://github.com/PuerkitoBio/goquery" rel="nofollow">goquery</a></p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传