How can I request a webpage and then read the HTML response as XML, and then turn it into an object?

blov · · 396 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>For example in this playground here: <a href="https://play.golang.org/p/lZIprnYT_b" rel="nofollow">https://play.golang.org/p/lZIprnYT_b</a> </p> <p>Is there some kind of simple way to parse these HTML elements into an object? One issue I noticed was that with HTML element tags with the same name and class can be siblings, but in JSON you can&#39;t really do this: </p> <pre><code> {&#34;html&#34;: { &#34;body&#34;: { &#34;div&#34;:{}, &#34;div&#34;:{} } }} </code></pre> <p>The second Div will overwrite the previous div. </p> <p>So my question is two things how do I convert a requested website into a datatype readable by the XML encoder. And how do I map HTML elements? Can I just add unique tags to each?</p> <p>Thanks</p> <hr/>**评论:**<br/><br/>carsncode: <pre><p>You&#39;re going to have a bad time. First and foremost, the vast majority of HTML pages are NOT valid XML. Second, as you&#39;ve noted, HTML/XML is not an object notation, it is a document notation, so it does not translate well into an object format. The typical object representation of an HTML or XML document is having each node contain an array of children, so that neighbors are not required to be unique in any way. You might find some useful examples and 3rd-party libs here: <a href="https://encrypted.google.com/search?hl=en&amp;q=golang%20html#hl=en&amp;q=golang+html+parser" rel="nofollow">https://encrypted.google.com/search?hl=en&amp;q=golang%20html#hl=en&amp;q=golang+html+parser</a></p></pre>patrickdappollonio: <pre><p>Check this package: <a href="https://github.com/PuerkitoBio/goquery" rel="nofollow">https://github.com/PuerkitoBio/goquery</a></p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

396 次点击  
加入收藏 微博
0 回复
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传