<p>I'm trying to strip things like <code>\n</code>, <code>&gt;</code>, etc from comments I retrieved from the reddit API. What is the standard way to achieve this?</p>
<hr/>**评论:**<br/><br/>rz2yoj: <pre><p>The standard way to get comments from the Reddit API without the escaping is to use the "raw_json" query string parameter.</p>
<p>For example: <a href="https://www.reddit.com/user/kjnkjnkjnkjnkjnkjn.json?raw_json=1">https://www.reddit.com/user/kjnkjnkjnkjnkjnkjn.json?raw_json=1</a></p></pre>manifold360: <pre><p><a href="https://github.com/microcosm-cc/bluemonday">bluemonday</a></p></pre>nesigma: <pre><p>Is there any reason using bluemonday when using html/template which already escapes dangerous characters?</p></pre>1lann: <pre><p>html/template escapes everything, bluemonday only escapes XSS. The differences are detailed in its README.</p></pre>nesigma: <pre><p>Okay so since html/template also covers XSS there no reason to use bluemonday on top.</p></pre>1lann: <pre><p>Yes it would be pointless to use bluemonday on top of html/template. You would only use bluemonday by itself to escape HTML (and then possibly pass it to html/template as type template.HTML).</p></pre>broady: <pre><p>Use the html/template package.</p>
<p>If you don't need templating, you can just use the html package. (html.EscapeString)</p>
<p>If you mean to strip tags, then you probably want to check out x/net/html or goquery</p></pre>magpiecub: <pre><p>If you're just removing arbitrary strings then I think you want <code>strings.Replace</code> or <code>regexp</code>.</p></pre>arp242: <pre><p>Probably not a good idea to "roll your own" HTML escaping code, especially not if the input is untrusted (like Reddit comments).</p></pre>magpiecub: <pre><p>Sanitizing and escaping are two totally different concepts. OP is asking about sanitizing.</p>
<p>If they only need to strip newlines and things that look like HTML character entity references, then <code>regexp</code> should work fine.</p></pre>arp242: <pre><p>OP isn't 100% clear on what the input looks like, but mention of <code>&gt;</code> makes it sound like there could be embeded HTML in there.</p>
<p>I would <em>expect</em> that the Reddit API removes most of the truly harmful <code><script>...</script></code> and <code>onload=..</code> stuff, but you can never be sure.</p>
<p>Either way, I don't really see a <em>downside</em> to using an established library; so why <em>not</em> use it, just to be on the safe side?</p></pre>relvae: <pre><p>And regexp is super slow</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传