is there a package in the standard library for sanitizing text?

agolangf · · 472 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;m trying to strip things like <code>\n</code>, <code>&amp;gt;</code>, etc from comments I retrieved from the reddit API. What is the standard way to achieve this?</p> <hr/>**评论:**<br/><br/>rz2yoj: <pre><p>The standard way to get comments from the Reddit API without the escaping is to use the &#34;raw_json&#34; query string parameter.</p> <p>For example: <a href="https://www.reddit.com/user/kjnkjnkjnkjnkjnkjn.json?raw_json=1">https://www.reddit.com/user/kjnkjnkjnkjnkjnkjn.json?raw_json=1</a></p></pre>manifold360: <pre><p><a href="https://github.com/microcosm-cc/bluemonday">bluemonday</a></p></pre>nesigma: <pre><p>Is there any reason using bluemonday when using html/template which already escapes dangerous characters?</p></pre>1lann: <pre><p>html/template escapes everything, bluemonday only escapes XSS. The differences are detailed in its README.</p></pre>nesigma: <pre><p>Okay so since html/template also covers XSS there no reason to use bluemonday on top.</p></pre>1lann: <pre><p>Yes it would be pointless to use bluemonday on top of html/template. You would only use bluemonday by itself to escape HTML (and then possibly pass it to html/template as type template.HTML).</p></pre>broady: <pre><p>Use the html/template package.</p> <p>If you don&#39;t need templating, you can just use the html package. (html.EscapeString)</p> <p>If you mean to strip tags, then you probably want to check out x/net/html or goquery</p></pre>magpiecub: <pre><p>If you&#39;re just removing arbitrary strings then I think you want <code>strings.Replace</code> or <code>regexp</code>.</p></pre>arp242: <pre><p>Probably not a good idea to &#34;roll your own&#34; HTML escaping code, especially not if the input is untrusted (like Reddit comments).</p></pre>magpiecub: <pre><p>Sanitizing and escaping are two totally different concepts. OP is asking about sanitizing.</p> <p>If they only need to strip newlines and things that look like HTML character entity references, then <code>regexp</code> should work fine.</p></pre>arp242: <pre><p>OP isn&#39;t 100% clear on what the input looks like, but mention of <code>&amp;gt;</code> makes it sound like there could be embeded HTML in there.</p> <p>I would <em>expect</em> that the Reddit API removes most of the truly harmful <code>&lt;script&gt;...&lt;/script&gt;</code> and <code>onload=..</code> stuff, but you can never be sure.</p> <p>Either way, I don&#39;t really see a <em>downside</em> to using an established library; so why <em>not</em> use it, just to be on the safe side?</p></pre>relvae: <pre><p>And regexp is super slow</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

472 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传