is there a package in the standard library for sanitizing text?

agolangf · 2017-09-03 11:00:14 · 565 次点击

这是一个分享于 2017-09-03 11:00:14 的资源，其中的信息可能已经有所发展或是发生改变。

I'm trying to strip things like \n, >, etc from comments I retrieved from the reddit API. What is the standard way to achieve this?

评论：

rz2yoj:

The standard way to get comments from the Reddit API without the escaping is to use the "raw_json" query string parameter.

For example: https://www.reddit.com/user/kjnkjnkjnkjnkjnkjn.json?raw_json=1

manifold360:

bluemonday

nesigma:

Is there any reason using bluemonday when using html/template which already escapes dangerous characters?

1lann:

html/template escapes everything, bluemonday only escapes XSS. The differences are detailed in its README.

nesigma:

Okay so since html/template also covers XSS there no reason to use bluemonday on top.

1lann:

Yes it would be pointless to use bluemonday on top of html/template. You would only use bluemonday by itself to escape HTML (and then possibly pass it to html/template as type template.HTML).

broady:

Use the html/template package.

If you don't need templating, you can just use the html package. (html.EscapeString)

If you mean to strip tags, then you probably want to check out x/net/html or goquery

magpiecub:

If you're just removing arbitrary strings then I think you want strings.Replace or regexp.

arp242:

Probably not a good idea to "roll your own" HTML escaping code, especially not if the input is untrusted (like Reddit comments).

magpiecub:

Sanitizing and escaping are two totally different concepts. OP is asking about sanitizing.

If they only need to strip newlines and things that look like HTML character entity references, then regexp should work fine.

arp242:

OP isn't 100% clear on what the input looks like, but mention of > makes it sound like there could be embeded HTML in there.

I would expect that the Reddit API removes most of the truly harmful and onload=.. stuff, but you can never be sure.

Either way, I don't really see a downside to using an established library; so why not use it, just to be on the safe side?

relvae:

And regexp is super slow

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

565 次点击

加入收藏微博

net

github

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

is there a package in the standard library for sanitizing text?

用户登录

今日阅读排行

一周阅读排行

最新主题