<p>It doesn't seem like there's a regex function to match a slice of runes - just <em>string</em> and <em>[]char</em>. Is there a way to use a slice of runes instead?</p>
<p>The reason I need to do it is because my parser needs to recognise unicode characters, so if there's another way to do this but still using a string or []char using a regex, that'd be useful too.</p>
<hr/>**评论:**<br/><br/>porkbonk: <pre><p>In addition to what singron said, keep in mind Go doesn't use PCRE so details/syntax might be different.</p>
<p><a href="https://github.com/google/re2/wiki/Syntax" rel="nofollow">Ctrl+f "Unicode" just to be sure</a> :)</p></pre>singron: <pre><p>You can just use the normal string methods in the regexp package. They support unicode.</p></pre>zacgarby: <pre><p>Really? I thought I tried that - I guess my regexes just didn't support unicode. Thanks :)</p></pre>TheMerovius: <pre><p>To be clear: They support <em>UTF-8</em>. "unicode" is not well-defined in this context. If your strings are not UTF-8 (for example on Windows, UTF-16 is still very common), you are going to have to convert them first.</p></pre>zacgarby: <pre><p>Oh yeah - what I meant was that my regexes don't match unicode strings</p></pre>TheMerovius: <pre><p><a href="https://play.golang.org/p/nvSGagdufO" rel="nofollow">Seems to be working fine</a></p>
<p>And to be clear: "unicode" is a character set - utf-8 is an encoding. It is important to distinguish the two, because if you are not using utf-8, but a different unicode-encoding (either in the regexp or in the searched string) it won't work. So, if "you thought you tried that", that might be explained by an encoding-fubar :) It helps to be specific here about what unicode-encoding you tried to use in your regexp, what unicode-encoding you where trying to match against and - if it doesn't work - give examples of specific strings/regexps where the results don't match your expectations. "unicode" just isn't the right term to use in this question :)</p></pre>denise-bryson: <pre><p>When you say <code>my parser</code> it sounds like you already have the data as a <code>[]rune</code> possibly for reasons other than just the regexp matching.</p>
<p>If that's the case then you can also implement an <a href="https://golang.org/pkg/io/#RuneReader" rel="nofollow">io.RuneReader</a> and use <a href="https://golang.org/pkg/regexp/#Regexp.FindReaderSubmatchIndex" rel="nofollow">FindReaderSubmatchIndex</a>, <a href="https://golang.org/pkg/regexp/#Regexp.MatchReader" rel="nofollow">MatchReader</a> or <a href="https://golang.org/pkg/regexp/#Regexp.FindReaderIndex" rel="nofollow">FindReaderIndex</a></p>
<p>play.golang.org doesn't allow me to share so attaching an untested and undocumented snippet below. Feel free to question anything that's not clear.</p>
<pre><code>package main
import (
"fmt"
"io"
"regexp"
)
type runeReader struct {
src []rune
pos int
}
func (r *runeReader) ReadRune() (rune, int, error) {
if r.pos >= len(r.src) {
return -1, 0, io.EOF
}
nextRune := r.src[r.pos]
r.pos++
return nextRune, 1, nil
}
func main() {
s := "Hello, 世界! 世 界 World 世界 World!"
rs := []rune(s)
re := regexp.MustCompile(`(?i)(\S+界 W\w+)`)
fmt.Println("match:")
fmt.Println(re.MatchString(s))
fmt.Println(re.MatchReader(&runeReader{src: rs}))
fmt.Println("findIndex:")
m := re.FindStringSubmatchIndex(s)
fmt.Println(m, s[m[2]:m[3]])
m = re.FindReaderSubmatchIndex(&runeReader{src: rs})
fmt.Println(m, string(rs[m[2]:m[3]]))
}
</code></pre></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传