Can I use a regex with a slice of runes?

polaris · 2017-11-22 19:00:07 · 1071 次点击

这是一个分享于 2017-11-22 19:00:07 的资源，其中的信息可能已经有所发展或是发生改变。

It doesn't seem like there's a regex function to match a slice of runes - just string and []char. Is there a way to use a slice of runes instead?

The reason I need to do it is because my parser needs to recognise unicode characters, so if there's another way to do this but still using a string or []char using a regex, that'd be useful too.

评论：

porkbonk:

In addition to what singron said, keep in mind Go doesn't use PCRE so details/syntax might be different.

Ctrl+f "Unicode" just to be sure :)

singron:

You can just use the normal string methods in the regexp package. They support unicode.

zacgarby:

Really? I thought I tried that - I guess my regexes just didn't support unicode. Thanks :)

TheMerovius:

To be clear: They support UTF-8. "unicode" is not well-defined in this context. If your strings are not UTF-8 (for example on Windows, UTF-16 is still very common), you are going to have to convert them first.

zacgarby:

Oh yeah - what I meant was that my regexes don't match unicode strings

TheMerovius:

Seems to be working fine

And to be clear: "unicode" is a character set - utf-8 is an encoding. It is important to distinguish the two, because if you are not using utf-8, but a different unicode-encoding (either in the regexp or in the searched string) it won't work. So, if "you thought you tried that", that might be explained by an encoding-fubar :) It helps to be specific here about what unicode-encoding you tried to use in your regexp, what unicode-encoding you where trying to match against and - if it doesn't work - give examples of specific strings/regexps where the results don't match your expectations. "unicode" just isn't the right term to use in this question :)

denise-bryson:

When you say my parser it sounds like you already have the data as a []rune possibly for reasons other than just the regexp matching.

If that's the case then you can also implement an io.RuneReader and use FindReaderSubmatchIndex, MatchReader or FindReaderIndex

play.golang.org doesn't allow me to share so attaching an untested and undocumented snippet below. Feel free to question anything that's not clear.

package main
import (
    "fmt"
    "io"
    "regexp"
)
type runeReader struct {
    src []rune
    pos int
}
func (r *runeReader) ReadRune() (rune, int, error) {
    if r.pos >= len(r.src) {
        return -1, 0, io.EOF
    }
    nextRune := r.src[r.pos]
    r.pos++
    return nextRune, 1, nil
}
func main() {
    s := "Hello, 世界! 世 界 World 世界 World!"
    rs := []rune(s)
    re := regexp.MustCompile(`(?i)(\S+界 W\w+)`)
    fmt.Println("match:")
    fmt.Println(re.MatchString(s))
    fmt.Println(re.MatchReader(&runeReader{src: rs}))
    fmt.Println("findIndex:")
    m := re.FindStringSubmatchIndex(s)
    fmt.Println(m, s[m[2]:m[3]])
    m = re.FindReaderSubmatchIndex(&runeReader{src: rs})
    fmt.Println(m, string(rs[m[2]:m[3]]))
}

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

1071 次点击

加入收藏微博

slice

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Can I use a regex with a slice of runes?

用户登录

今日阅读排行

一周阅读排行

最新主题