It doesn't seem like there's a regex function to match a slice of runes - just string and []char. Is there a way to use a slice of runes instead?
The reason I need to do it is because my parser needs to recognise unicode characters, so if there's another way to do this but still using a string or []char using a regex, that'd be useful too.
评论:
porkbonk:
singron:In addition to what singron said, keep in mind Go doesn't use PCRE so details/syntax might be different.
zacgarby:You can just use the normal string methods in the regexp package. They support unicode.
TheMerovius:Really? I thought I tried that - I guess my regexes just didn't support unicode. Thanks :)
zacgarby:To be clear: They support UTF-8. "unicode" is not well-defined in this context. If your strings are not UTF-8 (for example on Windows, UTF-16 is still very common), you are going to have to convert them first.
TheMerovius:Oh yeah - what I meant was that my regexes don't match unicode strings
denise-bryson:And to be clear: "unicode" is a character set - utf-8 is an encoding. It is important to distinguish the two, because if you are not using utf-8, but a different unicode-encoding (either in the regexp or in the searched string) it won't work. So, if "you thought you tried that", that might be explained by an encoding-fubar :) It helps to be specific here about what unicode-encoding you tried to use in your regexp, what unicode-encoding you where trying to match against and - if it doesn't work - give examples of specific strings/regexps where the results don't match your expectations. "unicode" just isn't the right term to use in this question :)
When you say
my parser
it sounds like you already have the data as a[]rune
possibly for reasons other than just the regexp matching.If that's the case then you can also implement an io.RuneReader and use FindReaderSubmatchIndex, MatchReader or FindReaderIndex
play.golang.org doesn't allow me to share so attaching an untested and undocumented snippet below. Feel free to question anything that's not clear.
package main import ( "fmt" "io" "regexp" ) type runeReader struct { src []rune pos int } func (r *runeReader) ReadRune() (rune, int, error) { if r.pos >= len(r.src) { return -1, 0, io.EOF } nextRune := r.src[r.pos] r.pos++ return nextRune, 1, nil } func main() { s := "Hello, 世界! 世 界 World 世界 World!" rs := []rune(s) re := regexp.MustCompile(`(?i)(\S+界 W\w+)`) fmt.Println("match:") fmt.Println(re.MatchString(s)) fmt.Println(re.MatchReader(&runeReader{src: rs})) fmt.Println("findIndex:") m := re.FindStringSubmatchIndex(s) fmt.Println(m, s[m[2]:m[3]]) m = re.FindReaderSubmatchIndex(&runeReader{src: rs}) fmt.Println(m, string(rs[m[2]:m[3]])) }
