Go语言中正则表达式的处理

hanzkering · · 1810 次点击 · · 开始浏览

这是一个创建于的文章，其中的信息可能已经有所发展或是发生改变。

## 1 概述正则表达式，又称规则表达式，Regular Expression，在代码中常简写为 regex、regexp 或 RE。正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符、及这些特定字符的组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑。 Perl 语言的正则表达式功能非常强大，很多语言设计正则式支持的时候都参考Perl的正则表达式。因此常用的表达式语法也是 Perl 兼容正则表达式。 Go 语言中使用包 `regexp` 提供对正则表达式的支持。本文说明 `regexp` 中常用的正则处理方法。  ## 2 获取正则对象通过编译正则表达式，可以得到正则操作对象，用于完成正则的相关处理：函数： * `regexp.Compile(expr string) (*Regexp, error)`，用于编译一个正则表达式，如果成功返回 Regexp 对象。 * `regexp.MustCompile(str string) *Regexp`，与 Compile 一致，差异是失败时会宕机。 ```go reg,err := regexp.Compile(`\d+`) 或 reg := regexp.MustCompile(`\d+`) ``` ## 3 匹配检测函数: * `func (re *Regexp) MatchString(s string) bool`，测试字符串是否匹配正则。 * `func (re *Regexp) Match(b []byte) bool`，检测字节切片是否匹配正则。演示字符串的匹配： ```go text := "Hello Gopher，Hello 韩忠康" reg := regexp.MustCompile(`\w+`) fmt.Println(reg.MatchString(text)) // true ``` ## 4 查找函数： * `func (re *Regexp) FindString(s string) string`，查找匹配模式的字符串，返回左侧第一个匹配的结果。 * `func (re *Regexp) FindAllString(s string, n int) []string`，用来查找匹配模式的字符串，返回多个匹配的结果，n 用于限定查找数量，-1不限制。 * `func (re *Regexp) FindAll(b []byte, n int) [][]byte`，用于在 `[]byte` 中查找，返回 `[][]byte`。匹配全部结果演示为： ```go text := "Hello Gopher，Hello 韩忠康" reg := regexp.MustCompile(`\w+`) fmt.Println(reg.FindAllString(text)) // [Hello Gopher Hello] ``` ## 5 查找匹配位置以下函数用于获取匹配正则子字符串的位置： * `func (re *Regexp) FindStringIndex(s string) (loc []int)`，返回包含最左侧匹配结果的起止位置的切片。 * `func (re *Regexp) FindIndex(b []byte) (loc []int)`，返回包含最左侧匹配结果的起止位置的切片。 * `func (re *Regexp) FindAllStringIndex(s string, n int) [][]int` 会返回包含全部匹配结果的起止位置的切片的切片。演示查找字符串最左侧匹配位置： ```go text := "Hello Gopher，Hello 韩忠康" reg := regexp.MustCompile("llo") fmt.Println(reg.FindStringIndex(text)) // [2 5] ``` ## 6 查找匹配子模式以下函数可以查找子模式，或查找子模式的位置： * `func (re *Regexp) FindStringSubmatch(s string) []string`，查找字符串中最左侧子匹配结果 * `func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string`，查找字符串中全部匹配和子模式。 * `func (re *Regexp) FindStringSubmatchIndex(s string) []int`，查找字符串中包含最左侧子匹配结果的起止位置的切片演示匹配全部子字符串如下： ```go re := regexp.MustCompile("Go(\w+)") fmt.Println(re.FindAllStringSubmatch("Hello Gopher，Hello GoLang", -1)) // [["Gophoer" "phoer"], ["GoLang", "Lang"]] ``` ## 7 替换函数： * `func (re *Regexp) ReplaceAllString(src, repl string) string`，将 src 中所有 re 的匹配结果都替换为 repl。 * `func (re *Regexp) ReplaceAll(src, repl []byte) []byte`，一致，针对的是 []byte。替换时可以使用反向引用 $1，$2，来引用匹配的子模式内容。 ```go re := regexp.MustCompile("Go(\w+)") fmt.Println(re.ReplaceAllString("Hello Gopher，Hello GoLang", "Hank$1")) // Hello Hankpher，Hello HankLang ``` ## 8 分割函数： * `func (re *Regexp) Split(s string, n int) []string`，使用正则分割字符串 s ，返回字符串切片。n 控制分割的片数，-1为不限制。 ```go reg := regexp.MustCompile("[\s,]") fmt.Println(reg.Split("Hello Gopher,Hello GoLang", -1)) // [Hello Gopher Hello GoLang] ``` 除了以上列举的较为常用的方法外，请参考 `godoc -http=:8088` 获取更全的信息。完！原文出自：[小韩说课](http://www.hellokang.net/go/go-regexp/) 微信关注：小韩说课 ![小韩说课](http://www.hellokang.net/images/wechat_subscription.jpg)

有疑问加站长微信联系（非本文作者））