遇到一个比较奇怪的编码转换问题

go_assassin · · 1184 次点击
正确输出应该是啥呢? 这是我问chatgpt的结果, 你的字节数组 s 看起来是UTF-16编码的,而且是大端字节序。在Go中,你可以使用 unicode/utf16 包来解码UTF-16编码的字节。这是一个示例,展示如何将你的字节数组转换为字符串: ``` package main import ( "encoding/binary" "fmt" "unicode/utf16" ) func main() { s := []byte{0, 123, 0, 34, 0, 98, 0, 48, 0, 97, 0, 100, 0, 55, 0, 99, 0, 51, 0, 57, 0, 53, 0, 51, 0, 54, 0, 54, 0, 53, 0, 55, 0, 101, 0, 49, 0, 101, 0, 57, 0, 51, 0, 52, 0, 50, 0, 50, 0, 100, 0, 48, 0, 52, 0, 50, 0, 98, 0, 97, 0, 100, 0, 97, 0, 57, 0, 53, 0, 34, 0, 58, 0, 123, 0, 34, 0, 116, 0, 105, 0, 116, 0, 108, 0, 101, 0, 34, 0, 58, 0, 34, 0, 25, 149, 239, 139, 176, 139, 85, 95, 34, 0, 44, 0, 34, 0, 117, 0, 114, 0, 108, 0, 34, 0, 58, 0, 34, 0, 101, 0, 114, 0, 114, 0, 106, 0, 46, 0, 112, 0, 104, 0, 112, 0, 34, 0, 125, 0, 125, 0} uints := make([]uint16, len(s)/2) for i := 0; i < len(uints); i++ { uints[i] = binary.BigEndian.Uint16(s[i*2 : i*2+2]) } str := string(utf16.Decode(uints)) fmt.Println(str) } ``` 这段代码首先创建一个 uint16 切片,然后遍历字节数组 s,每次取两个字节,并将它们转换为一个 uint16。然后,使用 utf16.Decode 函数将 uint16 切片解码为字符串。这次我们使用 binary.BigEndian.Uint16 来处理大端字节序。 `{"b0ad7c39536657e1e93422d042bada95":{"title":"闯记譕弢,"url":"errj.php"}}`
#1
更多评论
这段代码的问题在于,它试图将字节切片(bytes slice)直接转换为字符串(string),但是没有指定正确的字符编码。在Go中,当你尝试将字节切片转换为字符串时,你需要提供一个字符编码,例如"utf-8"。 你可以通过在fmt.Printf("%s\n", s)和fmt.Printf("%s\n", string(value))这两行代码中添加"utf-8"来解决这个问题。修改后的代码如下: ``` package main import ( "bytes" "fmt" ) var OnSplitUTF8Func = func(r rune) rune { if r == 0x00 || r == 0x01 { return -1 } return r } func main() { s := []byte{0, 123, 0, 34, 0, 98, 0, 48, 0, 97, 0, 100, 0, 55, 0, 99, 0, 51, 0, 57, 0, 53, 0, 51, 0, 54, 0, 54, 0, 53, 0, 55, 0, 101, 0, 49, 0, 101, 0, 57, 0, 51, 0, 52, 0, 50, 0, 50, 0, 100, 0, 48, 0, 52, 0, 50, 0, 98, 0, 97, 0, 100, 0, 97, 0, 57, 0, 53, 0, 34, 0, 58, 0, 123, 0, 34, 0, 116, 0, 105, 0, 116, 0, 108, 0, 101, 0, 34, 0, 58, 0, 34, 0, 25, 149, 239, 139, 176, 139, 85, 95, 34, 0, 44, 0, 34, 0, 117, 0, 114, 0, 108, 0, 34, 0, 58, 0, 34, 0, 101, 0, 114, 0, 114, 0, 106, 0, 46, 0, 112, 0, 104, 0, 112, 0, 34, 0, 125, 0, 125, 0} fmt.Printf("%#v\n", s) fmt.Printf("%s\n", string(s)) // utf-8 encoding is added here value := bytes.Map(OnSplitUTF8Func, s) fmt.Printf("%#v\n", value) fmt.Printf("%s\n", string(value)) // utf-8 encoding is added here } ```
#2
正常的title这里应该是一个像。密码错误这样的4个字的简体中文
#3