strings, runes, & bytes: oh my

polaris · · 422 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

I'm pretty new to Go and I'm finding that the relationships between strings, runes, and bytes is messy. For example: If I want to know that a string contains only letters, I iterate the string, but then have to cast the byte? into a rune first. What a mess. (<a href="http://pastebin.com/kRiGarkS" rel="nofollow">http://pastebin.com/kRiGarkS</a>) Second example: In order to really manipulate a string, like to say, remove every nth character, I have to iterate it and then build it as a byte buffer, then extract the string back out. The libraries are also a mess between strings, unicode, and bytes, with lots of overlapping features and functions. unicode can tell you have you have upper case letters, but strings can make them upper case and so can bytes. Is there a rule of thumb or best practice on when to use each one or a good blog post that explains it? <hr/>**评论：** itsmontoya: <pre>If you range over the string, you will get runes instead of bytes.</pre>Redundancy_: <pre><a href="https://blog.golang.org/strings" rel="nofollow">https://blog.golang.org/strings</a> is a good place to start</pre>Redundancy_: <pre>First example: <a href="https://play.golang.org/p/ERSXgKmAcX" rel="nofollow">https://play.golang.org/p/ERSXgKmAcX</a> Second example: <a href="https://play.golang.org/p/yH-FgwP-ei" rel="nofollow">https://play.golang.org/p/yH-FgwP-ei</a> Frankly, the TL/DR is probably that you always want to work with strings and runes while manipulating "characters". Bytes do not map to either thing in an clear way, except in the degenerate case of ASCII, and whenever you go to/from bytes you have to consider the encoding.</pre>PaluMacil: <pre>I was about to post that when I decided to double check which links you posted. :) This article made everything pretty clear to me and is what I'd recommend anyone too! If you come from a language like JavaScript where you never deal with bytes, it makes little sense to ever drop to a lower level concept than a unicode string--at least on the surface--but there are a lot of good reasons to deal with different types once you're doing things that need any sort of performance.</pre>