<p>I use Python to read and write text files, comparing data in some text files to data in other text files. They are all in iso8859-1. I pondered using Go for this to increase the speed and also have a compiled single binary to distribute. Can you do that in a smooth simple way or do you always have to convert it to utf-8 or handle it as bytes?</p>
<p>..sorry for the typo, meant "converting" in the headline.</p>
<hr/>**评论:**<br/><br/>lapingvino: <pre><p>UTF-8 is mostly by convention, not strictly forced. As long as you don't do character-fidling like using range etc, you should be fine afaik.</p></pre>FUZxxl: <pre><p>ISO8859-1 should be uncritical as it's an 8-bit encoding with no multi-byte sequences.</p></pre>defererror: <pre><p><a href="https://blog.golang.org/strings" rel="nofollow">https://blog.golang.org/strings</a></p>
<blockquote>
<p>It's important to state right up front that a string holds <em>arbitrary</em> bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.</p>
</blockquote></pre>hobbified: <pre><p>Huh, I never read that one. That's... an extremely wrong bit of design. If a string is only a sequence of bytes then there's no real point in having <code>string</code> and <code>[]byte</code> be distinct in the first place, and you're really not any better off than any of the 80s/90s languages, with no typesafe way of knowing what the hell a string <em>is</em>.</p></pre>BurningFox: <pre><p><code>string</code>, unlike <code>[]byte</code>, is immutable and <em>usually</em> implies UTF-8.</p></pre>hobbified: <pre><p>"Usually" is the worst. "Usually" is what leads to assumptions and grave bugs. Either make <code>string</code> appear as an immutable slice of <em>runes</em> regardless of its internal representation (maybe even going as far as Perl 6's Normal Form Grapheme to make true characterwise work easy), and relegate all non-Unicode work to <code>[]byte</code>, or else make it abundantly clear that UTF-8 isn't <em>actually</em> special.</p>
<p>But anyway, it's a moot point, it's obviously years too late to be having this argument :)</p></pre>barsonme: <pre><blockquote>
<p>Either make string appear as an immutable slice of runes regardless of its internal representation </p>
</blockquote>
<p>It essentially is. <a href="https://play.golang.org/p/V7if8MR-_A" rel="nofollow">https://play.golang.org/p/V7if8MR-_A</a></p></pre>hobbified: <pre><p>Until you put non-UTF-8 data into the string, and you start getting out garbage and U+FFFD runes from that range loop. You've got one interface that expects UTF-8, one that doesn't care, and no kind of warning when you bridge the gap.</p></pre>slrz: <pre><p>Still, it's the right thing to do. A programming language whose string type can't hold file names is useless for systems programming.</p>
<p>Think about it: you'd have to change basically the whole os package (not just file ops, also os.Args or Getenv) to take byte slices instead of strings, just so you can write a simple cat program that doesn't fall apart when encountering weird file names. It'd make Go much more cumbersome to use.</p>
<p>How would you even signal errors that'd occur when creating such then-malformed strings? Panic? Yuck.</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传