Working with iso8859-1 without convertine to utf-8

xuanbao · · 922 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

I use Python to read and write text files, comparing data in some text files to data in other text files. They are all in iso8859-1. I pondered using Go for this to increase the speed and also have a compiled single binary to distribute. Can you do that in a smooth simple way or do you always have to convert it to utf-8 or handle it as bytes? ..sorry for the typo, meant "converting" in the headline. <hr/>**评论：** lapingvino: <pre>UTF-8 is mostly by convention, not strictly forced. As long as you don't do character-fidling like using range etc, you should be fine afaik.</pre>FUZxxl: <pre>ISO8859-1 should be uncritical as it's an 8-bit encoding with no multi-byte sequences.</pre>defererror: <pre><a href="https://blog.golang.org/strings" rel="nofollow">https://blog.golang.org/strings</a> <blockquote> It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes. </blockquote></pre>hobbified: <pre>Huh, I never read that one. That's... an extremely wrong bit of design. If a string is only a sequence of bytes then there's no real point in having <code>string</code> and <code>[]byte</code> be distinct in the first place, and you're really not any better off than any of the 80s/90s languages, with no typesafe way of knowing what the hell a string is.</pre>BurningFox: <pre><code>string</code>, unlike <code>[]byte</code>, is immutable and usually implies UTF-8.</pre>hobbified: <pre>"Usually" is the worst. "Usually" is what leads to assumptions and grave bugs. Either make <code>string</code> appear as an immutable slice of runes regardless of its internal representation (maybe even going as far as Perl 6's Normal Form Grapheme to make true characterwise work easy), and relegate all non-Unicode work to <code>[]byte</code>, or else make it abundantly clear that UTF-8 isn't actually special. But anyway, it's a moot point, it's obviously years too late to be having this argument :)</pre>barsonme: <pre><blockquote> Either make string appear as an immutable slice of runes regardless of its internal representation </blockquote> It essentially is. <a href="https://play.golang.org/p/V7if8MR-_A" rel="nofollow">https://play.golang.org/p/V7if8MR-_A</a></pre>hobbified: <pre>Until you put non-UTF-8 data into the string, and you start getting out garbage and U+FFFD runes from that range loop. You've got one interface that expects UTF-8, one that doesn't care, and no kind of warning when you bridge the gap.</pre>slrz: <pre>Still, it's the right thing to do. A programming language whose string type can't hold file names is useless for systems programming. Think about it: you'd have to change basically the whole os package (not just file ops, also os.Args or Getenv) to take byte slices instead of strings, just so you can write a simple cat program that doesn't fall apart when encountering weird file names. It'd make Go much more cumbersome to use. How would you even signal errors that'd occur when creating such then-malformed strings? Panic? Yuck.</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

922 次点击

加入收藏微博

slice

python

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Working with iso8859-1 without convertine to utf-8

用户登录

今日阅读排行

一周阅读排行

最新主题