Working with iso8859-1 without convertine to utf-8

xuanbao · · 881 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I use Python to read and write text files, comparing data in some text files to data in other text files. They are all in iso8859-1. I pondered using Go for this to increase the speed and also have a compiled single binary to distribute. Can you do that in a smooth simple way or do you always have to convert it to utf-8 or handle it as bytes?</p> <p>..sorry for the typo, meant &#34;converting&#34; in the headline.</p> <hr/>**评论:**<br/><br/>lapingvino: <pre><p>UTF-8 is mostly by convention, not strictly forced. As long as you don&#39;t do character-fidling like using range etc, you should be fine afaik.</p></pre>FUZxxl: <pre><p>ISO8859-1 should be uncritical as it&#39;s an 8-bit encoding with no multi-byte sequences.</p></pre>defererror: <pre><p><a href="https://blog.golang.org/strings" rel="nofollow">https://blog.golang.org/strings</a></p> <blockquote> <p>It&#39;s important to state right up front that a string holds <em>arbitrary</em> bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.</p> </blockquote></pre>hobbified: <pre><p>Huh, I never read that one. That&#39;s... an extremely wrong bit of design. If a string is only a sequence of bytes then there&#39;s no real point in having <code>string</code> and <code>[]byte</code> be distinct in the first place, and you&#39;re really not any better off than any of the 80s/90s languages, with no typesafe way of knowing what the hell a string <em>is</em>.</p></pre>BurningFox: <pre><p><code>string</code>, unlike <code>[]byte</code>, is immutable and <em>usually</em> implies UTF-8.</p></pre>hobbified: <pre><p>&#34;Usually&#34; is the worst. &#34;Usually&#34; is what leads to assumptions and grave bugs. Either make <code>string</code> appear as an immutable slice of <em>runes</em> regardless of its internal representation (maybe even going as far as Perl 6&#39;s Normal Form Grapheme to make true characterwise work easy), and relegate all non-Unicode work to <code>[]byte</code>, or else make it abundantly clear that UTF-8 isn&#39;t <em>actually</em> special.</p> <p>But anyway, it&#39;s a moot point, it&#39;s obviously years too late to be having this argument :)</p></pre>barsonme: <pre><blockquote> <p>Either make string appear as an immutable slice of runes regardless of its internal representation </p> </blockquote> <p>It essentially is. <a href="https://play.golang.org/p/V7if8MR-_A" rel="nofollow">https://play.golang.org/p/V7if8MR-_A</a></p></pre>hobbified: <pre><p>Until you put non-UTF-8 data into the string, and you start getting out garbage and U+FFFD runes from that range loop. You&#39;ve got one interface that expects UTF-8, one that doesn&#39;t care, and no kind of warning when you bridge the gap.</p></pre>slrz: <pre><p>Still, it&#39;s the right thing to do. A programming language whose string type can&#39;t hold file names is useless for systems programming.</p> <p>Think about it: you&#39;d have to change basically the whole os package (not just file ops, also os.Args or Getenv) to take byte slices instead of strings, just so you can write a simple cat program that doesn&#39;t fall apart when encountering weird file names. It&#39;d make Go much more cumbersome to use.</p> <p>How would you even signal errors that&#39;d occur when creating such then-malformed strings? Panic? Yuck.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

881 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传