Should beginners be taught with ASCII or go straight to UTF-8?

agolangf · · 424 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

I'm working through the <a href="http://exercism.io/languages/go/exercises" rel="nofollow">Go exercises on exercism.io</a> and am surprised they're so ASCII-centric. Many of the exercise tests include no UTF-8 inputs, meaning that most people's submitted solutions will fail on a string that contains multi-byte characters. I'm going through and <a href="http://exercism.io/submissions/f7c67b0f88954619995cc2d284d3af2a" rel="nofollow">adding multi-byte characters to the tests myself</a>. Anyone have any opinions about this? Is it OK for beginners to learn to handle strings in a Unicode-unsafe way at first? Or are they just picking up bad habits? Is Unicode-safe Go code just as easy to write as Unicode-unsafe code? <hr/>**评论：** neoasterisk: <pre>If you write your Go code correctly then it should also be able to handle UTF-8. If it can only handle ASCII then it is simply wrong code. I don't think the code itself that works for UTF-8 has anything difficult for beginners which is good. Nevertheless, understanding how strings work in Go can be tricky. I've read <a href="https://blog.golang.org/strings" rel="nofollow">https://blog.golang.org/strings</a> many times and there's still stuff I miss.</pre>sin2pifx: <pre><blockquote> If you write your Go code correctly </blockquote> I think that's OP's point. Personally, I'd vote for teaching them both, but only if they're up to understanding the difference between ASCII, byte, UTF-8 and Unicode. If they're not there yet, stick with UTF-8.</pre>stuartcarnie: <pre>As per usual, it depends. I disagree with the statement that <blockquote> If it can only handle ASCII then it is simply wrong code </blockquote> What if the requirements of the program are to decode hex encoded strings, parse a list of IPv4 addresses or sum Hindu-Arabic numerals? The input for these examples requires ASCII and therefore the solutions can legitimately index the strings using byte offsets. The program can still deal with unexpected data, as no valid multi-byte UTF-8 sequence includes 0x00-0x7f (ASCII) UTF-8 should be an entirely separate, lengthy discussion including the APIs necessary in dealing with UTF-8 data.</pre>ChristophBerger: <pre>I definitely vote for including UTF-8 into learning material right from the start. There are a couple of things that beginners might stumble over if they don't know about Unicode and the way Go handles it. For example: <code>for i=0; i < len(str); i++</code> (byte-oriented) versus <code>for i, v := range str</code> (rune-oriented). Or <code>char := str[i]</code> where <code>i</code> might happen to be the index of one byte within a 3-bytes UTF-8 character. The student would see garbage in <code>char</code> and would have no idea why.</pre>Guitarbum722: <pre>It would be silly not to cover UTF-8. It is the most widely used now. Plus, all of the ASCII characters are completely handled in UTF-8 with no problem whatsoever. It isn't a difficult concept; it just needs to be explained well to the person learning. A programmer will encounter it in their first week on the job, so why not prepare them!</pre>Guitarbum722: <pre>Also string literals in Golang are UTF-8. </pre>rozzlapede: <pre>I'm working on a technology-independent curriculum for web developer training and I'm putting the unicode / UTF-8 material up front after HTML, alonside javascript and css. Not because they need to be experts in it, but because the class of bugs related to character encoding can appear in just about any context and can be especially difficult for beginners to troubleshoot on their own. Here are some tools I've found to help with exercises: That great technophile video: <a href="https://youtu.be/MijmeoH9LT4" rel="nofollow">https://youtu.be/MijmeoH9LT4</a> UniView for copy-paste inspection of Unicode text: <a href="https://r12a.github.io/uniview/" rel="nofollow">https://r12a.github.io/uniview/</a> Mimic for generating unicode-bug-ridden syntax: <a href="https://github.com/reinderien/mimic" rel="nofollow">https://github.com/reinderien/mimic</a></pre>sairamk: <pre>I think it should be mentioned about UTF-8 existence and why and how its useful for non-ASCII scenarios and provide examples or questions where it can be applied like Regular expressions. <del>The definition of <code>rune</code> actually covers UTF-8.</del> Edit: strike through. see comment thread below</pre>4ad: <pre><blockquote> The definition of <code>rune</code> actually covers UTF-8. </blockquote> No, rune represents Unicode code points, which have nothing to do with UTF-8.</pre>driusan: <pre>Unicode code points have something to do with UTF-8...</pre>4ad: <pre>No, they don't. UTF-8 is one possible encoding for Unicode.</pre>driusan: <pre>Your second sentence contradicts your first one.</pre>4ad: <pre>No, it does not. I recommend you try to understand the relation between things before speaking up. In particular relations between cause and effect, the relation between multiple levels of abstractions, and the relation between concepts and the representation of concepts.</pre>peterbourgon: <pre>Yo dude, chill ✌️

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

424 次点击

加入收藏微博

io

github

web

context

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Should beginners be taught with ASCII or go straight to UTF-8?

用户登录

今日阅读排行

一周阅读排行

最新主题