Can I retrieve the name and other properties of a rune?

agolangf · · 746 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Is there a package that exposes functions to return the name of a rune and other properties such as category, etc. e.g. <code>func Name(r rune) string</code></p> <p>I access there properties in Python by <code>import unicodedata</code>. In C, I can <code>#include &lt;uniname.h&gt;</code> and link with <code>libunistring</code>.</p> <p>I would prefer to get the name, category, etc for a rune instead of using predicates like <code>unicode.IsLetter(r)</code>, etc.</p> <hr/>**评论:**<br/><br/>restmile: <pre><p>I&#39;m not aware of any existing libraries for querying the UCD which contain the names of the codepoints. But the XML version is pretty easy to parse if you need the names and other data that isn&#39;t present in package <code>unicode</code>:</p> <p><a href="http://www.unicode.org/Public/7.0.0/ucdxml/" rel="nofollow">http://www.unicode.org/Public/7.0.0/ucdxml/</a></p></pre>yaxriifgyn: <pre><p>I&#39;ve parsed the unicode.org files in the past, for something related to a project that needed English, Arabic and Chinese languages in the GUI. The details are mostly lost now.</p></pre>_blob: <pre><p>You could write a script, which extracts the hex values and names from <a href="http://unicode.org/Public/UNIDATA/NamesList.txt" rel="nofollow">http://unicode.org/Public/UNIDATA/NamesList.txt</a> and generates a map like:</p> <pre><code>map[rune]string{ 0x0030:&#34;DIGIT ZERO&#34;, 0x0031:&#34;DIGIT ONE&#34;, ... } func Name(r rune) string { ... // some error handling, if the rune doesn&#39;t exist ... return m[r] ... </code></pre></pre>jeffrallen: <pre><p>Package unicode in the standard library gives you the tools to identify the range a rune falls in. The actual unicode names are not available however. You would need to use the tools in golanh.org/x/text to parse the unicode db to find them.</p></pre>FUZxxl: <pre><p>You chose a bad example—as far as I know, the Mongolian writing system doesn&#39;t distinguish between upper case and lower case.</p></pre>hobbified: <pre><p><a href="https://github.com/cooperhewitt/go-ucd" rel="nofollow">Here&#39;s the only thing I can find</a> — it gives you names, but that&#39;s it. What&#39;s would be really nice would be a library that supported all the good stuff, and that could go:generate itself from the UCD files. I might give it a try if I get the time this week.</p></pre>yaxriifgyn: <pre><p>This project is interesting. The voluminous unicode data resides in a server process that you can start up as required. I would have thought to use an on-disk database rather than place the data in process memory. Once you choose a multi-process solution, a server in Python could server up the data without the need to parse and load the source data files.</p></pre>hobbified: <pre><p>Well it also just has a library.</p></pre>yaxriifgyn: <pre><p>I&#39;ve had a chance to play around with it now. It&#39;s the

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

746 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传