Ascii to different Unicode fonts

xuanbao · · 978 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;ve created a toy program which takes an ascii string and maps it to unicode mathematical symbol table to produce the same string in different fonts.</p> <p><a href="https://play.golang.org/p/vyzUGXV8gel" rel="nofollow">https://play.golang.org/p/vyzUGXV8gel</a></p> <p>I&#39;d be interested to hear suggestions for improvement e.g. how to make it more efficient</p> <hr/>**评论:**<br/><br/>LukeShu: <pre><p><strong>main</strong></p> <p>Not that this is a performance-critical program, but calling <code>fmt.Printf()</code> in the tight loop is bad. That&#39;s a syscall for every character!</p> <p>I suspect your fastest way is going to be allocating a <code>utf8.RuneCountInString(str)</code>-length <code>[]rune</code>, filling it, then converting it with <code>string()</code>. Or, if you can assume input is all-ASCII, and if you assume the output runes are of uniform width (consult UTF-8 to see if this is true), you can maybe beat that by doing UTF-8 directly and allocating a <code>[]byte</code> of the correct length up-front.</p> <p>Or maybe using <code>strings.Builder</code>.</p> <p>Measure it!</p> <p><strong>GetRune</strong></p> <p>Why use &#39;\u&#39; notation instead of just using ASCII?</p> <p>Why loop, instead of just using arithmetic? That&#39;s making an O(n) operation of something that should be O(1)</p></pre>LukeShu: <pre><p>Here&#39;s a benchmark comparing your GetRune vs an arithmetic-based version</p> <pre><code>package main import ( &#34;testing&#34; ) const ( SwirlStart rune = &#39;\U0001D4D0&#39; DoubleBarStart rune = &#39;\U0001D538&#39; GothStart rune = &#39;\U0001D504&#39; SansSerifStart rune = &#39;\U0001D5A0&#39; MonospaceStart rune = &#39;\U0001D670&#39; ) var charsets = []rune{ SwirlStart, DoubleBarStart, GothStart, SansSerifStart, MonospaceStart, } var Result rune func BenchmarkGetRune(b *testing.B) { b.Run(&#34;Loop&#34;, func(b *testing.B) { benchmarkGetRune(b, GetRuneLoop) }) b.Run(&#34;Arith&#34;, func(b *testing.B) { benchmarkGetRune(b, GetRuneArith) }) } func benchmarkGetRune(b *testing.B, getRune func(c, s rune) rune) { var r rune for i := 0; i &lt; b.N; i++ { charset := charsets[i % len(charsets)] for c := &#39; &#39;; c &lt;= &#39;~&#39;; c++ { r = getRune(c, charset) } } Result = r } func GetRuneLoop(c, start rune) rune { var offset rune // ascii A-Z for i := &#39;\u0041&#39;; i &lt;= &#39;\u005A&#39;; i++ { if c == i { return start + offset } offset++ } // ascii a-z for i := &#39;\u0061&#39;; i &lt;= &#39;\u007A&#39;; i++ { if c == i { return start + offset } offset++ } return c } func GetRuneArith(c, start rune) rune { switch { case &#39;A&#39; &lt;= c &amp;&amp; c &lt;= &#39;Z&#39;: return c - &#39;A&#39; + start case &#39;a&#39; &lt;= c &amp;&amp; c &lt;= &#39;z&#39;: return c - &#39;a&#39; + start + 26 default: return c } } Here&#39;s what my laptop gave: $ go test -bench=. -benchtime 20s convert_test.go goos: linux goarch: 386 BenchmarkGetRune/Loop-2 3000000 8376 ns/op BenchmarkGetRune/Arith-2 30000000 916 ns/op PASS ok command-line-arguments 126.872s </code></pre> <p>Almost a 10x improvement! And clearer code too!</p></pre>LukeShu: <pre><p>And my intuition on the main part was right too:</p> <pre><code>package main import ( &#34;os&#34; &#34;fmt&#34; &#34;strings&#34; &#34;testing&#34; &#34;unicode/utf8&#34; ) const ( SwirlStart rune = &#39;\U0001D4D0&#39; DoubleBarStart rune = &#39;\U0001D538&#39; GothStart rune = &#39;\U0001D504&#39; SansSerifStart rune = &#39;\U0001D5A0&#39; MonospaceStart rune = &#39;\U0001D670&#39; ) var charsets = []rune{ SwirlStart, DoubleBarStart, GothStart, SansSerifStart, MonospaceStart, } var file = os.NewFile(3, &#34;file&#34;) func BenchmarkConvert(b *testing.B) { b.Run(&#34;Printf&#34;, func(b *testing.B) { benchmarkConvert(b, ConvertPrintf) }) b.Run(&#34;Runes&#34;, func(b *testing.B) { benchmarkConvert(b, ConvertRunes) }) b.Run(&#34;Builder&#34;, func(b *testing.B) { benchmarkConvert(b, ConvertBuilder) }) } func benchmarkConvert(b *testing.B, convert func(string, rune)) { str := &#34;The quick brown fox jumped over the lazy dog&#34; for i := 0; i &lt; b.N; i++ { convert(str, charsets[i%len(charsets)]) } } func ConvertPrintf(asciiString string, charset rune) { for _, c := range asciiString { fmt.Fprintf(file, &#34;%c&#34;, GetRune(c, charset)) } fmt.Fprintln(file) } func ConvertRunes(asciiString string, charset rune) { out := make([]rune, utf8.RuneCountInString(asciiString)) i := 0 for _, c := range asciiString { out[i] = GetRune(c, charset) i++ } fmt.Fprintln(file, string(out)) } func ConvertBuilder(asciiString string, charset rune) { var b strings.Builder for _, c := range asciiString { b.WriteRune(GetRune(c, charset)) } fmt.Fprintln(file, b.String()) } func GetRune(c, start rune) rune { switch { case &#39;A&#39; &lt;= c &amp;&amp; c &lt;= &#39;Z&#39;: return c - &#39;A&#39; + start case &#39;a&#39; &lt;= c &amp;&amp; c &lt;= &#39;z&#39;: return c - &#39;a&#39; + start + 26 default: return c } } </code></pre> <p>which gave me:</p> <pre><code>$ go test -bench=. convert_test.go 3&gt;/dev/null goos: linux goarch: 386 BenchmarkConvert/Printf-2 20000 67233 ns/op BenchmarkConvert/Runes-2 300000 4831 ns/op BenchmarkConvert/Builder-2 300000 5550 ns/op PASS ok command-line-arguments 5.262s </code></pre> <p>and the improvement is even more dramatic if we use something that doesn&#39;t just discard the result:</p> <pre><code>$ go test -bench=. convert_test.go 3&gt;/tmp/out.txt goos: linux goarch: 386 BenchmarkConvert/Printf-2 10000 105420 ns/op BenchmarkConvert/Runes-2 200000 6149 ns/op BenchmarkConvert/Builder-2 200000 6899 ns/op PASS ok command-line-arguments 3.836s </code></pre></pre>porjo38: <pre><p>Thanks, that&#39;s great!</p> <blockquote> <p>Why use &#39;\u&#39; notation instead of just using ASCII?</p> </blockquote> <p>Because I wasn&#39;t fully grasping that runes are just int32! I actually started out using hex (e.g. c == 0x41 ) but was getting an error about mismatched types (rune vs int), so went back to &#39;\u0000&#39; notation.</p> <blockquote> <p>Why loop, instead of just using arithmetic?</p> </blockquote> <p>Given the above misunderstanding, this seemed necessary at the time.</p></pre>nevyn: <pre><p>It&#39;s python, not Go, but you might appreciate: <a href="https://github.com/reinderien/mimic" rel="nofollow">https://github.com/reinderien/mimic</a></p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

978 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传