Ascii to different Unicode fonts

xuanbao · · 978 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

I've created a toy program which takes an ascii string and maps it to unicode mathematical symbol table to produce the same string in different fonts. <a href="https://play.golang.org/p/vyzUGXV8gel" rel="nofollow">https://play.golang.org/p/vyzUGXV8gel</a> I'd be interested to hear suggestions for improvement e.g. how to make it more efficient <hr/>**评论：** LukeShu: <pre>main Not that this is a performance-critical program, but calling <code>fmt.Printf()</code> in the tight loop is bad. That's a syscall for every character! I suspect your fastest way is going to be allocating a <code>utf8.RuneCountInString(str)</code>-length <code>[]rune</code>, filling it, then converting it with <code>string()</code>. Or, if you can assume input is all-ASCII, and if you assume the output runes are of uniform width (consult UTF-8 to see if this is true), you can maybe beat that by doing UTF-8 directly and allocating a <code>[]byte</code> of the correct length up-front. Or maybe using <code>strings.Builder</code>. Measure it! GetRune Why use '\u' notation instead of just using ASCII? Why loop, instead of just using arithmetic? That's making an O(n) operation of something that should be O(1)</pre>LukeShu: <pre>Here's a benchmark comparing your GetRune vs an arithmetic-based version <pre><code>package main import ( "testing" ) const ( SwirlStart rune = '\U0001D4D0' DoubleBarStart rune = '\U0001D538' GothStart rune = '\U0001D504' SansSerifStart rune = '\U0001D5A0' MonospaceStart rune = '\U0001D670' ) var charsets = []rune{ SwirlStart, DoubleBarStart, GothStart, SansSerifStart, MonospaceStart, } var Result rune func BenchmarkGetRune(b *testing.B) { b.Run("Loop", func(b *testing.B) { benchmarkGetRune(b, GetRuneLoop) }) b.Run("Arith", func(b *testing.B) { benchmarkGetRune(b, GetRuneArith) }) } func benchmarkGetRune(b *testing.B, getRune func(c, s rune) rune) { var r rune for i := 0; i < b.N; i++ { charset := charsets[i % len(charsets)] for c := ' '; c <= '~'; c++ { r = getRune(c, charset) } } Result = r } func GetRuneLoop(c, start rune) rune { var offset rune // ascii A-Z for i := '\u0041'; i <= '\u005A'; i++ { if c == i { return start + offset } offset++ } // ascii a-z for i := '\u0061'; i <= '\u007A'; i++ { if c == i { return start + offset } offset++ } return c } func GetRuneArith(c, start rune) rune { switch { case 'A' <= c && c <= 'Z': return c - 'A' + start case 'a' <= c && c <= 'z': return c - 'a' + start + 26 default: return c } } Here's what my laptop gave: $ go test -bench=. -benchtime 20s convert_test.go goos: linux goarch: 386 BenchmarkGetRune/Loop-2 3000000 8376 ns/op BenchmarkGetRune/Arith-2 30000000 916 ns/op PASS ok command-line-arguments 126.872s </code></pre> Almost a 10x improvement! And clearer code too!</pre>LukeShu: <pre>And my intuition on the main part was right too: <pre><code>package main import ( "os" "fmt" "strings" "testing" "unicode/utf8" ) const ( SwirlStart rune = '\U0001D4D0' DoubleBarStart rune = '\U0001D538' GothStart rune = '\U0001D504' SansSerifStart rune = '\U0001D5A0' MonospaceStart rune = '\U0001D670' ) var charsets = []rune{ SwirlStart, DoubleBarStart, GothStart, SansSerifStart, MonospaceStart, } var file = os.NewFile(3, "file") func BenchmarkConvert(b *testing.B) { b.Run("Printf", func(b *testing.B) { benchmarkConvert(b, ConvertPrintf) }) b.Run("Runes", func(b *testing.B) { benchmarkConvert(b, ConvertRunes) }) b.Run("Builder", func(b *testing.B) { benchmarkConvert(b, ConvertBuilder) }) } func benchmarkConvert(b *testing.B, convert func(string, rune)) { str := "The quick brown fox jumped over the lazy dog" for i := 0; i < b.N; i++ { convert(str, charsets[i%len(charsets)]) } } func ConvertPrintf(asciiString string, charset rune) { for _, c := range asciiString { fmt.Fprintf(file, "%c", GetRune(c, charset)) } fmt.Fprintln(file) } func ConvertRunes(asciiString string, charset rune) { out := make([]rune, utf8.RuneCountInString(asciiString)) i := 0 for _, c := range asciiString { out[i] = GetRune(c, charset) i++ } fmt.Fprintln(file, string(out)) } func ConvertBuilder(asciiString string, charset rune) { var b strings.Builder for _, c := range asciiString { b.WriteRune(GetRune(c, charset)) } fmt.Fprintln(file, b.String()) } func GetRune(c, start rune) rune { switch { case 'A' <= c && c <= 'Z': return c - 'A' + start case 'a' <= c && c <= 'z': return c - 'a' + start + 26 default: return c } } </code></pre> which gave me: <pre><code>$ go test -bench=. convert_test.go 3>/dev/null goos: linux goarch: 386 BenchmarkConvert/Printf-2 20000 67233 ns/op BenchmarkConvert/Runes-2 300000 4831 ns/op BenchmarkConvert/Builder-2 300000 5550 ns/op PASS ok command-line-arguments 5.262s </code></pre> and the improvement is even more dramatic if we use something that doesn't just discard the result: <pre><code>$ go test -bench=. convert_test.go 3>/tmp/out.txt goos: linux goarch: 386 BenchmarkConvert/Printf-2 10000 105420 ns/op BenchmarkConvert/Runes-2 200000 6149 ns/op BenchmarkConvert/Builder-2 200000 6899 ns/op PASS ok command-line-arguments 3.836s </code></pre></pre>porjo38: <pre>Thanks, that's great! <blockquote> Why use '\u' notation instead of just using ASCII? </blockquote> Because I wasn't fully grasping that runes are just int32! I actually started out using hex (e.g. c == 0x41 ) but was getting an error about mismatched types (rune vs int), so went back to '\u0000' notation. <blockquote> Why loop, instead of just using arithmetic? </blockquote> Given the above misunderstanding, this seemed necessary at the time.</pre>nevyn: <pre>It's python, not Go, but you might appreciate: <a href="https://github.com/reinderien/mimic" rel="nofollow">https://github.com/reinderien/mimic</a></pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

978 次点击

加入收藏微博

linux

github

python

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Ascii to different Unicode fonts

用户登录

今日阅读排行

一周阅读排行

最新主题