<p>I've created a toy program which takes an ascii string and maps it to unicode mathematical symbol table to produce the same string in different fonts.</p>
<p><a href="https://play.golang.org/p/vyzUGXV8gel" rel="nofollow">https://play.golang.org/p/vyzUGXV8gel</a></p>
<p>I'd be interested to hear suggestions for improvement e.g. how to make it more efficient</p>
<hr/>**评论:**<br/><br/>LukeShu: <pre><p><strong>main</strong></p>
<p>Not that this is a performance-critical program, but calling <code>fmt.Printf()</code> in the tight loop is bad. That's a syscall for every character!</p>
<p>I suspect your fastest way is going to be allocating a <code>utf8.RuneCountInString(str)</code>-length <code>[]rune</code>, filling it, then converting it with <code>string()</code>. Or, if you can assume input is all-ASCII, and if you assume the output runes are of uniform width (consult UTF-8 to see if this is true), you can maybe beat that by doing UTF-8 directly and allocating a <code>[]byte</code> of the correct length up-front.</p>
<p>Or maybe using <code>strings.Builder</code>.</p>
<p>Measure it!</p>
<p><strong>GetRune</strong></p>
<p>Why use '\u' notation instead of just using ASCII?</p>
<p>Why loop, instead of just using arithmetic? That's making an O(n) operation of something that should be O(1)</p></pre>LukeShu: <pre><p>Here's a benchmark comparing your GetRune vs an arithmetic-based version</p>
<pre><code>package main
import (
"testing"
)
const (
SwirlStart rune = '\U0001D4D0'
DoubleBarStart rune = '\U0001D538'
GothStart rune = '\U0001D504'
SansSerifStart rune = '\U0001D5A0'
MonospaceStart rune = '\U0001D670'
)
var charsets = []rune{
SwirlStart,
DoubleBarStart,
GothStart,
SansSerifStart,
MonospaceStart,
}
var Result rune
func BenchmarkGetRune(b *testing.B) {
b.Run("Loop", func(b *testing.B) { benchmarkGetRune(b, GetRuneLoop) })
b.Run("Arith", func(b *testing.B) { benchmarkGetRune(b, GetRuneArith) })
}
func benchmarkGetRune(b *testing.B, getRune func(c, s rune) rune) {
var r rune
for i := 0; i < b.N; i++ {
charset := charsets[i % len(charsets)]
for c := ' '; c <= '~'; c++ {
r = getRune(c, charset)
}
}
Result = r
}
func GetRuneLoop(c, start rune) rune {
var offset rune
// ascii A-Z
for i := '\u0041'; i <= '\u005A'; i++ {
if c == i {
return start + offset
}
offset++
}
// ascii a-z
for i := '\u0061'; i <= '\u007A'; i++ {
if c == i {
return start + offset
}
offset++
}
return c
}
func GetRuneArith(c, start rune) rune {
switch {
case 'A' <= c && c <= 'Z':
return c - 'A' + start
case 'a' <= c && c <= 'z':
return c - 'a' + start + 26
default:
return c
}
}
Here's what my laptop gave:
$ go test -bench=. -benchtime 20s convert_test.go
goos: linux
goarch: 386
BenchmarkGetRune/Loop-2 3000000 8376 ns/op
BenchmarkGetRune/Arith-2 30000000 916 ns/op
PASS
ok command-line-arguments 126.872s
</code></pre>
<p>Almost a 10x improvement! And clearer code too!</p></pre>LukeShu: <pre><p>And my intuition on the main part was right too:</p>
<pre><code>package main
import (
"os"
"fmt"
"strings"
"testing"
"unicode/utf8"
)
const (
SwirlStart rune = '\U0001D4D0'
DoubleBarStart rune = '\U0001D538'
GothStart rune = '\U0001D504'
SansSerifStart rune = '\U0001D5A0'
MonospaceStart rune = '\U0001D670'
)
var charsets = []rune{
SwirlStart,
DoubleBarStart,
GothStart,
SansSerifStart,
MonospaceStart,
}
var file = os.NewFile(3, "file")
func BenchmarkConvert(b *testing.B) {
b.Run("Printf", func(b *testing.B) { benchmarkConvert(b, ConvertPrintf) })
b.Run("Runes", func(b *testing.B) { benchmarkConvert(b, ConvertRunes) })
b.Run("Builder", func(b *testing.B) { benchmarkConvert(b, ConvertBuilder) })
}
func benchmarkConvert(b *testing.B, convert func(string, rune)) {
str := "The quick brown fox jumped over the lazy dog"
for i := 0; i < b.N; i++ {
convert(str, charsets[i%len(charsets)])
}
}
func ConvertPrintf(asciiString string, charset rune) {
for _, c := range asciiString {
fmt.Fprintf(file, "%c", GetRune(c, charset))
}
fmt.Fprintln(file)
}
func ConvertRunes(asciiString string, charset rune) {
out := make([]rune, utf8.RuneCountInString(asciiString))
i := 0
for _, c := range asciiString {
out[i] = GetRune(c, charset)
i++
}
fmt.Fprintln(file, string(out))
}
func ConvertBuilder(asciiString string, charset rune) {
var b strings.Builder
for _, c := range asciiString {
b.WriteRune(GetRune(c, charset))
}
fmt.Fprintln(file, b.String())
}
func GetRune(c, start rune) rune {
switch {
case 'A' <= c && c <= 'Z':
return c - 'A' + start
case 'a' <= c && c <= 'z':
return c - 'a' + start + 26
default:
return c
}
}
</code></pre>
<p>which gave me:</p>
<pre><code>$ go test -bench=. convert_test.go 3>/dev/null
goos: linux
goarch: 386
BenchmarkConvert/Printf-2 20000 67233 ns/op
BenchmarkConvert/Runes-2 300000 4831 ns/op
BenchmarkConvert/Builder-2 300000 5550 ns/op
PASS
ok command-line-arguments 5.262s
</code></pre>
<p>and the improvement is even more dramatic if we use something that doesn't just discard the result:</p>
<pre><code>$ go test -bench=. convert_test.go 3>/tmp/out.txt
goos: linux
goarch: 386
BenchmarkConvert/Printf-2 10000 105420 ns/op
BenchmarkConvert/Runes-2 200000 6149 ns/op
BenchmarkConvert/Builder-2 200000 6899 ns/op
PASS
ok command-line-arguments 3.836s
</code></pre></pre>porjo38: <pre><p>Thanks, that's great!</p>
<blockquote>
<p>Why use '\u' notation instead of just using ASCII?</p>
</blockquote>
<p>Because I wasn't fully grasping that runes are just int32! I actually started out using hex (e.g. c == 0x41 ) but was getting an error about mismatched types (rune vs int), so went back to '\u0000' notation.</p>
<blockquote>
<p>Why loop, instead of just using arithmetic?</p>
</blockquote>
<p>Given the above misunderstanding, this seemed necessary at the time.</p></pre>nevyn: <pre><p>It's python, not Go, but you might appreciate:
<a href="https://github.com/reinderien/mimic" rel="nofollow">https://github.com/reinderien/mimic</a></p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传