utf8.RuneCountInString` is 1-3x faster than `len([]rune(s))` conversion

blov · 2018-04-18 19:30:13 · 771 次点击

这是一个分享于 2018-04-18 19:30:13 的资源，其中的信息可能已经有所发展或是发生改变。

utf8.RuneCountInString is 1-3x faster than len([]rune(s)) conversion.

I hope that the latter will be optimized in the future. []rune(s) is perfectly optimizable (against what some people have discussed in comments). Go compiler already knows how to handle this, it has a specific code in it already. When it sees []rune(s), it puts a call to a function to be executed at runtime. It just needs to make it more performant. This is probably because, []rune(s) makes two passes when converting from a string.

const s = "日本語"
fmt.Println(len([]rune(s)), utf8.RuneCountInString(s))

goos: darwin
goarch: amd64
BenchmarkRunCountInString16ascii-8      100000000               11.9 ns/op             0 B/op          0 allocs/op
BenchmarkRunCountInString16multi-8      20000000                63.5 ns/op             0 B/op          0 allocs/op
BenchmarkRunCountInString32ascii-8      100000000               18.2 ns/op             0 B/op          0 allocs/op
BenchmarkRunCountInString32multi-8      10000000               120 ns/op               0 B/op          0 allocs/op
BenchmarkCastToRuneArray16ascii-8       50000000                26.2 ns/op             0 B/op          0 allocs/op
BenchmarkCastToRuneArray16multi-8       10000000               171 ns/op               0 B/op          0 allocs/op
BenchmarkCastToRuneArray32ascii-8       30000000                46.1 ns/op             0 B/op          0 allocs/op
BenchmarkCastToRuneArray32multi-8        5000000               322 ns/op               0 B/op          0 allocs/op

Benchmark Code

Btw, note that, for the current most used Go compiler, nothing escapes to heap. They're all in stack. And, as @martisch has added, there's a work going on for changing the implementation of utf8.RuneCountInString to the direct language construct using range string rather than using a special algorithm inside of it.

BenchmarkCastToRuneArray16ascii ([]rune)(str) does not escape
BenchmarkCastToRuneArray16multi ([]rune)(str) does not escape
BenchmarkCastToRuneArray32ascii ([]rune)(str) does not escape
BenchmarkCastToRuneArray32multi ([]rune)(str) does not escape

A lot of people are using len([]rune(s)) in their repo on Github. Probably, the tip of the iceberg.

Also, "How to write Go code example" uses this pattern (len([]rune(s))) as an example:

// Reverse returns its argument string reversed rune-wise left to right.
func Reverse(s string) string {
    r := []rune(s)
    for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
        r[i], r[j] = r[j], r[i]
    }
    return string(r)
}

Not just that, syscall package is using this pattern too. Read its source code.

评论：

0xjnml:

That's completely expected and I don't think any optimization can help. len([]rune(s)) unnecessarily converts s to []rune while possibly allocating the backing array of the new slice in heap. utf8.RuneCountInString does none of that and is the right tool to use in this case.

blackflicker:

Depending on the situation, when the compiler (in ssaing maybe) or runtime sees len([]rune(s)) maybe it can do what utf8.RuneCountInString does to optimize it?

Actually, just optimizing []rune(s) will make things better.

Creshal:

But why, when you could just use right function to begin with?

titpetric:

Compilers optimize away a lot of things that you wouldn't expect them to. Inlining functions for one example, or even skipping simple evaluations like 3+5+13 directly into 21. The compiled output tells a story which isn't exactly the same as the AST tokens which your source code parses to.

It's reasonable to catch such cases as the one suggested and optimize away, but, possibly not all cases should be optimized away. Can somebody make the case that len([]rune(s)) is an often used shorthand that it would make sense to optimize it further? We know it's possible, but if it's not used often (and I guess it's not), then there's little incentive for these optimizations, as you'll not really get any gains.

Why do I guess it's usage? For one, you could ask yourself:

Is it used in a loop? (frequency)

Is it common to get the result of some string length? (occurence)

In fact, the answer to both of these seems likely to be no, even on a large scale service. Of course, it might very well be the case that the compiler does optimize some things here away already, so it would be a case of can the compiler detect where s is coming from, and if it's a static declaration produce []rune{} from s at compile time instead of runtime. In that case, it's likely that len() might beat utf8.RuneCountInString several fold.

This is also the wisdom suggested by people, to keep your syntax close to the stdlib. A notable example is the usage of defer, which was significantly optimized for performance over the last few Go versions. It's more likely that the compiler will eventually optimize the performance with len([]rune(s)) relative to utf8.RuneCountInString.

Edit: added more thoughts

blackflicker:

Yes, this is what I mean, fully agreed.

blackflicker:

Because, I think it's more concise and easy to use and it doesn't import a package and it doesn't make a function call (although the function probably would be inlined).

kostix:

With all due respect, I find your reasoning to be flawed—purely from a programmer's view standpoint.

The statement len([]rune(s)) is read by a programmer like this:

Convert string s to a slice of runes—one rune for each Unicode code point in the string. At this point, the programmer has fully assessed what the possible runtime costs are.

Take the length of that slice.

(Throw the slice away).

This line of reasoning clearly raises a red flag for a conscious programmer.

On the other hand, utf8.RuneCountInString—as its several cousins in the utf8 package—is specifically designed to not require fully decoding the string to know its length in Unicode code points—it merely decodes one rune at a time as it goes.

The runtime cost, again, can be clearly assessed.

Note that programming is not about writing concise code, it's about writing sensible code.

blackflicker:

In my experience clarity and ease of coding leads to more sensible code. And more code = more bugs = more something bad etc.

Always try to keep it simple.

0xjnml:


Always try to keep it simple.

Well, foo(expr) is simpler than baz(T(expr)).

blackflicker:

Yeah, it's simpler than: import foo; foo.bar(expr).

For example: Let's also remove index expressions etc as well and use them through import array; array.getByIndex(arr, 10) instead of: arr[10]. Simpler, yeah?

AngusMcBurger:

Well then the clear simpler thing would be to use the function specifically named to perform this task, rather than the inefficient expression that happens to do it.

nevyn:


The statement len([]rune(s)) is read by a programmer like this:

And what does the programmer read this as:

mymap := make(map[string]int)
data := []byte{'a', 'b', 'c', 'd'}
if mymap[string(data)] >= 4 { ... }

joushou:

So you expect the compiler to import the package for you because you didn't bother?
This is one of those cases that just doesn't make sense to optimize. The optimization would be a very narrow peephole optimization that would only make sense if you convert a string to a rune array only to take the length of it, which is frankly just not a smart thing to do (worst case, you're making the string consume 4 times more space in order to count it).
The compilers tries to make good code better, not bad code good. If you tell the compiler to do something stupid, it will do something stupid. The compiler is not magic, and every optimization incurs a cost to develop and maintain. The compiler developers should put their effort into actually useful optimizations.
For reference, normal types of optimizations would be something like dead code elimination, constant elimination, vectorization, intermediate value elimination. It is not a normal type of optimization to replace one large algorithm with another.

bbatha:


So you expect the compiler to import the package for you because you didn't bother

The []rune cast already needs to count all of the runes in the string to allocate or append them to the array, and thus is "importing the package for you". The optimization would just cut out the temporary array allocation and copy.

joushou:

The compiler does not import any packages. Language functionality is "built-ins".
You could add another built-in as a peephole optimization, but this is borderline silly, and a waste of compiler dev time. They have more important things to work on, such as actually useful optimizations.

weberc2:

I agree about the optimization being specific, but it's perfectly reasonable for the compiler to see that the allocation is never used and to eschew it as part of some more general "optimize away unused allocs" rule.

[deleted]

joushou:

It's not syntactic sugar. Syntactic sugar is a syntax which simplifies an otherwise longer construct.
Examples of syntactic sugar:
// Rust "?" operator:
let v = func_call()?;
... vs ...
let v = match func_call() { Err(e) => { return Err(e); }, Ok(v) => v };
// Go shorthand declare-assign syntax
x := "hello"
... vs ...
var x = "hello"
// Go reuse-function-return
return func_call()
... vs. ...
v, err := func_call()
return v, err

What you're asking for is to take existing syntax that is a terrible way to achieve the endgoal you want, and implement a somewhat absurd peephole optimization to replace the functionality you are actively asking for with something better suited to achieve the goal you were aiming for for.
This isn't the job of an optimizing compiler. The compiler will not replace hashmap with a key-value list, or a set with a key list, just because there's only a few elements in it. The compiler will not give you an int where you accidentally used a float, just because you only use it for integral values. The compiler will not replace a bubble-sort implementation with quicksort/mergesort/..., just because you picked a poor algorithm for your corpus.
What it will do is to ensure that what you ask for is done fast. It's going to expand that UTF-8 string to a rune slice and count it as fast as it can, just ilke it will make a bubblesort or hashmap blazingly fast. However, it does what you ask it to. Ask it to run a wrong algorithm, and you get a slow program.
That goes for any optimizing compiler.

blackflicker:

Yes, the compiler could make it as faster as it can like utf8.RuneCountInString. It doesn't have to be a syntactic sugar etc. What I mean is to making it simpler without having call to a stdlib function, only using language constructs.

joushou:

Adding syntactic sugar implies adding syntax with new meaning. This is not syntactic sugar, as it is existing syntax with an existing meaning that gives the exact end-goal that you intended. What you want is for the optimizer to recognize a pattern that does something in a very bad way, and replace it with a better algorithm tailored to this exact use-case.

What you are asking for is called a peephole optimization (note that the wikipedia article uses assembly examples—the Go compiler would do this pass the SSA). That is, an optimization pass that would first detect this exact pattern in the SSA:

t0 = your_string
t1 = stringToRuneSlice(t0)
t2 = lengthOfSlice(t1)
// t1 not referenced any further

Then it would throw it all over the shoulder, and instead do the equivalent of the following (the optimizer should not import modules, as doing so has side-effects. Thus, the functionality must be duplicated as a compiler built-in):

t0 = your_string
t1 = countRunesInString(t0)

This is equivalent to a compiler detecting that you implemented a bubblesort, and then internally replacing it with a quicksort. Writing good code is your job, not the compiler. It would take an infinite amount of work to cover all these arbitrary cases of stupid code.

However, that is not to say that this reddit post is not useful. Some may not realize that this is a terrible construct, so it can be a nice heads-up. However, it is exactly equivalent to telling people that bubblesort is probably a bad idea.

blackflicker:


This is equivalent to a compiler detecting that you implemented a bubblesort

Comparing this with bubble-sort is exaggeration. It can just optimize len([]rune(s)).

the optimizer should not import modules, as doing so has side-effects. Thus, the functionality must be duplicated as a compiler built-in

Go compiler (maybe in the backend) can just add a call instruction node for the stdlib function by instead of len([]rune(s)) pattern, so, when the code is executed, it can be called instead. This won't lead to duplication. It does this in several places already as I remember.

blackflicker:

No, it doesn't escape to heap. They're all in stack. And, as @martisch has added, there's a work going on for changing the implementation of utf8.RuneCountInString to the direct language construct using range string rather than using a special algorithm inside of it.
BenchmarkCastToRuneArray16ascii ([]rune)(str) does not escape
BenchmarkCastToRuneArray16multi ([]rune)(str) does not escape
BenchmarkCastToRuneArray32ascii ([]rune)(str) does not escape
BenchmarkCastToRuneArray32multi ([]rune)(str) does not escape

martisch:

Note that the work referenced is about range s (the string itself) not range []rune{}. No rune slice is constructed when ranging over a string.

blackflicker:

Sorry, right, I mistyped it, updating.

0xjnml:

In the post you're replying to, I wrote

while possibly allocating the backing array of the new slice in heap

The language specification does not mandate nor guarantee the optimization for any particular Go compiler.

blackflicker:

The spec doesn't say about it'd would escape to heap either. You said that it may escape to heap and I showed that it's not for the current and most used compiler.

0xjnml:

It seems that your definition of the word may maybe differs from the one I'm using.

SSoreil:

That's an interesting way to write it, len([]rune(s)). I don't dislike it, although it makes sense the version actually making use of the way UTF8 is implemented is faster.

blackflicker:

Yes, read its source if you're interested, it explains why is that so much faster. I also had checked the compiler and runtime code of []rune(s). I was expecting this result.

martisch:

could try in comparison:
func RuneCountInString(s string) int {
    n := 0
    for range s {
        n++
    }
    return n
}

doesnt import utf8 either and does not need to allocate a slice.
Was/is considered as implementation of RuneCountInString in https://go-review.googlesource.com/c/go/+/33637

blackflicker:

https://go-review.googlesource.com/c/go/+/33637

Thanks for that redirection. I was looking for an issue talking about this.

egonelbre:

One question. Why are you looking at the rune count in the first place? I know very few uses for it.

blackflicker:


Why

Actually the thing is not directly rune count. It looks like so. The problem is caused because of the conversion. foo([]rune(s)) will all perform better when the conversion routine becomes more performant.

0xjnml:

Btw, note that, nothing escapes to heap.

That's not correct. It works for the particular compiler used. No generalization follows from that, specs do not even mention things like heap/escape/stack etc.

A lot of people are using len([]rune(s)) in their repo on Github. Probably, the tip of the iceberg.

That's comparable to the famous "100 billions of flies cannot be wrong". I for one, consider that a poor coding practice and I'd not approve it in a code review. Search the standard library and try if you can find the same approach. I guess and hope the search will show zero cases.

That said, of course it's a possible hard-coded, one-of-a-kind optimization, otherwise we need to put AI in the compiler. The next question is how often this optimization will get used. IMO, zero times in good code and possibly very often in real code. Considering the pragmatism of the Go team, it may actually happen that such optimization materializes. No harm can be done by filling a proposal at the issue tracker.

blackflicker:

Can you tell me for which compiler does it escape to heap?
Btw,  I'm not recommending this approach (len([]rune(s))), I just think that it's simpler. It looks like I'm defending it but actually I'm not.

0xjnml:

Can you tell me for which compiler does it escape to heap?

The point is not about a concrete example, even though older versions of the gc compiler are such examples. Any conforming implementation is completely free to not to implement every possible optimization. Your code should never ever rely on a particular optimization of a particular compiler. That's poor coding practice.

blackflicker:


Search the standard library and try if you can find the same approach.

Don't go too far. Check out: "How to write Go code example": https://golang.org/doc/code.html#Library
// Reverse returns its argument string reversed rune-wise left to right.
func Reverse(s string) string {
    r := []rune(s)
    for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
        r[i], r[j] = r[j], r[i]
    }
    return string(r)
}

guncha:

The code is not doing len([]rune(s)) is it?

blackflicker:

Go compiler puts a call for []rune(s) and it does two passes. If it wasn't doing so, other operations would perform better for foo([]rune(s)). It doesn't have to be like: len([]rune(s)).

0xjnml:

This example has nothing to do with the topic.

markuspeloquin:

1-3x faster doesn't mean anything. I mean, it could mean multiple things. Use speedup, which has a very specific meaning.

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

771 次点击

加入收藏微博

runtime

slice

日本語

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

utf8.RuneCountInString` is 1-3x faster than `len([]rune(s))` conversion

用户登录

今日阅读排行

一周阅读排行

最新主题