utf8.RuneCountInString` is 1-3x faster than `len([]rune(s))` conversion

blov · · 649 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p><code>utf8.RuneCountInString</code> is <strong>1-3x faster</strong> than <code>len([]rune(s))</code> conversion.</p> <p>I hope that the latter will be optimized in the future. <code>[]rune(s)</code> is perfectly optimizable (<em>against what some people have discussed in comments</em>). Go compiler already knows how to handle this, it has a specific code in it already. When it sees <code>[]rune(s)</code>, it puts a call to a function to be executed at runtime. It just needs to make it more performant. This is probably because, <code>[]rune(s)</code> makes two passes when converting from a string.</p> <pre><code>const s = &#34;日本語&#34; fmt.Println(len([]rune(s)), utf8.RuneCountInString(s)) goos: darwin goarch: amd64 BenchmarkRunCountInString16ascii-8 100000000 11.9 ns/op 0 B/op 0 allocs/op BenchmarkRunCountInString16multi-8 20000000 63.5 ns/op 0 B/op 0 allocs/op BenchmarkRunCountInString32ascii-8 100000000 18.2 ns/op 0 B/op 0 allocs/op BenchmarkRunCountInString32multi-8 10000000 120 ns/op 0 B/op 0 allocs/op BenchmarkCastToRuneArray16ascii-8 50000000 26.2 ns/op 0 B/op 0 allocs/op BenchmarkCastToRuneArray16multi-8 10000000 171 ns/op 0 B/op 0 allocs/op BenchmarkCastToRuneArray32ascii-8 30000000 46.1 ns/op 0 B/op 0 allocs/op BenchmarkCastToRuneArray32multi-8 5000000 322 ns/op 0 B/op 0 allocs/op </code></pre> <p><a href="https://play.golang.org/p/M2HHTtuHMI-">Benchmark Code</a></p> <hr/> <p>Btw, note that, for the current most used Go compiler, <strong>nothing escapes to heap</strong>. They&#39;re all in stack. And, as @martisch has added, there&#39;s a <a href="https://go-review.googlesource.com/c/go/+/33637">work</a> going on for changing the implementation of <code>utf8.RuneCountInString</code> to the direct language construct using <code>range string</code> rather than using a special algorithm inside of it.</p> <pre><code>BenchmarkCastToRuneArray16ascii ([]rune)(str) does not escape BenchmarkCastToRuneArray16multi ([]rune)(str) does not escape BenchmarkCastToRuneArray32ascii ([]rune)(str) does not escape BenchmarkCastToRuneArray32multi ([]rune)(str) does not escape </code></pre> <hr/> <p><strong>A lot of people are <a href="https://github.com/search?l=Go&amp;q=%22len%28%5B%5Drune%28%22&amp;type=Code">using</a> <code>len([]rune(s))</code> in their repo on Github.</strong> Probably, the tip of the iceberg.</p> <p>Also, &#34;<a href="https://golang.org/doc/code.html#Library">How to write Go code example</a>&#34; uses this pattern (<code>len([]rune(s))</code>) as an example:</p> <pre><code>// Reverse returns its argument string reversed rune-wise left to right. func Reverse(s string) string { r := []rune(s) for i, j := 0, len(r)-1; i &lt; len(r)/2; i, j = i+1, j-1 { r[i], r[j] = r[j], r[i] } return string(r) } </code></pre> <p>Not just that, <a href="https://golang.org/pkg/syscall/">syscall</a> package is using this pattern too. Read its source code.</p> <hr/>**评论:**<br/><br/>0xjnml: <pre><p>That&#39;s completely expected and I don&#39;t think any optimization can help. <code>len([]rune(s))</code> unnecessarily converts <code>s</code> to <code>[]rune</code> while possibly allocating the backing array of the new slice in heap. <code>utf8.RuneCountInString</code> does none of that and is the right tool to use in this case.</p></pre>blackflicker: <pre><p>Depending on the situation, when the compiler (<em>in ssaing maybe</em>) or runtime sees <code>len([]rune(s))</code> maybe it can do what <code>utf8.RuneCountInString</code> does to optimize it?</p> <p>Actually, just optimizing <code>[]rune(s)</code> will make things better.</p></pre>Creshal: <pre><p>But why, when you could just use right function to begin with?</p></pre>titpetric: <pre><p>Compilers optimize away a lot of things that you wouldn&#39;t expect them to. Inlining functions for one example, or even skipping simple evaluations like <code>3+5+13</code> directly into <code>21</code>. The compiled output tells a story which isn&#39;t exactly the same as the AST tokens which your source code parses to.</p> <p>It&#39;s reasonable to catch such cases as the one suggested and optimize away, but, possibly not all cases should be optimized away. Can somebody make the case that <code>len([]rune(s))</code> is an often used shorthand that it would make sense to optimize it further? We know it&#39;s possible, but if it&#39;s not used often (and I <em>guess</em> it&#39;s not), then there&#39;s little incentive for these optimizations, as you&#39;ll not really get any gains.</p> <p>Why do I <em>guess</em> it&#39;s usage? For one, you could ask yourself:</p> <ul> <li>Is it used in a loop? (frequency)</li> <li>Is it common to get the result of some string length? (occurence)</li> </ul> <p>In fact, the answer to both of these seems likely to be no, even on a large scale service. Of course, it might very well be the case that the compiler does optimize some things here away already, so it would be a case of can the compiler detect where <code>s</code> is coming from, and if it&#39;s a static declaration produce <code>[]rune{}</code> from <code>s</code> at compile time instead of runtime. In that case, it&#39;s likely that len() might beat <code>utf8.RuneCountInString</code> several fold.</p> <p>This is also the wisdom suggested by people, to keep your syntax close to the stdlib. A notable example is the usage of <code>defer</code>, which was significantly optimized for performance over the last few Go versions. It&#39;s more likely that the compiler will eventually optimize the performance with <code>len([]rune(s))</code> relative to <code>utf8.RuneCountInString</code>.</p> <p>Edit: added more thoughts</p></pre>blackflicker: <pre><p>Yes, this is what I mean, fully agreed.</p></pre>blackflicker: <pre><p>Because, I think it&#39;s more concise and easy to use and it doesn&#39;t import a package and it doesn&#39;t make a function call (<em>although the function probably would be inlined</em>).</p></pre>kostix: <pre><p>With all due respect, I find your reasoning to be flawed—purely from a programmer&#39;s view standpoint.</p> <p>The statement <code>len([]rune(s))</code> is read by a programmer like this:</p> <ul> <li>Convert string <code>s</code> to a slice of <code>rune</code>s—one rune for each Unicode code point in the string. At this point, the programmer has fully assessed what the possible runtime costs are.</li> <li>Take the length of that slice.</li> <li>(Throw the slice away).</li> </ul> <p>This line of reasoning clearly raises a red flag for a conscious programmer.</p> <p>On the other hand, <code>utf8.RuneCountInString</code>—as its several cousins in the <code>utf8</code> package—is specifically designed to not require <em>fully decoding</em> the string to know its length in Unicode code points—it merely decodes one rune at a time as it goes.</p> <p>The runtime cost, again, can be clearly assessed.</p> <p>Note that programming is not about writing <em>concise</em> code, it&#39;s about writing sensible code.</p></pre>blackflicker: <pre><p>In my experience clarity and ease of coding leads to more sensible code. And more code = more bugs = more something bad etc. </p> <p>Always try to keep it simple.</p></pre>0xjnml: <pre><blockquote> <p>Always try to keep it simple.</p> </blockquote> <p>Well, <code>foo(expr)</code> is <em>simpler</em> than <code>baz(T(expr))</code>.</p></pre>blackflicker: <pre><p>Yeah, it&#39;s simpler than: <code>import foo; foo.bar(expr)</code>.</p> <p>For example: Let&#39;s also remove index expressions etc as well and use them through <code>import array; array.getByIndex(arr, 10)</code> instead of: <code>arr[10]</code>. Simpler, yeah?</p></pre>AngusMcBurger: <pre><p>Well then the clear simpler thing would be to use the function <em>specifically named to perform this task</em>, rather than the inefficient expression that happens to do it.</p></pre>nevyn: <pre><blockquote> <p>The statement len([]rune(s)) is read by a programmer like this:</p> </blockquote> <p>And what does the programmer read this as:</p> <pre><code>mymap := make(map[string]int) data := []byte{&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;} if mymap[string(data)] &gt;= 4 { ... } </code></pre></pre>joushou: <pre><p>So you expect the compiler to import the package for you because you didn&#39;t bother?</p> <p>This is one of those cases that just doesn&#39;t make sense to optimize. The optimization would be a very narrow peephole optimization that would only make sense if you convert a string to a rune array <em>only</em> to take the length of it, which is frankly just not a smart thing to do (worst case, you&#39;re making the string consume 4 times more space in order to count it).</p> <p>The compilers tries to make <em>good</em> code <em>better</em>, not <em>bad</em> code <em>good</em>. If you tell the compiler to do something stupid, it will do something stupid. The compiler is not magic, and every optimization incurs a cost to develop and maintain. The compiler developers should put their effort into actually useful optimizations.</p> <p>For reference, <em>normal</em> types of optimizations would be something like dead code elimination, constant elimination, vectorization, intermediate value elimination. It is <em>not</em> a normal type of optimization to replace one large algorithm with another.</p></pre>bbatha: <pre><blockquote> <p>So you expect the compiler to import the package for you because you didn&#39;t bother</p> </blockquote> <p>The <code>[]rune</code> cast already needs to count all of the runes in the string to allocate or append them to the array, and thus is &#34;importing the package for you&#34;. The optimization would just cut out the temporary array allocation and copy.</p></pre>joushou: <pre><p>The compiler does not import any packages. Language functionality is &#34;built-ins&#34;.</p> <p>You could add another built-in as a peephole optimization, but this is borderline silly, and a waste of compiler dev time. They have more important things to work on, such as actually useful optimizations.</p></pre>weberc2: <pre><p>I agree about the optimization being specific, but it&#39;s perfectly reasonable for the compiler to see that the allocation is never used and to eschew it as part of some more general &#34;optimize away unused allocs&#34; rule.</p></pre>: <pre><p>[deleted]</p></pre>joushou: <pre><p>It&#39;s not syntactic sugar. Syntactic sugar is a syntax which simplifies an otherwise longer construct.</p> <p>Examples of syntactic sugar:</p> <pre><code>// Rust &#34;?&#34; operator: let v = func_call()?; ... vs ... let v = match func_call() { Err(e) =&gt; { return Err(e); }, Ok(v) =&gt; v }; // Go shorthand declare-assign syntax x := &#34;hello&#34; ... vs ... var x = &#34;hello&#34; // Go reuse-function-return return func_call() ... vs. ... v, err := func_call() return v, err </code></pre> <p>What <em>you&#39;re</em> asking for is to take existing syntax that is a terrible way to achieve the endgoal you want, and implement a somewhat absurd peephole optimization to replace the functionality you are <em>actively asking for</em> with something better suited to achieve the goal you were aiming for for.</p> <p>This isn&#39;t the job of an optimizing compiler. The compiler will not replace hashmap with a key-value list, or a set with a key list, just because there&#39;s only a few elements in it. The compiler will not give you an int where you accidentally used a float, just because you only use it for integral values. The compiler will not replace a bubble-sort implementation with quicksort/mergesort/..., just because you picked a poor algorithm for your corpus.</p> <p>What it <em>will</em> do is to ensure that <em>what you ask for</em> is done <em>fast</em>. It&#39;s going to expand that UTF-8 string to a rune slice and count it as fast as it can, just ilke it will make a bubblesort or hashmap blazingly fast. However, it does what you ask it to. Ask it to run a wrong algorithm, and you get a slow program.</p> <p>That goes for <em>any</em> optimizing compiler.</p></pre>blackflicker: <pre><p>Yes, the compiler could make it as faster as it can like <code>utf8.RuneCountInString</code>. It doesn&#39;t have to be a syntactic sugar etc. What I mean is to making it simpler without having call to a stdlib function, only using language constructs.</p></pre>joushou: <pre><p>Adding syntactic sugar implies <em>adding syntax</em> with <em>new meaning</em>. This is not syntactic sugar, as it is <em>existing syntax</em> with an <em>existing meaning</em> that gives <em>the exact end-goal that you intended</em>. What you want is for the optimizer to recognize a <em>pattern</em> that does something in a <em>very bad way</em>, and replace it with a better algorithm tailored to this <em>exact</em> use-case.</p> <p>What you are asking for is called a <a href="https://en.wikipedia.org/wiki/Peephole_optimization" rel="nofollow"><em>peephole optimization</em></a> (note that the wikipedia article uses assembly examples—the Go compiler would do this pass the SSA). That is, an optimization pass that would first detect this <em>exact</em> pattern in the SSA:</p> <pre><code>t0 = your_string t1 = stringToRuneSlice(t0) t2 = lengthOfSlice(t1) // t1 not referenced any further </code></pre> <p>Then it would throw it all over the shoulder, and instead do the <em>equivalent</em> of the following (the optimizer should not import modules, as doing so has side-effects. Thus, the functionality must be duplicated as a compiler built-in):</p> <pre><code>t0 = your_string t1 = countRunesInString(t0) </code></pre> <p>This is equivalent to a compiler detecting that you implemented a bubblesort, and then internally replacing it with a quicksort. Writing good code is your job, not the compiler. It would take an infinite amount of work to cover all these arbitrary cases of stupid code.</p> <p>However, that is not to say that this reddit post is not useful. Some may not <em>realize</em> that this is a terrible construct, so it can be a nice heads-up. However, it is exactly equivalent to telling people that bubblesort is probably a bad idea.</p></pre>blackflicker: <pre><blockquote> <p>This is equivalent to a compiler detecting that you implemented a bubblesort</p> </blockquote> <p>Comparing this with bubble-sort is exaggeration. It can just optimize <code>len([]rune(s))</code>.</p> <blockquote> <p>the optimizer should not import modules, as doing so has side-effects. Thus, the functionality must be duplicated as a compiler built-in</p> </blockquote> <p>Go compiler (<em>maybe in the backend</em>) can just add a call instruction node for the stdlib function by instead of <code>len([]rune(s))</code> pattern, so, when the code is executed, it can be called instead. This won&#39;t lead to duplication. It does this in several places already as I remember.</p></pre>blackflicker: <pre><p>No, it doesn&#39;t escape to heap. They&#39;re all in stack. And, as @martisch has added, there&#39;s a <a href="https://go-review.googlesource.com/c/go/+/33637" rel="nofollow">work</a> going on for changing the implementation of <code>utf8.RuneCountInString</code> to the direct language construct using <code>range string</code> rather than using a special algorithm inside of it.</p> <pre><code>BenchmarkCastToRuneArray16ascii ([]rune)(str) does not escape BenchmarkCastToRuneArray16multi ([]rune)(str) does not escape BenchmarkCastToRuneArray32ascii ([]rune)(str) does not escape BenchmarkCastToRuneArray32multi ([]rune)(str) does not escape </code></pre></pre>martisch: <pre><p>Note that the work referenced is about range s (the string itself) not range []rune{}. No rune slice is constructed when ranging over a string.</p></pre>blackflicker: <pre><p>Sorry, right, I mistyped it, updating.</p></pre>0xjnml: <pre><p>In the post you&#39;re replying to, I wrote</p> <blockquote> <p>while possibly allocating the backing array of the new slice in heap</p> </blockquote> <p>The language specification does not mandate nor guarantee the optimization for any particular Go compiler.</p></pre>blackflicker: <pre><p>The spec doesn&#39;t say about it&#39;d would escape to heap either. You said that it may escape to heap and I showed that it&#39;s not for the current and most used compiler.</p></pre>0xjnml: <pre><p>It seems that your definition of the word <code>may</code> maybe differs from the one I&#39;m using.</p></pre>SSoreil: <pre><p>That&#39;s an interesting way to write it, len([]rune(s)). I don&#39;t dislike it, although it makes sense the version actually making use of the way UTF8 is implemented is faster.</p></pre>blackflicker: <pre><p>Yes, read its source if you&#39;re interested, it explains why is that so much faster. I also had checked the compiler and runtime code of <code>[]rune(s)</code>. I was expecting this result.</p></pre>martisch: <pre><p>could try in comparison:</p> <pre><code>func RuneCountInString(s string) int { n := 0 for range s { n++ } return n } </code></pre> <p>doesnt import utf8 either and does not need to allocate a slice.</p> <p>Was/is considered as implementation of RuneCountInString in <a href="https://go-review.googlesource.com/c/go/+/33637" rel="nofollow">https://go-review.googlesource.com/c/go/+/33637</a></p></pre>blackflicker: <pre><blockquote> <p><a href="https://go-review.googlesource.com/c/go/+/33637" rel="nofollow">https://go-review.googlesource.com/c/go/+/33637</a></p> </blockquote> <p>Thanks for that redirection. I was looking for an issue talking about this.</p></pre>egonelbre: <pre><p>One question. Why are you looking at the rune count in the first place? I know very few uses for it.</p></pre>blackflicker: <pre><blockquote> <p>Why</p> </blockquote> <p>Actually the thing is not directly rune count. It looks like so. The problem is caused because of the conversion. <code>foo([]rune(s))</code> will all perform better when the conversion routine becomes more performant.</p></pre>0xjnml: <pre><blockquote> <p>Btw, note that, nothing escapes to heap. </p> </blockquote> <p>That&#39;s not correct. It works for the particular compiler used. No generalization follows from that, specs do not even mention things like heap/escape/stack etc.</p> <blockquote> <p>A lot of people are using len([]rune(s)) in their repo on Github. Probably, the tip of the iceberg.</p> </blockquote> <p>That&#39;s comparable to the famous &#34;100 billions of flies cannot be wrong&#34;. I for one, consider that a poor coding practice and I&#39;d not approve it in a code review. Search the standard library and try if you can find the same approach. I guess and hope the search will show zero cases.</p> <p>That said, of course it&#39;s a possible hard-coded, one-of-a-kind optimization, otherwise we need to put AI in the compiler. The next question is how often this optimization will get used. IMO, zero times in good code and possibly very often in real code. Considering the pragmatism of the Go team, it may actually happen that such optimization materializes. No harm can be done by filling a proposal at the issue tracker.</p></pre>blackflicker: <pre><p>Can you tell me for which compiler does it escape to heap?</p> <p>Btw, I&#39;m not recommending this approach (<code>len([]rune(s))</code>), I just think that it&#39;s simpler. It looks like I&#39;m defending it but actually I&#39;m not.</p></pre>0xjnml: <pre><blockquote> <p>Can you tell me for which compiler does it escape to heap?</p> </blockquote> <p>The point is not about a concrete example, even though older versions of the <code>gc</code> compiler are such examples. Any conforming implementation is completely free to not to implement every possible optimization. Your code should never ever rely on a particular optimization of a particular compiler. That&#39;s poor coding practice.</p></pre>blackflicker: <pre><blockquote> <p>Search the standard library and try if you can find the same approach.</p> </blockquote> <p>Don&#39;t go too far. Check out: &#34;How to write Go code example&#34;: <a href="https://golang.org/doc/code.html#Library" rel="nofollow">https://golang.org/doc/code.html#Library</a></p> <pre><code>// Reverse returns its argument string reversed rune-wise left to right. func Reverse(s string) string { r := []rune(s) for i, j := 0, len(r)-1; i &lt; len(r)/2; i, j = i+1, j-1 { r[i], r[j] = r[j], r[i] } return string(r) } </code></pre></pre>guncha: <pre><p>The code is not doing <code>len([]rune(s))</code> is it?</p></pre>blackflicker: <pre><p>Go compiler puts a call for <code>[]rune(s)</code> and it does two passes. If it wasn&#39;t doing so, other operations would perform better for <code>foo([]rune(s))</code>. It doesn&#39;t have to be like: <code>len([]rune(s))</code>.</p></pre>0xjnml: <pre><p>This example has nothing to do with the topic.</p></pre>markuspeloquin: <pre><p><em>1-3x faster</em> doesn&#39;t mean anything. I mean, it could mean multiple things. Use <em>speedup</em>, which has a very specific meaning.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

649 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传