Not counting initialization in benchmarks

agolangf · · 519 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

I'm going through the Cormen et al algorithms book, and I've run into a snag trying to write benchmarks. Here's the code: <a href="http://play.golang.org/p/Dsnqy2n7_B" rel="nofollow">http://play.golang.org/p/Dsnqy2n7_B</a> Currently, the creation of that random slice (rand.Perm(10)) is being timed by the benchmark, but I'd like it to not count. I've tried using various combinations of b.ResetTimer(), b.StopTimer(), and b.StartTimer() to split up the initialization of that random slice, but I get this weird behavior where the benchmarks will run for a very long time when I do that. About the code, the benchmark takes too long for the playground, so you'll have to run it on your local machine. Also, I wrote the benchmark to use go test in my actual code. I only put it in the main function here for convenience. <hr/>**评论：** cs-guy: <pre>Have you tried doing all the setup in a separate loop then resetting the timers? e.g. <a href="http://play.golang.org/p/4hhsXDBBHB" rel="nofollow">http://play.golang.org/p/4hhsXDBBHB</a> With this approach the benchmark goes from: <pre><code>benchmarkInsertionSort 1000000 1198 ns/op </code></pre> to <pre><code>benchmarkInsertionSort 10000000 182 ns/op </code></pre> on my machine.</pre>wirther: <pre>That seems to work. Even for larger slice sizes, as well. Thanks!</pre>XANi_: <pre>Do you really need to have completely random data each iteration ? That solution will take obscene amounts of memory (running that test on my machine takes >2GB of RAM). You are better off pregenerating a smaller set and copying it into pre-initalized slice <a href="http://play.golang.org/p/3G_oTb5YQm" rel="nofollow">like that</a></pre>wirther: <pre><del>The need to call rand.Perm isn't necessarily about random data, but about the fact that the sorting algorithm is in-place. So I can't use the same slice more than once, or else I'd be sorting an already sorted slice. For example, after 10,000 iterations of your code, I'd be sorting already sorted slices, which would skew the benchmarks towards the lower bound run times of the algorithm.</del> Also, another problem with your code is now the copy function is being measured by the benchmark. I'm sure copy is faster than rand.Perm, but it's still undesirable. I agree though that the memory usage in these benchmarks kind of sucks. I think this whole situation could be fixed if I just didn't use an in-place sorting algorithm, because then I could just use a single slice over and over again, but I'm just following along with what the book I have, for now. Edit: Sorry, I just realized that first paragraph is completely wrong. I forgot about the copy.</pre>XANi_: <pre><blockquote> Also, another problem with your code is now the copy function is being measured by the benchmark. I'm sure copy is faster than rand.Perm, but it's still undesirable. </blockquote> copy isnt so bad, it will be roughly same time no matter the algorithm you use (but ofc depending on data size) so for comparing between algorithms it is fine. Altho it might be sligthly faster (if you have cores to spare), especially for bigger datasets, to run few goroutines that just feed pointers to newly created data via channel. <blockquote> I think this whole situation could be fixed if I just didn't use an in-place sorting algorithm </blockquote> you would just move copy somewhere else.</pre>wirther: <pre><blockquote> copy isn't so bad </blockquote> That's not really the point though. If I were settling with having initialization in my benchmarks, then I'd stick with what I wrote in my original post where I called rand.Perm on the fly, inside the call to insertionSort. Then I'd have about the same overhead as the copy, but with minimal memory usage.</pre>XANi_: <pre>in case of copy it is about 10% to 40% of benchmark time for such short in case of generating it each time it is 1000% overhead or more (try running it with 1000 elements instead of 10) So I'd say it's an improvement. But benchmarking short functions always will be tricky</pre>wirther: <pre>Yes, I just ran benchmarks comparing copy and rand.Perm, and copy was much faster. I've also changed my benchmarks to this: <pre><code>func benchmark(b *testing.B, size int) { var s [][]int for i := 0; i < 100; i++ { s = append(s, rand.Perm(size)) } cp := make([]int, size) b.ResetTimer() for i := 0; i < b.N; i++ { copy(cp, s[i%100]) InsertionSort(cp) } } </code></pre> And here are the comparisons between cs-guy's benchmarks and this one: <pre><code>PASS Benchmark10InsertionSort-2 10000000 233 ns/op Benchmark100InsertionSort-2 200000 11866 ns/op Benchmark1000InsertionSort-2 2000 1085418 ns/op ok github.com/***/algos 35.614s PASS Benchmark10InsertionSort-2 5000000 251 ns/op Benchmark100InsertionSort-2 200000 11900 ns/op Benchmark1000InsertionSort-2 2000 1083444 ns/op ok github.com/***/algos 6.346s </code></pre> I'm satisfied with those results. The benchmark results are nearly identical, but the one using copy finishes in about 1/6th of the time as the one completely skipping initialization in the benchmark. The time saving is pretty nice. Thanks for the help!</pre>XANi_: <pre>You can also consider just doing <pre><code> insertionSort([]int{1,2,3,6,5,3,9,3,1,30}) </code></pre> + possibly separate test for best and worst case scenario</pre>wirther: <pre>True. By the way, I don't know which version of my above reply you saw because I edited it, but I ended up using your smaller sample of reusable permutations trick. Those benchmarks approximated really closely to cs-guy's benchmarks, so good job with that trick. Thanks again!</pre>XANi_: <pre>Your solution eats over 2GBs of memory on my PC and will eat even more on faster machines</pre>tv64738: <pre>Initialize first, then b.ResetTimer.</pre>wirther: <pre>That doesn't work. It's an in-place sorting algorithm, so I have to initialize a new slice for each call to insertionSort, but calling b.ResetTimer() a thousand times causes the benchmark to slow down by a lot. cs-guy up top found a good solution.</pre>tv64738: <pre>That's just the comprehensive version of initialize first ;)</pre>jmank88: <pre>I'm seeing the same behavior. Looking at <a href="https://golang.org/src/testing/benchmark.go?s=2163:2186#L61" rel="nofollow">benchmark.go</a>, I suspect that runtime.ReadMemStats() is the culprit (called from both {Stop,Start}Timer). FWIW removing the Perm call from the bench timing results in a larger N (5000000), but it takes the same # of iterations to get to that N (100, 10000, 1000000, 2/5000000). However, this obviously doesn't account for a 2s->200s change. Edit: Bumping the array size up from 10 to 100 or 1000 seems to help (likely from the smaller N).</pre>wirther: <pre>Thanks for confirming the weird behavior. I guess the timer functions weren't intended to be ran many times during a benchmark. Edit: I suppose I can just leave the initialization in the benchmarks, since they're kind of like a constant factor, anyway.</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

519 次点击

加入收藏微博

slice

github

channel

runtime

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Not counting initialization in benchmarks

用户登录

今日阅读排行

一周阅读排行

最新主题