Optimizations - what am I doing wrong?

blov · · 626 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

Hello, I've been trying to improve performance of some code I wrote a while back, a write-up about the problem and some optimization ideas I tried to work with can be found here: <a href="http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/">http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/</a> First - I am not educated in math and barely understand what O(n2) means. I have implemented three of the methods suggested; <ol> <li>SolutionBinarySearch(), uses a naive binary search and should have a time complexity of O(n2)</li> <li>SolutionLinear(), uses a linear search and should be O(nLogn + n)</li> <li>SolutionHashing(), uses hashing, should be O(n)</li> </ol> According to the article, they should perform like this: <ul> <li>SolutionBinarySearch - slowest</li> <li>SolutionLinear - faster than binary search</li> <li>SolutionHashing - faster than linear search</li> </ul> However, benchmarking my implementation shows <pre><code>$ go test -bench=. -benchtime=10s BenchmarkSolutionLinear-8 5000 2781682 ns/op BenchmarkSolutionHashing-8 200000 86691 ns/op BenchmarkSolutionBinarySearch-8 1000000 18583 ns/op </code></pre> This confuses me, it seems the binary search is fastest (it should be slowest?), and linear is super slow, it should be fast. I am obviously doing this wrong, so I am just curious if anyone has ideas? Here is my application: <a href="https://play.golang.org/p/SsmgzB3EVz">https://play.golang.org/p/SsmgzB3EVz</a> <hr/>**评论：** ImLopshire: <pre>Algorithmic complexity begins to matter when n is very large. In many cases, especially when n is small, a simpler solution with less overhead can be much faster. I notice your benchmarks are only using a slice of 50k items. This makes your n 50k. Try increasing n substantially. There will likely be a value of n when the less algorithmic complexity solution is faster.</pre>birkbork: <pre><blockquote> I notice your benchmarks are only using a slice of 50k items. This makes your n 50k. Try increasing n substantially. There will likely be a value of n when the less algorithmic complexity solution is faster. </blockquote> I increased them to 5 million, still seeing the same pattern: <pre><code>$ go test -bench=. -benchtime=30s BenchmarkSolutionLinear-8 100 411649700 ns/op BenchmarkSolutionHashing-8 10000 3728741 ns/op BenchmarkSolutionBinarySearch-8 20000 3070626 ns/op </code></pre></pre>tsdtsdtsd: <pre>In your linear solution you have to sort your input first. Could it be that this makes the benchmark slower? Just a guess after a peek, as I can't try it out right now at work.</pre>birkbork: <pre>Possibly, but the link in OP suggests it should be faster; <blockquote> We can use sorting to solve it in lesser time complexity. We can sort the array in O(nLogn) time. Once the array is sorted, then all we need to do is a linear scan of the array. So this approach takes O(nLogn + n) time which is O(nLogn). </blockquote> Maybe I am just using too small arrays to notice, as <a href="/u/ImLopshire" rel="nofollow">/u/ImLopshire</a> noticed</pre>vhodges: <pre>I didn't look too long (5 minutes or so) but it looks like you've swapped the Linear and Binary Search implementations.</pre>birkbork: <pre>You're probably right, I am fumbling here :-) That said, I tried following the descriptions from [1] for binary search: <blockquote> search all positive integers, starting from 1 in the given array. We may have to search at most n+1 numbers in the given array. So this solution takes O(n2) in worst case. </blockquote> which i tried to implement in SolutionBinarySearch(), and linear: <blockquote> We can sort the array in O(nLogn) time. Once the array is sorted, then all we need to do is a linear scan of the array. So this approach takes O(nLogn + n) time which is O(nLogn). </blockquote> which I tried to implement in SolutionLinear() 1: <a href="http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/" rel="nofollow">http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/</a></pre>CHAOSFISCH: <pre>First, your implementation contains many errors. I fixed the most obvious ones. Your "BinarySearch" is essentially the "naive" algorithm? For n = 25000000 <pre><code>BenchmarkSolutionLinear-4 1 19230408500 ns/op BenchmarkSolutionHashing-4 1 14998439500 ns/op BenchmarkSolutionNaive-4 1 11039920000 ns/op </code></pre> Results might be a bit off because of potential thermal throttling. I'd assume that for some larger n Hashing might be faster. However, it seems that the Naive algorithm works best for most cases here. Unless there's another unspotted bug or performance improvement possible within the respective algorithms. <del><a href="https://play.golang.org/p/P9g58_VNYt" rel="nofollow">https://play.golang.org/p/P9g58_VNYt</a></del> <a href="https://play.golang.org/p/Yym9hqQLW7" rel="nofollow">https://play.golang.org/p/Yym9hqQLW7</a> EDIT 1: I've played a bit further with the code and performed a bit of profiling. <ul> <li>Naive: 2/3 of time is spent on the for loop, 1/3 of time on the comparison.</li> <li>Linear: 99% of time is spent on sorting the numbers</li> <li>Hashing: 99% of time is spent on assigning entries to the map.</li> </ul> Thus, only for very large n the naive version could be slower.</pre>birkbork: <pre><blockquote> Your "BinarySearch" is essentially the "naive" algorithm? </blockquote> Yes. I guess this means I don't know what a binary search is! Very interesting profiling results. PS, you can increase the time the benchmark is run with <code>-benchtime=30s</code> or something, to get more accurate results with long running benchmarks. The default is 1 second. I'll have a look at your version, I am very curious of all them errors I made :-) Thank you so much for your detailed feedback!</pre>CHAOSFISCH: <pre>A more updated version (fixed/improved the way benchmarks are done): <a href="https://play.golang.org/p/Yym9hqQLW7" rel="nofollow">https://play.golang.org/p/Yym9hqQLW7</a></pre>JHunz: <pre>Your test set generation, while improved from the way OP was doing it, doesn't insert any negative numbers. This artificially boosts the naive solution</pre>CHAOSFISCH: <pre>Negative numbers have no effect and do not favor the naive solution (just tested it to verified it). More important is the density of the slice, i.e., which is highest if we need k iterations to find the solution k (assuming positive numbers). Thus, it highly depends on the input. Is it sparse? -> naive Is it dense? -> hashing (according to O(n)). However, linear is faster because the creation of the hashmap is too expensive. Using random numbers as done by OP will most likely favor naive. EDIT: I can benchmark only for up to n=100000000. After that my memory is insufficient.</pre>ChristophBerger: <pre>About the O() notation, the math may become clearer when looking at the related function plots for n2, log(n), 2n, etc. <a href="https://appliedgo.net/big-o/" rel="nofollow">Here</a> is a blog post I wrote about this a while ago.</pre>birkbork: <pre>Thank you for the link, the article looks really interesting!</pre>JHunz: <pre>There are a number of problems I see: First, you're generating a test set that doesn't contain any negative numbers because rand.Int returns non-negative values. Having them in would very drastically affect the performance of the naive solution, so that benchmark is artificially sped up with your current benchmarking code. Second, you're not actually handling the problem description correctly in some of your solutions. It specifies that negative numbers can be present in the array, but for example SolutionLinear will always return 1 on any array containing a negative number, whether or not 1 is present. This will make it artificially slightly faster once your test set contains negative numbers, although it would still be nLogn due to the sorting happening first. Third, you are handling the problem description incorrectly in another way by capping the values you search for i at 1 million in SolutionBinarySearch and SolutionHashing. This makes SolutionBinarySearch artificially fast at n>1000000 because its complexity caps at 1000000*n rather than n squared. So I think most of the discrepancy you're seeing is a discrepancy in what you thought you were testing vs what you were actually testing. Edit: Also, go has a map type which is basically a hash table already. You'd probably see better performance out of the hashing solution as well as not having to allocate a specific size up front by switching to using that rather than using a slice.</pre>birkbork: <pre>Thanks for all the pointers, will have another go at this <blockquote> Edit: Also, go has a map type which is basically a hash table already. You'd probably see better performance out of the hashing solution as well as not having to allocate a specific size up front by switching to using that rather than using a slice. </blockquote> Did try this initially, using map[int]bool, but it was considerably slower than the solution used in the OP. This might be because I only tested it with arrays with 50k elements. Should re-try.</pre>birkbork: <pre>Did another test regarding using map. It seems to be much slower: <pre><code>BenchmarkSolution/Hashing-8 20 72253585 ns/op BenchmarkSolution/HashingMap-8 1 22666093134 ns/op BenchmarkSolution/HashingMapNoInterface-8 1 21006878796 ns/op </code></pre> Where SolutionHashingMap() is by <a href="/u/CHAOSFISCH" rel="nofollow">/u/CHAOSFISCH</a> and SolutionHashingMapNoInterface() is slightly modified to use map[int]bool instead. <pre><code>// uses hasing, O(n) func SolutionHashing(A []int) int { hash := make([]bool, 1000000) for _, val := range A { if val > 0 && val < 100000 { hash[val] = true } } for i := 1; i < len(A)+1; i++ { if !hash[i] { return i } } return len(A) + 1 } // uses hasing, O(n), with map func SolutionHashingMap(A []int) int { var a struct{} hash := make(map[int]struct{}, len(A)) for _, val := range A { hash[val] = a } for i := 1; i < len(A)+1; i++ { if _, ok := hash[i]; !ok { return i } } return len(A) + 1 } // uses hasing, O(n), with map func SolutionHashingMapNoInterface(A []int) int { hash := make(map[int]bool, len(A)) for _, val := range A { hash[val] = true } for i := 1; i < len(A)+1; i++ { if !hash[i] { return i } } return len(A) + 1 } </code></pre> updated code here, based on changes from <a href="/u/CHAOSFISCH" rel="nofollow">/u/CHAOSFISCH</a> <a href="https://play.golang.org/p/7Ha98L8eep" rel="nofollow">https://play.golang.org/p/7Ha98L8eep</a></pre>CHAOSFISCH: <pre>Well, sure the solution you propose using a fixed size []bool is faster. To some extent you're cheating. My solutions work for all numbers MinInt64 to MaxInt64. You're limiting it to 100000.</pre>birkbork: <pre>True. The 100k limit is mentioned in <a href="https://codility.com/demo/take-sample-test/" rel="nofollow">https://codility.com/demo/take-sample-test/</a>, not the description posted in OP.</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

626 次点击

加入收藏微博

slice

net

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Optimizations - what am I doing wrong?

用户登录

今日阅读排行

一周阅读排行

最新主题