Optimizations - what am I doing wrong?

blov · · 449 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Hello, I&#39;ve been trying to improve performance of some code I wrote a while back, a write-up about the problem and some optimization ideas I tried to work with can be found here: <a href="http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/">http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/</a></p> <p>First - I am not educated in math and barely understand what O(n<sup>2)</sup> means.</p> <p>I have implemented three of the methods suggested;</p> <ol> <li><p>SolutionBinarySearch(), uses a naive binary search and should have a time complexity of O(n<sup>2)</sup></p></li> <li><p>SolutionLinear(), uses a linear search and should be O(nLogn + n)</p></li> <li><p>SolutionHashing(), uses hashing, should be O(n)</p></li> </ol> <p>According to the article, they should perform like this:</p> <ul> <li>SolutionBinarySearch - slowest</li> <li>SolutionLinear - faster than binary search</li> <li>SolutionHashing - faster than linear search</li> </ul> <p>However, benchmarking my implementation shows</p> <pre><code>$ go test -bench=. -benchtime=10s BenchmarkSolutionLinear-8 5000 2781682 ns/op BenchmarkSolutionHashing-8 200000 86691 ns/op BenchmarkSolutionBinarySearch-8 1000000 18583 ns/op </code></pre> <p>This confuses me, it seems the binary search is fastest (it should be slowest?), and linear is super slow, it should be fast.</p> <p>I am obviously doing this wrong, so I am just curious if anyone has ideas?</p> <p>Here is my application: <a href="https://play.golang.org/p/SsmgzB3EVz">https://play.golang.org/p/SsmgzB3EVz</a></p> <hr/>**评论:**<br/><br/>ImLopshire: <pre><p>Algorithmic complexity begins to matter when n is very large. In many cases, especially when n is small, a simpler solution with less overhead can be much faster.</p> <p>I notice your benchmarks are only using a slice of 50k items. This makes your n 50k. Try increasing n substantially. There will likely be a value of n when the less algorithmic complexity solution is faster.</p></pre>birkbork: <pre><blockquote> <p>I notice your benchmarks are only using a slice of 50k items. This makes your n 50k. Try increasing n substantially. There will likely be a value of n when the less algorithmic complexity solution is faster.</p> </blockquote> <p>I increased them to 5 million, still seeing the same pattern:</p> <pre><code>$ go test -bench=. -benchtime=30s BenchmarkSolutionLinear-8 100 411649700 ns/op BenchmarkSolutionHashing-8 10000 3728741 ns/op BenchmarkSolutionBinarySearch-8 20000 3070626 ns/op </code></pre></pre>tsdtsdtsd: <pre><p>In your linear solution you have to sort your input first. Could it be that this makes the benchmark slower? Just a guess after a peek, as I can&#39;t try it out right now at work.</p></pre>birkbork: <pre><p>Possibly, but the link in OP suggests it should be faster;</p> <blockquote> <p>We can use sorting to solve it in lesser time complexity. We can sort the array in O(nLogn) time. Once the array is sorted, then all we need to do is a linear scan of the array. So this approach takes O(nLogn + n) time which is O(nLogn).</p> </blockquote> <p>Maybe I am just using too small arrays to notice, as <a href="/u/ImLopshire" rel="nofollow">/u/ImLopshire</a> noticed</p></pre>vhodges: <pre><p>I didn&#39;t look too long (5 minutes or so) but it looks like you&#39;ve swapped the Linear and Binary Search implementations.</p></pre>birkbork: <pre><p>You&#39;re probably right, I am fumbling here :-)</p> <p>That said, I tried following the descriptions from [1]</p> <p>for binary search:</p> <blockquote> <p>search all positive integers, starting from 1 in the given array. We may have to search at most n+1 numbers in the given array. So this solution takes O(n<sup>2)</sup> in worst case.</p> </blockquote> <p>which i tried to implement in SolutionBinarySearch(),</p> <p>and linear:</p> <blockquote> <p>We can sort the array in O(nLogn) time. Once the array is sorted, then all we need to do is a linear scan of the array. So this approach takes O(nLogn + n) time which is O(nLogn).</p> </blockquote> <p>which I tried to implement in SolutionLinear()</p> <p>1: <a href="http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/" rel="nofollow">http://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/</a></p></pre>CHAOSFISCH: <pre><p>First, your implementation contains many errors. I fixed the most obvious ones. Your &#34;BinarySearch&#34; is essentially the &#34;naive&#34; algorithm?</p> <p>For n = 25000000</p> <pre><code>BenchmarkSolutionLinear-4 1 19230408500 ns/op BenchmarkSolutionHashing-4 1 14998439500 ns/op BenchmarkSolutionNaive-4 1 11039920000 ns/op </code></pre> <p>Results might be a bit off because of potential thermal throttling. I&#39;d assume that for some larger n Hashing might be faster. However, it seems that the Naive algorithm works best for most cases here. Unless there&#39;s another unspotted bug or performance improvement possible within the respective algorithms.</p> <p><del><a href="https://play.golang.org/p/P9g58_VNYt" rel="nofollow">https://play.golang.org/p/P9g58_VNYt</a></del> <a href="https://play.golang.org/p/Yym9hqQLW7" rel="nofollow">https://play.golang.org/p/Yym9hqQLW7</a></p> <p><strong>EDIT 1</strong>: I&#39;ve played a bit further with the code and performed a bit of profiling. </p> <ul> <li>Naive: 2/3 of time is spent on the for loop, 1/3 of time on the comparison.</li> <li>Linear: 99% of time is spent on sorting the numbers</li> <li>Hashing: 99% of time is spent on assigning entries to the map.</li> </ul> <p>Thus, only for very large n the naive version could be slower.</p></pre>birkbork: <pre><blockquote> <p>Your &#34;BinarySearch&#34; is essentially the &#34;naive&#34; algorithm?</p> </blockquote> <p>Yes. I guess this means I don&#39;t know what a binary search is!</p> <p>Very interesting profiling results. PS, you can increase the time the benchmark is run with <code>-benchtime=30s</code> or something, to get more accurate results with long running benchmarks. The default is 1 second.</p> <p>I&#39;ll have a look at your version, I am very curious of all them errors I made :-) Thank you so much for your detailed feedback!</p></pre>CHAOSFISCH: <pre><p>A more updated version (fixed/improved the way benchmarks are done): <a href="https://play.golang.org/p/Yym9hqQLW7" rel="nofollow">https://play.golang.org/p/Yym9hqQLW7</a></p></pre>JHunz: <pre><p>Your test set generation, while improved from the way OP was doing it, doesn&#39;t insert any negative numbers. This artificially boosts the naive solution</p></pre>CHAOSFISCH: <pre><p>Negative numbers have no effect and do not favor the <em>naive</em> solution (just tested it to verified it). More important is the density of the slice, i.e., which is highest if we need <em>k</em> iterations to find the solution <em>k</em> (assuming positive numbers).</p> <p>Thus, it highly depends on the input. </p> <p>Is it sparse? -&gt; <em>naive</em></p> <p>Is it dense? -&gt; <em>hashing</em> (according to O(n)). However, <em>linear</em> is faster because the creation of the hashmap is too expensive. </p> <p>Using random numbers as done by OP will most likely favor <em>naive</em>.</p> <p><strong>EDIT</strong>: I can benchmark only for up to <em>n=100000000</em>. After that my memory is insufficient.</p></pre>ChristophBerger: <pre><p>About the O() notation, the math may become clearer when looking at the related function plots for n<sup>2,</sup> log(n), 2<sup>n,</sup> etc. <a href="https://appliedgo.net/big-o/" rel="nofollow">Here</a> is a blog post I wrote about this a while ago.</p></pre>birkbork: <pre><p>Thank you for the link, the article looks really interesting!</p></pre>JHunz: <pre><p>There are a number of problems I see: </p> <p>First, you&#39;re generating a test set that doesn&#39;t contain any negative numbers because rand.Int returns non-negative values. Having them in would very drastically affect the performance of the naive solution, so that benchmark is artificially sped up with your current benchmarking code. </p> <p>Second, you&#39;re not actually handling the problem description correctly in some of your solutions. It specifies that negative numbers can be present in the array, but for example SolutionLinear will always return 1 on any array containing a negative number, whether or not 1 is present. This will make it artificially slightly faster once your test set contains negative numbers, although it would still be nLogn due to the sorting happening first.</p> <p>Third, you are handling the problem description incorrectly in another way by capping the values you search for i at 1 million in SolutionBinarySearch and SolutionHashing. This makes SolutionBinarySearch artificially fast at n&gt;1000000 because its complexity caps at 1000000*n rather than n squared. </p> <p>So I think most of the discrepancy you&#39;re seeing is a discrepancy in what you thought you were testing vs what you were actually testing. </p> <p>Edit: Also, go has a map type which is basically a hash table already. You&#39;d probably see better performance out of the hashing solution as well as not having to allocate a specific size up front by switching to using that rather than using a slice.</p></pre>birkbork: <pre><p>Thanks for all the pointers, will have another go at this</p> <blockquote> <p>Edit: Also, go has a map type which is basically a hash table already. You&#39;d probably see better performance out of the hashing solution as well as not having to allocate a specific size up front by switching to using that rather than using a slice.</p> </blockquote> <p>Did try this initially, using map[int]bool, but it was considerably slower than the solution used in the OP. This might be because I only tested it with arrays with 50k elements. Should re-try.</p></pre>birkbork: <pre><p>Did another test regarding using map. It seems to be much slower:</p> <pre><code>BenchmarkSolution/Hashing-8 20 72253585 ns/op BenchmarkSolution/HashingMap-8 1 22666093134 ns/op BenchmarkSolution/HashingMapNoInterface-8 1 21006878796 ns/op </code></pre> <p>Where SolutionHashingMap() is by <a href="/u/CHAOSFISCH" rel="nofollow">/u/CHAOSFISCH</a></p> <p>and SolutionHashingMapNoInterface() is slightly modified to use map[int]bool instead.</p> <pre><code>// uses hasing, O(n) func SolutionHashing(A []int) int { hash := make([]bool, 1000000) for _, val := range A { if val &gt; 0 &amp;&amp; val &lt; 100000 { hash[val] = true } } for i := 1; i &lt; len(A)+1; i++ { if !hash[i] { return i } } return len(A) + 1 } // uses hasing, O(n), with map func SolutionHashingMap(A []int) int { var a struct{} hash := make(map[int]struct{}, len(A)) for _, val := range A { hash[val] = a } for i := 1; i &lt; len(A)+1; i++ { if _, ok := hash[i]; !ok { return i } } return len(A) + 1 } // uses hasing, O(n), with map func SolutionHashingMapNoInterface(A []int) int { hash := make(map[int]bool, len(A)) for _, val := range A { hash[val] = true } for i := 1; i &lt; len(A)+1; i++ { if !hash[i] { return i } } return len(A) + 1 } </code></pre> <p>updated code here, based on changes from <a href="/u/CHAOSFISCH" rel="nofollow">/u/CHAOSFISCH</a> <a href="https://play.golang.org/p/7Ha98L8eep" rel="nofollow">https://play.golang.org/p/7Ha98L8eep</a></p></pre>CHAOSFISCH: <pre><p>Well, sure the solution you propose using a fixed size <em>[]bool</em> is faster. To some extent you&#39;re cheating. My solutions work for all numbers <em>MinInt64</em> to <em>MaxInt64</em>. You&#39;re limiting it to 100000.</p></pre>birkbork: <pre><p>True. The 100k limit is mentioned in <a href="https://codility.com/demo/take-sample-test/" rel="nofollow">https://codility.com/demo/take-sample-test/</a>, not the description posted in OP.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

449 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传