a better merge-sort alGOrithm?

xuanbao · 2016-12-13 04:00:22 · 699 次点击

这是一个分享于 2016-12-13 04:00:22 的资源，其中的信息可能已经有所发展或是发生改变。

I'm a new gopher. Following my first book on Go - writing the sample code doing all the exercises - I decided to jump in to a study of data-structures and algorithms.

I started easy. Bubble sort first then progressed to insertion sort, followed by merge sort. I decided to tinker with the different algorithms - insertion and merge - mostly. I coded an app that produces a random number of elements in a slice with a random number (int) in each element. Compare the sort times and display on screen. I noticed that insertion sort really shined on any array less than 1000 elements, sorting quicker than merge sort. However merge sort quickly out paced insertion sort after the array grew to element sizes greater than 2500. I experimented most of the day trying different size numbers and element sizes to 1 million with numbers ranging from zero to 10,000.

So after a while an idea formed. Since insertion sort works so well for arrays of 1000 elements or less and merge sort is a divide and conquer algorithm what if I have insertion sort take over after merge sort has divided the array down to 1000 elements. I coded this to allow an argument - threshold - in the merge sort function so I could make both calls no-threshold / threshold and compare the sort times of the algorithms. The threshold code appears to be 3 times faster on average for any size array up to 1 million elements (integers).

OK, I know I haven't invented a better mouse trap. I know it probably seems a bit trivial to the more experienced coders. But, am I missing something? Is this sound logic or is OZ behind the curtains having a laugh at my expense. What Algorithm can I use to test my sorted arrays. I visually inspected the small arrays but 100's of thousands? Is it possible the threshold hasn't actually sorted the array completely but just large chunks and therein lies the reason for the speed efficiency?

Here's the code ;

https://gist.github.com/dyanceyjr/2736c1718329f0d6e182ee776d225e31

评论：

joncalhoun:

Congrats on figuring this out yourself :)

This is actually a fairly well known thing (for lack of a better word) and iirc even the standard library's sort package does something like this. See https://golang.org/src/sort/sort.go

In the standard library I believe it uses a version of quick sort and heapsort but I only glanced at the code. 

Basically, some algorithms are better with smaller datasets and some are better at chunking together large datasets into their corresponding sections, so if we merge them we can write algorithms that are faster in practice. 

Sorting algorithms are interesting because there isn't always one optimal solution. A bubble sort is better than most sorting algorithms with mostly sorted data. Some sorts do better with large datasets. Others perform better on smaller datasets. I always found them fascinating even if you rarely need to actually implement one in your day to day work.

ops-man:

thanks. actually I'm following your blog series - Let's learn Algorithms - https://www.calhoun.io which in turn led to this. I'll digress for a moment to say Thanks again for your series, so far I've learned so much directly and indirectly just following along and asking questions.

UniverseCity:

Python's standard sorting algorithm works exactly as you described (with some other fancy stuff thrown in).

ops-man:

So once I learn more of the algorithms heap, quick, radix etc... then you just code procedure for selecting the most efficient combination of algorithm to use with the selected data structure??

UniverseCity:

The most efficient procedure can often be mathematically derived and then implemented as a proof.

ops-man:

indeed. I don't have those skills. yet. honestly the math eludes me. I'm trying to remedy that problem.

Your link is a very interesting read. I'll bookmark and read a couple times more. thanks.

joncalhoun:

I feel like this math eludes like 50% (if not more) of people who even go to college and get a computer science degree. Sure, all of those people have to learn how to create proofs and do this, but a vast majority of them only learn enough to get by and quickly forget it after college. 

I'm saying this not to say that you shouldn't give it a go, but to let you know that it isn't easy for everyone and that you shouldn't get disappointed if it is challenging.

ops-man:

thanks. I'm trying to keep focused on data structures and algorithms. But, I get distracted. I learn new things, sometimes something that was known to me and that l lacked understanding of has light on it with this new knowledge and I'm drawn to it like a moth.

drvd:

Just supplementing what u/joncalhoun already explained very well.

Take a look at the test in package sort. It uses a wide range of different types of input data: From random, to presorted, jigsaw patterns, data with lots of repetitions, unique values and even "malicious" input (crafted to trigger quicksorts worst case behaviour). Also look at the benchmarks.

The threshold in sort.go to switch from quicksort to shellshort is much lower than your 1000, actually 2 orders of magnitude. Maybe you could try your algorithm with the (very clever) benchmarks of package sort?

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

699 次点击

加入收藏微博

github

slice

python

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

a better merge-sort alGOrithm?

用户登录

今日阅读排行

一周阅读排行

最新主题