SortPerf.jl

Julia module to test the performance of sorting algorithms.
Popularity
3 Stars
Updated Last
1 Year Ago
Started In
December 2012

SortPerf.jl: Module to test the performance of sorting algorithms

The purpose of this module is to test the performance of the different sort (and related) algorithms in Julia. See https://github.com/kmsquire/SortPerf.jl/raw/master/sortperf.pdf for an example output from Version 0.3.0-prerelease+125.

Run with:

``````std_sort_tests(;sort_algs=SortPerf.sort_algs,   # [InsertionSort, HeapSort, MergeSort,
types=SortPerf.std_types,       # [Int32, Int64, Int128, Float32, Float64, String]
range=6:20,                     # Array size 2^6 through 2^20, by powers of 2
replicates=3,                   #
lt::Function=isless,            # \
by::Function=identity,          #  | sort(...) options
rev::Bool=false,                #  |
order::Ordering=Forward,        # /
save::Bool=false,               # create and save timing tsv and pdf plot
prefix="sortperf")              # prefix for saved files
``````

You can also test individual algorithms with

``````sortperf(Algorithm(s), data, [size,] [replicates=xxx])
``````

Some examples:

``````sortperf(QuickSort, Int, 10_000)               # Test QuickSort on 10,000 random ints
sortperf(MergeSort, [Float32, String], 6:2:10) # Test MergeSort on 2^6, 2^8, and 2^10 float 32s and strings
sortperf([QuickSort, MergeSort, TimSort],      # Test QuickSort, MergeSort, and TimSort on
[Int, Float32, Float64, String],      # Arrays of Int, Float32, Float64, and String
6:20;                                 # ranging from 2^6 elements to 2^20 elements, by
replicates=5)                         # powers of 2, and run each test 5 times
``````

Ordering parameters accepted by sort!(...) will be passed through.

Sorting Tests

The actual tests run include sorting arrays with the following characteristics:

• random
• sorted
• reversed
• sorted, but with 3 random exchanges
• sorted, with 10 random values appended
• 4 unique values
• all equal
• quicksort median killer: first half descending, second half ascending

The tests were inspired by similar tests used by sortperf in Python. See http://svn.python.org/projects/python/trunk/Objects/listsort.txt for more details.

Suggestions based on basic tests

Here is a table and some notes on the Julia implementations of the various algorithms. The table indicates the recommended sort algorithm for the given size (`small < ~2^12 (=8,192) items < large`) and type (string, floating point, or integer) of data.

• Random means that the data is permuted randomly.
• Structured here means that the data contains partially sorted runs (such as when adding random data to an already sorted array).
• Few unique indicates that the data only contains a few unique values.
(Un)stable (small) Stable (small) (Un)stable (large) Stable (large) In-place (large)
Strings
- Random M M M M Q
- Structured M M T T Q
- Few Unique Q M Q M Q
Float64
- Random Q M R R Q
- Structured M M T T Q
- Few Unique Q M Q R Q
Int64
- Random Q M R R Q
- Structured Q M uT R/T Q
- Few Unique Q M R R Q

Key:

Symbol Algorithm
H `HeapSort`
I `InsertionSort`
M `MergeSort`
Q `QuickSort`
T `TimSort`
uT `TimSortUnstable`
R `RadixSort`

Current Recommendations

• Except for pathological cases, small arrays are sorted best with `QuickSort` (unstable) or `MergeSort`` (stable)

• When sorting large arrays with sections of already-sorted data, use `TimSort`. The only structured case it does not handle well is reverse-sorted data with large numbers of repeat elements. An unstable version of `TimSort` (to be contributed to Julia soon) will handle this case

• For numerical data (Ints or Floats) without structure, `RadixSort` is the best choice, except for 1) 128-bit values, or 2) 64-bit integers which span the full range of values.

• When memory is tight, `QuickSort` is the best in-place algorithm. If there is concern about pathological cases, use `HeapSort`. All stable algorithms use additional memory, but `TimSort` is (probably) the most frugal.

• Composite types may behave differently. If sorting is important to your application, you should test the different algorithms on your own data. This package facilitates that.