Dumb questions from a programmer who is weak at math. 1) Is there a theoretical ...

magnio · on Dec 2, 2022

1) All comparison sorting algorithms, i.e. sorting actually involves comparing elements, has a hard lower bound of O(log n!)=O(n log n) worst case complexity. For non-comparison sorting, the limit depends on the algorithm.

2) Hybrid algorithms, i.e. those that change the underlying strategy based on some parameters, are already quite common in real-world implementaions. Examples include timsort (stable, mix of merge sort and insertion sort), used in Python, Java, and Rust, and pdqsort (unstable, mix of quicksort and heap sort), used in Rust and Go.

m12k · on Dec 2, 2022

1) IIRC the best case is O(n) (you just verify that it is already sorted, can't get much faster than that) while the worst case is O(nlog(n)). But some of the best algorithms in real world usage have worse worst case asymptotic speed.

2) I think things like cache and machine word size have huge impact on real world speed, so it makes sense to have knobs to tweak to fit within those limits, even enough a theoretical analysis does away with constants like that

brap · on Dec 2, 2022

> the worst case is O(nlog(n))

I think it's worth clarifying that this is the best worst case possible, i.e for every sorting algorithm you could create a certain input in a way that it won't be able to beat O(nlog(n)). In other words, O(nlog(n)) is a minimum hard limit for worst case speed, no algorithm can do better than O(nlog(n)) on all possible inputs (but it can do better on some inputs, o(n) being the hard limit there).

I don't really remember the theory behind this, but hopefully someone here can answer: is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?

GuB-42 · on Dec 2, 2022

> is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?

No, you can get better than nlog(n) for specific cases that may happen a lot in practice, but not in the general case of randomly selected continuous inputs.

The explanation comes from information theory, and it is the same idea as for why you can't compress random data.

In essence, a sorting algorithm has to pick the correct reordering of the sequence, there are n! possible reorderings, so the algorithm needs log(n!) bits of information about the sequence, each comparison yields one bit, and log(n!) is asymptotically equivalent to nlog(n), so you need at least nlog(n) comparisons to cover every case. In practice, some reorderings are more common than others, in particular the "already sorted" case, so it is worth making your algorithm check these cases first, but on randomly selected inputs, these special cases represent a negligible fraction of all the possible cases.

brap · on Dec 3, 2022

> you can get better than nlog(n) for specific cases that may happen a lot in practice, but not in the general case of randomly selected continuous inputs.

I guess what I'm asking is how "specific" do these cases have to be, or how "general" is the general case? Can specific cases be 0.0001%? How about 1%?

kjeetgill · on Dec 3, 2022

People forget! The O(nlogn) limit for the best worst case is for comparison sorts. I don't know if you consider it a "special case", but for more cases than not you can do guaranteed linear time to the number of elements: Radix and Bucket Sort. This is O(n) for things like ints because the number of bits are constant but for strings the length plays a factor: O(k*n). This performance isn't dependent on the distribution of items being sorted so I'd consider that pretty general.

You could also consider things like sleep sort or spaghetti sort: googling them I'll leave to the reader. Oh, and sorting networks are a good read too.

devit · on Dec 2, 2022

This is only for algorithms whose output is solely dependent on the outcomes of comparisons of input values.

Otherwise you can for instance sort an input made of zeroes and ones in O(n) time and O(log(n)) space by counting the ones.

Y_Y · on Dec 2, 2022

Regarding your last question, Quantum Bogosort Chinese to mind. Don't know about classical algorithms though.

Paedor · on Dec 2, 2022

For 1) The asymptotic best for sorting is O(n log n), if you're only allowed to compare the size of elements. If you allow operations like array indexing with elements, which is possible for integers, it gets as good as O(n) with radix sort.

hakuseki · on Dec 2, 2022

Assuming all values are unequal, there are n factorial (abbreviated n!) possible cases that we need to distinguish.

If we are sorting by comparison, then each comparison will eliminate at most half of the possible cases. So we need at least log_2(n!) comparisons in the worst case.

scandum · on Dec 2, 2022

Hard to answer. I think the main thing is that there are a lot of stumbling blocks, things that most people overlook, yet logical once you have it explained.

Occasionally one is discovered and solved, and it can lead to a brief domino effect, but I have no idea how many are left.