Hacker News new | past | comments | ask | show | jobs | submit login

1) IIRC the best case is O(n) (you just verify that it is already sorted, can't get much faster than that) while the worst case is O(nlog(n)). But some of the best algorithms in real world usage have worse worst case asymptotic speed.

2) I think things like cache and machine word size have huge impact on real world speed, so it makes sense to have knobs to tweak to fit within those limits, even enough a theoretical analysis does away with constants like that




> the worst case is O(nlog(n))

I think it's worth clarifying that this is the best worst case possible, i.e for every sorting algorithm you could create a certain input in a way that it won't be able to beat O(nlog(n)). In other words, O(nlog(n)) is a minimum hard limit for worst case speed, no algorithm can do better than O(nlog(n)) on all possible inputs (but it can do better on some inputs, o(n) being the hard limit there).

I don't really remember the theory behind this, but hopefully someone here can answer: is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?


> is it theoretically possible for a sorting algorithm to achieve sub-O(nlog(n)) speeds on 99.99% (or some other %) of randomly selected inputs? Or even O(n)?

No, you can get better than nlog(n) for specific cases that may happen a lot in practice, but not in the general case of randomly selected continuous inputs.

The explanation comes from information theory, and it is the same idea as for why you can't compress random data.

In essence, a sorting algorithm has to pick the correct reordering of the sequence, there are n! possible reorderings, so the algorithm needs log(n!) bits of information about the sequence, each comparison yields one bit, and log(n!) is asymptotically equivalent to nlog(n), so you need at least nlog(n) comparisons to cover every case. In practice, some reorderings are more common than others, in particular the "already sorted" case, so it is worth making your algorithm check these cases first, but on randomly selected inputs, these special cases represent a negligible fraction of all the possible cases.


> you can get better than nlog(n) for specific cases that may happen a lot in practice, but not in the general case of randomly selected continuous inputs.

I guess what I'm asking is how "specific" do these cases have to be, or how "general" is the general case? Can specific cases be 0.0001%? How about 1%?


People forget! The O(nlogn) limit for the best worst case is for comparison sorts. I don't know if you consider it a "special case", but for more cases than not you can do guaranteed linear time to the number of elements: Radix and Bucket Sort. This is O(n) for things like ints because the number of bits are constant but for strings the length plays a factor: O(k*n). This performance isn't dependent on the distribution of items being sorted so I'd consider that pretty general.

You could also consider things like sleep sort or spaghetti sort: googling them I'll leave to the reader. Oh, and sorting networks are a good read too.


This is only for algorithms whose output is solely dependent on the outcomes of comparisons of input values.

Otherwise you can for instance sort an input made of zeroes and ones in O(n) time and O(log(n)) space by counting the ones.


Regarding your last question, Quantum Bogosort Chinese to mind. Don't know about classical algorithms though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: