Couple years ago I had a fairly simple program which collected some data into a linked list and displayed it at the end. It took about 2 seconds to load and process the data from disk. I realized it might be neat to optionally sort the output. Since I didn't feel like doing serious work, I just added a bubble sort to my linked list which took a comparison function and then bubble-sorted by pointer swapping.
Guess what time it added? less than a 10th of a second. Completely negligible. It saved me a few hours (as writing bubble sort for a linked list takes maybe 5 minutes and has no chance of complicated bugs) and I doubt anyone will ever notice any slowdown.
So what's the lesson? Little performance tweaks hardly matter in the average program nowadays because everything's disk or network bound (IO bound). For the average program it's not worth writing more complex code that will be quicker (once you fix the hard-to-see bugs) until you actually know the slowness is an issue.
Well sure, but who writes their own sorting algorithms these days anyways? It's important to understand the characteristics of various sorting algorithms so you can choose the best one, but that "best one" is almost always going to be implemented by some standard library.
It makes sense to use the most optimal version out of the gate in these cases.
It's important to understand the characteristics of various sorting algorithms so you can choose the best one
I disagree - most languages offer a sort function which is nealry optimal in the average case. You very rarely choose unless you are doing something where the sort performance is extremely important.
However, uunderstanding various sort algorithms and their intricacies is a great way to demonstrate fundamental algorithm design as well as some data structures. This is why it's important - so when you do write code, you come with a huge amount of insight into how to go about it.
> Well sure, but who writes their own sorting algorithms these days anyways?
This was (is still?) in the gnu flex program source:
/* We sort the states in sns so we
* can compare it to oldsns quickly.
* We use bubble because there probably
* aren't very many states.
*/
bubble (sns, numstates);
Whoever wrote this should know better. Their mistake of not using the language's built-in sorting functionality would at least be forgivable if they had used insertion sort, but maybe we should blame academic institutions for exposing programmers to bubble sort in the first place. Want to show newbie programmers the difference between an O(N^2) and O(N*logN) algorithm? Fine. Insertion sort is O(N^2) too, just like bubble sort, though there is no practical case where bubble sort out-performs insertion sort, and you certainly wouldn't use bubble sort to speed up quicksort.
Understanding of the characteristics comes from understanding the implementation. I can not explain why it takes longer to insert in the middle of vector vs a list without explaining at least a rudimentary cartoon of the implementation.
Insertion sort is significantly faster (esp in branch prediction, they have similar cache locality, insertion sort has few comparisons and fewer moves and fewer overall instructions), is almost the same to implement, and is actually the fastest possible way to sort arrays of size 4 (and probably true up to about 30). So it's actually useful to know insertion sort, whereas bubblesort is actually useless.
I don't know about insertion sort - you need to do a nested loop through the already-sorted part of the array to find where it should go (which requires knowing how much is already sorted), and move multiple values around when you insert it (easy in a linked list, harder in an array (another nested loop!)).
Several years ago, I had an exam question asking what sort algorithm one should use in different situations, one of which was "automatically sorting a bibliography list when printing a research paper". I answered that while bubble sort is normally terrible, it's a great choice here if you're hand-rolling a sort instead of using a library. The data set couldn't possibly be large enough for the sort run-time to matter, particularly in comparison to printing time, and it's the simplest to correctly program.
There are circumstances in which bubble sort really is that bad, but on small data sets that are already disk/network/peripheral IO bound, it's just fine.
Without telling us the size of the data set that anecdote is meaningless. Other algorithms are _asymptotically_ better than bubble sort, i.e., when you get a certain amount of data, other algorithms perform significantly better. So this choice matters when the input can be arbitrarily large.
And it's false to say everything's disk/network bound these days. There will always be important data structures and other code that needs to scale, where the right algorithm makes the difference.
I'm not sure I buy this as a lesson. Choosing between two asymptotically (very) different algorithms (probably linear vs quadratic in your case) isn't really a performance tweak in my book, even though as you point out it wasn't dominant in your case.
Even the world's slowest disk will be faster than a CPU executing a sufficiently silly algorithm.
Why not use insertion sort - as an on-line algorithm, it can be used to immediately sort the list as it's build.
The implementation is straight-forward: instead of appending new elements to the end of the list, insert them in the correct place.
My personal go-to algorithm for linked lists is an unstable merge-sort variant with explicit stack[1] - while not as simple as insertion sort, performance is far superior.
I agree. Sometimes I just need to sort a very small data set, like with 4-10 items. When built-in functions aren't available, the performance loss of using bubblesort is negligible, and the time I save not having to worry about implementation is not insignificant.
Something similar happened to me, with a linked list as well (on a linked list, bubble sort is much easier to implement than other common sorting methods). Everything worked fine, until data volume increased by 2 orders of magnitude.
Couple years ago I had a fairly simple program which collected some data into a linked list and displayed it at the end. It took about 2 seconds to load and process the data from disk. I realized it might be neat to optionally sort the output. Since I didn't feel like doing serious work, I just added a bubble sort to my linked list which took a comparison function and then bubble-sorted by pointer swapping.
Guess what time it added? less than a 10th of a second. Completely negligible. It saved me a few hours (as writing bubble sort for a linked list takes maybe 5 minutes and has no chance of complicated bugs) and I doubt anyone will ever notice any slowdown.
So what's the lesson? Little performance tweaks hardly matter in the average program nowadays because everything's disk or network bound (IO bound). For the average program it's not worth writing more complex code that will be quicker (once you fix the hard-to-see bugs) until you actually know the slowness is an issue.