Isn't the difference between O(mn + nlog(n)) vs O(nlog(n)) running time going to...

spenuke · on July 12, 2013

I'd never heard of Big O until this thread, but from what I can tell it'll only be when a solution is in the O(logn) that "running time will get less significant as n gets larger". This is simply the way you describe logarithmic growth, so maybe that's where you got confused. Also, wouldn't it be fair to say that a logarithm (nlogn) is a lower order term than mn? In which case your definition stands.

At any rate, I wanted to test this out, so I made a naive benchmark for running these functions. The dict solution was ten times faster (0.0011s vs 0.015s) than the list version with ~1350 words. The dict solution ran in 0.13s at ~162,000 words, while I waited a couple minutes before killing the list version on that input.

kyllo · on July 12, 2013

Oh, you're right, m is not a constant but also varies somewhat independently of n, so you can't exclude it from the big O notation. And whether mn or nlogn is the lower order of the two terms depends highly on the value of m, which is the number of unique words in the text. A really long text that's just the same word over and over will have a big n but a small m.

matchu · on July 12, 2013

Worst-case big-O runtime is therefore O(n^2), and best case is O(n log(n)) :)