Isn't the difference between O(mn + nlog(n)) vs O(nlog(n)) running time going to get less significant as the value of n gets larger?
I thought the whole point of Big O / asymptotic analysis is that you can ignore lower-order terms and constant factors because they are insignificant for any appreciably large input size. And also because the lower order terms and constant factors vary too much depending on the programming language, the compiler or VM, the hardware, etc.
I'd never heard of Big O until this thread, but from what I can tell it'll only be when a solution is in the O(logn) that "running time will get less significant as n gets larger". This is simply the way you describe logarithmic growth, so maybe that's where you got confused. Also, wouldn't it be fair to say that a logarithm (nlogn) is a lower order term than mn? In which case your definition stands.
At any rate, I wanted to test this out, so I made a naive benchmark for running these functions. The dict solution was ten times faster (0.0011s vs 0.015s) than the list version with ~1350 words. The dict solution ran in 0.13s at ~162,000 words, while I waited a couple minutes before killing the list version on that input.
Oh, you're right, m is not a constant but also varies somewhat independently of n, so you can't exclude it from the big O notation. And whether mn or nlogn is the lower order of the two terms depends highly on the value of m, which is the number of unique words in the text. A really long text that's just the same word over and over will have a big n but a small m.
I thought the whole point of Big O / asymptotic analysis is that you can ignore lower-order terms and constant factors because they are insignificant for any appreciably large input size. And also because the lower order terms and constant factors vary too much depending on the programming language, the compiler or VM, the hardware, etc.