The tagger's speed was probably enough for what the professor intended at the time. They probably knew how to optimize it, but didn't because they had no reason to.
I say this as a professor myself, because we do that all the time in my team: we create some software system (also in NLP, by the way), we do experiments to show that it improves whatever metric we are interested in or that it teaches us something interesting, we publish a paper with it, make the code available and then move on to the next paper. That's what we are typically incentivized to do. The time spent into optimizing a system that we are not going to take to production anyway is time not spent doing more research. Which doesn't mean that we don't know how to optimize.
I hear what you are saying and this story is more tongue-in-cheek than trying to put down computer science or professors.
But, fyi, this tagger was not just a professor's demonstration, it was kind of ground-breaking and served as a foundation for other taggers. The professor went on to have a pretty awesome career at a few different big tech companies, far surpassing my own success. And yes, I agree, I am sure he could have made it faster himself had he dedicated the time to it.
It's surprising this would need to be said on, presumably, a forum of engineers. Or perhaps the temptation to laugh at someone is greater than the boring admission that OP is using the program on inputs much larger than ever intended.
It sounds like you do know how to optimize. Your important metric is just different. You're optimizing for your time rather than the computer's because that's by far the more valuable resource in your set of constraints.
Mostly unrelated: When I write heavily optimized code I prefer to write the stupidest, simplest thing that could possibly work first, even if I know it's too slow for the intended purpose. I'll leave the unoptimized version in the code base.
- It serves as a form of documentation for what the optimized stuff is supposed to do. I find this most beneficial when the primitives being used in the optimized code don't map well to the overarching flow of ideas (like how _mm256_maddubs_epi16 is just a vectorized 8-bit unsigned multiply if some preconditions on the inputs hold). The unoptimized code will follow the broader brush strokes of the fast implementation.
- More importantly, you can drop it into your test suite as an oracle to check that the optimized code actually behaves like it's supposed to on a wide variety of test cases. The ability to test any (small enough) input opens a door in terms of robustness.
I say this as a professor myself, because we do that all the time in my team: we create some software system (also in NLP, by the way), we do experiments to show that it improves whatever metric we are interested in or that it teaches us something interesting, we publish a paper with it, make the code available and then move on to the next paper. That's what we are typically incentivized to do. The time spent into optimizing a system that we are not going to take to production anyway is time not spent doing more research. Which doesn't mean that we don't know how to optimize.