Hacker News new | past | comments | ask | show | jobs | submit login

It's true that POS tagging works fairly well. But consider that a sentence involves more than one word. Even at 97 % accuracy for one word, the probability of correctly tagging every word in a short sentence of only ten words, is still as low as 0.97^10 = 0.74. And sentences are generally longer than ten words.

And as POS tagging is usually only done as preprocessing for some other task like syntactically parsing a text (which itself is usually preprocessing for yet another task), 97 % accuracy per word is not as good as it sounds. Parsers need to work with wrong data for every second or third sentence.




Indeed: the first paragraph of the linked paper says "Current good taggers have sentence accuracies around 55–57%".

(This surprises me. I would expect accuracy for different words in a sentence to be correlated, you either make no errors or several.)


Whoops, I didn't even look into the paper. Kind of makes my comment superfluous …




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: