>what I see my data science friends doing — it's hard to justify the cost (money and latency) in using LLMs directly for all tasks, and even if you want to you'll need a baseline model to compare against, so why not use LLMs for dataset creation or augmentation in order to train a classic supervised model?
The NLP infrastructure and pipelines we have today aren't there because they are necessarily the best way to handle the tasks you want. They're in place because computers simply could not understand text the way we would like and shortcuts, approximations were necessary.
Borrowing from the blog, Since you could not simply ask the computer, "How many paragraphs in this review say something bad about the acting? Which actors do they frequently mention?", separate processes of something like tagging names, linking them to a knowledge base, and paragraph-level actor sentiment etc were needed.
The approximations are cool and they do work rather well for some use cases but they fall apart in many others.
This is why automated resume filtering, moderation etc is still awful with the old techniques. You simply can't do what is suggested above and get the same utility.
The NLP infrastructure and pipelines we have today aren't there because they are necessarily the best way to handle the tasks you want. They're in place because computers simply could not understand text the way we would like and shortcuts, approximations were necessary.
Borrowing from the blog, Since you could not simply ask the computer, "How many paragraphs in this review say something bad about the acting? Which actors do they frequently mention?", separate processes of something like tagging names, linking them to a knowledge base, and paragraph-level actor sentiment etc were needed.
The approximations are cool and they do work rather well for some use cases but they fall apart in many others.
This is why automated resume filtering, moderation etc is still awful with the old techniques. You simply can't do what is suggested above and get the same utility.