I personally still think most people (not necessarily the author) miss out on the biggest improvement LLMs have to offer: powerful embeddings for text representation for text classification.
All of the prompting stuff is, of course, incredible, but the use of these models to create text embeddings of virtually any text document (from a sentence to a news paper article) allows for incredibly fast iteration on many traditional ML text classification problems.
Multiple times I've taken cases where I have ~1,000 text documents with labels, run them through ada-002, and stuck that in a logistic model and gotten wildly superior performance to anything I've tried in the past.
If you have an old NLP classification problem that you couldn't quite solve satisfactorily enough a few years ago, it's worth just mindlessly running it through the OpenAI embeddings API and sticking using those embeddings on your favorite off the shelf classifier.
Having done NLP work for many years, it is insane to me to consider how many countless hours I spent doing tricky feature engineering to try to squeeze the most information I could out of the limited text data available, to realize it can now be replaced with about 10 minutes of programming time and less than a dollar.
An even better improvement is the trivial ability to scale to real documents. It wasn't long ago that the best document models were just sums/averages of word embeddings.
Ask ChatGPT and you'll get a ton. A few I've run into personally:
TSLA is going to the moon.
^ Is this tweet bearish or bullish regarding the asset it mentions?
(Acute) hepatitis C
Hepatitis B; Acute
^ Do the above refer to the same disease?
The Federal Reserve decides to abolish interest rates on leap years.
Is it a leap year? New policy from the Fed says no interest if so.
^ Do these refer to the same news story, or different ones?
So you can see that text classification is useful for consolidating and integrating streams of textual information, and extracting actionable meaning.
I'm curious about the benchmarks for those tasks compared with spacy for example. I have used it before and I wonder if using GPT-3.5 justifies the pricing.
That's a great question. ChatGPT would probably do better. It's just a matter of cost and speed.
Let’s calculate the price of using GPT-3.5 to classify 10 million tweets. A very typical job.
The price is $0.002 per 1k tokens on GPT-3.5 Turbo. (Really it’s $0.0015 for output, $0.002 for input).
That’s $1 for 500k tokens, or $2 for 1M tokens.
Now lets’s classify 10M tweets. A tweet is 144 characters, so it’s roughly 100 tokens. Let’s also say the instructions are the size of a tweet, and the output is just 1 token (yes or no). That gives us 200 tokens per tweet classification, for a total of 2B tokens to process.
That costs 2B / 500k = $2,000 to run this job. Not so bad if it's mission critical, but starting to get pretty expensive. If I can get comparable performance using a homemade classifier, it makes much more sense to use that instead.
Fundamentally, it's overkill to use a 175B parameter model on many of these tasks. Also, the number of classifications can start to grow very quickly if doing things like classifying pairs of data points.
All of the prompting stuff is, of course, incredible, but the use of these models to create text embeddings of virtually any text document (from a sentence to a news paper article) allows for incredibly fast iteration on many traditional ML text classification problems.
Multiple times I've taken cases where I have ~1,000 text documents with labels, run them through ada-002, and stuck that in a logistic model and gotten wildly superior performance to anything I've tried in the past.
If you have an old NLP classification problem that you couldn't quite solve satisfactorily enough a few years ago, it's worth just mindlessly running it through the OpenAI embeddings API and sticking using those embeddings on your favorite off the shelf classifier.
Having done NLP work for many years, it is insane to me to consider how many countless hours I spent doing tricky feature engineering to try to squeeze the most information I could out of the limited text data available, to realize it can now be replaced with about 10 minutes of programming time and less than a dollar.
An even better improvement is the trivial ability to scale to real documents. It wasn't long ago that the best document models were just sums/averages of word embeddings.