I personally still think most people (not necessarily the author) miss out on th...

jiggawatts · on Sept 14, 2023

I keep hearing about text classification but I can’t think of many specific use cases.

If you don’t mind me asking: what are you using text classification for?

stormfather · on Sept 14, 2023

Ask ChatGPT and you'll get a ton. A few I've run into personally:

TSLA is going to the moon. ^ Is this tweet bearish or bullish regarding the asset it mentions?

(Acute) hepatitis C Hepatitis B; Acute ^ Do the above refer to the same disease?

The Federal Reserve decides to abolish interest rates on leap years. Is it a leap year? New policy from the Fed says no interest if so. ^ Do these refer to the same news story, or different ones?

So you can see that text classification is useful for consolidating and integrating streams of textual information, and extracting actionable meaning.

3abiton · on Sept 14, 2023

I'm curious about the benchmarks for those tasks compared with spacy for example. I have used it before and I wonder if using GPT-3.5 justifies the pricing.

jiggawatts · on Sept 14, 2023

Isn't that just asking ChatGPT yes/no questions?

Why use a classifier instead of just having it answer the question directly?

stormfather · on Sept 14, 2023

That's a great question. ChatGPT would probably do better. It's just a matter of cost and speed.

Let’s calculate the price of using GPT-3.5 to classify 10 million tweets. A very typical job.

The price is $0.002 per 1k tokens on GPT-3.5 Turbo. (Really it’s $0.0015 for output, $0.002 for input).

That’s $1 for 500k tokens, or $2 for 1M tokens.

Now lets’s classify 10M tweets. A tweet is 144 characters, so it’s roughly 100 tokens. Let’s also say the instructions are the size of a tweet, and the output is just 1 token (yes or no). That gives us 200 tokens per tweet classification, for a total of 2B tokens to process.

That costs 2B / 500k = $2,000 to run this job. Not so bad if it's mission critical, but starting to get pretty expensive. If I can get comparable performance using a homemade classifier, it makes much more sense to use that instead.

Fundamentally, it's overkill to use a 175B parameter model on many of these tasks. Also, the number of classifications can start to grow very quickly if doing things like classifying pairs of data points.