The first one also compares GPT-4 to the researches themselves. Smaller speciali...

janalsncm · on Sept 14, 2023

Well it really depends on the task. If it can be done with a regex, use a regex. We can’t make categorical statements about LLMs being better. It depends.

You can also probably distill a large model to a smaller one while maintaining a lot of performance. DistillBert is almost as good as Bert at a fraction of the inference cost.

GPT-3.5 and 4 also currently aren’t deterministic even with temperature zero, which is a nightmare for debugging.

syllogism · on Sept 15, 2023

The gold standard they're comparing against was done by humans though. And a task-specific model trained on that data will be better at that task than GPT-4.

What's definitely true is that getting decent data often takes some care, especially in how you define the task. And mechanical turk is often especially tricky to use well.