Yes, I'm aware of various rankings. Try all of those models on something that isn't commonly used on a benchmark, and you'll notice that a lot of the proprietary models have trouble actually producing statistically relevant results.
The only one that I've come across that makes me think LLMs will maybe be useful someday is Deepseek R1 and the redistillations based on it.
I've seen HN's fascination with OpenAI's products, and I can't understand why. Even O1 and O3, they're always too little too late, somebody else already is doing something better and throwing it into a HF repo. Must be the Silicon Valley RDF at work.
The only one that I've come across that makes me think LLMs will maybe be useful someday is Deepseek R1 and the redistillations based on it.
I've seen HN's fascination with OpenAI's products, and I can't understand why. Even O1 and O3, they're always too little too late, somebody else already is doing something better and throwing it into a HF repo. Must be the Silicon Valley RDF at work.