Vibes, based on what I can remember using that model for. There's still a gpt-4 ...

MichaelZuo · on Dec 9, 2024

Why not ask it to solve math questions?

The bar for GPT-4 was so low that unambiguously clearing that threshold should be pretty easy.

simonw · on Dec 9, 2024

I am not particularly interested in those benchmarks that deliberately expose weaknesses in models: I know that models have weaknesses already!

What I care about is the things that they're proven to be good at - can I do those kinds of things (RAG, summarization, code generation, language translation) directly on my laptop?