It’s great that this can run on a laptop but FWIW, llama 70B model is no where near “GPT-4 class” in my own use cases. 405B might be, though I haven’t tested it.
When I say GPT-4 class I'm talking about being comparable to the GPT-4 that was released in March 2023.
The Llama 3.3 70B model is clearly no way near as good as today's GPT-4o family of models, or the other top-ranking models today like Gemini 1.5 Pro and Claude 3.5 Sonnet.
To my surprise, Llama 3.3 70B is ranking higher than Claude 3 Opus on https://livebench.ai/ - I'm suspicious of that result, personally. I think Opus was the best available model for a few months earlier this year.
I guess it's because it has the highest score of all models in instruction following, 20 points higher then Opus, which compensates for shortcomings elsewhere (e.g. in language), and which wouldn't necessarily translate to human evaluation of usefulness.
The model you are running isnt the one used in the benchmarks you link.
The default llama3.3 model in ollama is heavily quantized (~4 bit). Running the full fp16 model, or even an 8-bit quant wouldnt be possible on your laptop with 64G RAM.
Vibes, based on what I can remember using that model for.
There's still a gpt-4 model available via the OpenAI API, but it's gpt-4-0613 from June 2023 - the March 2023 snapshot gpt-4-0314 is no longer available.
I'm not going to try for an extensive evaluation comparing it with Llama 3.3 though, life's too short and that's already been done better than I could by https://livebench.ai/
I am not particularly interested in those benchmarks that deliberately expose weaknesses in models: I know that models have weaknesses already!
What I care about is the things that they're proven to be good at - can I do those kinds of things (RAG, summarization, code generation, language translation) directly on my laptop?