Looks like a cool paper. It's really puzzling to me why llama turned out to be so bad, yet they're releasing great research. Especially considering the amount of GPUs, llama really seems unexcusable when compared to Loma from much smaller teams with a lot less resources
Llama will advance further just like the rest. The leaderboards for llms is just a constantly changing thing. They will all reach a maturity point and be roughly the same. That's probably something we'll see soon in the next 1-3 years tops. Then it'll just be incremental price drops in the cost to train and run, but the quality will all be equatable. Not to mention we're already running out of training data.
It was always about more than GPUs since even when the original llama came out, the community released fine tunes that would bench higher than the base model. And with the Deepseek distilled models it turned out you could fine tune some reasoning into a base model and make it perform better.