No, but getting better benchmarks tends to require more shenanigans (e.g. mixtur...

No, but getting better benchmarks tends to require more shenanigans (e.g. mixture-of-experts).

Qwen2 72B doesn't score that high on the leaderboard relative to brute-forced finetunes: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_...