Is there a popular benchmark site people use? Becaues I had to test all these by... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		poorman 10 months ago \| parent \| context \| favorite \| on: Magistral — the first reasoning model by Mistral A... Is there a popular benchmark site people use? Becaues I had to test all these by hand and `Qwen3-30B-A3B` still seems like the best model I can run in that relative parameter space (/memory requirements).

arnaudsm 10 months ago [–]

- https://livebench.ai/#/ + AIME + LiveCodeBench for reasoning

- MMLU-Pro for knowledge

- https://lmarena.ai/leaderboard for user preference

We only got Magistral's GPQA, AIME & livecodebench so far.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact