Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On livecodebench it's picking up 26%

phi4 is 23%

deepseek r1 qwen distilled 32b is 27%

llama 3.3 70b is 29% same with llama 4 scout

gpt 4o is 31%

gpt 41 is 45%

qwen3 32b reasoning 55%!! Expecting qwen3 coder 30b to be around here?

kimi k2 55%

claude 4 around 60%

qwen3 coder 480b 58%

nemotron 49b 74%!!

glm 4.5 358b 74%

exaone 4 32b reasoning 74%!!

deepseek r1 685b 75%

grok4, o4mini, gemini2.5pro, 80%



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: