Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

++, having played with all three, agreed


I'm curious, is there a standard benchmark any one knows of that compares "practical usefulness" of LLMs instead of tries to make them take some kind of useless IQ test?

e.g. how useful is this LLM for 1) code debugging, 2) (accurate) fact retrieval, 3) daily task planning


Kagi did an evaluation a while back: https://blog.kagi.com/kagi-ai-search


Thanks! I love kagi's ethos!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: