I'm curious, is there a standard benchmark any one knows of that compares "practical usefulness" of LLMs instead of tries to make them take some kind of useless IQ test?
e.g. how useful is this LLM for 1) code debugging, 2) (accurate) fact retrieval, 3) daily task planning