I don't think you and him are in disagreement. I read it as him saying "evaluating LLMs is extremely difficult and a big problem right now is that many people are treating them as basically human in capability".
Its the opposite problem to the perception of computers in the 70s, early computers were seen by some as too alien to be as useful as a person across most tasks, llms are seen by some as too human to not be as useful as a person across most tasks. They are both wrong in surprisingly complex ways.
Not sure if said in such generic ways. You need to define what "smarter" means. E.g. ChatGPT probably outperforms most people at math. Does it make it smarter than most people?
Its the opposite problem to the perception of computers in the 70s, early computers were seen by some as too alien to be as useful as a person across most tasks, llms are seen by some as too human to not be as useful as a person across most tasks. They are both wrong in surprisingly complex ways.