To test intelligence by humans or AI, one needs a question where the answer hasn...

To test intelligence by humans or AI, one needs a question where the answer hasn't been memorized (or answered by someone in its training set).

Indeed, you can see something like ChatGPT fall down by simply asking a modified form of a real IQ test question.

For example, ChatGPT answers a sample Stanford binet question "Counting from 1 to 100, how many 6s will you encounter?" correctly, but if you slightly modify it and ask how many 7s instead, it will only count 19.

Having written this out however, I've now invalidated the question since they use webcrawls to train.