> show humans are capable of what you just defined as generalizable reasoning.
I would also add "and plot those capabilities on a curve". My intuition is that the SotA models are already past the median human abilities in a lot of areas.
In the context of this paper, I think "generalizable reasoning" means that finding a method to solve the puzzle and thus being able to execute the method on puzzle instances of arbitrary complexity.
> Are these models capable of generalizable reasoning, or are they leveraging different forms of pattern matching?
Define reasoning, define generalizable, define pattern matching.
For additional credits after you have done so, show humans are capable of what you just defined as generalizable reasoning.