You could easily achieve perfect accuracy on the test set by just hardcoding the entire test set into your "model" and the entire model is "1. See image, 2. Look up image in test set, 3. Read off answer".
It would be interesting if someone would see whether they could sneakily (Sokal-style) publish a paper like the following: "We took (popular model X) and augmented it with an additional lexicon of specific lookup data, and the result blows away all the competition. This is deeply profound and implies that built-in lexicons could be the key to true general intelligence!" (When in fact all they did was hard-code the test set or part of the test set into their model.) Then see how many popular presses churn out sensational articles.
It would be interesting if someone would see whether they could sneakily (Sokal-style) publish a paper like the following: "We took (popular model X) and augmented it with an additional lexicon of specific lookup data, and the result blows away all the competition. This is deeply profound and implies that built-in lexicons could be the key to true general intelligence!" (When in fact all they did was hard-code the test set or part of the test set into their model.) Then see how many popular presses churn out sensational articles.