They have a paradigm in machine learning called over fitting.
Trying to do well on a test dataset by cheating and seeing it first...
I think teh benchmark should choose tests randomly from a large set of tests and calculate the expected performance over a number of such random runs. not allowing any one to cheat...
Over-fitting and peeking at the test set are completely different things. Over-fitting may in fact degrade performance on a test set, because it means you are giving too much weight to idiosyncratic patterns in the training data. Peeking at the test data, however, is right out, and should invalidate any results you try to report.
If I understand you correctly, what you are suggesting is that one way to improve deadcode analysis would be to start with known dead code and compare the results the deadcode analysis algorithm to the results achieved by "cheating."
Given that SunSpyder is a known example of deadcode and that using it is easier than writing a new deadcode benchmark, your explanation seems somewhat plausible (assuming I am understanding you correctly).
Edit: As a general case, there would seem to be a legitimate rationale for recognizing standard javascript snippets and loading pre-compliled routines to improve execution.