Generally speaking, no. An important part of a lot of benchmarks in ML research is generalization. What this means is that it's often a lot easier to get a machine learning model to memorize the test cases in a benchmark than it is to train it to perform a general capability the benchmark is trying to test for. For that reason, the dataset is important, as if it includes the benchmark test cases in some way, it invalidates the test
When AI research was still mostly academic, I'm sure a lot of people still cheated, but there was somewhat less incentive to, and norms like publishing datasets made it easier to verify claims made in research papers. In a world where people don't, and there's significant financial incentive to lie, I just kind of assume they're lying