Not to mention the usual problems with Phoronix benchmarks: it doesn't say how many benchmark runs were done, where are the error bars, was the software actually compiled properly etc. Phoronix folks don't understand what they are measuring; they also don't really care - I remember one of their benchmarks measured the execution time of a command that was erroring out.
Phoronix Test Suite isn't a benchmark, it's a marketing tool.
Every result graph there has the error indicated. If there are any significant errors then bars are shown. You can see this on the very first result page for LeelaChessZero.
It also shows you the number of runs. It also shows you the compile options used. All this info is included in every graph.
The complete system setups are described. The test suite is also open source.
I remember a recent "gaming on linux" article from them where they were computing "summary" geomeans including benchmarks across different resolutions... from the same game. So you might have:
* SOTTR 1080p
* SOTTR 1440p
* SOTTR 4K
* F1 1080p
...
And this wasn't like they had a 1080p geomean and then a 1440p geomean and a 4K geomean... they just had one geomean with a bunch of different resolutions thrown into it, including duplicates of the same game at different resolutions. And sometimes different combinations of resolutions for different games (they might skip 4K for a particular game, etc).
That's pleb-tier benchmarking, pick a random redditor and they know not to make that kind of mistake, it's obviously and facially incorrect.
It just goes to show the power of community goodwill... UserBenchmark's actual sub-scores are reasonably accurate, but because the owner is a massive fucking twat they're persona-non-grata in the internet community (I'm sure I'm going to be regaled with NO THEIR BENCHMARKS ARE TRASH AND HE CHANGES THINGS TO MAKE INTEL but nope, the subscores are accurate, topline "summary" score weights are what he fucks with). Michael Larabel is a very nice guy and frankly doesn't seem to understand the first thing about benchmarking, or score weighting, or mathematics, and constantly puts out trash-tier results with obvious defects, and he's revered in the community, basically a saint.
I know, nobody else is really benchmarking Linux and he's what we've got, if you don't like it then be the change, etcc. But, his results are given incredibly disproportionate weight to the quality there, he's no anandtech. And sadly anandtech is no anandtech anymore.
Games are benchmarked at different resolutions for the same game because that shifts the CPU/GPU burden. It's a great thing to do and many benchmarks do it too, and yes in the same average.
If you don't include a game at those resolutions for no good reason that's one thing, but varying the resolution in the same mean is a good idea.
Phoronix Test Suite isn't a benchmark, it's a marketing tool.