Geometric mean of (time + gzipped source code size in bytes) seems statistically wrong.
What if you shifted time to nanoseconds ? Or source code size in terms of Megabytes. The rankings could change. The culprit is the '+'
I would think Geometric mean of (time x gzipped source code size) is the correct way to compare languages together. It would not matter what the units of time or size are in that case.
[Here the geometric mean is the geometric mean of (time x gzipped size) of all benchmark programs of a particular language.]
I think the summed numbers might be unitless. At least all the other numbers are relative to the fastest/smallest entry. That is, what would make sense is score(x) = time(x) / time(fastest) + size(x) / size(smallest) instead of score(x) = (time(x) + size(x)) / score(best)
It's not necessarily wrong to add disparate units like this. It's implicitly weighting one unit to the other. Changing to nanoseconds just gives more weight to the time metric in the unified benchmark. You could instead explicitly weight them without changing units, if you cared about the size more you could add a multiplier to it.
You really don’t know what weight is the right weight to balance time and gripped size. Multiplying them together sidesteps the whole issue and puts time and size on par with each other regardless of the individual unit scaling.
The whole point of benchmarks is to protect against accidental bias in your calculations. Adding them seems totally against my intuition. If you did want to give time more weight then I would raise it to some power. Example: geometric mean of (time x time x source size) would give time much more importance in an arguably more principled way.
Multiplying them is another way of expressing them as a unified value. It's not a question of accidental bias, you're explicitly choosing how important one second is compared to one byte.
You could imagine there's a 1 sec/byte multiplier on the bytes value, saying in effect "for every byte of gzipped source, penalise the benchmark by one second".
> You could imagine there's a 1 sec/byte multiplier on the bytes value, saying in effect "for every byte of gzipped source, penalise the benchmark by one second".
Your explanation makes sense. However the main issue is we don’t know if this “penalty” is fair or correct or has some justifiable basis. In absence of any explanation it would make more sense to multiply them together as a “sane default”. Later, having done some research we can attach some weightage perhaps appealing to some physical laws or information theory. Even then I doubt that + would be the operator I would use to combine them.
That paper is only about the reasoning behind taking the geometric mean, it doesn't have anything to say on the "time + gzipped source code size in bytes" part.
What if you shifted time to nanoseconds ? Or source code size in terms of Megabytes. The rankings could change. The culprit is the '+'
I would think Geometric mean of (time x gzipped source code size) is the correct way to compare languages together. It would not matter what the units of time or size are in that case.
[Here the geometric mean is the geometric mean of (time x gzipped size) of all benchmark programs of a particular language.]