Geometric mean of (time + gzipped source code size in bytes) seems statistically...

ntoskrnl · on June 1, 2022

Yep this is correct. Adding disparate units is almost always nonsensical. You can confirm with a scientific calculator like insect:

  $ insect '5s + 10MB'
    Conversion error:

      Cannot convert unit MB (base units: bit)
                  to unit s

  $ insect '5s * 10MB'
  50 s·MB

smegsicle · on June 1, 2022

units, frink, insect oh my

tuukkah · on June 2, 2022

I think the summed numbers might be unitless. At least all the other numbers are relative to the fastest/smallest entry. That is, what would make sense is score(x) = time(x) / time(fastest) + size(x) / size(smallest) instead of score(x) = (time(x) + size(x)) / score(best)

dwattttt · on June 1, 2022

It's not necessarily wrong to add disparate units like this. It's implicitly weighting one unit to the other. Changing to nanoseconds just gives more weight to the time metric in the unified benchmark. You could instead explicitly weight them without changing units, if you cared about the size more you could add a multiplier to it.

sidkshatriya · on June 1, 2022

You really don’t know what weight is the right weight to balance time and gripped size. Multiplying them together sidesteps the whole issue and puts time and size on par with each other regardless of the individual unit scaling.

The whole point of benchmarks is to protect against accidental bias in your calculations. Adding them seems totally against my intuition. If you did want to give time more weight then I would raise it to some power. Example: geometric mean of (time x time x source size) would give time much more importance in an arguably more principled way.

dwattttt · on June 1, 2022

Multiplying them is another way of expressing them as a unified value. It's not a question of accidental bias, you're explicitly choosing how important one second is compared to one byte.

You could imagine there's a 1 sec/byte multiplier on the bytes value, saying in effect "for every byte of gzipped source, penalise the benchmark by one second".

sidkshatriya · on June 1, 2022

> You could imagine there's a 1 sec/byte multiplier on the bytes value, saying in effect "for every byte of gzipped source, penalise the benchmark by one second".

Your explanation makes sense. However the main issue is we don’t know if this “penalty” is fair or correct or has some justifiable basis. In absence of any explanation it would make more sense to multiply them together as a “sane default”. Later, having done some research we can attach some weightage perhaps appealing to some physical laws or information theory. Even then I doubt that + would be the operator I would use to combine them.

igouy · on June 1, 2022

> Adding them…

Read '+' as '&'.

igouy · on June 1, 2022

> The culprit is the '+'

That annotation does seem to have caused much frothing and gnashing.

Here's how the calculation is made — "How not to lie with statistics: The correct way to summarize benchmark results."

[pdf] http://www.cse.unsw.edu.au/~cs9242/11/papers/Fleming_Wallace...

yorwba · on June 1, 2022

That paper is only about the reasoning behind taking the geometric mean, it doesn't have anything to say on the "time + gzipped source code size in bytes" part.