Computer scientists need to learn about significant digits

spullara · on April 21, 2012

I thought that this article was about having a number type that supports significant figures and propagates them through all calculations appropriately. That would be pretty useful. Instead it is a complaint about benchmarks it seems, which is valid, but not obviously in the domain of all "computer scientists" but rather people that run micro-benchmarks and report results without care.

Yarnage · on April 21, 2012

Not sure what the point here is; this is all relative.

Sure, .14MB isn't a big deal but if you're uploading 2.14TB of data, I need to know about that .14.

pan69 · on April 21, 2012

I think he's trying to explain that .14 on 300.14 is negligible where as on 2.14 it might mean a lot.

However, I do not fully agree with his statement either. If something takes 300.14ms the .14 is a pretty useless detail. But if I now do this operation in a loop of 1 million it suddenly adds up to a lot. Knowing 300.14 was definitely worth while.

I guess it's all about context.

wtcurtis · on April 21, 2012

It's not a question of whether that .14 is negligible. His point is that it's literally meaningless.

Say you're benchmarking some routine. One particular measurement may be 300.14 ms. The next may be, say, 302.56 ms. The one after that, 297.12 ms. You have no way of knowing whether that .14 ms is due to the implementation of the routine, or whether it's simply noise in the measurement.

bradleyland · on April 23, 2012

It's a shame you were voted down. The brevity of the blog post causes it to fly right over the head of the target audience. There is a difference between significant digits in measurement, and significant digits in analysis.

First, a quick definition from Wikipedia [1]: leading and trailing zeros which are merely placeholders to indicate the scale of the number.

In measurement, significant digits indicate the precision of the measurement. This eliminates ambiguity. For example, if I were to write:

> The prongs should be spaced 1.000" on their centers.

This means that the level of precision should be to the thousandths of an inch. If I were to write:

> The prongs should be spaced 1.0" on their centers.

This means the level of precision should be to the tenth of an inch.

The former might be used in the engineering of something like a rocket booster control, while the latter might be used in something like woodworking of rudimentary farm implements (like an ox yolk).

The author's lament is that computer scientists -- although I'm not sure who exactly he's talking bout here -- don't pay attention to the meaning of the significant digits in use.

Let's say I (not a computer scientist) were to publish benchmarks that said:

> Under Ruby 1.9.3-p125, the method DateTime#parse takes 3.098193 seconds to parse the string "2011-05-25 00:00:03 -0400" at 100,000 iterations.

Because I used six digits to the right of the decimal point, I have implied that my level of precision is six digits. Put another way, I'm claiming that my benchmark is accurate to six decimal places. This is obviously not true. The value varies each time I run it, and my system isn't nearly controlled enough (I have lots of other processes running) to express this level of precision.

Most hackers just make this assumption. Because of our familiarity with the subject matter, we know that there is variation in benchmark runtime from execution to execution. Many hackers lack a background in formal engineering training as well, so we don't see the implied precision in significant digits. It is a useful tool for communication, however. I plan to pay more attention to it.