An interesting article describing a useful framework. In addition to the largest...

tntn · on March 23, 2019

> Thus, you should sort an array of floating point numbers and begin the summation with the smallest

Is this method more accurate than Kahan summation? And if so, is it worth the extra cost of sorting?

avian · on March 23, 2019

No, in fact it's slower than Kahan summation and less accurate. In particular, sorting doesn't help if you sum up a large number of similarly small values.

wglb · on March 23, 2019

How is it less accurate?

tntn · on March 23, 2019

Using avian's example of a list of numbers of similar magnitude, sorting the numbers will not really be any better than just summing them directly, but Kahan summation will do better.

thaumasiotes · on March 23, 2019

>> In particular, sorting doesn't help if you sum up a large number of similarly small values.

Elaborating based entirely on my reading of the wikipedia article:

Kahan summation manually allots some extra bits to calculate with. You get extra precision because you computed with more bits of precision. Conceptually, you can do the same thing with integers:

Imagine I'm adding up a bunch of unsigned bytes. Ordinary addition will give me the lowest 8 bits of the sum, which is pretty much worthless. But in the analogue of Kahan summation, I can do better: I allocate 16 bits to hold my running result, and compute an accurate sum of, say, 0x02B7. Then I zero the low-order bits until the high-order bits fit in a byte: the sum after precision loss is 0x02B4. Then I report the sum: it's 0xAD (= 0x02B4 >> 2) with a left shift of 2. This is much better than the ordinary result of 0xB7.

(In that example, I needed to return an extra value, the left shift of 2. But floating-point numbers already include an indication of their order of magnitude, which integers don't, so in Kahan summation the extra return value is not necessary.)

So the quick answer to "why is Kahan summation more accurate" is that it's more accurate because you used more accuracy. Sort-and-add doesn't give you any more precision than usual, but it will lower your accumulated error.

The root issue that both these strategies attempt to mitigate is that when you add a large value to a small value, you end up losing precision from the small value. Sort-and-add will address that problem as to any two elements of the input -- if you have a list of values of which some are very large and some are very small, sort-and-add will sum up all of the small numbers into one large number before adding that sum to another large number. This reduces the number of times you would add a large number to a small number, compared to summing the input in a random order.

Kahan summation means you do all your addition at a higher level of precision. As such, it also addresses the (large+small) problem that occurs when an element of the input is small in comparison to the sum of all the input values, as opposed to being small in comparison to the maximum individual input value.

If your input consists of a medium amount of small numbers and a few large numbers, such that the sum of the input is well approximated by each of the large numbers within the input, sort-and-add should work fine.

If your input contains a large amount of small numbers, sort-and-add will fail: by the time your running sum has become large compared to each element of the input, every further addition will suffer from loss of precision -- you are adding a small number (the next input element) to a large one (the running sum).

If your input contains many large numbers, sort-and-add will fail for exactly the same reason it fails on many small numbers: each element individually is small compared to the total.

wglb · on March 23, 2019

Thanks for thoughtful reply.

But the Kahan method aside, the idea would be to start with the smallest numbers, not the largest.

thaumasiotes · on March 23, 2019

Yes, I understand that.

>> if you have a list of values of which some are very large and some are very small, sort-and-add will sum up all of the small numbers into one large number before adding that sum to another large number

I don't quite follow the point you're making -- can you elaborate?

hedora · on March 24, 2019

Intuitively:

1e100 + 1e-100 = 1e100

Because of rounding error.

(This is true for some sufficiently large and small exponent.)

FabHK · on March 24, 2019

You don't need to go that far:

  julia> 1e16 + 1 == 1e16
  true

  julia> 1e19 + 1000 == 1e19
  true

thaumasiotes · on March 24, 2019

So what? That's true regardless of whether you add small numbers in before or after the big one. If you have a lot of small numbers that add to 1e-2, and a couple of big numbers that add to 1e+200, it is totally irrelevant what order you add the numbers in, because all of the small numbers together have exactly zero influence on the sum.

But we're talking about cases where the sum of the small numbers is large enough to be detectable when measured against the large numbers, even if no individual small number is that large.

ball_of_lint · on March 24, 2019

A solution more accurate than sorting before adding is to place your numbers into a priority queue and repeatedly add the smallest two numbers and re-insert the result into the queue. This helps handle the case where you have many similarly valued numbers and your running sum becomes large enough relative to your numbers to cause the same rounding errors.

taneq · on March 24, 2019

Isn't a priority queue implicitly sorted?

Zarel · on March 24, 2019

Yes. I don't think your parent post meant to imply otherwise.

The priority queue approach boils down to "sort after every addition, instead of just once at the beginning".

taneq · on March 24, 2019

Ah right, I read 'more accurate than sorting before adding' as 'without sorting before adding' and missed that it was more about rounding errors than the sorting.

hodgesrm · on March 23, 2019

It's amazing how many floating point edge cases emerge from mid-range values. Ordering can also help minimize the accumulation of rounding errors for similar reasons to those you pointed out.