Mean of two floating point numbers can be dangerous

laughinghan · on June 5, 2016

This behavior can actually be turned around and exploited to do some calculations to full precision, using algorithms that look like they would infinite loop: http://www.shapeoperator.com/2014/02/22/bisecting-floats/

Scea91 · on June 5, 2016

Thanks, it is interesting. I have encountered something related too: http://people.eecs.berkeley.edu/~jrs/papers/robust-predicate...

Confusion · on June 5, 2016

Related: How do you compute the midpoint of an interval? [1]. PDF [2]

[1] http://dl.acm.org/citation.cfm?id=2493882 [2] https://hal.archives-ouvertes.fr/hal-00576641v1/document

akkartik · on June 6, 2016

This really should be the top comment of the thread. I've been reading the paper slowly since yesterday.

Tl;dr - the best way to compute mid-points is:

    round_to_nearest_even((a - a/2) + b/2)

Why?

It can't be (a+b)/2 because (a+b) can overflow.

It can't be (a/2 + b/2) because a/2 or b/2 can underflow. However, this works if you explicitly handle the case when a == b.

It can't be (a + (b/2 - a/2)) for similar but more complicated reasons.

However, if you switch things around, ((a - a/2) + b/2) works great without the special case. (Though all three options need to handle a second special case: when a == -b.)

Regarding the rounding method, you want it to be symmetric and moreover you want nearby numbers to not always be biased in the same direction (as might happen if you always round towards zero). Hence rounding to the nearest 'even' floating point number (i.e. nearest bit pattern with a zero in the LSB)

---

In the case of OP, I think the paper would argue that the mid-point was working just fine, and that the code surrounding it isn't considering all possibilities (symmetrically). But then the discussion of infinities in the paper suggests that no matter what you do, there are issues for surrounding code to consider. Which in turn suggests that floating point is absolutely busted as an abstraction.

Someone · on June 5, 2016

"and on my hardware it caused the splitValue to be equal to point2"

The fix is the right one, but in C/C++, hardware is not what dictates rounding mode, the language implementation does, and that can be changed at runtime (http://www.cplusplus.com/reference/cfenv/fesetround/)

I think C/C++ default to 'round to nearest', with ties doing 'round to even'. Gnu libc certainly does (http://www.gnu.org/software/libc/manual/html_node/Rounding.h...)

rwmj · on June 5, 2016

I had a fun floating-point related bug the other day: I wanted to sort a list of doubles in C, and my comparison function was in essence:

    static int compare (void *dpa, void *dpb)
    {
      return *(double*)dpa - *(double*)dpb;
    }

The doubles represented times in nanoseconds, and the program failed to sort properly when the total runtime was larger than (IIRC) 2 seconds. I wonder if you can work out why.

(Spoiler alert) the fix was: https://github.com/libguestfs/libguestfs/commit/1f4a0bd90df3...

CamperBob2 · on June 5, 2016

Your fix requires a lot of mental CPU cycles on the part of whoever reads the code. Are you sure it wouldn't have been better to do it the simple, obvious way?

   double a = *(double *) dpa;   
   double b = *(double *) dpb;

   if (a > b) return 1;
   if (b < a) return -1;
   return 0;

sijoe · on June 6, 2016

Oh my gosh ... the fortran computed goto is back ... See http://www.lahey.com/docs/lfprohelp/F95ARComputed_GOTOStmt.h...

rwmj · on June 5, 2016

Yup I think you are right -- the (fixed) code is very obscure now.

huhtenberg · on June 5, 2016

I think it's actually quite elegant. The issue is that it always does two comparisons, even when one would suffice.

Someone · on June 6, 2016

Depending on how the compiler computes a>b, it may be branchless, though, and I expect that to be more important on modern CPUs.

titzer · on June 5, 2016

That code returns 0 for all NaN inputs.

CamperBob2 · on June 5, 2016

True enough, but if you're passing NaNs to qsort(), something has already gone wrong in your professional life.

Bromskloss · on June 5, 2016

How come the argument types are void* instead of double*?

danieldk · on June 5, 2016

man qsort

http://linux.die.net/man/3/qsort

mikeash · on June 5, 2016

That's C's idea of generics.

wronskian · on June 5, 2016

I saw this exact failure mode in the wild - a C++ Newton-Raphson implementation that relied on the order coming out of the mean calculation. In this case, the ordering put the N-R algorithm out and instability resulted. I seem to remember the fix was to put some simple bisection around the N-R guess, and if it went outside that range, truncate at the bisection range. It worked! (If anyone really wants the details, I can probably find the exact fix in our source control history.)

anon4711 · on June 6, 2016

I'm sufficiently interested to ask for further details :)

kazinator · on June 5, 2016

I don't understand this article. If you know the index, or other indicator of location, of the second point in the sequence, then just split the sequence before that point. No arithmetic involved; the type of the object is not even relevant.

Scea91 · on June 5, 2016

You are right that in this isolated problem it would make sense. However, this is actually a classifier so the idea is to maximize the margin in the split.

jrg123 · on June 5, 2016

That's (one reason) why standard containers use < instead of <=

anon4711 · on June 6, 2016

Even though this is not what this article is about, quite a few comments suggest better ways to compute the midpoint of two doubles, where "better" is supposed to mean "such that unnecessary overflow is avoided".

What I haven't seen is the suggestion

  0.5*x + 0.5*y

Yes, we now have two multiplications, but floating point multiplication is considerably faster than division.

bnolsen · on June 5, 2016

to be robust you need to deal with overflow and underflow and also pesky differences in domain.

overflow breaks (x + y) / 2. where x and y are large underflow breaks (x / 2.) + (y / 2.) where x and y are small.

and of course the problem where x is outside the exponent of y. But one of these values gets treated as though it were insignificant and effectively 0.

Scea91 · on June 5, 2016

I think that it is necessary to make a compromise between absolute robustness and readability, maintainability etc.

I hope that the input to the program is sensible enough that floating point overflow won't be an issue, since there are more complicated formulas in the program than this simple one.

peter303 · on June 5, 2016

Some numerical packages have an Almost class to deal with numerical imprecision. For example you test for Almost.equal with tiny tolerance.

panic · on June 5, 2016

How would that help here?

d33 · on June 5, 2016

Obligatory: "What Every Programmer Should Know About Floating-Point Arithmetic"

http://floating-point-gui.de/

dingo_bat · on June 5, 2016

I think the more important lesson here is to not write algorithms in a way that go into recursion because data types aren't infinitely precise.

thedufer · on June 5, 2016

The original didn't require infinite precision; rather, it required that the average of two different numbers be between the two or equal to the smaller one. This is true in integer arithmetic due to rounding down, but apparently not true for floating point.

PDoyle · on June 5, 2016

Same happens with integers. Try computing the mean of 3 and 4 using integer math.

The only difference, really, is that floating point can lull you into believing they have unlimited precision. With integer math, the problem would have been more obvious in the first place.

dskloet · on June 5, 2016

It's not the same. With integer division you know the division rounds down, so using <= always splits between the two points.

That said, it's always safer to compute the mean of two integers as

min + (max - min) / 2

to avoid integer overflow.

TimonKnigge · on June 5, 2016

I like this one better: (min&max) + ((min^max) >> 1)

  #include <iostream>
  using namespace std;

  int avg(int a, int b) {
    return (a&b) + ((a^b)>>1);
  }

  int main() {
    cout << avg(int(2e9), int(2e9 + 10)) << endl;
    cout << avg(int(2e9), int(-2e9)) << endl;
    cout << avg(int(-2e9), int(-2e9 - 10)) << endl;
    
    return 0;
  }

Gives:

  2000000005
  0
  -2000000005

luc4sdreyer · on June 5, 2016

Reminds me of the way Java's binary search computes the average:

  int mid = (low + high) >>> 1;

This was a fix because the original code had a bug:

  int mid =(low + high) / 2;

Which was part of the Java code base for many years. Joshua Bloch based the code on Programming Pearls, which also contained the bug.

More info: https://research.googleblog.com/2006/06/extra-extra-read-all...

Retr0spectrum · on June 5, 2016

Is there an explanation of how that actually works somewhere?

TimonKnigge · on June 5, 2016

Yes there is! Right here:

It's actually really simple. We'll write a and b in binary notation, for example:

  a = 1001101
  b = 0100111

Now what happens when you add two numbers in binary? We essentially add the numbers in each column together, and if it overflows, we carry to the next column (this is how you carry out addition in general).

So what are the columns where we need to carry, the ones that overflow? These are given by (a&b) - the columns where both a and b contain a one. To actually carry we just move everything one position to the left: ((a&b)<<1). And what are the columns where we don't need to carry, the ones that don't overflow? These are the ones where we have exactly one zero, either in a or in b, so: a^b.

In other words, a + b = ((a&b)<<1) + (a^b). To compute the average, we divide by two, or in other words, we bitshift to the right by one place: (a + b)/2 = (a&b) + ((a^b)>>1)

If anything is unclear, feel free to ask :)

Retr0spectrum · on June 5, 2016

Great explanation, thanks.

pbsd · on June 5, 2016

There's this well-known identity [1] a + b = (a ^ b) + ((a & b) << 1). This is essentially how a parallel ripple carry adder works. So (a + b) / 2 = (a + b) >> 1 = ((a ^ b) >> 1) + (a & b).

[1] http://www.inwap.com/pdp10/hbaker/hakmem/boolean.html#item23

Avshalom · on June 5, 2016

It's pretty obvious visually, you take the exclusive disjunct (a^b -> yellow) divide by two (>>1 -> bright yellow) and then add the intersection (a&b -> green) you use the full intersection because in the basic method (a+b / 2) they would each add their own half of the intersection.

https://imgur.com/PUT4G9Z

ced · on June 5, 2016

... unless max is a huge positive integer, and min is a huge negative integer!

RUBwkVjwLsDKgPw · on June 5, 2016

Just use signed ints. Signed integer underflow is undefined behavior so it can't happen /s

gcc_programmer · on June 5, 2016

Lol'd . To the contrary, it _can_ happen, and when it does the behaviour is undefined for C/C++. However, rest assured - you are probably on a x86_64 machine programming in sth like Ruby or Java script, so your Apps should be alright :P

jameshart · on June 5, 2016

That is the safe technique for common cases such as locating a pivot in an array - where min and max are guaranteed to be positive (or you're using unsigned ints in the first place). That approach doesn't solve the overflow problem in general for signed ints.

caf · on June 6, 2016

Integer division doesn't round down, it rounds towards zero.

panic · on June 5, 2016

Yep! The addition here can overflow, too, just like in the integer case (though you'll get an infinite value instead of wraparound).

erikb · on June 5, 2016

I would have upvoted if the title would have been "Mean of two floating point numbers can be mean".