Hacker News new | past | comments | ask | show | jobs | submit login
Mean of two floating point numbers can be dangerous (honzabrabec.cz)
79 points by Scea91 on June 5, 2016 | hide | past | favorite | 45 comments



This behavior can actually be turned around and exploited to do some calculations to full precision, using algorithms that look like they would infinite loop: http://www.shapeoperator.com/2014/02/22/bisecting-floats/


Thanks, it is interesting. I have encountered something related too: http://people.eecs.berkeley.edu/~jrs/papers/robust-predicate...


Related: How do you compute the midpoint of an interval? [1]. PDF [2]

[1] http://dl.acm.org/citation.cfm?id=2493882 [2] https://hal.archives-ouvertes.fr/hal-00576641v1/document


This really should be the top comment of the thread. I've been reading the paper slowly since yesterday.

Tl;dr - the best way to compute mid-points is:

    round_to_nearest_even((a - a/2) + b/2)
Why?

It can't be (a+b)/2 because (a+b) can overflow.

It can't be (a/2 + b/2) because a/2 or b/2 can underflow. However, this works if you explicitly handle the case when a == b.

It can't be (a + (b/2 - a/2)) for similar but more complicated reasons.

However, if you switch things around, ((a - a/2) + b/2) works great without the special case. (Though all three options need to handle a second special case: when a == -b.)

Regarding the rounding method, you want it to be symmetric and moreover you want nearby numbers to not always be biased in the same direction (as might happen if you always round towards zero). Hence rounding to the nearest 'even' floating point number (i.e. nearest bit pattern with a zero in the LSB)

---

In the case of OP, I think the paper would argue that the mid-point was working just fine, and that the code surrounding it isn't considering all possibilities (symmetrically). But then the discussion of infinities in the paper suggests that no matter what you do, there are issues for surrounding code to consider. Which in turn suggests that floating point is absolutely busted as an abstraction.


"and on my hardware it caused the splitValue to be equal to point2"

The fix is the right one, but in C/C++, hardware is not what dictates rounding mode, the language implementation does, and that can be changed at runtime (http://www.cplusplus.com/reference/cfenv/fesetround/)

I think C/C++ default to 'round to nearest', with ties doing 'round to even'. Gnu libc certainly does (http://www.gnu.org/software/libc/manual/html_node/Rounding.h...)


I had a fun floating-point related bug the other day: I wanted to sort a list of doubles in C, and my comparison function was in essence:

    static int compare (void *dpa, void *dpb)
    {
      return *(double*)dpa - *(double*)dpb;
    }
The doubles represented times in nanoseconds, and the program failed to sort properly when the total runtime was larger than (IIRC) 2 seconds. I wonder if you can work out why.

(Spoiler alert) the fix was: https://github.com/libguestfs/libguestfs/commit/1f4a0bd90df3...


Your fix requires a lot of mental CPU cycles on the part of whoever reads the code. Are you sure it wouldn't have been better to do it the simple, obvious way?

   double a = *(double *) dpa;   
   double b = *(double *) dpb;

   if (a > b) return 1;
   if (b < a) return -1;
   return 0;


Oh my gosh ... the fortran computed goto is back ... See http://www.lahey.com/docs/lfprohelp/F95ARComputed_GOTOStmt.h...


Yup I think you are right -- the (fixed) code is very obscure now.


I think it's actually quite elegant. The issue is that it always does two comparisons, even when one would suffice.


Depending on how the compiler computes a>b, it may be branchless, though, and I expect that to be more important on modern CPUs.


That code returns 0 for all NaN inputs.


True enough, but if you're passing NaNs to qsort(), something has already gone wrong in your professional life.


How come the argument types are void* instead of double*?



That's C's idea of generics.


I saw this exact failure mode in the wild - a C++ Newton-Raphson implementation that relied on the order coming out of the mean calculation. In this case, the ordering put the N-R algorithm out and instability resulted. I seem to remember the fix was to put some simple bisection around the N-R guess, and if it went outside that range, truncate at the bisection range. It worked! (If anyone really wants the details, I can probably find the exact fix in our source control history.)


I'm sufficiently interested to ask for further details :)


I don't understand this article. If you know the index, or other indicator of location, of the second point in the sequence, then just split the sequence before that point. No arithmetic involved; the type of the object is not even relevant.


You are right that in this isolated problem it would make sense. However, this is actually a classifier so the idea is to maximize the margin in the split.


That's (one reason) why standard containers use < instead of <=


Even though this is not what this article is about, quite a few comments suggest better ways to compute the midpoint of two doubles, where "better" is supposed to mean "such that unnecessary overflow is avoided".

What I haven't seen is the suggestion

  0.5*x + 0.5*y
Yes, we now have two multiplications, but floating point multiplication is considerably faster than division.


to be robust you need to deal with overflow and underflow and also pesky differences in domain.

overflow breaks (x + y) / 2. where x and y are large underflow breaks (x / 2.) + (y / 2.) where x and y are small.

and of course the problem where x is outside the exponent of y. But one of these values gets treated as though it were insignificant and effectively 0.


I think that it is necessary to make a compromise between absolute robustness and readability, maintainability etc.

I hope that the input to the program is sensible enough that floating point overflow won't be an issue, since there are more complicated formulas in the program than this simple one.


Some numerical packages have an Almost class to deal with numerical imprecision. For example you test for Almost.equal with tiny tolerance.


How would that help here?


Obligatory: "What Every Programmer Should Know About Floating-Point Arithmetic"

http://floating-point-gui.de/


I think the more important lesson here is to not write algorithms in a way that go into recursion because data types aren't infinitely precise.


The original didn't require infinite precision; rather, it required that the average of two different numbers be between the two or equal to the smaller one. This is true in integer arithmetic due to rounding down, but apparently not true for floating point.


Same happens with integers. Try computing the mean of 3 and 4 using integer math.

The only difference, really, is that floating point can lull you into believing they have unlimited precision. With integer math, the problem would have been more obvious in the first place.


It's not the same. With integer division you know the division rounds down, so using <= always splits between the two points.

That said, it's always safer to compute the mean of two integers as

min + (max - min) / 2

to avoid integer overflow.


I like this one better: (min&max) + ((min^max) >> 1)

  #include <iostream>
  using namespace std;

  int avg(int a, int b) {
    return (a&b) + ((a^b)>>1);
  }

  int main() {
    cout << avg(int(2e9), int(2e9 + 10)) << endl;
    cout << avg(int(2e9), int(-2e9)) << endl;
    cout << avg(int(-2e9), int(-2e9 - 10)) << endl;
    
    return 0;
  }
Gives:

  2000000005
  0
  -2000000005


Reminds me of the way Java's binary search computes the average:

  int mid = (low + high) >>> 1;
This was a fix because the original code had a bug:

  int mid =(low + high) / 2;
Which was part of the Java code base for many years. Joshua Bloch based the code on Programming Pearls, which also contained the bug.

More info: https://research.googleblog.com/2006/06/extra-extra-read-all...


Is there an explanation of how that actually works somewhere?


Yes there is! Right here:

It's actually really simple. We'll write a and b in binary notation, for example:

  a = 1001101
  b = 0100111
Now what happens when you add two numbers in binary? We essentially add the numbers in each column together, and if it overflows, we carry to the next column (this is how you carry out addition in general).

So what are the columns where we need to carry, the ones that overflow? These are given by (a&b) - the columns where both a and b contain a one. To actually carry we just move everything one position to the left: ((a&b)<<1). And what are the columns where we don't need to carry, the ones that don't overflow? These are the ones where we have exactly one zero, either in a or in b, so: a^b.

In other words, a + b = ((a&b)<<1) + (a^b). To compute the average, we divide by two, or in other words, we bitshift to the right by one place: (a + b)/2 = (a&b) + ((a^b)>>1)

If anything is unclear, feel free to ask :)


Great explanation, thanks.


There's this well-known identity [1] a + b = (a ^ b) + ((a & b) << 1). This is essentially how a parallel ripple carry adder works. So (a + b) / 2 = (a + b) >> 1 = ((a ^ b) >> 1) + (a & b).

[1] http://www.inwap.com/pdp10/hbaker/hakmem/boolean.html#item23


It's pretty obvious visually, you take the exclusive disjunct (a^b -> yellow) divide by two (>>1 -> bright yellow) and then add the intersection (a&b -> green) you use the full intersection because in the basic method (a+b / 2) they would each add their own half of the intersection.

https://imgur.com/PUT4G9Z


... unless max is a huge positive integer, and min is a huge negative integer!


Just use signed ints. Signed integer underflow is undefined behavior so it can't happen /s


Lol'd . To the contrary, it _can_ happen, and when it does the behaviour is undefined for C/C++. However, rest assured - you are probably on a x86_64 machine programming in sth like Ruby or Java script, so your Apps should be alright :P


That is the safe technique for common cases such as locating a pivot in an array - where min and max are guaranteed to be positive (or you're using unsigned ints in the first place). That approach doesn't solve the overflow problem in general for signed ints.


Integer division doesn't round down, it rounds towards zero.


Yep! The addition here can overflow, too, just like in the integer case (though you'll get an infinite value instead of wraparound).


I would have upvoted if the title would have been "Mean of two floating point numbers can be mean".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: