This really should be the top comment of the thread. I've been reading the paper slowly since yesterday.
Tl;dr - the best way to compute mid-points is:
round_to_nearest_even((a - a/2) + b/2)
Why?
It can't be (a+b)/2 because (a+b) can overflow.
It can't be (a/2 + b/2) because a/2 or b/2 can underflow. However, this works if you explicitly handle the case when a == b.
It can't be (a + (b/2 - a/2)) for similar but more complicated reasons.
However, if you switch things around, ((a - a/2) + b/2) works great without the special case. (Though all three options need to handle a second special case: when a == -b.)
Regarding the rounding method, you want it to be symmetric and moreover you want nearby numbers to not always be biased in the same direction (as might happen if you always round towards zero). Hence rounding to the nearest 'even' floating point number (i.e. nearest bit pattern with a zero in the LSB)
---
In the case of OP, I think the paper would argue that the mid-point was working just fine, and that the code surrounding it isn't considering all possibilities (symmetrically). But then the discussion of infinities in the paper suggests that no matter what you do, there are issues for surrounding code to consider. Which in turn suggests that floating point is absolutely busted as an abstraction.
"and on my hardware it caused the splitValue to be equal to point2"
The fix is the right one, but in C/C++, hardware is not what dictates rounding mode, the language implementation does, and that can be changed at runtime (http://www.cplusplus.com/reference/cfenv/fesetround/)
The doubles represented times in nanoseconds, and the program failed to sort properly when the total runtime was larger than (IIRC) 2 seconds. I wonder if you can work out why.
Your fix requires a lot of mental CPU cycles on the part of whoever reads the code. Are you sure it wouldn't have been better to do it the simple, obvious way?
double a = *(double *) dpa;
double b = *(double *) dpb;
if (a > b) return 1;
if (b < a) return -1;
return 0;
I saw this exact failure mode in the wild - a C++ Newton-Raphson implementation that relied on the order coming out of the mean calculation. In this case, the ordering put the N-R algorithm out and instability resulted. I seem to remember the fix was to put some simple bisection around the N-R guess, and if it went outside that range, truncate at the bisection range. It worked! (If anyone really wants the details, I can probably find the exact fix in our source control history.)
I don't understand this article. If you know the index, or other indicator of location, of the second point in the sequence, then just split the sequence before that point. No arithmetic involved; the type of the object is not even relevant.
You are right that in this isolated problem it would make sense. However, this is actually a classifier so the idea is to maximize the margin in the split.
Even though this is not what this article is about, quite a few comments suggest better ways to compute the midpoint of two doubles, where "better" is supposed to mean "such that unnecessary overflow is avoided".
What I haven't seen is the suggestion
0.5*x + 0.5*y
Yes, we now have two multiplications, but floating point multiplication is considerably faster than division.
to be robust you need to deal with overflow and underflow and also pesky differences in domain.
overflow breaks (x + y) / 2. where x and y are large
underflow breaks (x / 2.) + (y / 2.) where x and y are small.
and of course the problem where x is outside the exponent of y. But one of these values gets treated as though it were insignificant and effectively 0.
I think that it is necessary to make a compromise between absolute robustness and readability, maintainability etc.
I hope that the input to the program is sensible enough that floating point overflow won't be an issue, since there are more complicated formulas in the program than this simple one.
The original didn't require infinite precision; rather, it required that the average of two different numbers be between the two or equal to the smaller one. This is true in integer arithmetic due to rounding down, but apparently not true for floating point.
Same happens with integers. Try computing the mean of 3 and 4 using integer math.
The only difference, really, is that floating point can lull you into believing they have unlimited precision. With integer math, the problem would have been more obvious in the first place.
It's actually really simple. We'll write a and b in binary notation, for example:
a = 1001101
b = 0100111
Now what happens when you add two numbers in binary? We essentially add the numbers in each column together, and if it overflows, we carry to the next column (this is how you carry out addition in general).
So what are the columns where we need to carry, the ones that overflow? These are given by (a&b) - the columns where both a and b contain a one. To actually carry we just move everything one position to the left: ((a&b)<<1). And what are the columns where we don't need to carry, the ones that don't overflow? These are the ones where we have exactly one zero, either in a or in b, so: a^b.
In other words, a + b = ((a&b)<<1) + (a^b). To compute the average, we divide by two, or in other words, we bitshift to the right by one place: (a + b)/2 = (a&b) + ((a^b)>>1)
There's this well-known identity [1] a + b = (a ^ b) + ((a & b) << 1). This is essentially how a parallel ripple carry adder works. So (a + b) / 2 = (a + b) >> 1 = ((a ^ b) >> 1) + (a & b).
It's pretty obvious visually, you take the exclusive disjunct (a^b -> yellow) divide by two (>>1 -> bright yellow) and then add the intersection (a&b -> green) you use the full intersection because in the basic method (a+b / 2) they would each add their own half of the intersection.
Lol'd . To the contrary, it _can_ happen, and when it does the behaviour is undefined for C/C++. However, rest assured - you are probably on a x86_64 machine programming in sth like Ruby or Java script, so your Apps should be alright :P
That is the safe technique for common cases such as locating a pivot in an array - where min and max are guaranteed to be positive (or you're using unsigned ints in the first place). That approach doesn't solve the overflow problem in general for signed ints.