One example could be when computing sin (or any trig function) of a large number - first you perform range reduction, ie bring it into the range 0..2pi. For that, you're (potentially) subtracting two large numbers that are close together, so you lose a lot of accuracy.
This implementation of mod2pi in Julia, for example, uses double doubles ("TwicePrecision") here and there:
I have a book somewhere in the basement on organising calculations from the early twentieth century, back when computer was a job title and job runtime was measured in weeks...
===Update===
Couldn't find the one I was looking for, but I did dig up Eckert, Punched Card Methods in Scientific Computation (1940).
From Chapter VII "The Multiplication of Series":
p.68 "By a continuation of this process we obtain about 100 groups of cards each with its rate card. Since the capacity of the multiplier is eight digits, those groups which have 9 and 10 digit multipliers and eight digit multiplicand are done in two groups. The few terms with large multiplier and multiplicand are done by hand. In a series with reasonable convergence there may be about 20,000 cards altogether. The reproducer is of course used to check the punching."
10 decimal digits is what, 33 bits?
p.74 "The machine time for 100 series is as follows:
Original punching and verifying . . . . 2 days
Listing and summary punching checksums. 1/4 day
Sorting and gang punching duplicates. . 1 day
Multiplying . . . . . . . . . . . . . . 1 day
Sorting, tabulating, and summary punch. 1 day
In this example the checks would have been exact had we included a card for each product regardless of size, but this would have required about forty thousand cards instead of four or five thousand, and the multiplying time would have been over a week instead of one day."
I guess in principle a punched card could have contained 80 decimal places, but it seemed like the ALU equivalents were much narrower.
(this was published by the Thomas J. Watson Astronomical Computing Bureau)
I actually ran into this problem in grad school. I was trying to simulate an optical phased array with a target at the moon, it turns out that a single bit of a double is about 1 wavelength of light at the distance of the moon. I ended up doing most of the calculations symbolically and only using floats for the differences in the distances between each beam which was enough to get out of the solar system but not out of the galaxy...
You need multiprecision floating point for very deep mandelbrot zooms - it's really easy to hit the limits of a 53 bit mantissa on today's computers.
Interestingly, you don't need multiprecision floating point for every pixel: you can do precise calculations for a few tentpole pixels and calculate most of the rest at adouble precision based on the differences between their location and a tentpole. But you also need some significant cleverness to accurately fill in the gap betwen "most" and "all":
Algorithms stability. As precision decreases when magnitude rises, many tricks are used in sci. comp. to scale everything into ranges where computations won't diverge due to a lack of precision.
Why do you need 128-bit floats for storing solar system coordinates? 64-bit floats will give you millimeter precision on Pluto, if your coordinate system has an origin in the sun.
* JPL's highest accuracy calculations, which are for interplanetary navigation, we use 3.141592653589793.
* How many digits of pi would we need to calculate the circumference of a circle with a radius of 46 billion light years to an accuracy equal to the diameter of a hydrogen atom (the simplest atom)? The answer is that you would need 39 or 40 decimal places.
So if he'd only had interplanetary probes, غیاث الدین جمشید کاشانی Ghiyās-ud-dīn Jamshīd Kāshānī could in principle have navigated them to the proper accuracy in 1424.
well, 2 dimensions of 64 bits is 128 bits, while 3 dimensions of 64 bit precision is 192 bits... so mm precision to pluto is along 1 dimension I take. I just wasn't sure of the magnitude we were talking.
The precision doesn’t change when you change the number of dimensions.
If you are using a 64-bit float, then you have 53 bits of precision in 1D, 53 bits of precision in 2D, and 53 bits of precision in 3D. The precision doesn’t change as you add or remove dimensions.
Yes, 64 bits is not really comfortable for a long-term timestamp format.
The NTP nanokernel uses a 64.64 integer represention https://papers.freebsd.org/2000/phk-nanokernel/ because 32 bits isn't enough above the binary point because it runs out in 2038, and below the binary point you need 64 bits to accurately represent the nanosecond-scale frequency of the CPU's cycle counter to translate it to wall-clock time.
You can use double-doubles for near-128-bit floating point [1]. It's much more powerful than 53.53 fixed point you mentioned, but more complicated as well.
I know it is a joke, but 64bit fixed-point representation with nanosecond resolution has just ~585 years of range. With 1e100 years of range needed to measure out to the heat-death of the universe, you end up with resolution of ~5e80 years, which is 60 orders of magnitude longer than the current age of the universe.
It is easy to run out of bits when trying to use a single number as both an absolute date/time, and to compute relative durations between timestamps with reasonable precision. I've lost track of the number of bugs I've fixed where someone assumed a double would be big enough without doing the math.
Intermediates in computations. There's lots of quantities of the form z=x*y or z=x+y where while z may well be in the realm of "reasonably sized numbers that fit comfortably in 64 bits", x and y are (beyond) astronomical.
Well you may still have an x larger than the largest value representable in your floating point system, and y=1/x. It is not insane to want z=x*y=1 to be approximately true.
At that point you're not multiplying x and y to begin with.
So of course you want to avoid error states but a double already has an exponent range well beyond beyond astronomical. The width of the universe is around 10^27, a planck length is around 10^-35, and the limit of the format is ±10^308.
While I know there are better and more numerically stable ways to implement it, think of the softmax function. It is perfectly possible to have a list of non-astronomical/non-Plancklength numbers, naively softmax it, and die because of precision.
Softmax goes out the other side. Naively tossing around exponentials is so bad that it can easily explode any float, even 128 bit. It only adds another 4 bits to the exponent, after all. Even a 256 bit float has less than 20 bits of exponent! So that's an example where floats in general can go wrong, but it's not an example of where using larger floats is a meaningful help.
IMHO there's no reason for hardware support for floats beyond 64 bit.
If 64 bit IEEE-754 floating point numbers do not support your needs, adding more bits will almost certainly not fix the problem. The problem isn't lack of bits, it's going to be something fundamental about the nature of IEEE-754. You'll probably need something like a rational data type, or a symbolic algebra system that can abstractly represent everything that shows up in your domain without losing any precision. Generally you'll be required to do this in software.
32 bit is the sweet spot between precision and performance. There are only a handful of things that require 64 bits, (notably latitude and longitude) but these things are common enough that hardware support is (IMO) valuable.
The first example won't be solved by adding a higher precision floating point type.
Basically, they're summing up a bunch of intermediate calculations. The intermediate calculations are 64 bit floats, and are precise to 17 decimal digits or whatever. They sum all the intermediate components, and get a result that's only precise to 13 digits or whatever. They use double doubles to perform the summation, and they get a result that's back to being precise to 17 decimal digits. Great.
Now imagine doing this with 128 bit floats, which are precise to 33 digits. So you sum the intermediate results, now you're precise to 29 digits. So you've lost 4 digits of precision again. So you add a 192 bit floating point type...
IEEE-754 floating points are always going to have that problem. If you add two numbers of differing magnitude, you're going to lose precision. Finding the sum of a large sequence is generally going to result in the addition of a very large running total with small individual members.
(I perused the other examples, but they seem to be a variation on the same theme)
I think you are missing the point here; sure you still lose significant digits regardless of how many you started with. But the point was that for many practical real world applications quadruple floats have so much headroom that you are able to perform the needed calculations and still end up with sufficient number of significant digits, whereas with doubles you don't. Choice quote:
> This permitted benchmark results to be accurately reproduced for a significantly longer time, with virtually no change in total run time
They clearly understand that bigger floats do not magically change the fundamental behavior, but the additional headroom makes a significant difference
> A larger example of this sort arose in an atmospheric model (a component of large climate model). While such computations are by their fundamental nature “chaotic,” so that computations will eventually depart from any benchmark standard case, nonetheless it is essential to distinguish avoidable numerical error from fundamental chaos.
> Researchers working with this atmospheric model were perplexed by the difficulty of reproducing benchmark results. Even when their code was ported from one system to another, or when the number of processors used was changed, the computed data diverged from a benchmark run after just a few days of simulated time. As a result, they could never be sure that in the process of porting their code or changing the number of processors that they did not introduce a bug into their code.
> After an in-depth analysis of this code, He and Ding found that merely by employing double-double arithmetic in two critical global summations, almost all of this numerical variability was eliminated. This permitted benchmark results to be accurately reproduced for a significantly longer time, with virtually no change in total run time [3].
> In climate model simulations, for example, the initial conditions and boundary forcings can seldomly be measured more accurately than a few percent. Thus in most situations, we only require 2 decimal digits accuracy in final results. But this does not imply that 2 decimal digits accuracy arithmetic (or 6-7 bits mantissa plus exponents) can be employed during the internal intermediate calculations. In fact, double precision arithmetic is usually required.
The problem isn't lack of precision. The problem is numerical instability when adding up a bunch of numbers with high absolute values but since they were roughly evenly positive/negative, their sum was approximately 1. IEEE-754 floats, as useful as they are, are just bad at this, and adding more bits isn't a solution, it's a punt. They used Kahan summation or Bailey summation and the problem went away. No 128 bit hardware floats required. (Kahan summation is very well known, Bailey summation is new to me)
Here's my point: if double precision floating point doesn't satisfy your needs, you should dig into the problem and understand why. Understand first, write code second. 999/1000 the solution isn't "we need 128 bit floats", and for that .1%, we're waaaay better off telling those people "Sorry, do it in software and take the performance hit."