What Every Computer Scientist Should Know About Floating-Point Arithmetic (1991)

dang · on June 29, 2020

Posted 30 times, but most of the comments are about how many times it has been posted. Here are the on-topic threads:

2017 (just a bit): https://news.ycombinator.com/item?id=13431299

2012: https://news.ycombinator.com/item?id=4815399

2010 (not a good year): https://news.ycombinator.com/item?id=1982332

2009 (also pretty bad): https://news.ycombinator.com/item?id=687604

https://hn.algolia.com/?query=What%20Every%20Computer%20Scie...

s9w · on June 29, 2020

That's quite a lot. The by far most important things for me: Floats guarantee at least 7, and doubles at least 15 significant decimal digits. That's regularly surprisingly low to many devs.

hnick · on June 29, 2020

The real problem I see mostly happen when arithmetic gets involved. Two simple and precise floats added or subtracted can get you something like 5.00000000001 if you aren't careful.

A fairly innocuous perl snippet shows this:

  DB<1> $a+=0.1 for 1..10; print $a - 1
  -1.11022302462516e-16

I ran into a bug like this on a website, I don't know what they were doing to the inputs but a simple 60.9 or similar turned into 60.9000000001 and tripped up the back-end service.

slavik81 · on June 29, 2020

The problem is that they aren't precise floats. 0.1 has no finite representation in binary. It's an infinite repeating string of digits, like 1/3rd is in decimal. Thus, when you add a bunch of them together, the inherent error in rounding them off at some finite number of digits slowly adds up.

The same is true of all multiples of 0.1, unless they're also a multiple of 0.5. Hence why your 60.9 is not actually 60.9. It was never exact to begin with.

hnick · on June 29, 2020

Yes, that is true. But usually symptoms don't surface until errors accumulate which is why I found that web form interesting.

'Precise' was the wrong word. What I meant was the system reports back the exact entered value 0.1 (rather than what it has in memory, which is 0.100000001490116119384765625 according to https://www.h-schmidt.net/FloatConverter/IEEE754.html). Most programs will do this fairly well.

I confirmed javascript is entirely capable of storing and retrieving the number I entered, so I'm not entirely sure what they were doing to confuse it :) Must have been some secret maths.

DarkWiiPlayer · on June 29, 2020

> 0.1 has no finite representation in binary.

b1 / b1010

I don't see the problem. Did you mean to say it has no finite representation in the IEEE float format?

> It was never exact to begin with.

But the format sadly gets used in situations where people expect it to be reasonably accurate, where it isn't, because developers either underestimate the "reasonable" or overestimate the "accurate" part.

It's just sad that most mainstream programming languages have no native accurate data type for rational numbers.

slavik81 · on July 12, 2020

I meant in positional notation, but most people know that as "decimal notation". Unfortunately, that implies base 10, so I avoided the term.

I gave both an example in binary and an example in decimal to clarify my imprecise terminology. I worry that by objecting only to the binary example, you may confuse people into thinking the two are different.

account42 · on June 29, 2020

So are you saying that ⅓ has a finite presentation in decimal? Finite binary rationals ⊋ finite binary numbers.

DarkWiiPlayer · on July 2, 2020

> So are you saying that ⅓ has a finite presentation in decimal?

Yes: ⅓

Gracana · on June 29, 2020

> Did you mean to say it has no finite representation in the IEEE float format?

Of course that's implied.

DarkWiiPlayer · on July 2, 2020

It was a rhetoric question; I just wanted to point out that people blame these inaccuracies on "binary representations" when in reality it's just this one binary representation that's broken.

kstenerud · on June 29, 2020

It may seem low, but it's really not. At 6 significant digits, you'd have a margin of error of a thousandth of an inch when measuring a football field.

T-hawk · on June 29, 2020

Sure, that's low, depending on context. At 6 significant digits, a million-dollar bank account would have a margin of error of up to +/- 5 dollars.

And if you add up a million of any item with 6 significant digits, that's enough for inaccuracy to propagate all the way into your first significant digit. (It would require systematic inaccuracy in one direction, but there are certainly data sets and contexts where that would happen.)

kstenerud · on July 1, 2020

https://dzone.com/articles/never-use-float-and-double-for-mo...

0xffff2 · on June 29, 2020

Counterpoint: At 6 significant digits you have a margin of error of thousandths of a millimeter when measuring degrees longitude near the prime meridian, but the error balloons to whole meters when measuring longitudes in Australia.

I have a piece of software that I wrote ages ago where I looked at the max error in the US and concluded that 32bit floats were enough. This made sense since at the time I was reasonably confident that the tool would only be used with locations in the continental US and it simplified my architecture. Fast forward a few years and we ended up open sourcing the tool and people started using it all over the world. I'm still dealing with the fallout to this day.

I'm dreading the day that the tool gets used in some inter-planetary context and I have to repeat all of the work to add support for 128 bit/arbitrary precision location data.

s9w · on June 29, 2020

You are also not able to represent your yearly salary to the cent. At least if you're American haha

randcraw · on June 29, 2020

Some historical context... IEEE 754 (the current floating point standard in H/W and S/W) arose in 1985, only 6 years before this paper. Before that, floating point representations were often proprietary (e.g. Cray or IBM). Back then, most microprocessors did not have floating point hardware, so most non-scientific apps avoided floating point operations and booleans using them. Few coders had much experience with floating point unless they were engineers or scientists.

This paper broke new-ish ground in introducing 'the rest of us' to the nonlinear representation space of digital scientific numbers and subtleties of roundoff & inaccuracy inherent in math-intensive libraries and scientific math functions. It still usefully covers more ground in that space than most of today's BSCS degree programs.

seren · on June 29, 2020

Oddly enough I got an issue with float this week, 2 systems are communicating by transmitting float (rather needlessly I would add) and the conversion from float seems to be implemented differently on both systems, resulting in discrepancy sometimes.

This is kind of textbook example of an obvious bug but still surprising when you find it.

okennedy · on June 29, 2020

Even without conversions floats cause problems. Several years back I was working on a SQL query transpiler, and was getting driven up the wall trying to explain why compiling query 6 from the TPC-H benchmark on some of our backends would agree with Postgres, and disagree on others. Worse, it worked perfectly on the OCaml backend, we used for debugging, but broke on our performance-focused C++ backend.

I must have torn my hair out for nearly a week, but the culprit turned out to be TPC-H's defaults including a subtle and very hidden test for floating point numbers. Q6, with default values plugged in includes the bit:

  WHERE l_discount < 0.06+0.01

In the C++ backend (with a standard `float` iirc), 0.06+0.01 != 0.07, and the query result ended up wrong.

Fun times. (I now occasionally toss Q6 in as a test query for my database class. Definitely teaches proper respect for floating point arithmetic :D)

guenthert · on June 29, 2020

l_discount is not a monetary (fixed precision) type? It appears it ought to be, but I see now that is another can of worms ...

enriquto · on June 29, 2020

I do not understand what you mean. If you "convert" a float to a float it shouldn't change anything? As long as you keep to floats there should be no problem.

pjc50 · on June 29, 2020

Round-tripping floats through text is much, much harder to do than it sounds - you need to make sure you get all the decimal digits, and round in the correct way. e.g. https://randomascii.wordpress.com/2012/02/11/they-sure-look-...

enriquto · on June 29, 2020

I do not understand why would you ever round trip floats through text. That seems useless? In any case, if you really want to do that then you use a hexadecimal float representation (like C's printf has had the "%a" conversion since forever) , or simply print out the bytes. Representing a float in decimal is absolutely idiotic and the people who commit this atrocity and get into trouble for that have it coming.

ColinWright · on June 29, 2020

> I do not understand why would you ever round trip floats through text.

There are many, many occasions in the real world where this is necessary ... I've dealt with it multiple times. When done carefully and with appropriate knowledge of the underlying systems it's not a problem.

But just because you've never done, that doesn't mean it's the wrong thing to be doing.

> if you really want to do that then you use a hexadecimal float representation

There are combinations of systems where that won't work.

pjc50 · on June 29, 2020

JSON? Human-readable representations in general?

> Representing a float in decimal is absolutely idiotic

Other way round: humans like decimal arithmetic, and will accept binary floating point as an intermediate representation.

dahfizz · on June 29, 2020

We are talking about two machines talking to each other. Correctness is more important than human readability for bytes on a wire.

ColinWright · on June 29, 2020

Sometimes when two machines talk to each other there is no reliable way to do so in binary. Different endianness, different internal representation of floats, and so forth.

So converting floats can sometimes be necessary, and that's what we're talking about. Going via human readable text is only one specific example.

enriquto · on June 29, 2020

> Different endianness, different internal representation of floats, and so forth.

But this has nothing at all to do with floats. If you transmit 16 or 32 bit ints, or any image/video/sound datatype, or even raw bytes, there can be endianness shenanigans. Introducing endianness when talking about floating point numbers is completely offtopic.

ColinWright · on June 29, 2020

It has everything to do with internal representation of data in general, and floats are one specific example that turns up quite regularly. Sometimes one machine has some data, and you need to transfer that to another machine. Sometimes the internal representation of that data will be different. Sometimes you can't simply "send the bits" and have it work.

So in those cases you need to go through an intermediate representation, and then the question of conversion arises.

So sometimes, when sending a float (as a single example) from one machine to another, you have to go through conversions, and one way to do that is to generate a string representation. The point here and in other comments is that many people think this is easy and obvious, but in truth it's hard and full of pitfalls.

You said:

> I do not understand why would you ever round trip floats through text.

I'm trying to explain that. I have some experience of the issues involved, and was trying to share my experience. You are dismissing it, which is entirely your choice, and in that case, my comments are here for others.

dahfizz · on June 29, 2020

That is an extreme edge case. The vast majority of computing happens on x86 and ARM. Unless you're working on an IBM mainframe, you're using a little endian, IEEE float machine.

ColinWright · on June 29, 2020

I have extensive experience of other systems, and for me, with my experience, it's not an extreme edge case.

If for you it is an extreme edge case, then I suggest you put it to one side and hope it never becomes relevant. But if one day you find a bug you can't explain, maybe it will help you.

kstenerud · on June 29, 2020

Unfortunately, JSON doesn't have a mechanism for 100% accuracy in transmitting binary floating point numbers. Everything gets marshaled to decimal float, and then unmarshaled to binary float - even in machine-to-machine communication. That's simply a limitation of the format.

It's part of the reason I developed Concise Encoding: https://github.com/kstenerud/concise-encoding/#example which allows hex float notation when you need to be exact.

enriquto · on June 29, 2020

> Unfortunately, JSON doesn't have a mechanism for 100% accuracy in transmitting binary floating point numbers.

If it was just inaccurate! It cannot even deal with straightforward floats like inf, nan, or minus zero!

dragonwriter · on June 29, 2020

> We are talking about two machines talking to each other.

There are allnkinds of reasons machines talk to each other in human readable formats. APIsare for machine-to-machine communications, and yet APIs that speak JSON or XML are not uncommon.

ColinWright · on June 30, 2020

Not to flog unnecessarily an expired equine, you say:

> C's printf has had the "%a" conversion since forever

That turns out not to be the case. Would you care to guess:

* Which year it was first introduced?

* Which year it became wide-spread?

* When I started programming?

* How old some of the systems are in which I still work?

ColinWright · on June 29, 2020

I don't understand your confusion.

Consider, system A has a float in memory, and wants to send it to system B. System A has no knowledge of how system B stores floats, whether it's big-endian or little-endian, how many bits of precision it uses, etc.

It might also be that the only method of transmitting information is via a text encoding ... perhaps JSON, perhaps something else.

So system A "prints" the float, converts it to a string, and sends that. It's even useful because it can be inspected "by eye" on the way through.

Now system B receives the string and converts it to a float.

All this sort of thing happens in the real world, and is sometimes dictated by the original designs of the system, and sometimes it's specified by customers that this is how it has to work.

Does that help you to understand the original comment?

nestorD · on June 29, 2020

You could run into problems with little endians and big endians if you do a binary tranfer or, more likely, have a lossy float->string->float conversion.

dahfizz · on June 29, 2020

Unless you are using an IBM mainframe, Big Endian is all but extinct. Besides, it is about ~10 lines of C to detect endianess and reorder bytes if necessary.

pedrocr · on June 29, 2020

> Besides, it is about ~10 lines of C to detect endianess and reorder bytes if necessary.

Proper code to do this doesn't do any tests. Just grab the bytes from the order of the transfer format (in this case network order) into the order of the abstract machine in C. The compiler will turn that into whatever is needed depending on if it's compiling to a big-endian or little-endian machine and may use byteswap instructions if needed. Having tests in the code is a code smell that creates unchecked paths to the code that are much more likely to be wrong as they are not used most of the time.

account42 · on June 29, 2020

> and may use byteswap instructions if needed

Or it may not because often compilers fail to make simple optimizations:

https://godbolt.org/z/NAt3uX

pedrocr · on June 29, 2020

It does fine if you actually write the correct code :)

https://godbolt.org/z/qpLeWP

account42 · on June 29, 2020

There is nothing incorrect about the code I posted - clang will compile it to a single read + bswap just fine. And you don't need to go that far back for your code to not be recognized - GCC 4.9 will produce individual loads and shifts for that too.

The point is that you can't rely on compilers consistently to recognize complex patterns.

pedrocr · on June 29, 2020

Incorrect was too strong but it's a weird pattern to do this with sums. The OR pattern is what is used pretty much everywhere and conveys the intention of the code much more clearly.

And even if the compiler doesn't always optimize ideally my original point still stands. Delegating this to the compiler instead of trying to manually call swap instructions is a much better way of going about it.

enriquto · on June 29, 2020

That seems to be more a problem of strings that of floats, then.

rini17 · on June 29, 2020

Strings have nothing to do with the issue that there is no widespread lossless text exchange format for FP. Instead we round FP numbers while encoding (like "0.3") and then "deround" it while decoding.

(Lossless base10 encoding of 0.3 single precision would be like "10066330*2^-25"..not very readable).

dahfizz · on June 29, 2020

> the issue that there is no widespread lossless text exchange format for FP.

If you have to use strings as the intermediate representation (why?) Then printf("%x") and scanf("%x") is perfectly lossless and widespread.

The only thing that's lossy is converting floats to decimal point numbers, which is absolutely silly if it's just two machines communicating.

MaxBarraclough · on June 29, 2020

It's not clear what 'conversion' is referring to here. Do you mean the communication is using a decimal representation (like JSON)?

bch · on June 29, 2020

Original paper (PDF): https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf

kristopolous · on June 29, 2020

I think what every programmer should know is probably only 4 words:

They're approximations. Use accordingly.

phkahler · on June 29, 2020

I'm still hoping posits make it one of these days.

henvic · on June 29, 2020

This is gold. I've been forwarding this link for everyone who says they are using float for storing currency values (money, $$$) since forever.

todd8 · on June 29, 2020

I've often used floating point for currency. What can go wrong? Actually, a number of things can go wrong when using floating point as the article pointed out. The main concern nowadays is that floating point doesn't represent values with infinite precision and consequently is not capable of exactly representing every possible value.

In the past, number formats were often a concern. The first assembly language program I ever wrote was for an IBM mainframe that required a conversion of some numbers in zoned decimal format to be converted to packed decimal format. There were actually native assembly instructions for doing calculations in these different decimal formats. The IEEE 754 standard for floating point that we use today came about in 1985. Before that, floating point might involve 60 bit values (as on CDC mainframes) or hexidecimal based floating point (as on the IBM 360/370 mainframes). Double precision was not widely used because of memory limitations. Floating point was much slower. Programming languages didn't provide good or consistent facilities for non-integer values (it was virtually impossible to predict which implicit conversions were being done by PL/1 when doing mixed calculations, COBOL and FORTRAN handled calculations wildly differently). I believe that some of the current general advice about handling financial calculations stems from considerations that made sense during these Middle Ages of programming.

Now, with double precision available everywhere and standard, specified rounding modes, and the other benefits of IEEE 754, I think it's safe to consider using floating point for currency calculations. The most widely used software for financial calculations uses floating point (Microsoft Excel).

If Excel uses floating point, why is there a widely promelgated admonition to avoid it for currency? I believe that it made sense in the Middle Ages of computing, but now is not as relevant.

While it is true that some quantities cannot be represented exactly in floating point, for example 0.01, the same is true about decimal fixed point where 1/3 cannot be represented exactly. Common financial calculations can fail to be exact no matter the number format.

Consider calculating the payments for a fixed-rate 30 year mortgage. At a 5% interest rate a $200,000 loan will have a monthly interest rate of 0.05/12 so there will be 12 * 30 payments the amount of each of these payments will be:

    200000 * (0.05/12) / (1 - (1 + (0.05/12))^(-(12*30)))

This formula cannot be calculated exactly in decimal floating point for two reasons. First, the fraction (0.05/12) is not exact in decimal and secondly, there is unlikely be be a direct way to do exponentiation of decimal values.

Some languages (like Common Lisp) support exact rational numbers. This allows exact calculations with any rational fraction, but this still doesn't allow calculations involving irrational numbers, like sqrt(2) to be represented exactly. Consider calculating the cost of a circular plate when priced in dollars per gram. This involves using pi.

Care must always be exercised when doing financial calculations if they need to match how financial institutions are doing their calculations. Reports must round fractional values using the same rounding methods (is 1.005 rounded to 1.00 or 1.01? i.e. round-up vs round-to-even). Values should usually be stored after rounding to the same units used in the currency. These problems are not caused by the inaccuracy of the 17th digit of a double precision floating point being off by one.

For further information on the kinds of considerations that need to be made take a look at the design discussions that have been documented for the beancount project [1].

[1] https://beancount.github.io/docs/31_rounding_precision_in_be...

mamcx · on June 29, 2020

> I've often used floating point for currency

The question is WHY? Because decimal types are barely second class citizens across the stack. Is similar with dates.

Bad defaults are bad, and requiere stupid amounts of workarounds.

And just because excel do it?

https://en.wikipedia.org/wiki/Numeric_precision_in_Microsoft...

Excel is amalgamation of surprises that are not fixed.And when you put the CORRECT results, the users demand that we give the same wrong results as excel!

nurettin · on June 29, 2020

I use floating point for currency (decimal(19, 10) in db, double precision and rounding functions in memory) and there is nothing wrong with it. If I used integers for everything, I would have to take special care of every mathematical operation instead of rounding the final result to desired precision.

burntoutfire · on June 29, 2020

"Decimal" is not floating point... On the contrary, it's fixed point, as you set fixed amount of precision both before and after the decimal point. More importantly, "decimal" does not suffer from the arithmetic issues you get when using IEEE 754 floating point.

josefx · on June 29, 2020

> More importantly, "decimal" does not suffer from the arithmetic issues you get when using IEEE 754 floating point.

It can represent represent 1/10 correctly, it can represent 1/2 correctly (so can float), how about 1/3? Outside of a few happy coincidences you are back to the same old issues.

nurettin · on June 29, 2020

yes, what is your point?

shultays · on June 29, 2020

I don't know about databases but if that decimal is the decimal I know of, it is not a floating point number.

In decimal you specify how many bits will be used for decimal & non-decimal parts. If your unsigned decimal has 1 bit for non-decimal and 1 bit for decimal, then it can only store: 0.0 0.5 1.0 1.5. 1 bit for non-decimal gives you either 0 or 1 and 1 bit for decimal gives you .0 or .5. In many ways a decimal is just an integer, except the printing & some math operations being different

In floating point, you are also storing "exponent" which represents where the point goes. So the point actually "floats" depending on value of this exponent while in fixed point the point is fixed (and in previous example, the point is always in between those two bits. The implementation is a bit harder to explain, you can check it here: https://www.doc.ic.ac.uk/~eedwards/compsys/float/

rini17 · on June 29, 2020

That you aren't using floating point.

nurettin · on June 29, 2020

Double precision isn't floating point? I'm not sure that's his point.

rini17 · on June 29, 2020

Double precision is FP, Decimal is not FP.

Your software might confuse these two, but that notion isn't supported by computer science.

nurettin · on June 29, 2020

I didn't say decimal is floating point, I said I used decimal for db. Not to confuse what I use for the rest of the software. And the persons replying assumed that I had no idea about what I'm doing. Which isn't really nice.

Anyway I will break off the reply train here. Hope someone gets educated reading this thread so all those "nice" replies don't go to waste.

stronglikedan · on June 29, 2020

> And the persons replying assumed that I had no idea about what I'm doing. Which isn't really nice.

This is simply not true. Your original response strongly implied that you were confusing decimal with floating point. If you reread your original reply strictly in the context of the original comment, you may notice what I'm talking about. The responses were just pointing out the difference, and doing so quite nicely I might add.

Twirrim · on June 29, 2020

This is one of those cases where your reality was not successfully communicated in your message. Your original message, in the context in which it was presented, fairly clearly implied that you thought floating point and decimal are the same thing. That might not be the intent, or the reality, but it's what you managed to convey. Hence the thread.

bass_case · on June 29, 2020

Hello imposter syndrome my old friend

brudgers · on June 29, 2020

Computer science is hard and even Knuth is humbled by it. And he's been writing several programs a week for about sixty years.

bjourne · on June 29, 2020

A follow up to this classic article could be titled "What every language designer should know about humans." :) Floats are among the least user-friendly ways to do decimal arithmetic. They aren't even that fast anymore given the resurgence of fixed point arithmetic in hpc.

Imo, in 2020 we should have had mainstream languages in which (2^(1/2))^2 == 2. It's not rocket science. :) Float truncation is a little like an OS silently converting a user's flac collection to mp3 because it was running low on disk space.

SideQuark · on June 29, 2020

> Imo, in 2020 we should have had mainstream languages in which (2^(1/2))^2 == 2

Such things are either incredibly inefficient or mathematically unsound. And there's plenty of provable impossible computational things involved.

Start at the tablemaker's dilemma, then learn the difficulty of deciding if some basic expression is actually zero, and so on. There's a massive literature on doing such computations, and many, many impossibility theorems.

There is no getting around some seemingly simple problems require arbitrarily large computational resources to resolve - floating point picks the very useful path of doing approximations in predictable time over attempting exact answers in unbounded time and space.

bjourne · on June 30, 2020

"Such things" are the foundation of symbolic math libraries, deep learning libraries, most optimizing compilers and Wolfram Alpha. I.e it is not true that they would be incredibly inefficient or mathematically unsound.

I don't understand your point about the Tablemaker's dilemma because it is a dilemma only if precision is fixed. With the correct numeric type (rationals) precision is not fixed.

SideQuark · on June 30, 2020

Let's unpack:

> With the correct numeric type (rationals) precision is not fixed.

Almost no problems can be represented exactly by rationals. Even the sqrt(2) you started with is not rational.

And rationals are unusable for almost any type of work, because they grow exponentially (in time and space) for most problems.

For example, if you tried to compute a simple mandelbrot set with rationals, you'd need more bytes than there are subatomic particles in the universe (~10^80).

Proof: Mandelbrot takes a complex number c = x + i y, which you want to represent as a rational, and iterates z <- z^2 + c. Squaring doubles the number of digits for each of the numerator and denominator, so each iteration requires ~2 times the storage of the previous iteration. Thus you'll need over 2^300 bytes for 300 iterations, which is > 10^90. A common mandelbrot uses over 1000 iterations.

So you see rationals are computationally unusable for even simple tasks.

> symbolic math libraries

These can be arbitrarily slow and use arbitrarily large memory for simple problems - which I stated above when I said "Such things are either incredibly inefficient or mathematically unsound." For example simple converging infinite sums that don't have closed forms known to your tools would never evaluate - the only solution is some fixed, finite representation if you want to use the value in later computations. The same thing for integrals, root finding, pdes, and thousands of other problems.

> deep learning libraries

Deep learning libraries have led to a reduction in precision specifically for the reasons I mentioned: lower memory requirements and higher speed. Hardware vendors from Intel to NVidia to AMD to ARM have introduced lower precision math, including the new bfloat16 (and some have a bfloat8) for these reasons. I think this is the opposite of what you claimed, but supports my claim.

Care to show me where deep learning libraries do perfect symbolic computation in the process of deep learning? I'm aware people try to train them to do symbolic computation, but the libraries themselves aren't using the types of reductions you posted.

> most optimizing compilers

Most compilers would reduce sqrt(2) * sqrt(2) to 2? List them. (Heck - list one!) It's simply not true (as can be checked trivially for all common C++ compilers on godbolt).

I've seen none, despite following optimizing compiler theory and practice for decades.

>Wolfram Alpha

Redundant with the symbolic stuff above - and I've used Mathematica since v1.0 in 1987 quite extensively. It most certainly is slower doing anything symbolically that can be done numerically, and there are lots of problems it cannot do symbolically without puking, but those same problems run instantly when approximated with floating point.

There's ample pages listing things symbolic engines get wrong or cannot do, but that can be done elsewhere.

>I don't understand your point about the Tablemaker's dilemma because it is a dilemma only if precision is fixed

????

The dilemma is not about fixed precision.

The problem is correctly rounding a function as if it had an infinite expansion first. For some functions this means arbitrarily large computations. Ahead of time mankind does not know how many digits are needed for many common uses. So the problem is not about fixed precision - it's about possibly unbounded computation, which is vastly different.

For example, no one has a bound on the number of digits needed to compute correctly rounded a^b for any two floating point values a and b.

And there exist computable numbers for which the rounded value can never be computed with any amount of digits [3].

So it's not about fixed precision. It's far more subtle.

If you're willing to run unbounded in time and space computations, then you can more and more precise answers, but at every point you still don't know if you've escaped the Tablemaker's dilemma. This is the point.

Here's a guy I've followed for decades, one of many researchers on the topic [1]. Note that his "hard to round cases" in 2013 took 1576 years of computer time to determine how many digits are needed for double precision values for some elementary functions, precisely because this is not the trivial problem you think it is.

Here's a 2016 paper doing some work on GPUs [2]: " For example, the Nesterenko and Waldschmidt [1996] bound for the exponential in double precision states that 7,290,678 bits of intermediate precision suffice to provide a correctly rounded result."

To get 64 bits of accuracy, correctly rounded, you need 1 megabyte and an incredible amount of computation. My point exactly. And these bounds tend to increase exponentially, not linearly, soon pushing such issues beyond currently computable (which is also why there are still research papers being published on getting results for 64 bit doubles in various ways and for various elementary functions).

Again you have the choice I presented: you either accept unbounded in time and space behavior, or you fix time and space to make things usable with approximations. Floating-point is an incredibly good solution to making numerics as good as possible under fixed space and time requirements.

[1] https://www.vinc17.net/research/index.en.html [2] https://dl.acm.org/doi/pdf/10.1145/2935746 [3] https://en.wikipedia.org/wiki/Rounding#Table-maker's_dilemma

bjourne · on June 30, 2020

> Almost no problems can be represented exactly by rationals. Even the sqrt(2) you started with is not rational.

Almost all numerical problems software developers deal with can be represented exactly using rationals. Only a small fraction of all problems involve transcendentals. And those can't be represented exactly by floats either so I don't know what point you're making?

> For example, if you tried to compute a simple mandelbrot set with rationals, you'd need more bytes than there are subatomic particles in the universe (~10^80).

Well, yes. You'd run into the same problem if you tried to compute all the decimals of pi too. Using rationals does not mean that you can't round results.

> So you see rationals are computationally unusable for even simple tasks.

I see that rationals cannot represent all real numbers.

> These can be arbitrarily slow and use arbitrarily large memory for simple problems - which I stated above when I said "Such things are either incredibly inefficient or mathematically unsound." For example simple converging infinite sums that don't have closed forms known to your tools would never evaluate - the only solution is some fixed, finite representation if you want to use the value in later computations.

I wrote that in mainstream languages should be able to evaluate (2^(1/2))^2 == 2. How do you go from there to requiring them to be able to evaluate infinite sums?!

> Care to show me where deep learning libraries do perfect symbolic computation in the process of deep learning?

I didn't claim that.

> Most compilers would reduce sqrt(2) * sqrt(2) to 2? List them. (Heck - list one!) It's simply not true (as can be checked trivially for all common C++ compilers on godbolt).

I didn't claim that. Compilers are stuck with IEEE 754 fp semantics and therefore won't simplify (2^(1/2))^2. But there is no technical reason why they couldn't.

To paraphrase your reply: "Rational arithmetic and symbolic computation can't solve every problem. Therefore they are useless/no better than floating point arithmetic." From a theoretical perspective that is perhaps true but in practice both are very useful tools.

SideQuark · on June 30, 2020

>Almost all numerical problems software developers deal with can be represented exactly using rationals. Only a small fraction of all problems involve transcendentals.

There's more classes of real numbers than rationals and transcendentals. Hint: your example of sqrt(2) is neither.

As to doing things exactly with rationals, anytime you take a sin, cos, exp, log of any non-zero rational number the result is not rational. Any time you take a root of a non-perfect power rational you don't get a rational number (you get an algebraic number, which is neither rational nor transcendental).

So all you can do is simple arithmetic: + * / - (actually, that's from the definition of the field Q of rational numbers).

And you still run into the problem that rational number arithmetic gets arbitrarily slow and runs out of memory. Each multiply on average results in requiring the sum of the storage sizes of the multiplicands, making more than about 40 multiplications infeasible before you're out of RAM.

So your view of "almost all numerical problems software developers deal with" must be limited to only those using simple arithmetic and no more than around 40 operations. That's amazingly constraining.

What problems do you call numerical problems that fit your constraints?

Things that cannot be done exactly with rational numbers includes pretty much all of gaming (3D or otherwise), deep learning (probably every model has an exp in it), scientific computing, audio and video processing (cos and sin all over the place), and on and on.

>Using rationals does not mean that you can't round results.

Wait, you just claimed "...can be represented exactly using rationals. Now you want to round, throwing out the exactness? Which is it?

And congrats - you just re-invented floating-point math which is simply approximating values using rational numbers in a clever manner to handle larger ranges than fixed point. But they're still always rational approximations to values.

>I wrote that in mainstream languages should be able to evaluate (2^(1/2))^2 == 2. How do you go from there to requiring them to be able to evaluate infinite sums?!

Ah, so you mean languages should be able to simply solve this one specific problem, not all relatively simple symbolic problems? That seems like an even bigger mess than using floating point, which is well-defined and covers a large class of problems.

Does your mythical language know (A^(1/B))^B==A for all rationals A and B also representable in your language? Or are you limited to the value A=B=2 only?

(Hint: if you claim it should hold, you're gonna hit problems again :)

bjourne · on June 30, 2020

> As to doing things exactly with rationals, anytime you take a sin, cos, exp, log of any non-zero rational number the result is not rational.

I fail to see your point. I never claimed that you can do real arithmetic with rationals.

> So all you can do is simple arithmetic: + * / - (actually, that's from the definition of the field Q of rational numbers).

Same as with floats. Except with floats you don't even get division.

> And you still run into the problem that rational number arithmetic gets arbitrarily slow and runs out of memory. Each multiply on average results in requiring the sum of the storage sizes of the multiplicands, making more than about 40 multiplications infeasible before you're out of RAM.

What?

    >>> from fractions import Fraction
    >>> Fraction(7, 8)**40
    Fraction(6366805760909027985741435139224001, 1329227995784915872903807060280344576)

In other words, no, you don't run out of memory after 40 multiplications...

> Things that cannot be done exactly with rational numbers includes pretty much all of gaming (3D or otherwise), deep learning (probably every model has an exp in it), scientific computing, audio and video processing (cos and sin all over the place), and on and on.

You sure can do all of those things with rational arithmetic. What you can't do is do them exactly but I never claimed you could.

> > Using rationals does not mean that you can't round results.

> Wait, you just claimed "...can be represented exactly using rationals. Now you want to round, throwing out the exactness? Which is it?

I wrote "ALMOST ALL numerical problems software developers DEAL WITH can be represented exactly using rationals." The key word in that sentence is "ALMOST."

> And congrats - you just re-invented floating-point math which is simply approximating values using rational numbers in a clever manner to handle larger ranges than fixed point. But they're still always rational approximations to values.

Yes, but I have also solved the freakishly annoying issues detailed in the article. Which one do you prefer:

    >>> print(sum(0.1 for _ in range(10)))
    0.9999999999999999

or

    >>> print(sum(Fraction(1, 10) for _ in range(10)))
    1

? I know which one users prefer.

> >I wrote that in mainstream languages should be able to evaluate (2^(1/2))^2 == 2. How do you go from there to requiring them to be able to evaluate infinite sums?!

> Ah, so you mean languages should be able to simply solve this one specific problem, not all relatively simple symbolic problems? That seems like an even bigger mess than using floating point, which is well-defined and covers a large class of problems.

I don't think Mathematica or the Python symbolic math packages I've worked with are particularily messy.

> Does your mythical language know (A^(1/B))^B==A for all rationals A and B also representable in your language? Or are you limited to the value A=B=2 only?

Yes, assuming A and B are positive.

SideQuark · on July 1, 2020

> What? ... > In other words, no, you don't run out of memory after 40 multiplications...

So you think one case proves there is no set of 40 multiplications that overflows?

Since we're using a small starting value and only one operation per iteration, let's try 50 mults:

    x = frac(7,8)
    for i = 1 to 50
        x = x * x
    print(x) # as exact decimal fraction

Tell me how much ram and time this took you to evaluate with exact rational values. I'll wait :)

(Hint: you'll need approx 500 terabytes of ram to store these integers. I guess I didn't really need all 50 iterations! )

If you want only 40 mults, then using different starting values and adding a few additions will cause the same overflow. In fact, all sorts of common algorithms will overflow in very few operations using rationals. This is why numerical algorithms textbooks don't even cover it as a possibility - it's a terrible idea thrown out decades ago for precisely these reasons.

Your exact rational number idea has replaced predictable behavior on reasonable inputs with highly finicky unstable behavior, and a programmer will have no idea how to ensure things don't go off the rails for common uses.

In each reply you make fundamental math errors, provably incorrect claims, and don't stop to learn. I already gave a proof that such simple things will overflow - you ignored it, then tried to prove the converse with one toy example. It's not worth showing you more errors when you ignore evidence and continue disproven claims.

We're done. You're too stubborn to absorb relevant material.

bjourne · on July 1, 2020

> So you think one case proves there is no set of 40 multiplications that overflows?

> Since we're using a small starting value and only one operation per iteration, let's try 50 mults:

That is repeated exponentiation, not repeated multiplication and you are grasping for straws. Try the same code using an integer or a float greater than 1. Since the same thing will happen as when using rationals (you get an error) are you going to tell me that those number types are useless too?

> In each reply you make fundamental math errors, provably incorrect claims, and don't stop to learn.

Now I realize that you are most likely trolling me. Fine. Have a nice day! Bye!

rectang · on June 29, 2020

What primitive numerical type should languages have instead? Can you show us some sample code illustrating how this type would be used for tasks where floats are typically deployed today?

bjourne · on June 29, 2020

The natural type for numbers is expression trees. That's roughly what dl frameworks like TensorFlow and PyTorch does and allows for things like automatic differentiation and distributed computation.

For any expression, the runtime is allowed to reduce it only if it can prove that the reduction is value preserving. For example 3/1 => 3, (1/3)*3 => 1, (e^(ln(x)/2))^2 => x and 10/14 => 5/7. But 5/7 cannot be reduced further.

The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.

jcranmer · on June 29, 2020

> The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.

Rationals are not algebraic types; you don't support exponentials, radicals, or logarithms. A lot of numerical algorithms require algebraic operations on real numbers to compute their results--for example, computing eigenvalues, or numerical approaches to root finding. If you're going to argue for using symbolic notation, well, closed-form solutions cannot exist for several of the kinds of problems we want to solve.

Another issue is that rationals are fundamentally more expensive to compute than floating point; normalization of rationals requires computing a gcd (not really parallelizable on a bit level, and so can't be done in 1 cycle), while a floating point requires count-leading-0 (which is).

As a case in point, the easiest way to find a solution to a rational linear programming problem is to... solve it in floating-point to find a basis, and then adjust that basis using rational arithmetic (usually finding that the floating-point basis was indeed optimal!). Trying to start with rational arithmetic makes it slower by a factor of ~10000×.

bjourne · on June 30, 2020

What do you mean by algebraic types? If you mean algebraic number then yes, it is true that rationals cannot express all algebraic numbers such as the square root of prime numbers. But neither can floats! In fact, floats are limited to either 2^32 or 2^64 numbers (give or take a few) but rationals can express as many numbers as you have bits of memory in your system. I.e the number of numbers rationals can express is in practice infinite.

Neither does floats support transcendental functions. Those functions are implemented using Taylor series approximation and the same technique could be used for implementing rational transcendental functions. To boot, you'd get much higher precision.

Yes, rationals with arbitrary precision are more expensive to compute with than precision-limited floats. But who cares? It is not the 80s anymore.

jcranmer · on June 30, 2020

> Yes, rationals with arbitrary precision are more expensive to compute with than precision-limited floats. But who cares? It is not the 80s anymore.

The people who care are the people who now either have to buy 10000× the computing power or solve only 1/10000× the largest problem size to solve their needs, usually without meaningfully improving their results. Put another way, using rationals would turn your 2020-era computer into a 1980-era computer in terms of performance on your linear algebra benchmarks.

augustt · on June 29, 2020

Many "serious" numerical computations using rationals will quickly have the numerator and/or denominator explode to the point where BigInts are needed, at which point you could just be using a BigFloat.

rectang · on June 29, 2020

Thanks! Let me follow up by exploring a specific use case: scoring in a Lucene-style search engine, where arbitrarily complex queries repeatedly scale a floating point "score" value which measures document relevancy against a query.

Sometimes documents which ought to produce equal scores against a given query (if scoring were evaluated using real number math) do not produce equal scores in practice because of floating point limitations such as not supporting associativity or commutativity. Let's assume we are willing to sacrifice performance for correctness to address this issue.

1) If scores were tracked as expression trees, could this problem be eliminated entirely?

If so, I'd hazard a guess that while you'd eventually run into edge case issues such as needing arbitrarily large integers, in practice the CPU cost to track scores as expression trees rather than floats would be increased by a roughly constant factor.

2) If scores were tracked using a rational instead of a float, could the problem of tied scores be eliminated?

I suspect that the answer to that is no, because you will quckly run up against integer precision limitations in the numerator/denominator of the rational type.

bjourne · on June 30, 2020

1) Hard for me to say since I don't know the formula of the scoring system you're talking about. But in general, if the problem goes away when doing the calculation on paper it would also go away when using symbolic math since the methods are identical.

An expression is a tree. A tree can grow arbitrarily large and therefore an expression can grow arbitrarily large. It's a problem in theory but not in practice. You can try for yourself in SymPy. It is very hard to come up with an expressions so complicated that they become difficult for the system to handle.

2) No. Assuming you are using tf-idf for scoring, your calculation involves numbers with infinite decimal expansions that cannot be expressed as rationals. However, you get arbitrarily precise by using very large integers in the numerator and denominator.

That is in contrast with floats whose precision is fixed at either 32 or 64 bit.

rwallace · on June 29, 2020

> resurgence of fixed point arithmetic in hpc

Where is fixed point arithmetic resurging in HPC? Serious question; I haven't heard about this, but if it's happening in places I haven't heard about, I would be interested in looking at them.

bjourne · on June 29, 2020

Google is using fixed point arithmetic in their TPU clusters. See https://link.springer.com/chapter/10.1007/978-3-319-67952-5_...