Floating-Point Formats and Deep Learning

alecmg · on July 28, 2020

Hoped to see opinions on Unum [1]

Correct me if I'm wrong, but most machine learning does happen around 1.0. Unum should give more precision for same bits, same precision for less bits around 0 and 1. And some other interesting features.

But would require new hardware and software.

[1] https://en.wikipedia.org/wiki/Unum_(number_format)

bsder · on July 28, 2020

Please don't propagate Unums.

They have all the failure modes of interval arithmetic on top of their own brand of failure modes.

There are loads of engineers with stiff numerical problems that would welcome a better representation than standard floating point if it worked. Somehow, Gustafson never manages to demonstrate unums on any of these problems.

Until Gustafson grabs some real numerical code, implements it with unums and demonstrates how much better they are, he is not worth paying attention to.

kortex · on July 28, 2020

Type I and II unums are likely dead ends. Type III (posits) look promising. Facebook AI has developed their own type of "posit" with the terribly undescriptive name of "(8, 1, alpha, beta, gamma) log". Much less catchy than "posit" or "unum" but have been actually demonstrated to have numerical benefits.

https://engineering.fb.com/ai-research/floating-point-math/

bsder · on July 28, 2020

The key quote from that paper is as follows:

"Against 32-bit IEEE 754 single-precision FMA, ELMA will not be effective, though, as the Kulisch accumulator is massive (increasing adder/shifter sizes and flip-flop power), and the log-to-linear lookup table is prohibitive."

So, these things may be effective if you can dial your precision down. Okay. We use domain-specific number representations all the time. Fixed-point binary for DSP. Decimal representations for currency.

The Facebook takeaway is that transistors are now so cheap, that we can do <16 bit floating arithmetic with mostly ROM lookup tables if we make some restrictions.

That is, however, NOT what Gustafson is proposing. He proposes these for general usage as a panacea to the complications of floating point.

Numerics isn't some secret cabal. A new floating point system that allowed the folks doing partial differential equation solvers, computational fluid dynamics, or discrete time simulation to gain even 25% in time or to open up a new simulation field because of extra stability would get a serious look.

And while William Kahan (who drove a lot of IEEE-754) generally comes off as an insufferable jerk, he knows his stuff, and he wasn't alone. The numerics folks at IBM and DEC (and others) were mostly converging to the same things with differences at the margins (signed zero, denorms, NaNs, etc.)--mostly because some of the things that IEEE-754 demanded were a huge pain to implement in hardware of the day.

As for politics, IEEE-754 was basically a reaction to allowing DEC or IBM to create a de facto definition of the floating point standard.

daniel-cussen · on July 28, 2020

I read the book, the End of Error by John Gustafson. He works on a lot of numerical problems right there, providing code and results.

He also goes over the history of floats and numerical computation, highlighting how often the best method was never adopted, for reasons as simple as patents or it not being better by enough.

Unums are better by enough, and they're being used at LLNL in experiments. It's just a hard ask for people to switch the number system they use, even switching from decimal to binary was a huge slog in the early computer age.

rrss · on July 28, 2020

You should also read Kahan's commentary on the book: https://people.eecs.berkeley.edu/~wkahan/EndErErs.pdf.

Also, note that since the publication of the End of Error, Gustafson has given up on the original variable-length unums described in the book and switched to "posits" (or "type III unums").

quietbritishjim · on July 28, 2020

But even if you have extra precision around 1.0, assuming that weight is used for multiplication (of an output from the previous layer), surely you'll lose it as soon as you perform that multiplication? E.g. if the previous layer output is 3.5 and you multiply it by 1.00...0063, you'll get 3.500...02205, but you'll lose those trailing places on the output unless you have lots of precision around 3.5 too.

I tried to think about ways round that with traditional floating point formats and nonlinear functions (i.e. using exp to change the multiplication to addition) but it seems to come back to the same problem.

xiphias2 · on July 28, 2020

Companies with lots of deep learning applications, like Google/FB have better chance of experimenting with alternative hardware formats, as NVIDIA has to be compatible with older hardware. Still I'm hopeful that they will come, as they can give further scaling in AI.

sdenton4 · on July 28, 2020

There's also the whole world of fixed point inference which isn't discussed here, but quite important. All of the hardware supports fast integer operations, and with fewer platform specific caveats, so you can get better guarantee of consistent behavior in deployments.

_5659 · on July 28, 2020

> Floating point? In MY deep learning?

It's more likely than you think.

Maybe not the most appropriate place for an "X? in MY y?" meme despite its relatively innocuous presentation

It's kind of gross so I'll refrain from linking it

loopz · on July 28, 2020

The moment floating-point precision errors become significant in your model, know that you're dealing with algorithmic BS.

tgv · on July 28, 2020

Didn't you know our neurons have a 384 bit resolution?

nine_k · on July 28, 2020

Source, if you don't mind?