Correct me if I'm wrong, but most machine learning does happen around 1.0. Unum should give more precision for same bits, same precision for less bits around 0 and 1. And some other interesting features.
They have all the failure modes of interval arithmetic on top of their own brand of failure modes.
There are loads of engineers with stiff numerical problems that would welcome a better representation than standard floating point if it worked. Somehow, Gustafson never manages to demonstrate unums on any of these problems.
Until Gustafson grabs some real numerical code, implements it with unums and demonstrates how much better they are, he is not worth paying attention to.
Type I and II unums are likely dead ends. Type III (posits) look promising. Facebook AI has developed their own type of "posit" with the terribly undescriptive name of "(8, 1, alpha, beta, gamma) log". Much less catchy than "posit" or "unum" but have been actually demonstrated to have numerical benefits.
"Against 32-bit IEEE 754 single-precision FMA, ELMA will not be effective, though, as the Kulisch accumulator is massive (increasing adder/shifter sizes and flip-flop power), and the log-to-linear lookup table is prohibitive."
So, these things may be effective if you can dial your precision down. Okay. We use domain-specific number representations all the time. Fixed-point binary for DSP. Decimal representations for currency.
The Facebook takeaway is that transistors are now so cheap, that we can do <16 bit floating arithmetic with mostly ROM lookup tables if we make some restrictions.
That is, however, NOT what Gustafson is proposing. He proposes these for general usage as a panacea to the complications of floating point.
Numerics isn't some secret cabal. A new floating point system that allowed the folks doing partial differential equation solvers, computational fluid dynamics, or discrete time simulation to gain even 25% in time or to open up a new simulation field because of extra stability would get a serious look.
And while William Kahan (who drove a lot of IEEE-754) generally comes off as an insufferable jerk, he knows his stuff, and he wasn't alone. The numerics folks at IBM and DEC (and others) were mostly converging to the same things with differences at the margins (signed zero, denorms, NaNs, etc.)--mostly because some of the things that IEEE-754 demanded were a huge pain to implement in hardware of the day.
As for politics, IEEE-754 was basically a reaction to allowing DEC or IBM to create a de facto definition of the floating point standard.
I read the book, the End of Error by John Gustafson. He works on a lot of numerical problems right there, providing code and results.
He also goes over the history of floats and numerical computation, highlighting how often the best method was never adopted, for reasons as simple as patents or it not being better by enough.
Unums are better by enough, and they're being used at LLNL in experiments. It's just a hard ask for people to switch the number system they use, even switching from decimal to binary was a huge slog in the early computer age.
Also, note that since the publication of the End of Error, Gustafson has given up on the original variable-length unums described in the book and switched to "posits" (or "type III unums").
But even if you have extra precision around 1.0, assuming that weight is used for multiplication (of an output from the previous layer), surely you'll lose it as soon as you perform that multiplication? E.g. if the previous layer output is 3.5 and you multiply it by 1.00...0063, you'll get 3.500...02205, but you'll lose those trailing places on the output unless you have lots of precision around 3.5 too.
I tried to think about ways round that with traditional floating point formats and nonlinear functions (i.e. using exp to change the multiplication to addition) but it seems to come back to the same problem.
Companies with lots of deep learning applications, like Google/FB have better chance of experimenting with alternative hardware formats, as NVIDIA has to be compatible with older hardware. Still I'm hopeful that they will come, as they can give further scaling in AI.
There's also the whole world of fixed point inference which isn't discussed here, but quite important. All of the hardware supports fast integer operations, and with fewer platform specific caveats, so you can get better guarantee of consistent behavior in deployments.
Correct me if I'm wrong, but most machine learning does happen around 1.0. Unum should give more precision for same bits, same precision for less bits around 0 and 1. And some other interesting features.
But would require new hardware and software.
[1] https://en.wikipedia.org/wiki/Unum_(number_format)