The natural type for numbers is expression trees. That's roughly what dl frameworks like TensorFlow and PyTorch does and allows for things like automatic differentiation and distributed computation.
For any expression, the runtime is allowed to reduce it only if it can prove that the reduction is value preserving. For example 3/1 => 3, (1/3)*3 => 1, (e^(ln(x)/2))^2 => x and 10/14 => 5/7. But 5/7 cannot be reduced further.
The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.
> The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.
Rationals are not algebraic types; you don't support exponentials, radicals, or logarithms. A lot of numerical algorithms require algebraic operations on real numbers to compute their results--for example, computing eigenvalues, or numerical approaches to root finding. If you're going to argue for using symbolic notation, well, closed-form solutions cannot exist for several of the kinds of problems we want to solve.
Another issue is that rationals are fundamentally more expensive to compute than floating point; normalization of rationals requires computing a gcd (not really parallelizable on a bit level, and so can't be done in 1 cycle), while a floating point requires count-leading-0 (which is).
As a case in point, the easiest way to find a solution to a rational linear programming problem is to... solve it in floating-point to find a basis, and then adjust that basis using rational arithmetic (usually finding that the floating-point basis was indeed optimal!). Trying to start with rational arithmetic makes it slower by a factor of ~10000×.
What do you mean by algebraic types? If you mean algebraic number then yes, it is true that rationals cannot express all algebraic numbers such as the square root of prime numbers. But neither can floats! In fact, floats are limited to either 2^32 or 2^64 numbers (give or take a few) but rationals can express as many numbers as you have bits of memory in your system. I.e the number of numbers rationals can express is in practice infinite.
Neither does floats support transcendental functions. Those functions are implemented using Taylor series approximation and the same technique could be used for implementing rational transcendental functions. To boot, you'd get much higher precision.
Yes, rationals with arbitrary precision are more expensive to compute with than precision-limited floats. But who cares? It is not the 80s anymore.
> Yes, rationals with arbitrary precision are more expensive to compute with than precision-limited floats. But who cares? It is not the 80s anymore.
The people who care are the people who now either have to buy 10000× the computing power or solve only 1/10000× the largest problem size to solve their needs, usually without meaningfully improving their results. Put another way, using rationals would turn your 2020-era computer into a 1980-era computer in terms of performance on your linear algebra benchmarks.
Many "serious" numerical computations using rationals will quickly have the numerator and/or denominator explode to the point where BigInts are needed, at which point you could just be using a BigFloat.
Thanks! Let me follow up by exploring a specific use case: scoring in a Lucene-style search engine, where arbitrarily complex queries repeatedly scale a floating point "score" value which measures document relevancy against a query.
Sometimes documents which ought to produce equal scores against a given query (if scoring were evaluated using real number math) do not produce equal scores in practice because of floating point limitations such as not supporting associativity or commutativity. Let's assume we are willing to sacrifice performance for correctness to address this issue.
1) If scores were tracked as expression trees, could this problem be eliminated entirely?
If so, I'd hazard a guess that while you'd eventually run into edge case issues such as needing arbitrarily large integers, in practice the CPU cost to track scores as expression trees rather than floats would be increased by a roughly constant factor.
2) If scores were tracked using a rational instead of a float, could the problem of tied scores be eliminated?
I suspect that the answer to that is no, because you will quckly run up against integer precision limitations in the numerator/denominator of the rational type.
1) Hard for me to say since I don't know the formula of the scoring system you're talking about. But in general, if the problem goes away when doing the calculation on paper it would also go away when using symbolic math since the methods are identical.
An expression is a tree. A tree can grow arbitrarily large and therefore an expression can grow arbitrarily large. It's a problem in theory but not in practice. You can try for yourself in SymPy. It is very hard to come up with an expressions so complicated that they become difficult for the system to handle.
2) No. Assuming you are using tf-idf for scoring, your calculation involves numbers with infinite decimal expansions that cannot be expressed as rationals. However, you get arbitrarily precise by using very large integers in the numerator and denominator.
That is in contrast with floats whose precision is fixed at either 32 or 64 bit.
For any expression, the runtime is allowed to reduce it only if it can prove that the reduction is value preserving. For example 3/1 => 3, (1/3)*3 => 1, (e^(ln(x)/2))^2 => x and 10/14 => 5/7. But 5/7 cannot be reduced further.
The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.