> NaN is a sentinel, and it exists outside the set of floats, despite being a float itself for type purposes, which is why it needs to be tested for using specific operators (isNaN) and not using equality operators. Part of the utility of NaN is that it can not be tested for using equality.
That seems like pretty bad type design - if you want a sum type, define an actual sum type. If a given type doesn't have a well-behaved concept of equality or comparison (and to be clear, that means well behaved for any values of that type), it shouldn't support the standard equality or comparison operators. I agree that you probably don't want to put NaN in a binary tree, but the language should support you in not doing that rather than silently breaking when you do.
> if you want a sum type, define an actual sum type
> the language should support you in not doing that rather than silently breaking when you do.
You do realize that we're not talking about a programming language and its type system, right? Nothing is stopping anyone from creating such a sum type in the languages that support such. Places that don't support rich type systems, such as C and even assembly, need to be able to support floating point semantics. That doesn't change the fact that in such a system, the NaN type will, by definition, define a comparison operator that always returns false and expose some other means to determine if a value is a NaN.
> That doesn't change the fact that in such a system, the NaN type will, by definition, define a comparison operator that always returns false and expose some other means to determine if a value is a NaN.
That's a language design decision. You don't have to define that kind of comparison operator. You certainly don't have to make the standard == behave that way.
> You certainly don't have to make the standard == behave that way.
Is equality an attribute of the operator or an attribute of the type and value? The fact that one defines operators on types—and that things such as how values compare to each other is based on types and their coercion rules when compared (str==num→false, unless you define how a num becomes a str or a str becomes a number, which is type based)—says to me that it's the latter.
Using the same symbol for two arbitrary, unrelated things is confusing for the reader - especially if you're using a symbol and a term that already have a standard mathematical meaning. str + str for concatenation might seem unrelated to num + num, but it's a valid "+" in the standard mathematical sense of being a valid monoid operation (i.e. it's associative). And this means you can write code that uses that + operator in a generic way and be able to rely on it making sense - e.g. if you write a "sum of a list" function, and then refactor it to sum the lists in parallel, if your refactoring was correct for lists of numbers it will be correct for lists of strings as well.
(If your language doesn't allow polymorphism, you might have to write the same code twice but you'd be able to use the same symbol, which helps a reader understand how it's conceptually the same thing)
"==" for two different types won't do exactly the same thing, but it should always be a well-behaved "equals" operation that satisfies the usual expectations that a reader would have. That usually includes that any value is equal to itself.
> "==" for two different types won't do exactly the same thing, but it should always be a well-behaved "equals" operation that satisfies the usual expectations that a reader would have. That usually includes that any value is equal to itself.
NaN is not the same kind of value that any number, say 3.14, is. I would expect that "the reader" with "usual expectations" would be familiar with floating point operations, how NaN works when it comes to equality is well defined, as well defined as equality over integers.
I'm not sure what point you're trying to make. That "equality" as embodied in the operator "=="? But further up the thread I talk about how equality, and other operators, is something that is defined on/by the types, you admit as much when you say:
> str + str for concatenation might seem unrelated to num + num, but it's a valid "+" in the standard mathematical sense of being a valid monoid operation (i.e. it's associative).
How operators work is an attribute of the types the operators work on. str defines the + operator to mean concatenation, which is entirely different from the meaning of the + operator when applied to integers, or what should be expected to happen when the LHS and the RHS of the binary + operator are of different types (some choose to do fail without explicit conversions, others choose to implicitly convert or promote).
isNaN is used to determine if the bitpattern in a floating point value is NaN because equality on the bitpattern that is floating point is defined very specifically for those bitpatterns. The same could be said for a hypothetical isPi function over the floating point domain, which could be defined to return true for any value that approximates π, which makes sense because the exact value of π can not be represented in binary or in decimal. Being an irrational number, equality wouldn't work on a floating point representation/approximation of π either, so a separate operator (operator is just another name for function) would be necessary in order to determine if a given value was Pi-approximate.
> I would expect that "the reader" with "usual expectations" would be familiar with floating point operations
Why? Why should every single programmer be expected to be an expert in this particular obscure datatype, to the point that it's ok for it to break all the normal rules that normal datatypes follow?
> How operators work is an attribute of the types the operators work on. str defines the + operator to mean concatenation, which is entirely different from the meaning of the + operator when applied to integers
It's not "entirely different". It conforms to the normal mathematical definition of + and has the properties that a reader would expect + to have (e.g. associativity). Defining to e.g. search strings would be bad and confusing.
The whole point of NaN working this way is that it is not well behaved and raises an alarm when a bad computation is made. Otherwise you run the risk of wiping out a NaN and never knowing that a divide by zero has corrupted your results.
Having it not compare equal to itself is a pretty crappy way of making it raise an alarm - you're relying on that accidentally causing some kind of visible corruption to your program's datastructures, which is not at all guaranteed. Anyone who cares about not silently getting corrupt results should be turning on floating-point exceptions rather than using NaNs, unless they know exactly what they're doing.
You seem to be equating having a NaN in a result as being undetectable corruption. The whole point of NaN is that you don't end up with silent data corruption, you end up with an unusable value: unusable in the sense that it infects further calculations instead of covering it up with a value that looks valid.
r = (x / y) + 1
If y was 0, I wouldn't want r to equal 1; if r was 1 (or anything other than NaN, really) then that would be silent data corruption.
That some languages have a literal form of NaN that can be typed into code and that has a type that is the same as the type of floating point numbers leads one to believe that an equality operator can/should be used to test for NaN. Really, a literal form of NaN is kind of a blight on such a language: the only way to obtain a value that is NaN should be via a calculation that results in it, there is very little need for a literal form, except for the case where you want to store the result of a calculation that might be NaN (to record that the calculation was corrupted). But in the wide range of cases, if someone was typing a literal NaN into their code, that would be a code smell: NaN is a legit result of a calculation, but not a legit input to a calculation because it infects any calculation it touches, and since there's not a legit reason to have it as an input, there's no reason to have a literal form that can be typed into the code.
As for floating point exceptions vs testing for NaN:
x = performcalculation()
if isNan(x) { abort }
y = performFurtherCalc(x)
if isNan(y) { abort }
This can be done at every point an additional value is introduced to the calculation, to determine if the result is sane or not. Floating point exceptions do these checks for you.
> The whole point of NaN is that you don't end up with silent data corruption, you end up with an unusable value: unusable in the sense that it infects further calculations instead of covering it up with a value that looks valid.
NaN values are slightly better than silently returning a valid value like 0, agreed. But having NaN silently corrupt a datastructure when you insert it into it is not what anyone asked for. Either make NaN a well-behaved value with well-behaved equality, or make it immediately explode like an exception.
> As for floating point exceptions vs testing for NaN:
x = performcalculation()
if isNan(x) { abort }
y = performFurtherCalc(x)
if isNan(y) { abort }
That's horrible ergonomics, that's like something from the '70s. I know there are still languages that lack proper sum types and high-level composition, but any serious language that can't solve that properly at least has some kind of language-level hack like exceptions.
NaN as an approach to handling arithmetic errors only makes sense (weighed against checked arithmetic) if it poisons all computations involving it.
I would argue firstly that floating-point numbers should not implement float == float → boolean comparison, because downgrading NaN to a boolean loses its infectious nature, undermining the purpose of NaN as an error-handling (or at least -detection) scheme.
But if you’re going to implement float == float → boolean comparison, I would argue that NaN not equalling itself is more reasonable on this axis of poisoning all computations than the alternative, though I will admit that it’s rendered subjective by the fact that there is no inherent virtue in equality (your “normal” path could just as easily be conditional on inequality).
I’d really like to see more experimentation into languages that only support checked arithmetic (not just for floating-point numbers, but also for integer types).
That seems like pretty bad type design - if you want a sum type, define an actual sum type. If a given type doesn't have a well-behaved concept of equality or comparison (and to be clear, that means well behaved for any values of that type), it shouldn't support the standard equality or comparison operators. I agree that you probably don't want to put NaN in a binary tree, but the language should support you in not doing that rather than silently breaking when you do.