In Java it's not confusing when you realize that == always compares values (i.e. things that can be stored in variables); it's just that for reference types, the values are references. It'd be more obvious if Java required explicit dereferencing like C++ does, but it's still consistent. Just not convenient.
> In Java it's not confusing when you realize that == always compares values (i.e. things that can be stored in variables); it's just that for reference types, the values are references.
It's still confusing. Plenty of "reference types" behave like values in all obvious senses. Why should "foo" be a reference but 45L a value?
That's pretty much saying "it is the way it is", which
a) is not even true, see floating point numbers
and
b) will be obsolete with value types.
Needless to say, there is pointless confusion created by Java's design, and there are better approaches available.
All you have to do is to adapt the semantic model from reference equality vs. value equality to identity vs. equality.
Identity checks whether the "bits" are identical, irregardless of whether the bits are "references" or values, and equality is a user-defined operation.
"Foo" equality "Foo" // True
"Foo" identity "Foo" // Only true if they point to the same "object"
123 equality 123 // True
123 identity 123 // True
Double.NaN equality Double.NaN // False
Double.NaN identity Double.NaN // True
Which symbols you pick for "equality" and "identity" is largely arbitrary.
I don't think you can get away with that for theoretical and practical reasons:
1.
There is a ton of code out there which does something like
def contains(that: Thing): Boolean =
this.value identity that || this.value equality that
Pretty much every single collection implementation would be broken if this stopped working with value types.
Additionally, you would run into issues with floating point numbers which would not be found/retrieved anymore if identity were removed.
2.
The idea to define a _sane_ definition of identity/equality across all types is there to avoid the "next-best" option: boxing primitives to wrapper classes which is both slow and has terrible semantics.
3.
I don't really think restricting identity to e.g. reference types makes sense given that equality is defined for every type. Either none of them should be available by default, or both should be.
There _are_ multiple valid ways to compare to things (consider floating point numbers for a second) and making one more privileged than the other feels wrong.
> Pretty much every single collection implementation would be broken if this stopped working with value types.
Maybe collections of values should be different from collections of references. The sensible use cases for the two are quite different.
> Additionally, you would run into issues with floating point numbers which would not be found/retrieved anymore if identity were removed.
Meh, just allow NaN to compare equal to itself. Equality is supposed to be reflexive.
> The idea to define a _sane_ definition of identity/equality across all types is there to avoid the "next-best" option: boxing primitives to wrapper classes which is both slow and has terrible semantics.
Unboxed primitives don't have identity, only value equality. They align well with what's being proposed.
> I don't really think restricting identity to e.g. reference types makes sense given that equality is defined for every type. Either none of them should be available by default, or both should be.
We could build the distinction into the language, so for every type you define you explicitly choose whether it's value or reference. Scala's already halfway there with the class/case class distinction.
> There _are_ multiple valid ways to compare to things (consider floating point numbers for a second)
Disagree; comparison is so fundamental to most types that it's worth privileging. Using the wrong kind of comparison is a very common source of bugs.
> Maybe collections of values should be different from collections of references. The sensible use cases for the two are quite different.
I think all existing code disagrees with that. There has been great value derived from being able to abstract over element types.
What you are proposing would double the required number of collection classes and all of its traits, because it would require separate ones for Collection[E <: AnyRef] and for Collection[E <: AnyRef].
There is literally no reason for introducing this complexity. Go has demonstrated how poorly this idea has worked out in practice.
Additionally, this approach would make it nearly impossible to migrate reference types to value types, because it would break all users of the code.
> Meh, just allow NaN to compare equal to itself. Equality is supposed to be reflexive.
That's a complete non-option. You might not like the IEEEs definition of equality, but this is what it is.
Messing with it would break all existing code using floating point numbers.
> Unboxed primitives don't have identity, only value equality.
Their identity is the bits they consist of, just like identity on references is the bits of the reference.
> They align well with what's being proposed.
What is being proposed?
> We could build the distinction into the language, so for every type you define you explicitly choose whether it's value or reference.
We already have that: AnyRef and AnyVal.
> Scala's already halfway there with the class/case class distinction.
That doesn't make any sense. The case keyword is basically just a compiler built-in macro to generate some code. It is already doing way to much, and overloading it with even more semantics is not the way to go.
> Disagree; comparison is so fundamental to most types that it's worth privileging. Using the wrong kind of comparison is a very common source of bugs.
What I'm proposing improves the consistency across value and reference types so that it's always obvious which kind of comparison happens:
- identity: Low-level comparison of the bits at hand. Built into the JVM and not overridable.
- equality: High-level comparison defined by the author of the type.
> What you are proposing would double the required number of collection classes and all of its traits, because it would require separate ones for Collection[E <: AnyRef] and for Collection[E <: AnyRef].
Less than double, because not all collections make sense for both - e.g. a sorted set or sorted map only makes sense if the keys are values.
> There is literally no reason for introducing this complexity. Go has demonstrated how poorly this idea has worked out in practice.
It eliminates a common class of errors. All type-level distinctions add a bit of complexity, but we often consider them worthwhile to make.
> Additionally, this approach would make it nearly impossible to migrate reference types to value types, because it would break all users of the code.
Changing from one to the other is a radical change that should force the user to reexamine code that deals with them.
> That's a complete non-option. You might not like the IEEEs definition of equality, but this is what it is. Messing with it would break all existing code using floating point numbers.
Java already deviated from the IEEE definition with Float and Double. The sky didn't fall. Maybe strict IEEE semantics could be offered in their own type where needed, and that type would neither be value or identity-is-meaningful. (This would mean the type system wouldn't allow you to use the strict-IEEE type in any standard collection, which I think is correct behaviour; compare e.g. Haskell where for a long time you could corrupt the standard sorted set structure by inserting two NaNs).
> Their identity is the bits they consist of, just like identity on references is the bits of the reference.
That's a low-level implementation detail that may not even be true on all platforms. The language semantics should make sense.
> We already have that: AnyRef and AnyVal.
No, those are just implementation details of how they're passed around. Many AnyRef types have value semantics.
> That doesn't make any sense. The case keyword is basically just a compiler built-in macro to generate some code. It is already doing way to much, and overloading it with even more semantics is not the way to go.
Well, what I'd like in an ideal language is: no universal equality, opt-in value equality with derivation for product/coproduct types. As for references... I'm not really convinced there's a legitimate use case for comparing references, especially the implicit invisible references that the language uses to implement user classes. If we need reference comparison at all I'd rather something a bit more explicit - either an opt-in "the identity of this class is meaningful", or a notion of explicit references that were much more visible in the code (something a bit like ActorRef), or both.
> What I'm proposing improves the consistency across value and reference types so that it's always obvious which kind of comparison happens:
> - identity: Low-level comparison of the bits at hand. Built into the JVM and not overridable.
> - equality: High-level comparison defined by the author of the type.
That's very inconsistent at the language-semantics level; which things are "the bits at hand" are a low-level implementation detail that should probably be left up to the runtime to represent as best suits a particular code path. At the language level, "does 2L + 2L equal 4L?" is the same kind of question as "does "a" + "b" equal "ab"?", and both those questions are quite different from any question to which reference comparison would be the answer.
That's tautological. The question is not why String behaves as a reference type - obviously, because it is a reference type. The question is, why is String a reference type, when it has such obvious value semantics?
And it's a perfectly valid question. The real answer is that they wanted strings to have methods, so it had to be a class, because primitives can't have methods; and Java doesn't have value classes, so it had to be a reference type. So the reason is a deficiency in the language.
A better question is, why .NET string is a reference type (with overloaded == to make it behave like a value, even!). It could have easily been a struct. I suspect that this is one of those decisions that were made very early on to basically be like Java because that's what people expect; and then it became impossible to change without major breakage.
The real problem is that most implementations that do reference / value type distinction, mix up semantics and implementation. What we really need is the ability to say that something 1) doesn't have a meaningful identity, and 2) is immutable (in a sense that you can't replace individual components of that object; you still can replace the whole object). When you combine these two, you get something that can be implemented as a value type under the hood, but the programmer doesn't need to think about it. They can just continue treating it as a reference, just as they do in Python. There's no way to tell the difference.