This is really an unscientific claim but I ran a hand crafted/hacked benchmark j...

vbezhenar · on May 16, 2017

This is from Oracle JDK sources:

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        ...

So if both strings are interned, this shortcut will work. I guess, your benchmarks were wrong. Microbenchmarks in Java are hard to implement correctly.

hashCode is cached as well in field `hash` and lazily computed:

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

Everything there is optimized till the last cycle, I'm sure about that.

derriz · on May 16, 2017

I hadn't spotted that in the hasCode() implementation. Thanks, that makes sense.

Regarding .equals(), indeed that shortcut exists but it didn't help my benchmark because it spent nearly all it's time comparing Strings which were not equal.

I switched the benchmark to compare all equal (interned) Strings and even then there .equals() is about half the speed of the reference equality check. I guess because of the method call overhead?

Yes I would have been sure that everything to do with Strings would have been optimized to the last cycle hence my surprise. I'm not sure about the state of Java intrinsics with the Oracle JDK but I would have thought that most of the String methods would be handled by intrinsics and be very heavily optimized.

Edit: correction - indeed microbenchmarking is tricky and I had a bug. If the (interned) Strings are equal then .equals() and reference comparison run at roughly the same speed.

vbezhenar · on May 16, 2017

I don't think that JVM will do magic and execute something different from actual sources, even for String. On the other hand, optimizing those bytecodes to compile to perfect machine code or playing with code to produce bytecode which will be correctly optimized by JIT is more reasonable approach, not only java.lang.String would benefit from those optimizations, but any other developer with similar code.

Method call overhead is real and calling equals method is slower compared to reference comparison, but eventually JIT might decide to inline this method call, so there'll be no overhead after that. Anyway it's unlikely, that those difference will be noticeable in real code, IMO.

etatoby · on May 16, 2017

You should compare the cost of one (or two) .intern() plus '==' against a single .equals()

For example, if your code needs 10 .intern() and then performs 100 '==', that is possibly going to be faster than performing the equivalent 100 .equals() (or maybe not)

But if you are comparing a single couple of strings, .equals() is always going to be faster.

Not to mention that if you .intern() random strings, you risk running out of PermGen space. Even in the case of 100 comparisons among 10 strings, you may be better off using a custom HashMap.

You can find all this information in the article.

derriz · on May 16, 2017

I'm not suggesting that .equals() should be implemented by 2 inter() operations followed by a reference comparison. But that the implementation check whether the pair of Strings happened to be already .intern()ed, and then use the short circuit of a reference comparison.

A programmer could then decide to intern _some_ non-constant Strings in their application if it made sense (i.e. they were frequently used in comparisons or as hash keys).

And I don't think you can't run out of permgen as the String pool is a fixed size - unless I am misreading that section of the article.

However now that I've thought about a bit more, I think the flaw in the idea is unrelated to your points but that currently it's not cheap to determine if a particular String is interned. The StringTable implementation would have to be changed otherwise it would require adding storage overhead to every String which would not be acceptable. This is also probably why .hashCode() doesn't memoize it's results.