No safe efficient ways to do three-way string comparisons in Go

quelltext · on Oct 27, 2022

> Basically no one should use strings.Compare

Others have already talked about the performance aspect but I'm just baffled at the comment basically saying nobody should use this function anyway.

I expect the need for 3-way compares isn't that uncommon, why tell people not to use it?

It's great to have this idea that the compiler should optimize all comparison situations but a) it doesn't yet and b) people still want a 3-way compare function anyway.

Basically it's saying "don't use a function at all", or even "write your own copies of this function". Even if the compiler one day becomes smart enough to optimize you still end up with tons of code duplication if people followed the advice.

And for people who care about performance it means they'll use a 3rd party library or implement their own "optimizations" that might not be effective or even buggy.

Literally nobody is benefiting from this.

What's the point of providing this function but actually thinking nobody should use it, in a comment no less instead of documentation?

Now, there may be other reasons, e.g. the author thinking 3-way compare is a bad pattern in the first place. If argued well, maybe I could agree. But that argument isn't made here.

I'm also not saying they needed to optimize this off the bat. A comment saying "we don't think there is a big demand yet, will optimize when we see the demand" would've been acceptable.

skybrian · on Oct 27, 2022

The comment is confusing, but the idea seems to be that instead of calling this function, you should inline the code - that is, just write the comparisons yourself. You don't need a function call.

I guess this is for stylistic reasons, but I don't know why anyone would feel strongly about doing it one way or the other.

zenexer · on Oct 27, 2022

I don’t think that’s true. I believe they want you to explicitly define how you want you compare strings. Comparing Unicode strings isn’t as straightforward as comparing individual bytes or code points. For example, there are multiple Unicode strings that will yield the character “ï”. If you use a naïve comparison function, you can’t be sure that it will behave as expected when it encounters the word “naïve”.

sltkr · on Oct 27, 2022

> I believe they want you to explicitly define how you want you compare strings.

If that were true, they wouldn't have made strings comparable in the first place, and they wouldn't recommend that callers inline the implementation (they'd recommend calling some locale-dependent function instead).

An explicit comparison like "naïve" == "naïve" or "naïve" < "naïve" isn't any more clear about how the comparison is performed than Compare("naïve", "naïve") would be.

donatj · on Oct 27, 2022

You’re looking for Collator.CompareString

https://pkg.go.dev/golang.org/x/text/collate#Collator.Compar...

quelltext · on Oct 27, 2022

Yes, I mentioned that that's basically what it says. But why would you tell people to actively duplicate code? It's like "I don't get what this type of function does and have to look it up all the time, better to have everyone inline this so it's clearer".

To be fair, I agree that seeing bad examples of its use shown by one of the other folks here, some could have been avoided by them being forced to inline, but funnily those examples still exist despite the desires of the function author.

Nothing was achieved here.

somekyle2 · on Oct 27, 2022

This is a part of Go's rather odd perspective that I appreciate.

I've worked in codebases with library helpers for everything that mostly served to turn two clear lines into a function call. Once such functions exist, folks feel obligated to use them; after all, why duplicate code? So now, what would've been a couple dozen lines of self-contained straightforward code has 3 imports and 5 functions you need to be familiar with to understand it. It also makes compilation more expensive.

I'm not saying library functions are bad or anything like that, but there's value to keeping the typical vocabulary of code small, of not adding new dependencies to solve trivial problems. Especially in a standard library that is widely used and you expect to be maintained for years. Providing functions that solve problems that are impossible or tricky in the language is important; providing `Plus(a, b int) int` that wraps `+` makes the library worse.

I think there's a good argument for providing Compare() and Abs() and other fairly trivial functions even if it is perhaps less effort to not use them in most cases, but for a stdlib I can appreciate the logic of leaving out what isn't providing clear value.

xmprt · on Oct 27, 2022

I would understand that if the method was left out. But in this case, a strings.Compare method is provided but the implementation says not to use it. That's the worst of both worlds. The stdlib still has this extra method and users will feel obligated to use it because it exists but the method is inefficient.

echlebek · on Oct 27, 2022

The standard library can't ever remove anything because of the Go compatibility promise, and a certain percentage of it is mistakes. Once some functionality is later realized to be poor or incorrect, and the design prevents fixing it, there is not much that can be done other than telling people not to use it anymore. What you refer to as the worst of both worlds is unfortunately inevitable, but at least it's in service of a greater goal.

edit: clarity

pjmlp · on Oct 27, 2022

That is what happens when they refuse to adopt a deprecation process, even Java, .NET, C and C++ with their high regard for backwards compatibility do have such processes, and have removed features from standard library and languages.

jonathanstrange · on Oct 27, 2022

Go marks features as deprecated, though, both in source code and in the docs. For example, strings.Title() has been deprecated and instead the case package should be used. The backwards guarantee merely means that the feature won't be removed from the library in Go 1.x, but when you write new code you won't use it and the preferred way of doing the same is mentioned in the docs and version notes. I think it's good not to remove deprecated functions within the same major version.

pjmlp · on Oct 27, 2022

Except Go certainly is never getting another major version, I am waiting for Go 1.10000.0, given how decisions are made on the ecosystem.

vbezhenar · on Oct 27, 2022

What's wrong with Go 2 with tools to automate migrations? Of course that limits the scale of possible changes, but moving methods between packages should be possible with this approach.

pjmlp · on Oct 27, 2022

Community culture, most likely it will never happen as such.

etse · on Oct 29, 2022

Change doesn’t have to come at the pace of npm. Let’s be charitable, people. :)

barsonme · on Oct 27, 2022

I think manual three-way comparisons are very clear. But so is strings.Compare. For better or worse, copying code is a pretty normal thing to do in Go. See "A little copying is better than a little dependency." [0]

[0]: https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s&themeRef...

codeflo · on Oct 27, 2022

I even somewhat agree with that principle, but it don’t think it applies to using the standard library, which is (a) not little and (b) not a new dependency you add to the project.

If that’s the justification to keep an implementation bad on purpose (which I’m not sure is the actual intention, but is at least that’s what the referenced comment claims), then I’m not even sure I speak the same language as the people who made that decision.

masto · on Oct 27, 2022

I seem to recall one of go’s creators snapped back at a similar question about code duplication with “what’s the matter, are your fingers broken?”

I think that sums up the philosophy, though maybe not as well as “Nothing was achieved here”, which I would like to translate into Latin and get on some stickers.

verisimilitudes · on Oct 27, 2022

[flagged]

dang · on Oct 27, 2022

Since we've asked you many times to stop posting flamebait and you're still doing it, I've banned the account. It's not what this site is for, and it destroys what it is for.

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.

MobiusHorizons · on Oct 27, 2022

How much do you know about the authors of go? It is objectively untrue that they don’t understand computers.

Maybe just maybe, people with decades of software experience might understand that the weak link in software is the editor not the author, and are optimizing for different priorities than you have.

But mostly I wanna to say avoid assuming someone else doesn’t understand something because they didn’t do what you would do. You should always figures out what they were trying to do first.

doctor_eval · on Oct 27, 2022

Such a bizarre comment for a language designed - by the creator of Unix no less - to work well with tools and to be easily parsed. High likelihood it was posted from a Unix device too.

barsonme · on Oct 27, 2022

Go doesn't have many features of other languages. There are several competing goals, including keeping the language small, not hiding complexity, etc.

Manually writing the three-way comparison fits in with these goals. If slices were comparable then I'd wager that bytes.Compare would not exist.

saghm · on Oct 27, 2022

What sticks out to me is the part saying "the compiler should be changed". It's weird enough that this isn't even a doc comment but a comment inside the function (making it harder for people using the function to notice), but even as someone who thinks the passive voice is often unfairly maligned, the phrasing immediately brings to mind the question "who should change the compiler?" Is the Go standard library not maintained by the same group of people as the compiler, or is this comment just a doubly obfuscated "TODO"?

yencabulator · on Oct 28, 2022

It's not a TODO yet, it 's a note that says if you think this would be a good idea, do that instead. It's a potential TODO-to-be, waiting for the need.

saghm · on Oct 28, 2022

Is that better? That would mean that they have a method in their API that they don't think anyone should use but haven't deprecated it or documented it that way and have no plans to do anything about it.

yencabulator · on Oct 28, 2022

strings.Compare is mostly used in trivial demonstration programs, and is there because bytes.Compare is there. It makes https://pkg.go.dev/sort#Find documentation simpler. Deprecating strings.Compare would make sort documentation worse.

Real code tends to not be that trivial, and that's why real code most likely shouldn't be using strings.Compare.

thayne · on Oct 27, 2022

Even if the compiler had good optimization of three way compares, this would be useful for passing to a higher order function. Say a function that allows you to specify how strings should be ordered.

llanowarelves · on Oct 27, 2022

It is like Demo deprecating fs.exists().[1]

[1]https://github.com/denoland/deno_std/discussions/2102

thayne · on Oct 27, 2022

> you still end up with tons of code duplication if people followed the advice.

Code duplication is the go way

snotrockets · on Oct 27, 2022

[flagged]

diceduckmonk · on Oct 27, 2022

> it is designed to be friendly to enterprises who have to produce code

Our two person engineering team has found this to be the ultimate form of user-friendliness as our goal is to pragmatically deliver software.

_AzMoo · on Oct 27, 2022

I think what they're saying is that you need more code in Go, which is inherently unfriendly to developers, to produce an equivalent output in other languages. And to a certain extent that is true, but it disregards the intangible benefits of Go, such as it's balance between simplicity and the ability to make it perform.

snotrockets · on Oct 27, 2022

No. I'm rephrasing the design intent for the language, as outlined in various talks. For example, here: https://go.dev/talks/2012/splash.article

esprehn · on Oct 27, 2022

The optimal version does seem to exist here (per the comment too):

https://github.com/golang/go/blob/d28bf6c9a2ea9b992796738d03...

So the goal was to intentionally nerf Compare() to discourage code the golang authors considered less clear. I'm not sure bad performance is really discouraging usage though, it just penalizes folks that use stdlib.

I wonder if they'd accept a PR to switch to runtime.cmpstring today?

smt88 · on Oct 27, 2022

> "intentionally nerf Compare() to discourage code the golang authors considered less clear"

If that's true, it further proves that I disagree with the philosophy of the Go designers on pretty much everything.

If your language provides multiple ways to do something, they should all be optimized in good faith. To do what you suggested is insane and user-hostile.

rjh29 · on Oct 27, 2022

A sane language would deprecate the function so the compiler warns (or dies) when encountering it. Go's compiler is happy to die because I have an unused import, but not this?

somekyle2 · on Oct 27, 2022

Well, the question is what you want to optimize for. It's a library function that exists for consistency's sake; it's clear and simple, and reasonably fast (and near optimal in some cases).

For optimal performance but suboptimal clarity, they could use a runtime implementation, but ideally the clear code would be fast, so best not to compromise the clarity for performance unless it proves important.

I've been following Go content for years, this is the first I can recall hearing of it, so I'm mostly fascinated that people care, given that this seems to be doing what the most classic optimization advice suggests and not messing it up prematurely.

insanitybit · on Oct 27, 2022

The most classic optimization advice is based on a complete misquote. Here is some critical context:

> t. The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by pennywise-and-pound-foolish programmers

> in established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal

> when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies.

> We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Keep in mind that in this case the argument is being made against goto, which is effectively an inlined `jmp` instruction that can do all sorts of insane things just to save a few instructions. This quote is discouraging a case where the complexity is extreme and the benefit is minor.

All that people can seem to remember is "premature optimization is the root of all evil".

[PDF] https://dl.acm.org/doi/pdf/10.1145/356635.356640

codeflo · on Oct 27, 2022

“Avoid premature optimization” means to do the cleanest/clearest thing now, and only optimize later when you have the full picture. Arguably, that advice doesn’t apply to standard library functions anyway, at least not to the same extent. But if it did: How is adding a new, “bad” implementation any clearer than simply calling the already existing optimized one that the GP linked to?

philosopher1234 · on Oct 27, 2022

You should optimize when it’s net useful, not just because you know how. Making a random function faster and harder to maintain is bad engineering if the speed isn’t useful.

adwn · on Oct 27, 2022

> Making a random function faster and harder to maintain is bad engineering if the speed isn’t useful.

1) It's not some random function, it's part of the standard library.

2) Three-way string compares are far from uncommon, and string comparisons are relatively expensive. Therefore, a speedup would be useful.

Optimizing this function would not count as "premature".

codeflo · on Oct 27, 2022

Again: How is adding a new, “bad” implementation any easier to maintain than simply calling the already existing optimized one that the GP linked to?

zarzavat · on Oct 27, 2022

There’s no such thing as premature optimization in a language standard library. Every function should be optimized to the bitter end because everybody uses the standard library.

kubanczyk · on Oct 27, 2022

> If your language provides multiple ways to do something

Yeah, no. The founding principle of Go is to avoid multiple ways. If multiple ways emerge anyway, move things&people around until one of them becomes strongly preferred.

There's nothing inherently bad about the approach, it's a choice which has costs and benefits.

Beltalowda · on Oct 27, 2022

It actually used to call this, but was changed in 2015: https://github.com/golang/go/commit/fd4dc91a96518fdbb47781f9...

I'm not entire sure if I see the problem or follow why just using runtime·cmpstring is such a bad thing; doesn't seem that "overengineered" to me.

esprehn · on Oct 27, 2022

Yeah, that's a pretty small amount of clean up. Someone should try to revert the patch and reland it.

icholy · on Oct 27, 2022

It's because people started putting "always use strings.Compare" in their style guides.

tapirl · on Oct 27, 2022

Even if strings.Compare is optimized, it is not always better than general comparison operators.

Both strings.Compare and comparison operators have their respective best use scenarios.

twic · on Oct 28, 2022

> The function is here ONLY for symmetry with package bytes.

> This function should be used ONLY if it makes code clearer.

> It is not here for performance. Remove any performance benefit.

Insane troll logic, pure and simple.

kitd · on Oct 27, 2022

Completely OT but, after 5 years of writing Go, TIL you can assign multiple LHS from multiple RHS:

c1, c2 := a[i], b[i]

rob74 · on Oct 27, 2022

Well, duh... I mean, that's the same mechanism that allows a function to return multiple values, and Go prides itself on its orthogonal features, so of course you can do that. Also handy for swapping: "x, y = y, x". Go isn't always more verbose than other languages...

kevingadd · on Oct 27, 2022

Really bizarre. It seems like it wouldn't have been much more work to just implement it properly. Instead people are supposed to wait until the compiler magically gets smart enough to optimize the pattern... but the pattern and method are both intentionally slow, so there will never be usage pressure to optimize it.

A reasonable compromise would be to just implement a single pass three-way compare in native Go instead of optimized assembler, and then if users keep requesting it be optimized, at that point make the compiler improvements or write the hand-tuned assembler version.

Otherwise what you're going to get is people using messy workarounds to do a 3-way compare, like doing a byte-wise compare that isn't unicode-correct. Blech.

somekyle2 · on Oct 27, 2022

It was previously implemented as a special internal function https://github.com/golang/go/commit/fd4dc91a96518fdbb47781f9...

So, making it a simple Go function was more work (at least, as an individual change) because they could've left it.

A three-way compare in native Go would likely be slower in most cases than the "slow" version that exists there, because in actually equal or size varying cases the "slow" one gets sent directly to optimized platform-tuned assembly, and the other cases still end up with tuned multibyte comparison stuff that likely wouldn't be possible in the stock Go compiler without clever bounds check removal. A compiler that can make that three-way quite fast is desirable, and there are reasons to do that independent of that function, but even Rust uses unsafe and farms out to builtin tuned memcmp stuff for 3 way compare of strings.

My theory is that strings.Compare _was_ known to be faster, and people starting preferring it because it was faster, and that in part prompted the change. Most engineers use a faster approach if available, even if it is a bit clunkier and not necessary (as this comment section shows, many folks are outraged at the idea of code not optimized for maximal performance). Encouraging bad use because a function is unintentionally faster than the naive thing is a bug in a stdlib.

adonovan · on Oct 27, 2022

I added the original fast implementation in this CL https://go.dev/cl/2828 because I found it useful, clear, and efficient in tuple "less" comparisons like this:

   if cmp := strings.Compare(x.first, y.first); cmp != 0 {
       return cmp < 0
   }
   if cmp := strings.Compare(x.second, y.second); cmp != 0 {
       return cmp < 0
   }
   ...

Compared to the != then < approach, it makes only a single pass over the string data. To this day I never understood the justification for intentionally making it slower, or why the style of code above isn't reasonable.

dhosek · on Oct 27, 2022

Any sort that doesn’t take into account Unicode collation isn’t Unicode correct.

dhosek · on Oct 27, 2022

I’d argue that any string comparison which does not take into account collation is inherently broken. Even in the pure ASCII English-language case, a naïve comparison on values won’t give desirable results since abc123 will come before abc99 even though a reader would expect otherwise. Just because we’ve tolerated crappy string sorting for sixty years doesn’t mean we should continue to do so.

eru · on Oct 27, 2022

For some applications (eg sticking stuff in an ordered data structure), you just need any consistent ordering, but don't care too much about exactly which one.

cookiengineer · on Oct 27, 2022

I've been developing my own version string parser for a couple weeks now, in golang.

It's ridiculous to what lengths you have to go to understand which part of a string comes earlier or later.

Simple example: semantic versioning allows "1.2.3alpha" and also 1.2.3-beta", but which one comes first now...

- Is 1.2.3 > 1.2.3omega?

- Is 1.2.3 > 1.2.3beta?

- Is 1.2.3gamma > 1.2.3?

In the Linux world it gets even funnier cause they invented SONAME fields that reflect breaking API changes instead of forcing packages to comply with semantic versioning syntax. Oftentimes there is a package version of e.g. 0.4.7 that has an SONAME of 12.7 on the filesystem.

Add to that the ~prerelease suffix syntax in Debian based distros which are maintained downstream, and all the +buildid or .commithash or -revision123 suffixes and you've landed in string comparison hell.

When I started I would have never guessed that this is such a complex problem to solve in golang.

crymer11 · on Oct 27, 2022

> semantic versioning allows "1.2.3alpha"

Perhaps I missed it, but I thought [Semantic Versioning](https://semver.org/) required a “-“ between the patch number and a prerelease identifier since at least version 1.0.0 (with 1.0.0-beta allowing a “.” instead of a “-“), no?

oefrha · on Oct 27, 2022

Yes, version string comparison is hard because people have all sorts of unstandardized ideas about version strings. Not sure why you seem to believe there’s golang-specific difficulty here.

kibwen · on Oct 27, 2022

> semantic versioning allows "1.2.3alpha" and also 1.2.3-beta"

No, according to the spec, the hyphen is mandatory: "A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version."

cookiengineer · on Oct 27, 2022

You interpreted your own copy/pasted answer wrongly.

MAY has not the same meaning as MUST. MAY is optional, MUST is mandatory.

> https://semver.org/#spec-item-9

tedunangst · on Oct 27, 2022

You may denote a prerelease version. Or not. But you can't just append any crap you like and call it the prerelease version.

wongarsu · on Oct 27, 2022

The wording is ambiguous, but the BNF later in the spec [1] agrees with your interpretation. Valid version numbers are three numbers separated by dots, followed by either a minus and dot-separated pre-release versions; or a plus and dot-separated build identifiers.

1: https://semver.org/#backusnaur-form-grammar-for-valid-semver...

kibwen · on Oct 27, 2022

No, that is a misreading. The "MAY" indicates that the prerelease identifier itself is optional. However, if you do append one, it must include a leading hyphen.

alcover · on Oct 27, 2022

  - Is 1.2.3 > 1.2.3omega? No
  - Is 1.2.3 > 1.2.3beta?  No
  - Is 1.2.3gamma > 1.2.3? Yes

I don't see ambiguity in your examples.

cookiengineer · on Oct 27, 2022

> - Is 1.2.3gamma > 1.2.3? Yes

I wrote this example, because I knew the answer. And your interpretation (the same as my initial one) is wrong :)

> Pre-release versions have a lower precedence than the associated normal version.

[1] https://semver.org/#spec-item-9

tsimionescu · on Oct 27, 2022

1.2.3gamma is not a pre-release version, it is a malformed version string (assuming SemVer). A proper SemVer is something like [0-9]+[.][0-9]+[.][0-9]+(-[0-9a-zA-Z]+)?([+][0-9a-zA-Z]+)?

> 2. A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version.

Nezghul · on Oct 27, 2022

Well, if You want to compare strings according to some rules then... write a custom comparator.

sethammons · on Oct 27, 2022

I'd use capture regex to get the first three numerals and capture the remaining string. If the remaining string exists, you can easily ignore the expected leading dash and your malformed semver suffixes will work and those can be trivially compared/sorted.

What about Go makes this different? That's how I'd solve this is any language

daviddever23box · on Oct 27, 2022

Agreed - especially when it is potentially unknown what might follow the first three numerals. Any performance hit would be mitigated by the corresponding reduction of the downstream logic.

codedokode · on Oct 27, 2022

Then you can use bytewise comparison instead of alphabetic comparison.

snotrockets · on Oct 27, 2022

Assuming you normalized the strings before.

simiones · on Oct 27, 2022

Why would that matter, if you only want some well-defined order?

snotrockets · on Oct 27, 2022

Because with (Unicode) strings, "\u006e\u0303" is defined to be equal to "\u00f1", for example. If you'd do bytewise comparison, as the above comment suggested, you may not reach the same result ¯\_(ツ)_/¯

simiones · on Oct 28, 2022

Whether those two strings are or are not equivalent depends on the context. If we're assuming (as the GP did) a very generic context where we simply want to store arbitrary strings in a sorted data structure, then there is no reason to assume they are supposed to be interpreted as Unicode.

For a simple example, perhaps this is a list of strings that require Unicode normalization to be properly interpreted as human text that you are storing into a TreeMap for efficient retrieval. When you are adding "\u00f1" to the list, you wouldn't want the collection to say that it's already there because it already had "\u006e\u0303".

dhosek · on Oct 28, 2022

Then in that case, treat the string as an array of bytes.

Lvl999Noob · on Oct 27, 2022

It is easy to use a 3-way compare to implement normal comparison operators. It is much harder to recognize when some code is using comparison operators to implement an ad-hoc 3-way compare.

It would be harder still to know how those normal compare functions can be combined to implement an efficient 3-way compare. For strings and other built in types, this might be possible. But for user defined types that compare in weird ways, it could be very hard.

It's much better to let the programmer define the 3-way compare and use that to derive all the comparison operators.

irogers · on Oct 27, 2022

People try to get cute with 3-way compares. A real Java bug that inspired: https://errorprone.info/bugpattern/BadComparable

public MyFile implements Comparable { ... long timestamp; ... @Override public int compare(Object other) { return (int)(((MyFile)other).timestamp - timestamp; } ... }

The int conversion loses the sign of the subtract. The particular use of the code was sorting files to delete the X number of oldest.

More examples about the dangers with just ints: https://stackoverflow.com/questions/2728793/java-integer-com...

The comparable API in Java came from the C qsort API. It would have been better to just have a less-than method. Kind of surprised to see this in Go near a core API.

masklinn · on Oct 27, 2022

But of course the real solution is to use a strong and composable type for your comparison result, rather than remove the entire thing because you originally designed the API wrong.

romac · on Oct 27, 2022

I like how Rust does it with the `Ordering` type and associated combinators: https://doc.rust-lang.org/stable/std/cmp/enum.Ordering.html

For two values `x` and `y` of the same type with fields `a` and `b`, this lets you compare on `a` first and then on `b` by writing

``` x.a.cmp(&y.a).then(x.b.cmp(&y.b)) ```

masklinn · on Oct 27, 2022

In fairness that was directly inspired by Haskell’s type of the same name: https://hackage.haskell.org/package/base-4.17.0.0/docs/Prelu...

im3w1l · on Oct 27, 2022

Not bad but I think python's way - using tuples is cleaner

  (x.a, x.b) < (y.a, y.b)

btw you can format as code by indenting two spaces.

tialaramex · on Oct 27, 2022

Indeed. 1) Get strong types 2) Make a type representing Ordering 3) Use this type everywhere for ordering things. Stop worrying about it.

Using -1, 0 and 1 to represent these feels like something that was a clever trick for low level code on a PDP-11 and hasn't been a good idea since the 1970s.

Dylan16807 · on Oct 27, 2022

I don't blame the cute three way comparison at all for that problem. Converting number types incorrectly is an endemic problem, and using the right conversion would have made this work quite nicely.

scotty79 · on Oct 27, 2022

What's the right conversion in this case? How to narrow down negative long to negative int without loosing a sign?

wiml · on Oct 27, 2022

I don't know about Java or Go, but any time I've looked at a C compiler's output it's emitted pretty good assembly for "if (a>b) return 1; else if (a<b) return -1; else return 0;" kinds of input when optimization is on. As long as the type being compared is something the compiler can reason about (and/or has a peephole optimization for I guess).

sltkr · on Oct 27, 2022

You could call Long.signum(), but that's not the right solution here: you should not subtract the values, but call Long.compare(a, b) directly, which is clear, efficient and correct.

The reason to avoid subtraction is that you can get integer underflow if one operand is negative and one is positive, and then the result of Long.signum(b - a) is different from Long.compare(a, b).

slaymaker1907 · on Oct 27, 2022

It could or for integer types less than 4 bytes. For integers, while I don't think Java has this built in, you could use saturating subtraction (so it goes to INT_MIN or something on underflow).

Dylan16807 · on Oct 27, 2022

For this situation you want it to saturate. Clamp the value between int_min and int_max.

Someone · on Oct 27, 2022

When I look at https://go.godbolt.org/z/8jqMPh135, compiling for a few CPUs, recent compilers only generate a single call to runtime.cmpstring.

⇒ I think the comment is outdated, and there is a safe efficient way to do three-way string comparisons in go.

Caveat: I’m not that good at reading modern assembly, and I do not understand why there also is a call to runtime.memequal in that code. ⇒ Corrections welcome.

gnfargbl · on Oct 27, 2022

I'm not that good at it either, but I think that the runtime.memequal call is coming from the if a == b test, and that it is the if a < b test that is being replaced with a call to runtime.cmpstring.

If so, then I guess there's still a redundant call to runtime.memequal inserted by the compiler.

It's hard to imagine any of this matters at all, in practice, which is probably why the go authors haven't bothered addressing the issue.

Someone · on Oct 27, 2022

Thanks! Never thought == could be a simple buffer comparison (after a length check, I guess). I thought locale and things being Unicode would make that harder.

gnfargbl · on Oct 27, 2022

Ah, this confuses everyone: strings in Golang are really just read-only byte slices. They can contain Unicode, but they don't have to, and the == operator just does a byte comparison.

insanitybit · on Oct 27, 2022

This is just Go's philosophy of "we're delivering a language for bad programmers, we know what's best for them".

philosopher1234 · on Oct 27, 2022

They're allowing devs to resist foolish performance pressure. This is a language for good developers, built by good designers.

jrockway · on Oct 27, 2022

This is really interesting. I never use three way string comparison, and so I wondered when other programmers use it. (The obvious use case is sorting, but in Go, sort.Interface expects a Less function and sort.Slice accepts a "Less" function, so you won't use it there.)

I searched through my random checked out applications to see who calls strings.Compare, and why.

Most of them are mistaken sorting. In gVisor, there is this in a test:

    ...
    cmpopts.SortSlices(func(a, b testEntryEventInfo) bool {
                  return strings.Compare(string(a.Addr), string(b.Addr)) < 0
    }),
    ...

This pattern shows up a bunch:

    // Less implements sort.Interface.
    func (s fooSlice) Less(i, j int) bool {
        return strings.Compare(s[i].Whatever, s[j].Whatever) == -1
    }

In Go's Snowflake library, there is this:

    if strings.Compare(s0, name) != 0 {
            t.Error("a file was not downloaded by GET")
    }

This is exceedingly interesting to me because you'd expect Compare to cost more than < (it does more stuff), and yet people are somehow baited into writing it. I'm guessing what happens is that people don't know that "<" works for strings, and they reach for strings.Compare, so the documentation is trying to talk them out of a program that doesn't actually make sense. In the case of "!= 0", they just wanted "s0 != name", which I take to mean they had absolutely no idea that strings in Go are comparable at all.

All in all, I get the impression that people coming from other languages to Go have different expectations about what the language can do. The documentation for strings.Compare is a logical place to correct them, but it doesn't bother. Just a comment that says "sucks if you use this".

BTW, there were a couple of correct uses. Gazelle has a sort function like this in a few places:

    sort.SliceStable(sortedFiles, func(i, j int) bool {
        if cmp := strings.Compare(sortedFiles[i].Path, sortedFiles[j].Path); cmp != 0 {
            return cmp < 0
        }
        return sortedFiles[i].DefName < sortedFiles[j].DefName
     })

Here they are punished for the inefficiency of strings.Compare; they would actually take advantage of it in the common case where the paths aren't equal over the easier approach of:

   if xs[i].Path != x[j].Path {
       return xs[i].Path < xs[j].Path
   }
   return xs[i].DefName < xs[j].DefName

Another correct use was in "goja" (a Javascript interpreter), which uses strings.Compare to implement Javascript's <string>.compareTo(<string>). Whether or not Javascript code is using compareTo correctly is an analysis I do not have the energy to do :)

quelltext · on Oct 27, 2022

> so the documentation is trying to talk them out of...

What documentation though? There's an internal comment but it's not documentation and it's not really arguing against the use of this type of comparison (just against use of the implemented function).

If the documentation for this function went on along the lines of "we discourage the use of 3 way compares as we consider it an antipattern, do X instead, there's literally no reason to ever use 3 way compares, we know it, here's why..." maybe, but it doesn't.

Now, maybe 3 way comparison is not common but it is a distinct operation with known optimization opportunities. I wouldn't readily claim that the operation is not needed by anyone.

kubanczyk · on Oct 27, 2022

> If the documentation for this function went on along the lines of "we discourage the use of 3 way compares as we consider it an antipattern, do X instead, there's literally no reason to ever use 3 way compares, we know it, here's why..." maybe, but it doesn't.

But it does. It literally says "It is usually clearer and always faster to use the built-in string comparison" and it always said that.

https://pkg.go.dev/strings@go1.5#Compare

https://pkg.go.dev/strings@go1.19.2#Compare

The entire documentation for this function is four short sentences. If my code editor didn't make it easy and natural to see them, I'd rethink my shit right now.

quelltext · on Oct 27, 2022

It doesn't go into why, it's just a statement.

jrockway · on Oct 27, 2022

> What documentation though?

Yeah, sorry about that. I read the actual documentation and nothing is mentioned. I realized later on in my comment though ;)

Dylan16807 · on Oct 27, 2022

> This is exceedingly interesting to me because you'd expect Compare to cost more than < (it does more stuff), and yet people are somehow baited into writing it.

I would expect it to be the same within a couple nanoseconds, because "more stuff" is an extremely small amount of stuff.

The naive direct implementation of a 3 way compare does the exact same comparisons as a direct implementation of less than, it merely returns more information, which makes it more generally useful.

jrockway · on Oct 27, 2022

Yeah. To me, I just think it's weird to ask for two pieces of information and then throw one of them away. Sometimes it's not a performance hit, but I feel like it's one of those things where you should get a feeling "I might be doing this wrong". But if you don't know that < exists (and obviously searched for strings.Less), then you won't get that feeling. That's what I find so fascinating; if other people's feelings are similar to mine, then they simply didn't know that you were allowed to use < here.

Dylan16807 · on Oct 27, 2022

I don't see it as two pieces of information. It's returning one idea either way, but the tri-state tells you more than a boolean.

coding123 · on Oct 27, 2022

But that is more useful, we should agree, for implementing binary trees or sorting algorithms.

blincoln · on Oct 27, 2022

Maybe the overuse of Compare instead of ==, <, or > is from enterprise devs who've done a lot of Java work? Every time I step away from Java for 5+ years and come back, I forget that one has to use string.equals() because using == will generate incorrect results. I imagine people who use it more frequently could easily have the opposite problem.

rbehrends · on Oct 27, 2022

> I never use three way string comparison, and so I wondered when other programmers use it.

Three-way comparisons show up naturally when doing a binary search or implementing binary trees.

Interestingly, the binary search implementation in the Go stdlib doesn't need it, but that's because it's only doing part of what you'd normally expect a binary search function to do and shifts the responsibility for the actual equality check to the caller [1].

[1] https://go.dev/src/sort/search.go

ars · on Oct 27, 2022

Ideological purity only works when you actually give people a good alternative. Otherwise you cause more harm than good.

I see it in things like transportation, power generation, recycling. People refuse to use the best methods because then people will never switch - except they never offer the thing that you are supposed to switch to.

I'm actually kinda surprised to see it in a programing language.

recursive · on Oct 27, 2022

What's even the ideology here? It reads like someone's taking a stand, but I can't tell what the stand is. Like, how is someone supposed to alphabetize a list of names? Surely that's a basic task that programming languages should be able to to do.

travisby · on Oct 27, 2022

Based on https://go-review.googlesource.com/c/go/+/3012?tab=comments , it sounds like they were originally focusing on _clearer_.

(I interpret that) rsc considers the native binary operators to be clearer than a function with the name `Compare`. Honestly, I'd expect people to use `==` as well over `strings.Compare`.

`==` _does_ work for strings, but I believe `[]byte` would try comparing memory addresses which is why the `bytes.Compare` version exists and is optimized (some assembly version, probably costly to maintain per arch?).

throwawayabcdef · on Oct 27, 2022

https://github.com/golang/go/commit/fd4dc91a96518fdbb47781f9...

I interpret this as replacing a strange but more optimal call to the runtime with a straight forward but less fast stdlib implementation. rsc's comment seems more like a todo note than and ideological stance (to me at least). Something like: "I removed this weird fast way to do this that was breaking stuff b/c of linker things and replaced it with some straightforward code that's slower. If we want this to be more optimal in the future we can fix it at the compiler level.". But I'm just reading into it. Someone should ask him and tell him to explain himself :) It's been 8 years. What have you been doing?

quelltext · on Oct 27, 2022

The native binary operators aren't a three way compare though.

The author seems to assume nobody needs a three way compare, which I feel is assuming a bit too much.

In particular because it is there because someone is using/needing it already.

Basically their comment actually says that everyone who does need a three way compare should just re-implement what they wrote there (and maybe they'll make sure everyone's code is optimized later). Sounds pretty bizarre to me to discourage code reuse. And if three way compares are a pattern they want to discourage in the first place it should be elaborated on in documentation.

ori_b · on Oct 27, 2022

Sort only uses '<'.

suprfsat · on Oct 27, 2022

well the comment says it's faster to use the built-in string comparison operators ==, <, >, and so on.

the reason it's faster is because the function is intentionally slow, but

Nevermark · on Oct 27, 2022

A very good point.

Until there are the resources and demand to redesign a system for an important goal, the current system should be optimized where it reasonably can be, to better meet the same goal.

Until there are the resources and demand for a deeper redesign, the current code should be optimized as much as it reasonably can be.

Sometimes local optimizations are not worth the cost to make them, a local benefit/effort maximum has been reached. Only a more global change can take things further. But this doesn't look like one of those cases.

It is also worth noting that it is very rare that actual global redesigns are ever economical. Virtually every improvement, in anything, at any scale, is a "local redesign" in some sense.

So "up with" economical local optimizations!

Spivak · on Oct 27, 2022

I’m so confused? Why wouldn’t you make Compare fast and add the optimization? I’m sure someone will run into a logically equivalent to a 3-way compare situation that the compiler can’t detect.

Hiding language features in optimization passes is a dangerous game. It’s one of the reasons I don’t like implicit tail calls.

eru · on Oct 27, 2022

At least tail calls are rather easy to detect purely syntactically.

But yes, an annotation that forces the compiler to optimize them (or give you a warning or even error when unable to do so) can be useful.

whateveracct · on Oct 27, 2022

I hate paternalistic non-optimization like this. Just optimize it - your code sucks and you should improve it!

jbverschoor · on Oct 27, 2022

What I don’t understand is that they doing first check for lengths

Strings are so common, it’s insane they don’t optimize

somekyle2 · on Oct 27, 2022

In the generated code, there is a length check. In implementation (which will often be inlined, so may vary contextually) it does a length check, and if equal does a memequal (tuned platform assembly). If that equality check fails, or the length check isn't equal, it does a runtime.cmpstring (tuned platform assembly). So, when strings are actually equal or are unequal in length, it's pretty much optimal. The bad case is when the strings are equal in length but not in content, as that can result in two scans of the strings. Still not slow in that case given that the work it's doing is being done quite efficiently, but some of that work is effectively wasted. On my local go install (2020 macbook, go) the difference is between about 2ns and 4ns for an 8-byte string that is different in the last byte. Naturally, could vary quite a bit based on cache and string size, but fast enough to not worry about until profiles show it matters.

adrian_b · on Oct 27, 2022

This way of doing string comparisons does not seem to be the right one.

The obvious way of comparing strings is to do a comparison of the content for the minimum of their lengths (which should be computed in a branchless way), and that should be done in an optimized assembly loop, for which suitable SIMD instructions are available in most modern CPUs. In many C standard libraries the function memcmp already provides such an optimized implementation, which should be used for the comparison of strings of equal length.

Then, only when the result of the comparison is equal, the 2 string lengths are compared, to provide the final comparison result.

Therefore, when a good memcmp is already available, a string comparison consists of a minimum computation, a memcmp invocation and an optional integer comparison of the lengths.

jbverschoor · on Oct 27, 2022

Well sure, fast enough. But this is exactly why need faster machines.. Add some cpu, add some cache.

It's a 100% increase. If you're doing a lot of string parsing, it will add up. Probably not very noticeable, and maybe you'll have extra cachemisses in a critical path.

If the function shouldn't be used, why create it at all, or why not show compiler errors / runtime error

tylerhou · on Oct 27, 2022

Wouldn't the normal implementation for == check lengths?

Edit: Never mind, it doesn't: https://github.com/golang/go/blob/d28bf6c9a2ea9b992796738d03...

But checking lengths doesn't really help you: it only tells you when strings are not equal, and you would still have to walk the string to see which one is larger/smaller.

orlp · on Oct 27, 2022

> But checking lengths doesn't really help you: it only tells you when strings are not equal, and you would still have to walk the string to see which one is larger/smaller.

Only for a three-way comparison. If all you care about is equality different lengths gives a fast path for inequality.

laserbeam · on Oct 27, 2022

Because string comparison is usually alphabetic. If a string is longer/shorter it says nothing about its ordering. Even if the lengths are equal, you still need to memcmp them to verify hey are indeed equal. The length doesn't actually get you new information in this context.

I'm assuming encoding and utf-8 normalization and other similar things are not in scope when answering this.

smasher164 · on Oct 27, 2022

Another option is to use `go:linkname` and call the runtime function yourself.

//go:linkname cmpstring runtime.cmpstring func cmpstring(a, b string)

daviddever23box · on Oct 27, 2022

The premise that three-way string comparisons are widely used is spurious at best.

tapirl · on Oct 27, 2022

Evidences are provided in the article: https://sourcegraph.com/search?q=context:global+switch+strin...

daviddever23box · on Oct 27, 2022

201 instances != widely used by multiple projects. Thank you for playing.

There are many ways to approach this type of problem, and–while I understand that some may want to lean on this approach, based on their experiences with another language–this is an academic comparison, and far less useful than one might believe when building performant code.

To each their own.

tapirl · on Oct 27, 2022

Do you mean it is rare?

Man, it is the key to do binary-searching.

> ... based on their experiences with another language ...

Sorry, this is language independent.

johndfsgdgdfg · on Oct 27, 2022

Well, it was desgined by Google. What did you expect?

seanw444 · on Oct 27, 2022

Google generally engineers their stuff well. I don't like them much either, but calling them incompetent at building a very popular language, to which their reputation is attached and at risk, is rather presumptuous.

bilekas · on Oct 27, 2022

Maybe I'm not understanding the articles premise exactly, but if you're using three-way comparisons frequently, why would you not just implement the `diff3` ?

It's not that complicated and well documented, infact I'm sure someone has already done it in go.

http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf

iainmerrick · on Oct 27, 2022

It's not three-way as in comparing three strings; here they're comparing two strings, with a "three-way" result: a<b, a==b, a>b

bilekas · on Oct 27, 2022

Okay that makes a bit more sense, I was sure something wasn't registering right with me reading it.

chrsig · on Oct 27, 2022

I mean, from rsc's comment, it's a known issue. I'm guessing no one has cared enough to improve it. Making me think it's not that big of a deal that it does the extra comparison.

If people really need the extra performance, they'll use unsafe to create byte slices backed by the strings, and then use bytes.Compare. Or they'll improve the compiler.

quelltext · on Oct 27, 2022

> If people really need the extra performance, they'll use unsafe to create byte slices backed by the strings, and then use bytes.Compare

...and thus very likely end up with buggy code.

somekyle2 · on Oct 27, 2022

If you're having trouble because generating a 3 way string compare in Go isn't fast enough and delegating to bytes is hard to get correct, my recommendation would be to file an issue with Go. It's not difficult for them to make strings.Compare faster, and I imagine real world evidence that it'd be substantially helpful would be enough to motivate it.

Though, if 3 way compare is the bottleneck, it seems like a real possibility that there are algorithmic optimizations that may be more effective than a faster 3 way compare, like using native comparison operators, hashing, using byte slices, etc. Or maybe not; I don't think I've seen a case where it's a bottleneck so I might be inaccurately imagining the cases where it'd happen.

chrsig · on Oct 27, 2022

> my recommendation would be to file an issue with Go. It's not difficult for them to make strings.Compare faster, and I imagine real world evidence that it'd be substantially helpful would be enough to motivate it.

I agree. Squeaky wheel gets the grease.

raggi · on Oct 27, 2022

You could type all six lines, or you could link the runtime func and yolo.

beeforpork · on Oct 27, 2022

Go spaceship!

philosopher1234 · on Oct 27, 2022

The go designers are brilliant. This is exactly the right priority. We would all do better it think more the way they do.

lamp987 · on Oct 27, 2022

This was a joke right?

philosopher1234 · on Oct 27, 2022

No I’m being sincere. Clarity is more important than performance. This is a very thoughtful decision, it relieves devs of the idiotic pressure to micro optimize.

frou_dh · on Oct 27, 2022

It's amusing to mention clarity, given the topic. Because the cargo-cult of this kind of function having return type 'int' is lack of clear thinking manifest. Int has 4-billion distinct values and all they want is three.

Here's clarity:

datatype order https://smlfamily.github.io/Basis/general.html#SIG:GENERAL.o...

function compare https://smlfamily.github.io/Basis/string.html#SIG:STRING.com...

    case String.compare(a, b) of
    | LESS    => ...
    | EQUAL   => ...
    | GREATER => ...

userbinator · on Oct 27, 2022

So I think the strings.Compare should be implemented efficiently, to avoid breaking user expectations.

Is Go forkable (for lack of a better word)? That's the only question that comes to mind when I read the article. You have the source code, and this shouldn't be a difficult change to make, so I think "fork-and-fix and see who follows along" should be the mentality to practice in this case.

somekyle2 · on Oct 27, 2022

If anyone finds "a standard library string comparison function, while quite fast in absolute terms, isn't as fast as it could be" to be practical justification for a fork, I'd be very surprised.

mplewis · on Oct 27, 2022

No one will follow along.

If you want to implement strings.Compare better, you can write a package and publish it for other folks to import. The function is not particularly special.

userbinator · on Oct 27, 2022

No one will follow along.

"The only sure path to failure is to not even try."

Or are you both implying the power of and supporting the massive monopoly that Google already has?

TillE · on Oct 27, 2022

Encouraging people to use a different language entirely (Rust, whatever) is much much more likely to succeed than getting them to use some random Go fork.

robertlagrant · on Oct 27, 2022

> Or are you both implying the power of and supporting the massive monopoly that Google already has?

This is not a reasonable assessment.

Golang is massive; why would anyone switch to not-Golang which isn't backed by a large organisation and just has one string comparison optimisation?

And why would anyone trust a forking entity that forked a runtime to make a change that could be packaged as a library, as the comment you're replying to already suggests?