Hacker News new | past | comments | ask | show | jobs | submit login
No safe efficient ways to do three-way string comparisons in Go (go101.org)
175 points by tapirl on Oct 26, 2022 | hide | past | favorite | 150 comments



> Basically no one should use strings.Compare

Others have already talked about the performance aspect but I'm just baffled at the comment basically saying nobody should use this function anyway.

I expect the need for 3-way compares isn't that uncommon, why tell people not to use it?

It's great to have this idea that the compiler should optimize all comparison situations but a) it doesn't yet and b) people still want a 3-way compare function anyway.

Basically it's saying "don't use a function at all", or even "write your own copies of this function". Even if the compiler one day becomes smart enough to optimize you still end up with tons of code duplication if people followed the advice.

And for people who care about performance it means they'll use a 3rd party library or implement their own "optimizations" that might not be effective or even buggy.

Literally nobody is benefiting from this.

What's the point of providing this function but actually thinking nobody should use it, in a comment no less instead of documentation?

Now, there may be other reasons, e.g. the author thinking 3-way compare is a bad pattern in the first place. If argued well, maybe I could agree. But that argument isn't made here.

I'm also not saying they needed to optimize this off the bat. A comment saying "we don't think there is a big demand yet, will optimize when we see the demand" would've been acceptable.


The comment is confusing, but the idea seems to be that instead of calling this function, you should inline the code - that is, just write the comparisons yourself. You don't need a function call.

I guess this is for stylistic reasons, but I don't know why anyone would feel strongly about doing it one way or the other.


I don’t think that’s true. I believe they want you to explicitly define how you want you compare strings. Comparing Unicode strings isn’t as straightforward as comparing individual bytes or code points. For example, there are multiple Unicode strings that will yield the character “ï”. If you use a naïve comparison function, you can’t be sure that it will behave as expected when it encounters the word “naïve”.


> I believe they want you to explicitly define how you want you compare strings.

If that were true, they wouldn't have made strings comparable in the first place, and they wouldn't recommend that callers inline the implementation (they'd recommend calling some locale-dependent function instead).

An explicit comparison like "naïve" == "naïve" or "naïve" < "naïve" isn't any more clear about how the comparison is performed than Compare("naïve", "naïve") would be.


You’re looking for Collator.CompareString

https://pkg.go.dev/golang.org/x/text/collate#Collator.Compar...


Yes, I mentioned that that's basically what it says. But why would you tell people to actively duplicate code? It's like "I don't get what this type of function does and have to look it up all the time, better to have everyone inline this so it's clearer".

To be fair, I agree that seeing bad examples of its use shown by one of the other folks here, some could have been avoided by them being forced to inline, but funnily those examples still exist despite the desires of the function author.

Nothing was achieved here.


This is a part of Go's rather odd perspective that I appreciate.

I've worked in codebases with library helpers for everything that mostly served to turn two clear lines into a function call. Once such functions exist, folks feel obligated to use them; after all, why duplicate code? So now, what would've been a couple dozen lines of self-contained straightforward code has 3 imports and 5 functions you need to be familiar with to understand it. It also makes compilation more expensive.

I'm not saying library functions are bad or anything like that, but there's value to keeping the typical vocabulary of code small, of not adding new dependencies to solve trivial problems. Especially in a standard library that is widely used and you expect to be maintained for years. Providing functions that solve problems that are impossible or tricky in the language is important; providing `Plus(a, b int) int` that wraps `+` makes the library worse.

I think there's a good argument for providing Compare() and Abs() and other fairly trivial functions even if it is perhaps less effort to not use them in most cases, but for a stdlib I can appreciate the logic of leaving out what isn't providing clear value.


I would understand that if the method was left out. But in this case, a strings.Compare method is provided but the implementation says not to use it. That's the worst of both worlds. The stdlib still has this extra method and users will feel obligated to use it because it exists but the method is inefficient.


The standard library can't ever remove anything because of the Go compatibility promise, and a certain percentage of it is mistakes. Once some functionality is later realized to be poor or incorrect, and the design prevents fixing it, there is not much that can be done other than telling people not to use it anymore. What you refer to as the worst of both worlds is unfortunately inevitable, but at least it's in service of a greater goal.

edit: clarity


That is what happens when they refuse to adopt a deprecation process, even Java, .NET, C and C++ with their high regard for backwards compatibility do have such processes, and have removed features from standard library and languages.


Go marks features as deprecated, though, both in source code and in the docs. For example, strings.Title() has been deprecated and instead the case package should be used. The backwards guarantee merely means that the feature won't be removed from the library in Go 1.x, but when you write new code you won't use it and the preferred way of doing the same is mentioned in the docs and version notes. I think it's good not to remove deprecated functions within the same major version.


Except Go certainly is never getting another major version, I am waiting for Go 1.10000.0, given how decisions are made on the ecosystem.


What's wrong with Go 2 with tools to automate migrations? Of course that limits the scale of possible changes, but moving methods between packages should be possible with this approach.


Community culture, most likely it will never happen as such.


Change doesn’t have to come at the pace of npm. Let’s be charitable, people. :)


I think manual three-way comparisons are very clear. But so is strings.Compare. For better or worse, copying code is a pretty normal thing to do in Go. See "A little copying is better than a little dependency." [0]

[0]: https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s&themeRef...


I even somewhat agree with that principle, but it don’t think it applies to using the standard library, which is (a) not little and (b) not a new dependency you add to the project.

If that’s the justification to keep an implementation bad on purpose (which I’m not sure is the actual intention, but is at least that’s what the referenced comment claims), then I’m not even sure I speak the same language as the people who made that decision.


I seem to recall one of go’s creators snapped back at a similar question about code duplication with “what’s the matter, are your fingers broken?”

I think that sums up the philosophy, though maybe not as well as “Nothing was achieved here”, which I would like to translate into Latin and get on some stickers.


[flagged]


Since we've asked you many times to stop posting flamebait and you're still doing it, I've banned the account. It's not what this site is for, and it destroys what it is for.

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.


How much do you know about the authors of go? It is objectively untrue that they don’t understand computers.

Maybe just maybe, people with decades of software experience might understand that the weak link in software is the editor not the author, and are optimizing for different priorities than you have.

But mostly I wanna to say avoid assuming someone else doesn’t understand something because they didn’t do what you would do. You should always figures out what they were trying to do first.


Such a bizarre comment for a language designed - by the creator of Unix no less - to work well with tools and to be easily parsed. High likelihood it was posted from a Unix device too.


Go doesn't have many features of other languages. There are several competing goals, including keeping the language small, not hiding complexity, etc.

Manually writing the three-way comparison fits in with these goals. If slices were comparable then I'd wager that bytes.Compare would not exist.


What sticks out to me is the part saying "the compiler should be changed". It's weird enough that this isn't even a doc comment but a comment inside the function (making it harder for people using the function to notice), but even as someone who thinks the passive voice is often unfairly maligned, the phrasing immediately brings to mind the question "who should change the compiler?" Is the Go standard library not maintained by the same group of people as the compiler, or is this comment just a doubly obfuscated "TODO"?


It's not a TODO yet, it 's a note that says if you think this would be a good idea, do that instead. It's a potential TODO-to-be, waiting for the need.


Is that better? That would mean that they have a method in their API that they don't think anyone should use but haven't deprecated it or documented it that way and have no plans to do anything about it.


strings.Compare is mostly used in trivial demonstration programs, and is there because bytes.Compare is there. It makes https://pkg.go.dev/sort#Find documentation simpler. Deprecating strings.Compare would make sort documentation worse.

Real code tends to not be that trivial, and that's why real code most likely shouldn't be using strings.Compare.


Even if the compiler had good optimization of three way compares, this would be useful for passing to a higher order function. Say a function that allows you to specify how strings should be ordered.


It is like Demo deprecating fs.exists().[1]

[1]https://github.com/denoland/deno_std/discussions/2102


> you still end up with tons of code duplication if people followed the advice.

Code duplication is the go way


[flagged]


> it is designed to be friendly to enterprises who have to produce code

Our two person engineering team has found this to be the ultimate form of user-friendliness as our goal is to pragmatically deliver software.


I think what they're saying is that you need more code in Go, which is inherently unfriendly to developers, to produce an equivalent output in other languages. And to a certain extent that is true, but it disregards the intangible benefits of Go, such as it's balance between simplicity and the ability to make it perform.


No. I'm rephrasing the design intent for the language, as outlined in various talks. For example, here: https://go.dev/talks/2012/splash.article


The optimal version does seem to exist here (per the comment too):

https://github.com/golang/go/blob/d28bf6c9a2ea9b992796738d03...

So the goal was to intentionally nerf Compare() to discourage code the golang authors considered less clear. I'm not sure bad performance is really discouraging usage though, it just penalizes folks that use stdlib.

I wonder if they'd accept a PR to switch to runtime.cmpstring today?


> "intentionally nerf Compare() to discourage code the golang authors considered less clear"

If that's true, it further proves that I disagree with the philosophy of the Go designers on pretty much everything.

If your language provides multiple ways to do something, they should all be optimized in good faith. To do what you suggested is insane and user-hostile.


A sane language would deprecate the function so the compiler warns (or dies) when encountering it. Go's compiler is happy to die because I have an unused import, but not this?


Well, the question is what you want to optimize for. It's a library function that exists for consistency's sake; it's clear and simple, and reasonably fast (and near optimal in some cases).

For optimal performance but suboptimal clarity, they could use a runtime implementation, but ideally the clear code would be fast, so best not to compromise the clarity for performance unless it proves important.

I've been following Go content for years, this is the first I can recall hearing of it, so I'm mostly fascinated that people care, given that this seems to be doing what the most classic optimization advice suggests and not messing it up prematurely.


The most classic optimization advice is based on a complete misquote. Here is some critical context:

> t. The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by pennywise-and-pound-foolish programmers

> in established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal

> when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies.

> We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Keep in mind that in this case the argument is being made against goto, which is effectively an inlined `jmp` instruction that can do all sorts of insane things just to save a few instructions. This quote is discouraging a case where the complexity is extreme and the benefit is minor.

All that people can seem to remember is "premature optimization is the root of all evil".

[PDF] https://dl.acm.org/doi/pdf/10.1145/356635.356640


“Avoid premature optimization” means to do the cleanest/clearest thing now, and only optimize later when you have the full picture. Arguably, that advice doesn’t apply to standard library functions anyway, at least not to the same extent. But if it did: How is adding a new, “bad” implementation any clearer than simply calling the already existing optimized one that the GP linked to?


You should optimize when it’s net useful, not just because you know how. Making a random function faster and harder to maintain is bad engineering if the speed isn’t useful.


> Making a random function faster and harder to maintain is bad engineering if the speed isn’t useful.

1) It's not some random function, it's part of the standard library.

2) Three-way string compares are far from uncommon, and string comparisons are relatively expensive. Therefore, a speedup would be useful.

Optimizing this function would not count as "premature".


Again: How is adding a new, “bad” implementation any easier to maintain than simply calling the already existing optimized one that the GP linked to?


There’s no such thing as premature optimization in a language standard library. Every function should be optimized to the bitter end because everybody uses the standard library.


> If your language provides multiple ways to do something

Yeah, no. The founding principle of Go is to avoid multiple ways. If multiple ways emerge anyway, move things&people around until one of them becomes strongly preferred.

There's nothing inherently bad about the approach, it's a choice which has costs and benefits.


It actually used to call this, but was changed in 2015: https://github.com/golang/go/commit/fd4dc91a96518fdbb47781f9...

I'm not entire sure if I see the problem or follow why just using runtime·cmpstring is such a bad thing; doesn't seem that "overengineered" to me.


Yeah, that's a pretty small amount of clean up. Someone should try to revert the patch and reland it.


It's because people started putting "always use strings.Compare" in their style guides.


Even if strings.Compare is optimized, it is not always better than general comparison operators.

Both strings.Compare and comparison operators have their respective best use scenarios.


> The function is here ONLY for symmetry with package bytes.

> This function should be used ONLY if it makes code clearer.

> It is not here for performance. Remove any performance benefit.

Insane troll logic, pure and simple.


Completely OT but, after 5 years of writing Go, TIL you can assign multiple LHS from multiple RHS:

c1, c2 := a[i], b[i]


Well, duh... I mean, that's the same mechanism that allows a function to return multiple values, and Go prides itself on its orthogonal features, so of course you can do that. Also handy for swapping: "x, y = y, x". Go isn't always more verbose than other languages...


Really bizarre. It seems like it wouldn't have been much more work to just implement it properly. Instead people are supposed to wait until the compiler magically gets smart enough to optimize the pattern... but the pattern and method are both intentionally slow, so there will never be usage pressure to optimize it.

A reasonable compromise would be to just implement a single pass three-way compare in native Go instead of optimized assembler, and then if users keep requesting it be optimized, at that point make the compiler improvements or write the hand-tuned assembler version.

Otherwise what you're going to get is people using messy workarounds to do a 3-way compare, like doing a byte-wise compare that isn't unicode-correct. Blech.


It was previously implemented as a special internal function https://github.com/golang/go/commit/fd4dc91a96518fdbb47781f9...

So, making it a simple Go function was more work (at least, as an individual change) because they could've left it.

A three-way compare in native Go would likely be slower in most cases than the "slow" version that exists there, because in actually equal or size varying cases the "slow" one gets sent directly to optimized platform-tuned assembly, and the other cases still end up with tuned multibyte comparison stuff that likely wouldn't be possible in the stock Go compiler without clever bounds check removal. A compiler that can make that three-way quite fast is desirable, and there are reasons to do that independent of that function, but even Rust uses unsafe and farms out to builtin tuned memcmp stuff for 3 way compare of strings.

My theory is that strings.Compare _was_ known to be faster, and people starting preferring it because it was faster, and that in part prompted the change. Most engineers use a faster approach if available, even if it is a bit clunkier and not necessary (as this comment section shows, many folks are outraged at the idea of code not optimized for maximal performance). Encouraging bad use because a function is unintentionally faster than the naive thing is a bug in a stdlib.


I added the original fast implementation in this CL https://go.dev/cl/2828 because I found it useful, clear, and efficient in tuple "less" comparisons like this:

   if cmp := strings.Compare(x.first, y.first); cmp != 0 {
       return cmp < 0
   }
   if cmp := strings.Compare(x.second, y.second); cmp != 0 {
       return cmp < 0
   }
   ...
Compared to the != then < approach, it makes only a single pass over the string data. To this day I never understood the justification for intentionally making it slower, or why the style of code above isn't reasonable.


Any sort that doesn’t take into account Unicode collation isn’t Unicode correct.


I’d argue that any string comparison which does not take into account collation is inherently broken. Even in the pure ASCII English-language case, a naïve comparison on values won’t give desirable results since abc123 will come before abc99 even though a reader would expect otherwise. Just because we’ve tolerated crappy string sorting for sixty years doesn’t mean we should continue to do so.


For some applications (eg sticking stuff in an ordered data structure), you just need any consistent ordering, but don't care too much about exactly which one.


I've been developing my own version string parser for a couple weeks now, in golang.

It's ridiculous to what lengths you have to go to understand which part of a string comes earlier or later.

Simple example: semantic versioning allows "1.2.3alpha" and also 1.2.3-beta", but which one comes first now...

- Is 1.2.3 > 1.2.3omega?

- Is 1.2.3 > 1.2.3beta?

- Is 1.2.3gamma > 1.2.3?

In the Linux world it gets even funnier cause they invented SONAME fields that reflect breaking API changes instead of forcing packages to comply with semantic versioning syntax. Oftentimes there is a package version of e.g. 0.4.7 that has an SONAME of 12.7 on the filesystem.

Add to that the ~prerelease suffix syntax in Debian based distros which are maintained downstream, and all the +buildid or .commithash or -revision123 suffixes and you've landed in string comparison hell.

When I started I would have never guessed that this is such a complex problem to solve in golang.


> semantic versioning allows "1.2.3alpha"

Perhaps I missed it, but I thought [Semantic Versioning](https://semver.org/) required a “-“ between the patch number and a prerelease identifier since at least version 1.0.0 (with 1.0.0-beta allowing a “.” instead of a “-“), no?


Yes, version string comparison is hard because people have all sorts of unstandardized ideas about version strings. Not sure why you seem to believe there’s golang-specific difficulty here.


> semantic versioning allows "1.2.3alpha" and also 1.2.3-beta"

No, according to the spec, the hyphen is mandatory: "A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version."


You interpreted your own copy/pasted answer wrongly.

MAY has not the same meaning as MUST. MAY is optional, MUST is mandatory.

> https://semver.org/#spec-item-9


You may denote a prerelease version. Or not. But you can't just append any crap you like and call it the prerelease version.


The wording is ambiguous, but the BNF later in the spec [1] agrees with your interpretation. Valid version numbers are three numbers separated by dots, followed by either a minus and dot-separated pre-release versions; or a plus and dot-separated build identifiers.

1: https://semver.org/#backusnaur-form-grammar-for-valid-semver...


No, that is a misreading. The "MAY" indicates that the prerelease identifier itself is optional. However, if you do append one, it must include a leading hyphen.


  - Is 1.2.3 > 1.2.3omega? No
  - Is 1.2.3 > 1.2.3beta?  No
  - Is 1.2.3gamma > 1.2.3? Yes
I don't see ambiguity in your examples.


> - Is 1.2.3gamma > 1.2.3? Yes

I wrote this example, because I knew the answer. And your interpretation (the same as my initial one) is wrong :)

> Pre-release versions have a lower precedence than the associated normal version.

[1] https://semver.org/#spec-item-9


1.2.3gamma is not a pre-release version, it is a malformed version string (assuming SemVer). A proper SemVer is something like [0-9]+[.][0-9]+[.][0-9]+(-[0-9a-zA-Z]+)?([+][0-9a-zA-Z]+)?

> 2. A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version.


Well, if You want to compare strings according to some rules then... write a custom comparator.


I'd use capture regex to get the first three numerals and capture the remaining string. If the remaining string exists, you can easily ignore the expected leading dash and your malformed semver suffixes will work and those can be trivially compared/sorted.

What about Go makes this different? That's how I'd solve this is any language


Agreed - especially when it is potentially unknown what might follow the first three numerals. Any performance hit would be mitigated by the corresponding reduction of the downstream logic.


Then you can use bytewise comparison instead of alphabetic comparison.


Assuming you normalized the strings before.


Why would that matter, if you only want some well-defined order?


Because with (Unicode) strings, "\u006e\u0303" is defined to be equal to "\u00f1", for example. If you'd do bytewise comparison, as the above comment suggested, you may not reach the same result ¯\_(ツ)_/¯


Whether those two strings are or are not equivalent depends on the context. If we're assuming (as the GP did) a very generic context where we simply want to store arbitrary strings in a sorted data structure, then there is no reason to assume they are supposed to be interpreted as Unicode.

For a simple example, perhaps this is a list of strings that require Unicode normalization to be properly interpreted as human text that you are storing into a TreeMap for efficient retrieval. When you are adding "\u00f1" to the list, you wouldn't want the collection to say that it's already there because it already had "\u006e\u0303".


Then in that case, treat the string as an array of bytes.


It is easy to use a 3-way compare to implement normal comparison operators. It is much harder to recognize when some code is using comparison operators to implement an ad-hoc 3-way compare.

It would be harder still to know how those normal compare functions can be combined to implement an efficient 3-way compare. For strings and other built in types, this might be possible. But for user defined types that compare in weird ways, it could be very hard.

It's much better to let the programmer define the 3-way compare and use that to derive all the comparison operators.


People try to get cute with 3-way compares. A real Java bug that inspired: https://errorprone.info/bugpattern/BadComparable

public MyFile implements Comparable { ... long timestamp; ... @Override public int compare(Object other) { return (int)(((MyFile)other).timestamp - timestamp; } ... }

The int conversion loses the sign of the subtract. The particular use of the code was sorting files to delete the X number of oldest.

More examples about the dangers with just ints: https://stackoverflow.com/questions/2728793/java-integer-com...

The comparable API in Java came from the C qsort API. It would have been better to just have a less-than method. Kind of surprised to see this in Go near a core API.


But of course the real solution is to use a strong and composable type for your comparison result, rather than remove the entire thing because you originally designed the API wrong.


I like how Rust does it with the `Ordering` type and associated combinators: https://doc.rust-lang.org/stable/std/cmp/enum.Ordering.html

For two values `x` and `y` of the same type with fields `a` and `b`, this lets you compare on `a` first and then on `b` by writing

``` x.a.cmp(&y.a).then(x.b.cmp(&y.b)) ```


In fairness that was directly inspired by Haskell’s type of the same name: https://hackage.haskell.org/package/base-4.17.0.0/docs/Prelu...


Not bad but I think python's way - using tuples is cleaner

  (x.a, x.b) < (y.a, y.b)
btw you can format as code by indenting two spaces.


Indeed. 1) Get strong types 2) Make a type representing Ordering 3) Use this type everywhere for ordering things. Stop worrying about it.

Using -1, 0 and 1 to represent these feels like something that was a clever trick for low level code on a PDP-11 and hasn't been a good idea since the 1970s.


I don't blame the cute three way comparison at all for that problem. Converting number types incorrectly is an endemic problem, and using the right conversion would have made this work quite nicely.


What's the right conversion in this case? How to narrow down negative long to negative int without loosing a sign?


I don't know about Java or Go, but any time I've looked at a C compiler's output it's emitted pretty good assembly for "if (a>b) return 1; else if (a<b) return -1; else return 0;" kinds of input when optimization is on. As long as the type being compared is something the compiler can reason about (and/or has a peephole optimization for I guess).


You could call Long.signum(), but that's not the right solution here: you should not subtract the values, but call Long.compare(a, b) directly, which is clear, efficient and correct.

The reason to avoid subtraction is that you can get integer underflow if one operand is negative and one is positive, and then the result of Long.signum(b - a) is different from Long.compare(a, b).


It could or for integer types less than 4 bytes. For integers, while I don't think Java has this built in, you could use saturating subtraction (so it goes to INT_MIN or something on underflow).


For this situation you want it to saturate. Clamp the value between int_min and int_max.


When I look at https://go.godbolt.org/z/8jqMPh135, compiling for a few CPUs, recent compilers only generate a single call to runtime.cmpstring.

⇒ I think the comment is outdated, and there is a safe efficient way to do three-way string comparisons in go.

Caveat: I’m not that good at reading modern assembly, and I do not understand why there also is a call to runtime.memequal in that code. ⇒ Corrections welcome.


I'm not that good at it either, but I think that the runtime.memequal call is coming from the if a == b test, and that it is the if a < b test that is being replaced with a call to runtime.cmpstring.

If so, then I guess there's still a redundant call to runtime.memequal inserted by the compiler.

It's hard to imagine any of this matters at all, in practice, which is probably why the go authors haven't bothered addressing the issue.


Thanks! Never thought == could be a simple buffer comparison (after a length check, I guess). I thought locale and things being Unicode would make that harder.


Ah, this confuses everyone: strings in Golang are really just read-only byte slices. They can contain Unicode, but they don't have to, and the == operator just does a byte comparison.


This is just Go's philosophy of "we're delivering a language for bad programmers, we know what's best for them".


They're allowing devs to resist foolish performance pressure. This is a language for good developers, built by good designers.


This is really interesting. I never use three way string comparison, and so I wondered when other programmers use it. (The obvious use case is sorting, but in Go, sort.Interface expects a Less function and sort.Slice accepts a "Less" function, so you won't use it there.)

I searched through my random checked out applications to see who calls strings.Compare, and why.

Most of them are mistaken sorting. In gVisor, there is this in a test:

    ...
    cmpopts.SortSlices(func(a, b testEntryEventInfo) bool {
                  return strings.Compare(string(a.Addr), string(b.Addr)) < 0
    }),
    ...
This pattern shows up a bunch:

    // Less implements sort.Interface.
    func (s fooSlice) Less(i, j int) bool {
        return strings.Compare(s[i].Whatever, s[j].Whatever) == -1
    }
In Go's Snowflake library, there is this:

    if strings.Compare(s0, name) != 0 {
            t.Error("a file was not downloaded by GET")
    }
This is exceedingly interesting to me because you'd expect Compare to cost more than < (it does more stuff), and yet people are somehow baited into writing it. I'm guessing what happens is that people don't know that "<" works for strings, and they reach for strings.Compare, so the documentation is trying to talk them out of a program that doesn't actually make sense. In the case of "!= 0", they just wanted "s0 != name", which I take to mean they had absolutely no idea that strings in Go are comparable at all.

All in all, I get the impression that people coming from other languages to Go have different expectations about what the language can do. The documentation for strings.Compare is a logical place to correct them, but it doesn't bother. Just a comment that says "sucks if you use this".

BTW, there were a couple of correct uses. Gazelle has a sort function like this in a few places:

    sort.SliceStable(sortedFiles, func(i, j int) bool {
        if cmp := strings.Compare(sortedFiles[i].Path, sortedFiles[j].Path); cmp != 0 {
            return cmp < 0
        }
        return sortedFiles[i].DefName < sortedFiles[j].DefName
     })
Here they are punished for the inefficiency of strings.Compare; they would actually take advantage of it in the common case where the paths aren't equal over the easier approach of:

   if xs[i].Path != x[j].Path {
       return xs[i].Path < xs[j].Path
   }
   return xs[i].DefName < xs[j].DefName
Another correct use was in "goja" (a Javascript interpreter), which uses strings.Compare to implement Javascript's <string>.compareTo(<string>). Whether or not Javascript code is using compareTo correctly is an analysis I do not have the energy to do :)


> so the documentation is trying to talk them out of...

What documentation though? There's an internal comment but it's not documentation and it's not really arguing against the use of this type of comparison (just against use of the implemented function).

If the documentation for this function went on along the lines of "we discourage the use of 3 way compares as we consider it an antipattern, do X instead, there's literally no reason to ever use 3 way compares, we know it, here's why..." maybe, but it doesn't.

Now, maybe 3 way comparison is not common but it is a distinct operation with known optimization opportunities. I wouldn't readily claim that the operation is not needed by anyone.


> If the documentation for this function went on along the lines of "we discourage the use of 3 way compares as we consider it an antipattern, do X instead, there's literally no reason to ever use 3 way compares, we know it, here's why..." maybe, but it doesn't.

But it does. It literally says "It is usually clearer and always faster to use the built-in string comparison" and it always said that.

https://pkg.go.dev/strings@go1.5#Compare

https://pkg.go.dev/strings@go1.19.2#Compare

The entire documentation for this function is four short sentences. If my code editor didn't make it easy and natural to see them, I'd rethink my shit right now.


It doesn't go into why, it's just a statement.


> What documentation though?

Yeah, sorry about that. I read the actual documentation and nothing is mentioned. I realized later on in my comment though ;)


> This is exceedingly interesting to me because you'd expect Compare to cost more than < (it does more stuff), and yet people are somehow baited into writing it.

I would expect it to be the same within a couple nanoseconds, because "more stuff" is an extremely small amount of stuff.

The naive direct implementation of a 3 way compare does the exact same comparisons as a direct implementation of less than, it merely returns more information, which makes it more generally useful.


Yeah. To me, I just think it's weird to ask for two pieces of information and then throw one of them away. Sometimes it's not a performance hit, but I feel like it's one of those things where you should get a feeling "I might be doing this wrong". But if you don't know that < exists (and obviously searched for strings.Less), then you won't get that feeling. That's what I find so fascinating; if other people's feelings are similar to mine, then they simply didn't know that you were allowed to use < here.


I don't see it as two pieces of information. It's returning one idea either way, but the tri-state tells you more than a boolean.


But that is more useful, we should agree, for implementing binary trees or sorting algorithms.


Maybe the overuse of Compare instead of ==, <, or > is from enterprise devs who've done a lot of Java work? Every time I step away from Java for 5+ years and come back, I forget that one has to use string.equals() because using == will generate incorrect results. I imagine people who use it more frequently could easily have the opposite problem.


> I never use three way string comparison, and so I wondered when other programmers use it.

Three-way comparisons show up naturally when doing a binary search or implementing binary trees.

Interestingly, the binary search implementation in the Go stdlib doesn't need it, but that's because it's only doing part of what you'd normally expect a binary search function to do and shifts the responsibility for the actual equality check to the caller [1].

[1] https://go.dev/src/sort/search.go


Ideological purity only works when you actually give people a good alternative. Otherwise you cause more harm than good.

I see it in things like transportation, power generation, recycling. People refuse to use the best methods because then people will never switch - except they never offer the thing that you are supposed to switch to.

I'm actually kinda surprised to see it in a programing language.


What's even the ideology here? It reads like someone's taking a stand, but I can't tell what the stand is. Like, how is someone supposed to alphabetize a list of names? Surely that's a basic task that programming languages should be able to to do.


Based on https://go-review.googlesource.com/c/go/+/3012?tab=comments , it sounds like they were originally focusing on _clearer_.

(I interpret that) rsc considers the native binary operators to be clearer than a function with the name `Compare`. Honestly, I'd expect people to use `==` as well over `strings.Compare`.

`==` _does_ work for strings, but I believe `[]byte` would try comparing memory addresses which is why the `bytes.Compare` version exists and is optimized (some assembly version, probably costly to maintain per arch?).


https://github.com/golang/go/commit/fd4dc91a96518fdbb47781f9...

I interpret this as replacing a strange but more optimal call to the runtime with a straight forward but less fast stdlib implementation. rsc's comment seems more like a todo note than and ideological stance (to me at least). Something like: "I removed this weird fast way to do this that was breaking stuff b/c of linker things and replaced it with some straightforward code that's slower. If we want this to be more optimal in the future we can fix it at the compiler level.". But I'm just reading into it. Someone should ask him and tell him to explain himself :) It's been 8 years. What have you been doing?


The native binary operators aren't a three way compare though.

The author seems to assume nobody needs a three way compare, which I feel is assuming a bit too much.

In particular because it is there because someone is using/needing it already.

Basically their comment actually says that everyone who does need a three way compare should just re-implement what they wrote there (and maybe they'll make sure everyone's code is optimized later). Sounds pretty bizarre to me to discourage code reuse. And if three way compares are a pattern they want to discourage in the first place it should be elaborated on in documentation.


Sort only uses '<'.


well the comment says it's faster to use the built-in string comparison operators ==, <, >, and so on.

the reason it's faster is because the function is intentionally slow, but


A very good point.

Until there are the resources and demand to redesign a system for an important goal, the current system should be optimized where it reasonably can be, to better meet the same goal.

Until there are the resources and demand for a deeper redesign, the current code should be optimized as much as it reasonably can be.

Sometimes local optimizations are not worth the cost to make them, a local benefit/effort maximum has been reached. Only a more global change can take things further. But this doesn't look like one of those cases.

It is also worth noting that it is very rare that actual global redesigns are ever economical. Virtually every improvement, in anything, at any scale, is a "local redesign" in some sense.

So "up with" economical local optimizations!


I’m so confused? Why wouldn’t you make Compare fast and add the optimization? I’m sure someone will run into a logically equivalent to a 3-way compare situation that the compiler can’t detect.

Hiding language features in optimization passes is a dangerous game. It’s one of the reasons I don’t like implicit tail calls.


At least tail calls are rather easy to detect purely syntactically.

But yes, an annotation that forces the compiler to optimize them (or give you a warning or even error when unable to do so) can be useful.


I hate paternalistic non-optimization like this. Just optimize it - your code sucks and you should improve it!


What I don’t understand is that they doing first check for lengths

Strings are so common, it’s insane they don’t optimize


In the generated code, there is a length check. In implementation (which will often be inlined, so may vary contextually) it does a length check, and if equal does a memequal (tuned platform assembly). If that equality check fails, or the length check isn't equal, it does a runtime.cmpstring (tuned platform assembly). So, when strings are actually equal or are unequal in length, it's pretty much optimal. The bad case is when the strings are equal in length but not in content, as that can result in two scans of the strings. Still not slow in that case given that the work it's doing is being done quite efficiently, but some of that work is effectively wasted. On my local go install (2020 macbook, go) the difference is between about 2ns and 4ns for an 8-byte string that is different in the last byte. Naturally, could vary quite a bit based on cache and string size, but fast enough to not worry about until profiles show it matters.


This way of doing string comparisons does not seem to be the right one.

The obvious way of comparing strings is to do a comparison of the content for the minimum of their lengths (which should be computed in a branchless way), and that should be done in an optimized assembly loop, for which suitable SIMD instructions are available in most modern CPUs. In many C standard libraries the function memcmp already provides such an optimized implementation, which should be used for the comparison of strings of equal length.

Then, only when the result of the comparison is equal, the 2 string lengths are compared, to provide the final comparison result.

Therefore, when a good memcmp is already available, a string comparison consists of a minimum computation, a memcmp invocation and an optional integer comparison of the lengths.


Well sure, fast enough. But this is exactly why need faster machines.. Add some cpu, add some cache.

It's a 100% increase. If you're doing a lot of string parsing, it will add up. Probably not very noticeable, and maybe you'll have extra cachemisses in a critical path.

If the function shouldn't be used, why create it at all, or why not show compiler errors / runtime error


Wouldn't the normal implementation for == check lengths?

Edit: Never mind, it doesn't: https://github.com/golang/go/blob/d28bf6c9a2ea9b992796738d03...

But checking lengths doesn't really help you: it only tells you when strings are not equal, and you would still have to walk the string to see which one is larger/smaller.


> But checking lengths doesn't really help you: it only tells you when strings are not equal, and you would still have to walk the string to see which one is larger/smaller.

Only for a three-way comparison. If all you care about is equality different lengths gives a fast path for inequality.


Because string comparison is usually alphabetic. If a string is longer/shorter it says nothing about its ordering. Even if the lengths are equal, you still need to memcmp them to verify hey are indeed equal. The length doesn't actually get you new information in this context.

I'm assuming encoding and utf-8 normalization and other similar things are not in scope when answering this.


Another option is to use `go:linkname` and call the runtime function yourself.

//go:linkname cmpstring runtime.cmpstring func cmpstring(a, b string)


The premise that three-way string comparisons are widely used is spurious at best.



201 instances != widely used by multiple projects. Thank you for playing.

There are many ways to approach this type of problem, and–while I understand that some may want to lean on this approach, based on their experiences with another language–this is an academic comparison, and far less useful than one might believe when building performant code.

To each their own.


Do you mean it is rare?

Man, it is the key to do binary-searching.

> ... based on their experiences with another language ...

Sorry, this is language independent.


Well, it was desgined by Google. What did you expect?


Google generally engineers their stuff well. I don't like them much either, but calling them incompetent at building a very popular language, to which their reputation is attached and at risk, is rather presumptuous.


Maybe I'm not understanding the articles premise exactly, but if you're using three-way comparisons frequently, why would you not just implement the `diff3` ?

It's not that complicated and well documented, infact I'm sure someone has already done it in go.

http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf


It's not three-way as in comparing three strings; here they're comparing two strings, with a "three-way" result: a<b, a==b, a>b


Okay that makes a bit more sense, I was sure something wasn't registering right with me reading it.


I mean, from rsc's comment, it's a known issue. I'm guessing no one has cared enough to improve it. Making me think it's not that big of a deal that it does the extra comparison.

If people really need the extra performance, they'll use unsafe to create byte slices backed by the strings, and then use bytes.Compare. Or they'll improve the compiler.


> If people really need the extra performance, they'll use unsafe to create byte slices backed by the strings, and then use bytes.Compare

...and thus very likely end up with buggy code.


If you're having trouble because generating a 3 way string compare in Go isn't fast enough and delegating to bytes is hard to get correct, my recommendation would be to file an issue with Go. It's not difficult for them to make strings.Compare faster, and I imagine real world evidence that it'd be substantially helpful would be enough to motivate it.

Though, if 3 way compare is the bottleneck, it seems like a real possibility that there are algorithmic optimizations that may be more effective than a faster 3 way compare, like using native comparison operators, hashing, using byte slices, etc. Or maybe not; I don't think I've seen a case where it's a bottleneck so I might be inaccurately imagining the cases where it'd happen.


> my recommendation would be to file an issue with Go. It's not difficult for them to make strings.Compare faster, and I imagine real world evidence that it'd be substantially helpful would be enough to motivate it.

I agree. Squeaky wheel gets the grease.


You could type all six lines, or you could link the runtime func and yolo.


Go spaceship!


The go designers are brilliant. This is exactly the right priority. We would all do better it think more the way they do.


This was a joke right?


No I’m being sincere. Clarity is more important than performance. This is a very thoughtful decision, it relieves devs of the idiotic pressure to micro optimize.


It's amusing to mention clarity, given the topic. Because the cargo-cult of this kind of function having return type 'int' is lack of clear thinking manifest. Int has 4-billion distinct values and all they want is three.

Here's clarity:

datatype order https://smlfamily.github.io/Basis/general.html#SIG:GENERAL.o...

function compare https://smlfamily.github.io/Basis/string.html#SIG:STRING.com...

    case String.compare(a, b) of
    | LESS    => ...
    | EQUAL   => ...
    | GREATER => ...


So I think the strings.Compare should be implemented efficiently, to avoid breaking user expectations.

Is Go forkable (for lack of a better word)? That's the only question that comes to mind when I read the article. You have the source code, and this shouldn't be a difficult change to make, so I think "fork-and-fix and see who follows along" should be the mentality to practice in this case.


If anyone finds "a standard library string comparison function, while quite fast in absolute terms, isn't as fast as it could be" to be practical justification for a fork, I'd be very surprised.


No one will follow along.

If you want to implement strings.Compare better, you can write a package and publish it for other folks to import. The function is not particularly special.


No one will follow along.

"The only sure path to failure is to not even try."

Or are you both implying the power of and supporting the massive monopoly that Google already has?


Encouraging people to use a different language entirely (Rust, whatever) is much much more likely to succeed than getting them to use some random Go fork.


> Or are you both implying the power of and supporting the massive monopoly that Google already has?

This is not a reasonable assessment.

Golang is massive; why would anyone switch to not-Golang which isn't backed by a large organisation and just has one string comparison optimisation?

And why would anyone trust a forking entity that forked a runtime to make a change that could be packaged as a library, as the comment you're replying to already suggests?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: