According to the graph more than half of the tests are still failing. Can the sp...

sylvestre · on Jan 29, 2022

Good question!

Probably not. In general, functionalities are enabling or disabling a behavior when doing an operation. In the code, it translates most of the time by a simple if else. For example, adding new options usually looks like this PR: https://github.com/uutils/coreutils/pull/2880/files

The performance wins are usually produced by using some fancy Rust features.

laumars · on Jan 29, 2022

That’s only correct if you focus exclusively on optional features enabled by feature flags or environmental variables. Some features might be around support for things like Unicode where a tool could work 90% of the time but that extra 10% of support requires implementing a lot of additional logic that slows the routine down for the other 90% of use cases too.

Also ‘if’ causes branching which costs a small amount of CPU overhead. So it’s not a free operation and can quickly add up if you needed inside a hot path.

nicoburns · on Jan 29, 2022

> Some features might be around support for things like Unicode where a tool could work 90% of the time but that extra 10% of support requires implementing a lot of additional logic that slows the routine down for the other 90% of use cases too.

This kind of thing is where Rust really shines. The ecosystem was built post-unicode, so things tend to support it by default. Ripgrep for example has been unicode aware from the beginning, and you have to opt-out if you don't want that.

laumars · on Jan 29, 2022

I’m aware of Rusts support for Unicode, I was only using that as an example because it’s easy to visualise since most people who’ve written any kind of text parsing will understand the additional computational overhead correctly supporting Unicode costs. But while the example doesn’t directly apply to Rust, I guarantee you that there will be other edge cases in a similar vein that might cause issues.

devit · on Jan 29, 2022

For new flags, this is easily solved by generating and compiling (via generic monomorphization, macros or build-time code generation) two versions of all the relevant code (including of course any loops calling the relevant code) and switching the one that is executed depending on the value of the flag.

tomsmeding · on Jan 29, 2022

This is true, but also results in larger binaries. Elsewhere in this thread, it was noted that the coreutils compiled from the Rust code are already quite a bit bigger than the GNU counterparts. On many systems this might not matter much (and I guess it's fine if a software suite explicitly makes binary size a non-goal), but on some systems it does.

alerighi · on Jan 29, 2022

Well, they could be doing some simplifications and assumptions that can improve performance.

Also performance is not always the only factor to consider: you can optimize a program for speed, but also for RAM usage, or the size of the binary itself.

In program like Coreutils to me is more important that the programs are small than the rest. Typically you use a lot of commands in a script, to do some trivial operations (the input of the program is usually small), thus simpler programs (that have less startup time) are usually better.

nix23 · on Jan 29, 2022

>Coreutils to me is more important that the programs are small than the rest

Then use BusyBox/Matchbox....because..well that's the whole point of those projects.

https://busybox.net/about.html

zaarn · on Jan 29, 2022

After the first time the binary is executed, it should be comfortably in the memory disk cache. And frankly, on a modern SSD, the sector size is large enough that the only difference is reading 3 instead of 1 sectors to call "ls". Barely matters if it gets batched.

tialaramex · on Jan 29, 2022

Slightly less than half, but yes. That could certainly be relevant. On the other hand inevitably some tests will be fragile or even outright wrong and so while your solution passes a different (possibly better/ faster) solution fails because the test is bad.

For example suppose you're implementing case-insensitive sort, you write a test and tweak it slightly so that it passes as you expected. I come along and write a slightly faster case-insensitive sort, and mine fails. Upon examining the test I discover it thinks I ought to sort (rat, doG, cat, DOG, dog, DOg) into (cat, doG, dog, DOG, DOg, rat) but I get (cat, doG, DOG, dog, DOg, rat) my answer seems, if anything better and certainly not wrong but it fails your test.

tharkun__ · on Jan 29, 2022

The test is right because it tests for the fact that lower case letters have a smaller representation as an ASCII character position. There is probably a ton of scripting and other software out there that assumes that this is how things are sorted. You also want a stable sort meaning sorting the same sequence twice you want it to be sorted the exact same way again. So you need this sort of definition of how things are supposed to be sorted.

So while I agree that without historical context and compatibility for human consumption your way of sorting is probably fine, you could wreak major havoc when trying your sort as a drop in replacement. If you only have a few users change management for something like this is relatively easy. The user base of coreutils? Think twice if you want to try that change management.

tannhaeuser · on Jan 29, 2022

> The test is right because it tests for the fact that lower case letters have a smaller representation as an ASCII character position.

It's the other way around, though.

tharkun__ · on Jan 29, 2022

See why having tests is great to define how a system is supposed to work? Much better than memory.

tialaramex · on Jan 29, 2022

If the test is actually the specification then, sure I guess, but in most cases it isn't.

Notice that the "failed" example exhibits stability which you claimed was desirable, while neither exhibits sorting by case, this is after all a case-insensitive sort, the "successful" example is just swapping some of the list items for whatever reason, maybe it was how their chosen algorithm worked, maybe it's a bug, they wrote the test so they get to fill out a "correct" answer that matches their behaviour.

Now, striving to pass such tests gets you bug-for-bug compatibility which is what you want if you're an emulator, but the GNU project started out deliberately not doing bug-for-bug because it means people accuse you of copying, and so I don't see why this project should be different.

AndrewDucker · on Jan 29, 2022

214 passes, 298 fails. That seems rather more than 50% failing, even before the errors and skipped tests.

tialaramex · on Jan 29, 2022

There are 611 tests, 298 is less than half.

mlindner · on Jan 29, 2022

There are tests that are erroring as well. 214 passing out of 611 is less than 50% passing (in fact it's almost only 1/3 passing).

cedilla · on Jan 29, 2022

It is possible, but with benchmarks like "head -n 1000000 wikidata.xml" I doubt it. A comment in that PR says "the difference to GNU head is mostly in user time, not in system time. I suspect this is due to GNU head not using SIMD to detect newlines".

Unfortunately I couldn't find a list of failed/successful tests, if that's available I'd be happy if someone linked it

sylvestre · on Jan 29, 2022

https://github.com/uutils/coreutils/actions click on a CI job on main For example: https://github.com/uutils/coreutils/runs/4990891225?check_su...

The "Run GNU tests" is probably what you are looking for.