This is only because the Rust implementations are using particularly slow code paths, either because SIMD/AVX optimizations requires a nightly compiler, some optimizations would require unsafe code, or that other languages are using particularly hacky code that would never fly in real world software.
For example, many of the Java/C/C++ benchmarks are using custom optimizations that should be illegal for the benchmarking. Case in point, some are featuring custom hash maps that feature hashing algorithms that, while fast, would never be useful as they provide no protection against collisions. You'll see a hashing algorithm in a C preprocessor, for example, that just fakes having an actual algorithm whereas Rust examples are sticking to the tried and tested production-grade algorithms shipping in the standard library.
> Case in point, some are featuring custom hash maps that feature hashing algorithms that, while fast, would never be useful as they provide no protection against collisions.
They are useful in this case, aren't they? Custom hash map implementation can be also useful in other scenarios. This is feature of the language that allow you to do that so it's not "illegal" to use it. If Rust doesn't allow you to do something while other language does, it doesn't mean it should be illegal.
I don't see any tricks in Go and it's faster than Rust and more memory efficient in almost half of the benchmarks. So what's your excuse for those cases ?
I believe that what your OP mmstick might have been saying is that the hashing algorithm used in the C version of the algorithm might be very different from the hashing algorithm in the Rust version, and this difference might be significant.
I don't speak Rust, so I can't quite tell what's going on here:
Again, I don't speak Rust so hopefully someone can comment on this, but I don't see that same simple hash is being used in the Rust...I think it might be using a default hash function, which would probably be more collision resistant.
I am sure that the Rust could be made to do the same thing, but I am not sure that, as-is, this is an apples to apples comparison.
I have wondered the same for a while now, and I hope someone familiar with Rust can explain it: in many of the benchmarks Go is faster and/or uses less memory despite the GC. Like you, I don't see much trickiness in the Go code.
In many cases a garbage collected language will be faster when it comes to allocations than using naive allocation strategies (e.g. reallocating a hashmap many times as it grows instead of reserving memory upfront). This is because the garbage collector tends to have preallocated memory lying around, while the RAII needs to call malloc and free every time a heap object is created.
Another factor could also be that the GC in the Go examples just never runs since it doesn't allocate enough memory. It's hard to say exactly what's happening without tracing the runtime, which would also affect the benchmarks a bit.
It's important to note that while both languages are natively compiled (and I suspect Rust would inch out in that category due to LLVM's backing) most of the overhead would probably be from memory, whether allocations, or cache usage, which makes comparing them in microbenchmarks a little inaccurate.
> This is because the garbage collector tends to have preallocated memory lying around,
This is an allocator feature independent of garbage collection; jemalloc does this for example. GCd languages have the potential to do this better since they can do additional analysis whilst tracing for little additional cost, but non-GCd languages can still benefit from this.
Part of this is also that Go is really good at not allocating from the heap when it can allocate from the stack (escape analysis). Until Go 1.5 the Go garbage collector was pretty weak, but it didn't matter as much as it would have for a language with heavier heap allocation.
Wasn't there some issues around a "custom library" for this? I distinctly remember there being some kind of argument about what is legal and what isn't.
That is, I think I'm thinking of this:
> k-nucleotide will explicitly require built-in / library HashMap.
Since C doesn't have a standard library hashmap, you can write an entirely custom one just for the benchmark. But since Rust has a built-in hashmap in the standard library, you cannot. Even though you could write Rust code that's the same as the custom C hashmap.
But they say 'don't write a custom hash table', not 'don't write a custom hash function'. Maybe the problem is that the data in this benchmark is just not a good way to exercise the hash tables in a given language the way the benchmark intended. That probably means the benchmark should be modified; the complaint that some implementations use 'laughably bad' hash functions that seem to be measurably decent hash functions for the data at hand seems really strange.
The "library" distinction you're drawing here is arbitrary. Or at least, my understanding of it is.
Am I allowed to use a custom HashMap for the game, or am I required to use the one in std::collections? My understanding is that it's the latter, and so that penalizes Rust or any other language that includes a map in their standard library.
> If you want an additional Rust hashmap implementation then you need to persuade the Rust community to provide one in std::collections.
Again, this means that for Rust, it _has_ to be in the standard library. But C doesn't have one in the standard library. So C gets to write their own, and Rust doesn't.
If it compiles and run it is allowed. There is no rule to benchmark other than having the same end result with best performance.
I still can't get all the moral over this.
The best thing here for Rust would be to achieve the best performance on this game and nothing else.
To me this game is important and winning on it is more yet. So, if Rust don't have better result by "morals" this is so wrong that hurts. But each time I read these responses I ask myself if there is not a real problem there.
I like Rust, but performance is decisive. It is really sad to see that Rust keeps going down on this game and that makes me question myself when trying to invest time on Rust.
It isn't a "moral" issue. It is about the usefulness of the benchmarks. If you're using them to make a recommendation about which programming language to choose, it is important to have a sense for how well the results generalize to the kind of programs you will be writing. Using unrealistic hacks targeted specifically to each benchmark is contra that goal.
If, on the other hand, you're just treating the benchmarks as a fun competition akin to code golf, then by all means, hacks ahoy!
> The best thing here for Rust would be to achieve the best performance on this game and nothing else.
That's absurd. The benchmarks game doesn't measure real-world performance in any sense whatsoever. And many of the implementations are written in ways that you'd never ever put into production code (for example, the laughably bad hashing functions that are mentioned elsewhere in this thread). From what I understand, the Rust implementations tend to not get up to those shenanigans, which makes them appear worse than the implementations from other languages that do.
It's not the goal of a programming language to be the top on a silly benchmarks game. The goal is to be an excellent language for real-world use.
But people are criticizing C because the way it is implemented is not "realistic". Or that the "hash" is a hack that will never work in real life.
I don't know the background of these people, but they couldn't be more wrong. So that looks like "morals" or just they don't ever saw real C code out there.
You're making an assumption that the C code is allowed to have the same algorithm as the Rust code. This is not actually exactly true, based on the rules of the game.
Which ones do you have in mind? Preprocessor sounds a little cheaty but picking a hash function that's a better fit for the data is a pretty basic, practical sort of optimization.
If that's the one it's a custom hash function which is then inlined as a macro. That still seems like a pretty vanilla C optimization that one might write in real C code.
The problem isn't that it's a macro. It's that it's a horrifically bad hash function. It's blazing fast, but completely unsuitable for use in any real code.
If it's a hash function that only works for this exact data set, I can see the argument. If it's a hash function that works for this kind of data (these particularly formatted, particularly constrained strings), it's fair play. Which one is it? 'It's not a good general purpose hash function' alone doesn't seem like a valid criticism, especially along with 'the Rust version uses a general purpose hash function'. Nobody said you had to use a general purpose hash function.
Imagine you got a million strings of variable length such that the last 4 bytes are guaranteed to be unique value between 0 and 999999. Lopping off the last four bytes is a perfectly good hash function for that kind of data.
Sure, but it's not a good benchmark. If you can demonstrate that you have blazing fast performance, but only for the exact input data that the benchmark uses, then you haven't demonstrated anything at all beyond an ability to tailor your code to a very specific input. But in the real world, we don't get to write our code for a very specific input (after all, if we knew what the input was ahead of time, we could just pre-calculate the answers instead of publishing the program).
So yeah, if you can come up with a tailored hash algorithm that works correctly and is DoS-immune for all possible input for the program, go ahead and use that tailored hash algorithm. But if your hash algorithm is only correct for the specific input data the benchmark game uses, and would not be collision-resistant under real world conditions, then you really shouldn't use that hash as it does not demonstrate anything useful.
Well, here's the thing. I don't think it's for the exact input, it's for a type of inputs. Custom hash functions for specific types of data are a basic optimization technique and I find it odd you'd even suggest every hash function should be 'DoS-immune'. There's absolutely nothing 'real world' about this unless you think the world consists entirely of hostile inputs. In the real world, people absolutely optimize.
Your argument seems to be that that's not the intent of the benchmark which may be true but it's not clear from the rules provided at all. To me, it looks like the opposite is true - they talk about using a standard hash table and most of those allow user-specified hash functions.
Rust's default hashmap algorithm is DoS-immune for any kind of input, which is a perfectly logical default algorithm for a language intended to be used in security-critical areas like operating systems, system libraries, web browsers, etc. and promotes memory safety.
That's really great but I don't see how it's related unless your argument is truly 'custom hash functions are bad', in which case I don't really know what else to tell you beside 'that's completely wrong'.
For example, many of the Java/C/C++ benchmarks are using custom optimizations that should be illegal for the benchmarking. Case in point, some are featuring custom hash maps that feature hashing algorithms that, while fast, would never be useful as they provide no protection against collisions. You'll see a hashing algorithm in a C preprocessor, for example, that just fakes having an actual algorithm whereas Rust examples are sticking to the tried and tested production-grade algorithms shipping in the standard library.