This is only because the Rust implementations are using particularly slow code p...

lossolo · on Dec 27, 2016

> Case in point, some are featuring custom hash maps that feature hashing algorithms that, while fast, would never be useful as they provide no protection against collisions.

They are useful in this case, aren't they? Custom hash map implementation can be also useful in other scenarios. This is feature of the language that allow you to do that so it's not "illegal" to use it. If Rust doesn't allow you to do something while other language does, it doesn't mean it should be illegal.

I don't see any tricks in Go and it's faster than Rust and more memory efficient in almost half of the benchmarks. So what's your excuse for those cases ?

_zn02 · on Dec 28, 2016

I believe that what your OP mmstick might have been saying is that the hashing algorithm used in the C version of the algorithm might be very different from the hashing algorithm in the Rust version, and this difference might be significant.

I don't speak Rust, so I can't quite tell what's going on here:

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

But the C hash function is very simple, and probably not at all collision-resistant:

  #define CUSTOM_HASH_FUNCTION(key) (khint32_t)((key) ^ (key)>>7)

Again, I don't speak Rust so hopefully someone can comment on this, but I don't see that same simple hash is being used in the Rust...I think it might be using a default hash function, which would probably be more collision resistant.

I am sure that the Rust could be made to do the same thing, but I am not sure that, as-is, this is an apples to apples comparison.

Manishearth · on Dec 28, 2016

llogiq wrote a post about this: https://llogiq.github.io/2016/12/08/hash.html

richard_todd · on Dec 28, 2016

I have wondered the same for a while now, and I hope someone familiar with Rust can explain it: in many of the benchmarks Go is faster and/or uses less memory despite the GC. Like you, I don't see much trickiness in the Go code.

dalailambda · on Dec 28, 2016

In many cases a garbage collected language will be faster when it comes to allocations than using naive allocation strategies (e.g. reallocating a hashmap many times as it grows instead of reserving memory upfront). This is because the garbage collector tends to have preallocated memory lying around, while the RAII needs to call malloc and free every time a heap object is created.

Another factor could also be that the GC in the Go examples just never runs since it doesn't allocate enough memory. It's hard to say exactly what's happening without tracing the runtime, which would also affect the benchmarks a bit.

It's important to note that while both languages are natively compiled (and I suspect Rust would inch out in that category due to LLVM's backing) most of the overhead would probably be from memory, whether allocations, or cache usage, which makes comparing them in microbenchmarks a little inaccurate.

Manishearth · on Dec 28, 2016

> This is because the garbage collector tends to have preallocated memory lying around,

This is an allocator feature independent of garbage collection; jemalloc does this for example. GCd languages have the potential to do this better since they can do additional analysis whilst tracing for little additional cost, but non-GCd languages can still benefit from this.

voidlogic · on Dec 28, 2016

Part of this is also that Go is really good at not allocating from the heap when it can allocate from the stack (escape analysis). Until Go 1.5 the Go garbage collector was pretty weak, but it didn't matter as much as it would have for a language with heavier heap allocation.

steveklabnik · on Dec 27, 2016

Rust can do custom hashes as well, to be clear. There's been some arguments over what hashes are allowed in the game in the past.

igouy · on Dec 28, 2016

To be clear: anyone can contribute a Rust k-nucleotide program that uses a custom hash function, just like those used by other programming languages.

steveklabnik · on Dec 28, 2016

Wasn't there some issues around a "custom library" for this? I distinctly remember there being some kind of argument about what is legal and what isn't.

That is, I think I'm thinking of this:

> k-nucleotide will explicitly require built-in / library HashMap.

https://alioth.debian.org/tracker/?func=detail&group_id=1008...

Since C doesn't have a standard library hashmap, you can write an entirely custom one just for the benchmark. But since Rust has a built-in hashmap in the standard library, you cannot. Even though you could write Rust code that's the same as the custom C hashmap.

pvg · on Dec 28, 2016

The rules provided aren't as clear as they could be:

http://benchmarksgame.alioth.debian.org/u64q/knucleotide-des...

But they say 'don't write a custom hash table', not 'don't write a custom hash function'. Maybe the problem is that the data in this benchmark is just not a good way to exercise the hash tables in a given language the way the benchmark intended. That probably means the benchmark should be modified; the complaint that some implementations use 'laughably bad' hash functions that seem to be measurably decent hash functions for the data at hand seems really strange.

igouy · on Dec 28, 2016

To be clear: anyone can contribute a Rust k-nucleotide program that uses a custom hash function, just like those used by other programming languages.

> Since C doesn't have a standard library hashmap, you can write an entirely custom one just for the benchmark.

NOT TRUE!

#315195 states the opposite!

"k-nucleotide will explicitly require built-in / library HashMap"

steveklabnik · on Dec 28, 2016

The "library" distinction you're drawing here is arbitrary. Or at least, my understanding of it is.

Am I allowed to use a custom HashMap for the game, or am I required to use the one in std::collections? My understanding is that it's the latter, and so that penalizes Rust or any other language that includes a map in their standard library.

igouy · on Dec 28, 2016

Just like everyone else Rust advocates can contribute programs that use a custom Hash function with a library Hash map.

In fact, the current Rust k-nucleotide DOES use a custom Hash function with a library Hash map.

No one is allowed to -- in your words -- "write an entirely custom [hashmap] just for the benchmark".

(This was all discussed to death, in early December on https://www.reddit.com/r/rust/ ).

steveklabnik · on Dec 28, 2016

You didn't answer my question. Does "library" here mean that you are able to use a hashmap other than the one in std::collections, or not?

igouy · on Dec 28, 2016

When someone fully re-implements khash in Rust and publishes the library, that will become a question which merits consideration.

Meanwhile, from the task description: Please don't implement your own custom "hash table" - it will not be accepted.

steveklabnik · on Dec 28, 2016

So, the answer is "no, you are not allowed." But you're open to changing that. Got it.

igouy · on Jan 5, 2017

Can't use a library that doesn't exist.

If you want an additional Rust hashmap implementation then you need to persuade the Rust community to provide one in std::collections.

No one is allowed to -- in your words -- "write an entirely custom [hashmap] just for the benchmark".

steveklabnik · on Jan 6, 2017

> If you want an additional Rust hashmap implementation then you need to persuade the Rust community to provide one in std::collections.

Again, this means that for Rust, it _has_ to be in the standard library. But C doesn't have one in the standard library. So C gets to write their own, and Rust doesn't.

fungos · on Dec 28, 2016

If it compiles and run it is allowed. There is no rule to benchmark other than having the same end result with best performance.

I still can't get all the moral over this.

The best thing here for Rust would be to achieve the best performance on this game and nothing else.

To me this game is important and winning on it is more yet. So, if Rust don't have better result by "morals" this is so wrong that hurts. But each time I read these responses I ask myself if there is not a real problem there.

I like Rust, but performance is decisive. It is really sad to see that Rust keeps going down on this game and that makes me question myself when trying to invest time on Rust.

sanderjd · on Dec 28, 2016

It isn't a "moral" issue. It is about the usefulness of the benchmarks. If you're using them to make a recommendation about which programming language to choose, it is important to have a sense for how well the results generalize to the kind of programs you will be writing. Using unrealistic hacks targeted specifically to each benchmark is contra that goal.

If, on the other hand, you're just treating the benchmarks as a fun competition akin to code golf, then by all means, hacks ahoy!

fungos · on Dec 28, 2016

Benchmark GAME :)

igouy · on Dec 28, 2016

http://benchmarksgame.alioth.debian.org/sometimes-people-jus...

lilyball · on Dec 28, 2016

> The best thing here for Rust would be to achieve the best performance on this game and nothing else.

That's absurd. The benchmarks game doesn't measure real-world performance in any sense whatsoever. And many of the implementations are written in ways that you'd never ever put into production code (for example, the laughably bad hashing functions that are mentioned elsewhere in this thread). From what I understand, the Rust implementations tend to not get up to those shenanigans, which makes them appear worse than the implementations from other languages that do.

It's not the goal of a programming language to be the top on a silly benchmarks game. The goal is to be an excellent language for real-world use.

steveklabnik · on Dec 28, 2016

The Benchmarks Game has more rules than that. It's not about "morals."

fungos · on Dec 28, 2016

But people are criticizing C because the way it is implemented is not "realistic". Or that the "hash" is a hack that will never work in real life.

I don't know the background of these people, but they couldn't be more wrong. So that looks like "morals" or just they don't ever saw real C code out there.

steveklabnik · on Dec 28, 2016

You're making an assumption that the C code is allowed to have the same algorithm as the Rust code. This is not actually exactly true, based on the rules of the game.

igouy · on Dec 28, 2016

Where is it forbidden for programs written in C to use FNV?

For example https://github.com/haipome/fnv/blob/master/fnv.c

steveklabnik · on Dec 28, 2016

I'm referring to my reply to you above about not being able to implement a different map, not the hash function itself.

pvg · on Dec 27, 2016

Which ones do you have in mind? Preprocessor sounds a little cheaty but picking a hash function that's a better fit for the data is a pretty basic, practical sort of optimization.

simcop2387 · on Dec 27, 2016

I think he's talking about the C code here, right at the top

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

pvg · on Dec 27, 2016

If that's the one it's a custom hash function which is then inlined as a macro. That still seems like a pretty vanilla C optimization that one might write in real C code.

lilyball · on Dec 28, 2016

The problem isn't that it's a macro. It's that it's a horrifically bad hash function. It's blazing fast, but completely unsuitable for use in any real code.

pvg · on Dec 28, 2016

If it's a hash function that only works for this exact data set, I can see the argument. If it's a hash function that works for this kind of data (these particularly formatted, particularly constrained strings), it's fair play. Which one is it? 'It's not a good general purpose hash function' alone doesn't seem like a valid criticism, especially along with 'the Rust version uses a general purpose hash function'. Nobody said you had to use a general purpose hash function.

Imagine you got a million strings of variable length such that the last 4 bytes are guaranteed to be unique value between 0 and 999999. Lopping off the last four bytes is a perfectly good hash function for that kind of data.

lilyball · on Dec 28, 2016

Sure, but it's not a good benchmark. If you can demonstrate that you have blazing fast performance, but only for the exact input data that the benchmark uses, then you haven't demonstrated anything at all beyond an ability to tailor your code to a very specific input. But in the real world, we don't get to write our code for a very specific input (after all, if we knew what the input was ahead of time, we could just pre-calculate the answers instead of publishing the program).

So yeah, if you can come up with a tailored hash algorithm that works correctly and is DoS-immune for all possible input for the program, go ahead and use that tailored hash algorithm. But if your hash algorithm is only correct for the specific input data the benchmark game uses, and would not be collision-resistant under real world conditions, then you really shouldn't use that hash as it does not demonstrate anything useful.

pvg · on Dec 28, 2016

Well, here's the thing. I don't think it's for the exact input, it's for a type of inputs. Custom hash functions for specific types of data are a basic optimization technique and I find it odd you'd even suggest every hash function should be 'DoS-immune'. There's absolutely nothing 'real world' about this unless you think the world consists entirely of hostile inputs. In the real world, people absolutely optimize.

Your argument seems to be that that's not the intent of the benchmark which may be true but it's not clear from the rules provided at all. To me, it looks like the opposite is true - they talk about using a standard hash table and most of those allow user-specified hash functions.

mmstick · on Dec 28, 2016

Rust's default hashmap algorithm is DoS-immune for any kind of input, which is a perfectly logical default algorithm for a language intended to be used in security-critical areas like operating systems, system libraries, web browsers, etc. and promotes memory safety.

pvg · on Dec 28, 2016

That's really great but I don't see how it's related unless your argument is truly 'custom hash functions are bad', in which case I don't really know what else to tell you beside 'that's completely wrong'.