Yes, you've basically gone over the details of why doing parallelized / concurrent performance comparisons between different programming languages is very difficult. I agree with most of what you have written. However, this doesn't change the fact that any comparison of performance between different languages, that ignores relevant concurrent and / or parallelization features, isn't very useful. This is especially true when the selling point for a few languages in this list is concurrency and / or parallization related features.
No, what I'm saying is that it is useful. Because in a concurrent program, you'll almost certainly being using the single threaded techniques developed here. And that the fastest program will be the one that uses the fastest single threaded techniques. So it makes sense to simplify the model and just look at the thing that is going to dominate the runtime of the program.
If a "dumb" thread pool is indeed what ends up being the best concurrent solution, then Rust, Go, C and C++ (and probably most of the other languages) are going to have almost no problems implementing it. It's just Not That Interesting.
Now, maybe I'm wrong. Maybe there is something clever here to exploit in a concurrent program, and that some languages let you do that easier than others. I doubt it. It would be interesting if it turned out to be the case, but it would be a different benchmark. It wouldn't just be about how well a language can deal with simple data transformation tasks, but also about how it deals with tricky concurrency problems. But this doesn't seem like the kind of problem that needs such things. It complicates the model.
Like honestly, just saying that any benchmark that ignores parallelism "isn't very useful" is just totally ridiculous.
> No, what I'm saying is that it is useful. Because in a concurrent program, you'll almost certainly being using the single threaded techniques developed here. And that the fastest program will be the one that uses the fastest single threaded techniques.
You have a point. Maybe I am wrong in saying that it's near useless.
> If a "dumb" thread pool is indeed what ends up being the best concurrent solution, then Rust, Go, C and C++ (and probably most of the other languages) are going to have almost no problems implementing it. It's just Not That Interesting.
Your argument makes sense if every language used the same form of concurrency or parallelism. Unfortunately, that isn't the case. Wouldn't the type of concurrency or parallelism affect performance?
> Now, maybe I'm wrong. Maybe there is something clever here to exploit in a concurrent program, and that some languages let you do that easier than others.
but concurrency / parallelism IS much easier in some languages vs others (did I misunderstand what you wrote?).
> Wouldn't the type of concurrency or parallelism affect performance?
That's what I'm saying: almost certainly not IF the best concurrent solution to this problem is to chunk of the work and feed it off to a bog standard thread pool. Pretty much every language---even C---has tools that make this pretty easy to do.
And I don't see any reason why a dumb thread pool wouldn't be the best answer to this if parallelism were allowed. Like, the fact that Go has goroutines is just not going to help with anything for this kind of work-load. Goroutines help in more complex concurrent situations.
> but concurrency / parallelism IS much easier in some languages vs others (did I misunderstand what you wrote?).
When the extent of the parallelism is a dumb thread pool with no synchronization between them, then it's pretty easy to do that in just about any language.
> That's what I'm saying: almost certainly not IF the best concurrent solution to this problem is to chunk of the work and feed it off to a bog standard thread pool.
That's the thing. This is not the case for all programming languages, even for some of the ones included this performance test. Python has a GIL. Distributing work via threads isn't going to do much unless it's something I/O intensive. Consequently, testing performance via a programming language's available methods to distribute work does matter. Special features like Goroutines also matter.
> When the extent of the parallelism is a dumb thread pool with no synchronization between them, then it's pretty easy to do that in just about any language.
The specific implementations of work distribution still matters for performance doesn't it? Different implementations yield varying levels of performance right?
Python has a `multiprocessing` module that sidesteps the GIL.
You can question me all day. Go build the benchmark. Publish it. Have people scrutinize it. I have my prior based on building this sort of tooling for years and actually publishing benchmarks.
And you still continue to miss my point. Instead, we're in a hypothetical pissing match about code that doesn't exist. Either the parallelism problem is uninteresting, or your benchmark becomes about complex synchronization. The former complicates the benchmark for no good reason. The latter results in a different benchmark entirely.
> Python has a `multiprocessing` module that sidesteps the GIL.
Yes and distributing work with multiple processes is not the same as distributing it via threads. There is a difference in performance in different implementations and variations of concurrency and parallelism.
> we're in a hypothetical pissing match about code that doesn't exist. Either the parallelism problem is uninteresting, or your benchmark becomes about complex synchronization.
Yes, the code doesn't exist because it's difficult to do, and that's what makes it interesting.
> Yes and distributing work with multiple processes is not the same as distributing it via threads. There is a difference in performance in different implementations and variations of concurrency and parallelism.
I literally addressed that in my first comment: "Now, maybe some languages have a harder time passing data over thread boundaries than others and maybe that impacts things. But that doesn't apply to, say, C, Rust, C++ or Go."
> Yes, the code doesn't exist because it's difficult to do, and that's what makes it interesting.
Well, I mean, I haven't done it not because it's hard, but because it appears pointless to me. That's my prior. Show me I'm wrong. I'm happy to learn something new.
> Well, I mean, I haven't done it not because it's hard, but because it appears pointless to me.
That's just your opinion and you haven't convinced me otherwise. Likewise, I don't think I'm going to convince you either.
> I literally addressed that in my first comment
Yes, but that doesn't mean that it's not an important point. Even if we were just comparing "C, Rust, C++ or Go.", different work distribution implementations have different sets of performance. It's hard, but it's still useful to capture that data because the real world doesn't run on a single thread or even a single core. If you're going to continue glossing over this, there's no point in continuing our discussion. We're just going in circles.
I didn't say it wasn't useful to capture the data. Indeed, I've been saying, "happy to see it done and happy to learn something if I'm wrong." Look at what I said above:
> Now, maybe I'm wrong. Maybe there is something clever here to exploit in a concurrent program, and that some languages let you do that easier than others. I doubt it. It would be interesting if it turned out to be the case, but it would be a different benchmark.
So the fact that you keep trying to niggle at the specifics of parallelism tells me that you're completely and totally missing my point. You're missing the forest for the trees.
Looking back at the convo, it started by you saying, "it would be nice to see parallelism considered." Fine. I responded by saying why that particular rabbit hole is probably not be so useful, or would be measuring something totally different than the OP. To which, you kind of flipped the script and said, "However, single thread benchmarks don't reflect the real world unless you're a student." It would have been a lot better to just say, "I disagree, but I don't have time to do the benchmark myself, so we'll have to leave it at that."
There's a lot of goalpost shifting going on. Like, are we arguing whether a parallel benchmark could be useful? Or are we arguing about whether a single threaded benchmark is useless? They are two different arguments. The former gets my response above, navel gazing at why that model probably isn't particularly useful. But the latter is just crazytown.