Comparing Parallel Rust and C++

keldaris · on Nov 7, 2019

This is a wonderfully written comparison benchmark and it deserves attention even for that reason alone. It knows its target audience, explains what's going on succinctly but completely, and avoids most of the usual benchmark pitfalls that result in comparing apples to oranges. Great job.

The one glaring issue that is ignored is the floating point model used. I understand Rust still doesn't have a usable equivalent to -ffast-math, so I assume it wasn't used for that reason. Some discussion of whether it's permissible in this algorithm (I believe so?) and how much advantage that might give to C++ seems crucial when performance is a priority.

Ironically, reading this has further convinced me that, for all of its disadvantages, I much prefer C++ to Rust for my needs. I'm sure others will draw the opposite conclusion and that's great. Rust is a language that clearly knows what it wants and if your priorities are aligned with that, the performance gap is shrinking and implementation-related reasons to avoid it are rapidly decreasing in number.

dgellow · on Nov 7, 2019

> reading this has further convinced me that, for all of its disadvantages, I much prefer C++ to Rust for my needs.

What are your needs, if I may ask? I'm not asking this to start a thread "rust/C++ is better", I'm just curious regarding situations in which someone decides that one language matches their needs more than another, it's always interesting to see what boundaries people consider for those decisions.

keldaris · on Nov 7, 2019

I mostly use C++ for numerical simulations in physics and associated code (analysis, some visualization, etc.). That means my primary consideration is the ease and convenience of writing high performance code for a narrow set of hardware. I care about the quality of tooling, especially for performance analysis (including things like likwid [1]) and GPGPU computation. I do not care about safety (memory or otherwise) - my code doesn't take arbitrary input, run on shared hardware, do much of anything over networks or have memory safety crashes.

From this rather narrow point of view, Rust does very little to help and quite a lot to hinder me. Rust is very much about memory safety - an issue extremely far down my list of concerns - and to me the borrow checker is an anti-feature I'd love to turn off. None of this is in any way an indictment of Rust - it looks like a very well designed language that knows what it wants to accomplish. It just happens to want the exact opposite things from what I want, and that's fine. I know I'm in a weird minority (most of the people who think like I do seem to be game engine developers).

[1] https://github.com/RRZE-HPC/likwid

pdpi · on Nov 7, 2019

> I do not care about safety (memory or otherwise) - my code doesn't take arbitrary input, run on shared hardware, do much of anything over networks or have memory safety crashes.

This is a common misconception — safety and security are different things. If your simulations make extensive use of parallelism (and I can only assume they do), memory safety also helps you ensure correctness.

I can absolutely appreciate that you might lose other things to achieve that, and that this is not a trivial tradeoff to make (and I'll take you on your word that it is the wrong tradeoff for you!), but it's not as clear cut as "I don't need safety features because I have no security exposure".

blub · on Nov 7, 2019

Memory management errors are just like regular bugs, but they have a disproportionate impact on security.

If the security requirements are permissive, what's the point of investing so much effort in preventing only one category of bug? There's no misconception.

jerf · on Nov 7, 2019

"Memory management errors are just like regular bugs"

I'd have to disagree in this specific context of high-performance parallel numeric simulation. It is terribly easy for memory management errors to produce incorrect simulations that still "work" perfectly, because incorrect handling of floating point numbers will never crash the program, merely corrupt the results in arbitrary, but not necessarily easy-to-detect, ways. Bugs that cause visibly obvious incorrect answers or crash the program are far preferable to bugs that cause subtle errors. It's too easy to "correct" for a bug in a simulation by simply poking it until the buggy simulation seems to return to the behavior you expect, but two bugs don't make correct code.

If you don't really care about accuracy, it may not be a big deal, though... the "shape" of these errors is very difficult to wrap one's head around and I'd still question someone working for or with me fairly hard even so if they think they don't care about accuracy and so it's OK to write code that may or may not be thread-safe.

affyboi · on Nov 7, 2019

To tack onto this point, I actually had a couple of issues where subtle memory management bugs led to slightly incorrect (not terribly incorrect) results when I was working on a graphics paper. The worst part about that was it wasn't immediately obvious that there even was an error, until much later when we realized that the results we were getting weren't consistent with what we expected.

Something glaring would be if we were getting values that were zeroed out or something like that, but when the only difference is that a slope is -1.4 instead of -1.6 or something like that, you don't immediately realize something's up.

Once we realized there was a bug, it took a long time to track down, and after that I was so frustrated I started writing a new library from scratch in Rust for our research and it worked out pretty well.

Anecdotal evidence for sure but it was really nice not having to triple check for memory race issues.

keldaris · on Nov 7, 2019

Can you elaborate a bit on how the bug worked and why Rust would have prevented it? I'm genuinely curious because the only bugs I've ever had with comparable presentation and consequences have been logic bugs which no set of language features could possibly prevent.

blub · on Nov 7, 2019

Why did it take a long time to track down the bug? You had a test case which triggered it, so in theory just running under valgrind would have been enough.

Or did it not happen in that case?

I'm not even gonna comment on the rewrite, that reads like an ad. :-)

slavik81 · on Nov 8, 2019

I really appreciate keldaris' posts, as his experience seems to be very similar to mine.

Many simulation algorithms involve manipulating large arrays of data. Rust can check that your indexes are within bounds, but it can't check that your indexes are correct. If you access X+1 to get the element to the left of X, but you should have accessed X-1, then you have introduced the same sort of error as you'd get from memory corruption (and, frankly, one that is much harder to debug. Standard library assertions, Valgrind and ASan are no help at all if you're accessing valid data.)

Nevertheless, I found it helped a lot to compile with `-D_FORTIFY_SOURCE=2 -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_DEBUG` during my normal development. Among other things, that adds bounds checks to all container accesses. I leave them off in release mode, as my full-scale simulations already take multiple hours and I need the performance.

blub · on Nov 7, 2019

Any code base which does simulations would likely have some automated testing covering a good chunk of the value domain + extremes, which lends itself very well to ASan and UBSan analysis.

I don't think OP said anything about writing code that's not thread-safe. They specifically said they're carefully designing such code. What they said is that due to the way they handle memory, most related errors don't happen in practice - which is perfectly believable. Floating point errors are separate issue, they could happen in almost any language.

jerf · on Nov 7, 2019

"ASan and UBSan analysis."

This is a personal quirk, but I consider C++ a different language from C++ + a lot of static analysis tools, on the grounds that most of my major beefs with C++ (and C) can largely be addressed by using powerful static analysis tools from day one. So from my personal, idiosyncratic perspective, I see two very different situations if the original poster is "using C++" or "using C++ and a suite of quality static analysis tools".

I prefer languages like Rust that take advantage of our decades of experience with C/C++ and write the most important stuff directly into the language, but in 2019 I still concede that "C(++) + good static analysis" is still a very economically compelling language for a lot of companies with existing C/C++ codebases. "Upgrading" to C+++SA is not cheap, but it's cheaper than any of the other options, as much fun as "rewrite in Rust" may be.

As for their belief that their code is thread safe, if they are using C++ and not C+++SA, I echo mike_hock's comment. It may be threadsafe, but from this distance and without a lot more evidence, my Bayesian priors say I don't believe it.

keldaris · on Nov 7, 2019

> I see two very different situations if the original poster is "using C++" or "using C++ and a suite of quality static analysis tools".

I guess I'm somewhere in between on that spectrum. I use clang-tidy and valgrind constantly, along with an everchanging set of profiling tools that occasionally find a potential bug as well. I've used cppcheck in the past, but didn't find it particularly useful, and I use whatever VS2019's static analysis tool is called now if I'm on Windows. I'm also extremely pedantic about compiler warnings, in that I either have a specific reason for disabling a warning entirely for that codebase or it will get fixed. Exhaustive (often mathematically exhaustive) tests go on top of this.

That being said, I don't use any of the sanitizers in my usual workflow. The reason is simple - most of them don't offer anything I need. The huge set of bugs around excessive dynamic memory allocation simply doesn't arise when you don't do dynamic memory allocation, so ASan/LSan/MSan don't give me much. UBSan might theoretically help, but only with issues that have so far been very trivial to debug (accidentally misaligned pointer in explicit SIMD code and the like, results in instant crash and an immediate fix). The one exception is the ThreadSanitizer and when I happen to finally get a non-trivial deadlock, that'll certainly be my first stop. As it is, most of the multithreading I've done recently has been fairly trivial from a compsci point of view (and for good reason, overly fancy algorithms aren't often particularly performant on current architectures).

Regardless, I'm entirely open to the possibility that I've neglected a tool that might help me further. Strong tooling is one of the major advantages of C++ and I've tried most of the major tools at one point or another. I do require demonstrable benefit in the context I work in to continue using them, however, and I've often found it worthwhile to simplify my workflow by cutting something out that turned out to be mostly superfluous.

comex · on Nov 7, 2019

ASan and UBSan are dynamic analyses.

mike_hock · on Nov 7, 2019

As a rule of thumb, if "I don't care about memory safety and thread safety" is followed immediately by "my programs don't have memory corruption or race conditions," the latter statement is usually false.

pdpi · on Nov 7, 2019

Like I said, it's a matter of safety versus security.

Use-after-free, concurrent read/writes, forgetting to acquire a lock are all problems that can crash an application or cause correctness issues. Preventing all of those falls under the "safety" heading, and is still useful in a security-permissive environment.

Now, what do you mean by "investing so much effort"? At the language level, it's valuable effort that lets us achieve security in many security-conscious applications, and we get the safety for free in less security-critical environments as a knock-on. Score! From an application developer's point of view, though, you're right. It might genuinely not matter, especially if you're not otherwise motivated to learn the language (a lot of that effort is a one-off and is amortised over your lifetime as a developer, if you choose to learn it). Not all applications benefit that much from the safety guarantees, and you might just be better off not even putting in the effort of using the language if something else is a better fit for you.

thr0w__4w4y · on Nov 8, 2019

> This is a common misconception — safety and security are different things.

IMO this could either be read as: - "The misconception that safety and security are different things is common", i.e. they are not different, but the misconception is common. - "The misconception [above, that safety and security are not different things] is common - safety and security /are/ two very different things."

I hope you mean the latter (safety != security). I consult in security (hardware and software), and I came to security by way of high-availability, high-consequence systems.

I've seen many safe, but insecure systems. I don't know if it's possible to have an unsafe, secure system (I doubt it?) but absolutely, safety != security. There is an overlap of sorts, but to say that these 2 "are not different things" is why I have lots of work.

Wow, sorry, I'll get off my soapbox. I'm still not sure if perhaps we're in "violent agreement" as the saying goes.

pdpi · on Nov 8, 2019

The last sentence in my post would’ve revealed that yes, we agree they’re different things :).

filmor · on Nov 7, 2019

When I was working in LQCD (lattice quantum chromodynamics), people were fighting quite a bit with corrupted memory, which only resulted in a crash in the best of all cases, usually it led to very or (worse) slightly weird results.

keldaris · on Nov 7, 2019

I can't really comment on that case without seeing the code, but maybe you can describe how Rust might prevent those issues - how was the memory corrupted? In the kind of code I write, I find it extremely trivial to avoid the stereotypical memory safety issues. That doesn't mean I don't have bugs, obviously - I make logic errors like everyone else, but no language will ever protect you from that. That's what tests and debugging are for.

pdpi · on Nov 7, 2019

Rust outright forbids a few Bad Things, such as "simultaneous" reads/writes on the same piece of memory, and read-after-free. E.g. reading partially-committed data from non-atomic writes is a really nasty form of data corruption that can easily occur in simulation-style code.

Of course, no language can protect you from all bugs, but some languages can protect you from at least some bugs. Rust can, and does, protect you from all data race-related bugs, and few other classes of issues.

keldaris · on Nov 7, 2019

I partly addressed this in another comment already, but perhaps I can try to convey a bit more of why I'm completely unimpressed by that argument for my use case.

First of all, the "Bad Things" aren't actually uniformly bad. In its zeal, Rust won't let you modify different elements of the same array from different threads (an issue that comes up almost immediately in the OP) even though you know with perfect mathematical certainty that to do so is perfectly safe. In the context of simulation code that's not nitpicking, it's a usecase that comes up all the time, I can't even think of a codebase I've worked on in the last 5 years that didn't have this pattern in it somewhere. I have never, ever encountered a data race bug in any of those cases (even though there are usually tests to cover it just in case), but switching to Rust would immediately force me to waste time (and code quality, in my view) on working around a language restriction. The same goes for read-after-free and most other memory safety features, because most of the reasoning there simply doesn't apply when you're dealing with statically allocated memory that's never freed, with static guarantees on sizes, etc.

Rust is very deliberately designed to address safety and correctness issues that exist in a style of programming that's just completely orthogonal to what I do. There is some inevitably some overlap and it's always possible to construct a contrived case where Rust could potentially prevent a bug I could conceivably make, but the point is that everything I've seen leads me to conclude that any such benefit would be vastly outweighed by the time unnecessarily spent on working around non-issues, dealing with syntax verbosity and semantic baggage (I distinctly do not want the notion of "lifetimes" at the language level at all), all of which means more code and, in all likelihood, more bugs.

All of this sounds very negative, but I really don't mean to be. If you buy into the dynamic allocation heavy style of programming Rust is designed to improve, Rust looks great. It also seems to have an educated and thoughtful user community for the most part, something I actually envy a bit. I just don't think people should fall into the narrow view of thinking that Rust is a universal good - it takes a particular approach, with very stringent limitations and the limitations aren't there because Rust is bad, they are there to make it great... just not for everyone.

steveklabnik · on Nov 7, 2019

> In its zeal, Rust won't let you modify different elements of the same array from different threads (an issue that comes up almost immediately in the OP) even though you know with perfect mathematical certainty that to do so is perfectly safe.

To be clear, Rust will let you do this, as long as you're communicating that you know this to the compiler. split_at_mut and scoped threads are two ways that this is possible without requiring you to write your own unsafe code.

keldaris · on Nov 7, 2019

Right, the OP solved this using Rayon's par_chunks_mut, then enumerate and foreach over that, so I assumed that's the most ergonomic way to work around the issue. I certainly wasn't claiming that there are no workarounds, merely noting that from my - obviously biased - point of view this looks like yet another annoying hoop to jump through. There's obviously the opposite point of view which holds this up as a useful safety measure and determining which is correct requires knowing the context you work in. Given the vanishingly tiny amount of time I've spent on debugging issues this restriction might have prevented Rust just looks very unergonomic to me, but the opposite perspective can be just as valid given the right context.

pixel_fcker · on Nov 7, 2019

I think the key difference here is that in your code you’re saying you “know” that it’s safe to write to these certain sections of the array from different threads. That’s an invariant that‘s held in your memory (or in a comment maybe).

The rust equivalent is doing the exact same thing but requires you to formalise that invariant in your code by using one of the methods defined above.

I can see how initially this looks like an extra hoop to jump through but after using it for a while I’ve found the opposite is true: because those invariants are checked by the compiler it’s one less thing I have to keep in my head and I can concentrate on the real problem.

safercplusplus · on Nov 7, 2019

And also one less thing to transfer to the heads of anyone else that might need/want to understand the code. (Often the author him or herself at some point in the future, right?) Though equivalent facilities are available in C++ [1][2] (and can now/soon be enforced[3]).

[1] https://github.com/duneroadrunner/SaferCPlusPlus/blob/master...

[2] https://github.com/duneroadrunner/SaferCPlusPlus/blob/master...

[3] https://github.com/duneroadrunner/scpptool

pdpi · on Nov 7, 2019

> it's always possible to construct a contrived case where Rust could potentially prevent a bug I could conceivably make, but the point is that everything I've seen leads me to conclude that any such benefit would be vastly outweighed by the time unnecessarily spent on working around non-issues

Yeah, I think at this point we're just agreeing very loudly :)

Rust absolutely has some objective advantages over C++, as does C++ over Rust. The choice of which set of (dis)advantages works best is very much something that can only be determined on a case-by-case basis.

87zuhjkas · on Nov 7, 2019

I think C++ might be a better choice if you don't care so much about correct results from your computations. I think there are certainly some scenarios where this is the case.

FpUser · on Nov 7, 2019

This does not sound nice at all. What on Earth correctness has to do with avoiding C++?

87zuhjkas · on Nov 7, 2019

> Rust is very deliberately designed to address safety and correctness issues that exist in a style of programming that's just completely orthogonal to what I do.

My reasoning is: If rust addresses correctness issues that C++ does not, then it is more likely that rust programs produce correct computing results than C++ programs on average. But this is just an assumption and I might be wrong.

> This does not sound nice at all.

Ah that's true, sorry for that, I should have expressed it in a different (nicer) way.

keldaris · on Nov 7, 2019

Since you quoted me, I'd like to clarify my statement. When I said:

> Rust is very deliberately designed to address safety and correctness issues that exist in a style of programming that's just completely orthogonal to what I do.

this doesn't in any way imply that Rust addresses correctness issues that all C++ code is subject to. Rather, Rust addresses correctness issues that are endemic in particular kinds of C++ code, which is qualitatively different from the code I find myself writing (probably much more due to domain specifics than any personal skill). Therefore, in that context, I don't see much reason to believe Rust will on average produce better results than C++ in terms of correctness, even though it might very well do so in a different context (like the context it was actually designed for).

That being said, I obviously do care about correctness. If I had reason to believe that Rust will on average lead to satisfactorily correct code with less effort than it would take me to achieve the same level of correctness in C++, absent any other major contraindications I would switch languages. Personally, I would be rather happy to switch away from C++ - unfortunately, most alternatives so far look considerably worse given the specific context I operate in.

gameswithgo · on Nov 7, 2019

Rust has a lot of correctness features that are useful, and some not related to memory safety, for example 3 that come to mind:

options and results instead of null pointer or using bit flags to indicate invalid states (a recent sudo exploit would not have happened in a language with option types)

everything is an expression so you do not have to create uninitialized variables and then set them later inside a switch or if statement.

much less (no?) undefined behavior

for someone working in a particular C++ niche who has developed strategies to avoid all of these problems already, then switching to Rust certainly may not be worth the cost involved in learning something new, but if you were to start from scratch and pick one of the two languages, there might be good reasons to pick Rust for the same task.

keldaris · on Nov 7, 2019

You can have options and results in C++ if you like (I sometimes use custom result types, and I certainly don't use exceptions), but there's no language-level support for them and that's valuable, I agree. Not sure I understand the second point (I don't have to create uninitialized variables in C++, though I may sometimes want to). As for undefined behavior, I don't personally view that as an issue at all for the most part. I write code for a specific set of compilers running on a specific set of hardware, not an abstract standard. The behavior is what the compiler does (or rather, what I cause it to do) and there's nothing undefined or arbitrary about that.

Anyway, I agree that some aspects of Rust unrelated to memory safety are good for correctness. Unfortunately, I can't pick languages in a vacuum, so I have to weigh that against things like GPGPU support (first rate vs. non-existent), tooling quality (particularly profilers), library support (Eigen alone is worth quite a lot) and other factors. If I could ignore all of those real world issues and just choose the better language, I don't know if I would choose Rust, but it would certainly have a decent shot.

lmm · on Nov 8, 2019

> You can have options and results in C++ if you like (I sometimes use custom result types, and I certainly don't use exceptions)

It's not really practical because C++ has no true sum types. You can emulate them with a Java-style visitor pattern but that carries an immense code overhead.

lenkite · on Nov 8, 2019

You have std::variant and std::visit. https://www.bfilipek.com/2018/09/visit-variants.html Or you can use a library: https://github.com/mpark/patterns

lmm · on Nov 8, 2019

> You have std::variant and std::visit. https://www.bfilipek.com/2018/09/visit-variants.html

Which isn't a true sum type because it doesn't nest properly.

> Or you can use a library: https://github.com/mpark/patterns

Interesting; proper pattern-matching is nice, but the lack of type safety is still a major issue.

filmor · on Nov 8, 2019

One thing I remember vividly was an off-by-one error off a colleague in a nested loop. Since we were using periodic boundary conditions, the result looked almost right, but of course the very last read would be garbage. Depending on the lattice size this was not a glaring problem, but sometimes the very last (out-of-bounds) read would trigger a segfault.

Incidentally, I rewrote that piece of code using a self-made `lattice_iterator` in C++ that "linearised" the code just to simplify it and make debugging easier and everything started to work. We found the off-by-one error afterwards by comparing the iteration behaviour of a single run.

This kind of problem can be caught by tests (though it's difficult, you can't tell me that you are really testing each individual iteration behaviour of your code), the advantage of Rust's approach is that this wouldn't even have passed the compiler.

nwallin · on Nov 8, 2019

It sounds like you were dealing with code written by physicists. Which probably means they were doing literally everything wrong. And there's a snowball's chance in hell you'd be able to convince them to rewrite it in rust, because "it's too hard". And even if you convinced their boss to force them to rewrite it in rust, everything would be in unsafe blocks. And they'd complain the language is too verbose because every function has to be cluttered with unsafe blocks. They wouldn't complain about the unnecessary indentation though, because all of their code either isn't indented or is nested off the right side of the screen.

Not that I'm bitter. (I'm bitter)

Academic code being shitty can't be fixed by inventing a better language.

Razengan · on Nov 7, 2019

> I know I'm in a weird minority (most of the people who think like I do seem to be game engine developers).

As an amateur game engine developer, I'm probably in a weird minority for abhorring C/C++/C#/Canything. :)

I love Swift, and Rust or Dart would be my next choices if Swift wasn't available.

With first-class support for native APIs which do most of the work anyway, performance has not been a concern for me on Swift. I see only benefits, like making it hard for some types of bugs to creep in and code that is pleasant to read and a joy to write (though you can write beautiful/ugly code in any language, of course.)

bulldoa · on Nov 7, 2019

Not well verse in the subject, do people develop game engine in garbage collected languages? Intuitively I would think game engine requires absolute performance

gameswithgo · on Nov 7, 2019

Sometimes. And sometimes the core engine may be in C++ but the game logic is in C# (Unity).

There are plenty of games whose performance demands are not so high, that a GC language can work fine. There are also many examples of realtime 3d games, that used garbage collection, and were extremely successful (minecraft, subnautica). These were high quality games, though both do suffer from performance downsides, related at least indirectly to using GC languages.

mratsim · on Nov 7, 2019

There is a growing community of game developers using Nim.

But Nim doesn't have a single GC, it has multiple and the default one can be compiled with real-time latency constraints so that you can use to make sure that you don't stop the world for less than 1/200 of a second for example.

Plus the GC is per type, you can mix raw pointers for manually managed memory and references for GC-managed objects. The GC can only be triggerred in code paths with references.

nwallin · on Nov 8, 2019

Game engines absolutely not. They're almost all written in C++ with exceptions and RTTI disabled and no STL. Gamdev C++ can better be thought of as C+. It's kinda sorta not the same language.

However, a lot of game engines use GC languages to operate game logic. Lots of them use Lua, unity uses c#. The ubiquity of Lua in gamedev has resulted in luajit being shockingly good if you're pushing around a bunch of floats.

Razengan · on Nov 8, 2019

For things like the latest Doom or pushing 3D hardware to its limits, I'd think no.

I can't say how far you could get in pure Swift for 3D engines, but for 2D games Swift is more than good enough, and you can always interop with C/C++ when you really need to.

RMarcus · on Nov 7, 2019

My experience is entirely different -- writing HPC code for supercomputers at Los Alamos National Lab (on and off for 5 years) made me a true Rust believer.

One of the things I spent the most time on with Fortran / C++ codes was debugging wrong-result bugs. About 90% of the time, the wrong result came from some edge-case where an array was wrongly freed too early, an array was accessed out of scope, or a race condition caused an array member to be updated in a non-deterministic manner. Each of these bugs required hours of debugging and was a huge time sink. Once I started working with Rust, I never encountered any of these bugs. After about a year of fighting the borrow-checker, I feel my overall efficiency has greatly improved.

Now, when I go back and write or read C++ code, patterns that the Rust compiler would yell about jump out at me (multiple unprotected mutable references, cloning unique pointers), and I find these are generally a source of the bug I'm hunting. Like sibling comments point out, a lot (but not all) of the things Rust stops you from doing are just bad practice anyway.

Of course, for GPGPU stuff I have to write CUDA or OpenCL, but those are generally small, compact kernels that are easy to reason about end to end.

I'm not suggesting that you are doing this, but for me, I initially resisted Rust for a long time. Rust seemed extremely complex, and whenever I'd try to use it I would run into a wall. The loud Rust community talking about how great Rust was and how easy it was to use once you "got it" made me feel stupid. Instead of being humble, I became arrogant, and I'd say things like "Rust is too restrictive for the high performance applications I care about" or "I write code that Rust would find unsafe but is actually super well-tuned for this architecture." For me, these were mental excuses I made because I was unable to accept that I was having such a hard time with Rust, and I considered myself a "high performance computing software engineer!"

It took me way longer than most to "get" Rust -- over a year of repeatedly forcing myself to learn and stumble through compiler errors before things started to click. A year after that, and I'm still frequently surprised by certain aspects of the language ("really? I need a & in that match statement?" and "oh god, what does this lifetime and trait bound mean..." are two of the most common). But the parts of Rust that have clicked for me (the borrow checker and associated lifetime mechanics) make Rust very enjoyable to write.

Again, I'm not suggesting that you are falling into the same trap I did, I just wanted to post this to encourage anyone else in the "banging their head against the Rust compiler" stage to power through!

keldaris · on Nov 7, 2019

Thank you for sharing your experience! I'd be very curious to hear more about your experience in doing GPGPU work in Rust - it was my understanding that there was virtually no tooling, libraries or support for that kind of thing beyond the existence of the C FFI.

On the broader point, I suspect a large part of the reason we've had such contrasting experiences is just a radically different mindset behind the C++ codebases we've dealt with. Wrongly freed or out of scope arrays scream of exactly the kind of C++ code Rust was designed to address, and as far as I can tell it is indeed great at doing that. On the opposite extreme, when you have statically determined sizes and bounds, all allocations happen at startup and nothing ever gets freed, that entire class of issues simply doesn't arise in the first place. The reason why the overwhelming majority of the bugs I debug are either silly typos or plain logic errors isn't because I'm particularly good at this, it's just a different approach to programming that's easy to pull off in simulation code (or embedded systems, or game engines), but probably rather more difficult in other kinds of applications.

Anyway, I'm glad you're enjoying Rust and I hope it'll have more of a scientific / numerics / GPGPU ecosystem in the future. More viable languages can only be a good thing for us computational scientists.

wyldfire · on Nov 7, 2019

> I do not care about safety (memory or otherwise) - my code doesn't take arbitrary input, run on shared hardware, do much of anything over networks or have memory safety crashes

How do you know it's the case that you don't have memory safety defects in your implementation?

It's almost certainly not the case that you don't care about safety. Out of bounds accesses and writes would make the simulation defective. Defective simulations are useless. However it might make sense if you said that you don't find yourself fixing defects like these often.

keldaris · on Nov 7, 2019

I fix out of bounds errors fairly frequently, but every single one of them is a trivial typo that is caught instantly and takes under 10 seconds to fix and re-test, literally. The reason for that is that in my domain all the memory requirements that matter [1] are known statically down to exact sizes, which are mathematically provable. Making good use of this fact makes most memory safety bugs either impossible (like the usual double frees, use after free, etc.) or trivial to address (like out of bounds accesses). The common approach, which frankly horrifies me, of dynamically allocating different objects all over the place at runtime and trying to care about it to the least possible extent is just utterly alien in this domain (which I think is the point of similarity to game engines). Consequently, things like the borrow checker, that are fundamentally designed to make this common approach safer and less bug prone elicit a "um... why?" sort of reaction from people like me, because they seem to be completely beside the point [2].

[1] Yes, there are scratch buffers for things like log output and other details of convenience. None of them impact the actual simulation and all are easily handled by a trivial bump allocator over a static buffer. Bugs are unlikely and rare, but if they do happen they don't really affect anything of consequence.

[2] The borrow checker is just a random example, I also tend to avoid most of "modern" C++ for the same reasons and write in a fairly orthodox style.

steev · on Nov 7, 2019

Not to get too much into the weeds, but in my experience in HPC for numerical work out-of-bounds read/writes are almost never an issue. I say almost because I'm sure someone somewhere as slipped up, but I've literally never had this problem. A priori you know your bounds and looping within them is trivial. I don't think I've ever encountered any data where you did not know the dimensions at program startup. I have experimented with Rust because I thoroughly enjoy the language (but am not an expert), and immediately got bit by bounds checks during vector indexing.

I also find if you write modern, idiomatic C++ code you rarely, if ever, have to worry about memory safety issues.

pcwalton · on Nov 7, 2019

> I also find if you write modern, idiomatic C++ code you rarely, if ever, have to worry about memory safety issues.

This is empirically not true. Tons of memory safety issues are found (and exploited) all the time in modern idiomatic C++ codebases.

FpUser · on Nov 7, 2019

This is very generic statement. I do write servers for example that run non stop and frankly I've long forgotten when the production version caused memory issues. Basically combination of being diligent, using memory leak detection tools and certain programming style works for me just fine.

For example my latest game server is up more than 2 month already and the only reason it gets restarted is that I am updating it with the new version.

pcwalton · on Nov 7, 2019

You get a very different view when you're looking at actively attacked codebases. Memory safety issues are everywhere.

https://twitter.com/lazyfishbarrel is very much worth reading.

pnako · on Nov 7, 2019

That twitter account mostly reports security bugs in 20-year-old C libraries. Nice, but hardly an argument against "C/C++", and completely irrelevant to people writing numerical code in modern C++.

For years, we've been telling newbies not to use the expression "C/C++", which is incorrect. Now the Rust community disingenuously keeps pushing this outdated meme; I say it's disingenuous because they are informed enough to know that C and C++ are distinct languages, yet they use the well-known flaws of C to attack C++ when it can advance their moral crusade for memory safety.

pcwalton · on Nov 8, 2019

Browsers are not "20-year-old C libraries".

These issues are every bit as much of problems in C++. In fact, there is a reasonable argument that modern C++ is less safe than old C++, because of features like lambdas that practically invite use-after-free.

pnako · on Nov 8, 2019

Browsers depend on tons of 20-year-old C libraries. At the moment, the top link from the Twitter account you gave above is this one from November 6: https://twitter.com/LazyFishBarrel/status/119228101802954342...

It reports a total of 37 issues in:

  - freetype2 (C lib, 20+ years old)
  - usrsctp (C lib, age unknown)
  - libexif (C lib, age unknown)
  - libxslt (C lib, 20+ years old)
  - imagemagick (C lib, 20+ years old)
  - mruby (C)
  - php (C)
  - openSSL (C, 20+ years old)
  - curl (C lib, 20+ years old)
  - ffmpeg (C lib, 18 years old)
  - ghostscript (C lib, 30 years old)
  - irssi (C, 20 years old)

In that list were also Skia and libsass, two projects actually written in C++.

In Sass, the issue is a nullptr issue: https://github.com/sass/libsass/issues/3001

In Skia the bug was in intrinsics code: https://skia.googlesource.com/skia/+/0f55db539032a23b52897ae...

Of course that's a single data point, but it shows what I think is a reasonable argument: most of the issues indeed happen in (old) C code, for well-known reasons (no standard string, array or collection support, no RAII), but because C++ supports those things by default it largely avoids those issues.

pnako · on Nov 7, 2019

Most of those are issues in old, _C_ code. No one disputes the fact that C is a mine field due to its complete lack of support for things like arrays and strings.

But that's not really an issue in modern C++. It's only really a problem when you want to implement your own data structures with raw pointers, in which case, yes, you have to be careful and write tests and use sanitizers, Valgrind, etc.

FpUser · on Nov 7, 2019

I can not be responsible for however said highly attacked systems were designed and hence can not judge.

In my own cases I use proprietary protocols for client-server communications that more or less ensure that memory bounds are not broken.

Of course attackers might be able to punch holes in lower layers ( UDP for example ) over which I have no direct control but in this case Rust would use the same UDP stack and offer no advantage.

mcqueenjordan · on Nov 7, 2019

I’m glad to hear that your code is unbreakable and without any bugs, but pcwalton’s claim is still absolutely correct.

> This is empirically not true. Tons of memory safety issues are found (and exploited) all the time in modern idiomatic C++ codebases.

FpUser · on Nov 7, 2019

"I’m glad to hear that your code is unbreakable and without any bugs, but pcwalton’s claim is still absolutely correct"

I smell sarcasm here. I do not claim my code to be unbreakable. I do believe it is REASONABLY safe by design. pcwalton's claim is generic claim about generic code that may have no relevance to particular situations. Mine for example

jcelerier · on Nov 7, 2019

let's be serious, chromium and firefox are more 90s style codebases than 2010s. There's thousands of raw malloc calls when I grep in the chromium source tree, and let's not even start talking about firefox where in the same file you've got :

- raw mallocs : https://github.com/mozilla/gecko/blob/central/dom/plugins/ip...

- new / delete : https://github.com/mozilla/gecko/blob/central/dom/plugins/ip...

- "whatever.Allocate<T>" : https://github.com/mozilla/gecko/blob/central/dom/plugins/ip...

and that's not limited to a single file... look at this :

https://github.com/mozilla/gecko/blob/3e6d6e013400af38f85ceb... - some malloc and new, again

- you also get some unique_ptr (because "modern" m'see) : https://github.com/mozilla/gecko/blob/3e6d6e013400af38f85ceb...

- moz_xmalloc because why not ? https://github.com/mozilla/gecko/blob/3e6d6e013400af38f85ceb...

- oh and did you know about our own custom reference counting pointer ? https://github.com/mozilla/gecko/blob/3e6d6e013400af38f85ceb...

etc etc... when you've got 35 different ways to allocate objects used willy-nilly of course things go wrong. Most modern codebases only ever use automatic storage, and unique / shared_ptr.

pcwalton · on Nov 7, 2019

The issues have nothing to do with the differences between STL smart pointers and Mozilla/Chromium smart pointers.

nwallin · on Nov 8, 2019

They are '90s codebases, they're not just "like" '90s codebases. Firefox obviously dates back to Netscape, and webkit (which Safari, Chromium, Opera, Vivaldi, and now Edge are based on) is a fork of KHTML.

Mozilla could have achieved 90% of what it wanted from a rust rewrite with a modern C++11 rewrite at a quarter of the cost. Linter rules that say "no new or delete", "either unique_ptr<T> or shared_ptr<const T>", and "only construct unique or shared ptr via make_unique and make_shared" get them like three quarters of the way there.

The thing that makes rust great is that the static analyzer is built into the compiler and has strict defaults. C++ is the same language, but clang-analyze and clang-tidy are shipped as separate packages and have more permissive defaults.

pcwalton · on Nov 8, 2019

Those rules are completely insufficient. It's worth looking at the actual vulnerabilities here.

There is a reasonable argument that modern C++ is less safe than old C++, because features like lambdas are very prone to use-after-free.

jzoch · on Nov 7, 2019

>being diligent, using memory leak detection tools and certain programming style

Thats the whole point. You need to do these things in C++, not in rust. In rust you get it for free and dont need to be an expert and use runtime detection tools or even static analyzers besides your compiler (w.r.t to memory safety and some classes of data races. These things can be useful in other domains)

People make the same assertions about dynamically typed languages at scale and how you "only" need to write tests that assert the types or "i wrote the function and know which type is passed duh" or "i write unit tests that would catch this" when a statically typed language tells you at compile time whether or not it will work. No intelligence required.

FpUser · on Nov 7, 2019

You have a valid point. However practically speaking smart pointers in C++ eliminate most of headaches. At least for me personally so I do not really consider it a big nuisance. But yes I agree that for many people choosing Rust could be preferred way.

MaulingMonkey · on Nov 9, 2019

> memory leak detection tools

Hopefully you're not only relying on those - valgrind, address sanitizer, fuzzing tools, static analysis, etc. are a must for network-facing C++ (or unsafe Rust) as far as I'm concerned. You're not just looking for leaks, but use after free bugs, single byte overflows, bad casts triggered by bad data, and a whole slew of other potential problems.

jandrewrogers · on Nov 7, 2019

I can't speak for all modern C++ code bases but this assertion is manifestly false in every modern C++ code base I have come in contact with for a long time now. There have been some gross exploits in publicly audited "safe" Rust code in recent months -- does this mean no one should use Rust? Are you going to make a hobby of denouncing Rust in public forums as a consequence?

I don't understand the desperate need to paint all modern C++ code bases as dangerously unsafe. It is demonstrably not true and doesn't reflect well on the motivations of those that would blindly assert it. Modern C++ has many issues and, like all programming languages, is the scene of many bugs. Just not memory safety issues. Furiously asserting that memory safety is an issue does not manufacture fact.

pcwalton · on Nov 8, 2019

> I don't understand the desperate need to paint all modern C++ code bases as dangerously unsafe.

Because the idea that security vulnerabilities can be fixed by just "modernizing" C++ codebases is actively harming security, by discouraging investment in memory-safe languages.

> It is demonstrably not true and doesn't reflect well on the motivations of those that would blindly assert it.

It is demonstrably true, as http://twitter.com/lazyfishbarrel shows. Perhaps consider that those of us who work on browsers, which are some of the largest most-attacked pieces of software in the world, would know what we are talking about.

zarkov99 · on Nov 7, 2019

I work in a domain where memory defects do matter, but with moderately disciplined development, that is no naked news/deletes, no naked array access, no locks, etc, we hardly see any memory problems in a high concurrency, high throughput, mission critical environment. At least for my group, Rust's memory safety focus would not move the needle.

vbarrielle · on Nov 7, 2019

The very popular C++ linear algebra library Eigen has bounds checking for matrices by default, but you can turn them off when defining NDEBUG. This means most out of bounds accesses are found during development.

bluGill · on Nov 7, 2019

Memory safety defects tend to manifest themselves in obvious ways. A few unit tests, and some work in memory sanitizes will find them.

pcwalton · on Nov 7, 2019

This is empirically not true. A look through https://twitter.com/lazyfishbarrel confirms this.

bluGill · on Nov 7, 2019

The exceptions tend to escape notice for years. They get a lot of attention (and are often really hard to fix unlike the early ones), but I stand by my statement: most are easily found and fixed - but they are also fixed early in development so you don't hear about them.

RaleyField · on Nov 7, 2019

If it's so obvious then why am I receiving security patches for my Linux desktop almost every day?

bluGill · on Nov 8, 2019

Many reasons.

There is a long tail of exceptions to my statement, hard to find things that escape notice for years.

There are a lot of security issues that are not really memory safety as we are talking about here. (many are memory safety in a way that has nothing to do with getting your allocate/free wrong - using uninitialized memory for example). Some of them are subtle new attacks that were just discovered and now need to be mitigated.

integricho · on Nov 7, 2019

Luckily, we have sanitizers now.

red75prime · on Nov 7, 2019

How are you dealing with data races? Manual synchronization (memory barriers, and the like)?

keldaris · on Nov 7, 2019

For the vast majority of cases I've found it possible (and very much worthwhile) to just think a bit harder and avoid race conditions by construction (for instance, reformulating an algorithm so that writes only occur to different cache lines in a single pass, etc.). That's obviously not always feasible, but in my experience people give up far too quickly.

For the remaining few cases - yes, manually inserting the minimal necessary amount of synchronization (on x86-64 that's often a single instruction). On GPUs you obviously need memory barriers fairly frequently, but that's much less of an issue than on CPUs. Anyway, I don't have any faith in "smart" compilers or language features (which have large costs elsewhere) that try to free me from the simple necessity to carefully think through what my algorithm actually does before I implement it. If you care about getting close to optimal performance, you'll have to do the thinking anyway.

FiberBundle · on Nov 7, 2019

Thinking through potential race conditions is certainly the right thing to do, but we still make mistakes and concurrency bugs are hard to find and sometimes even to notice. I'd still rather have the borrow checker keep an eye on my code and if its unnecessarily conservative in a situation that you know is thread-safe you can still use unsafe code.

dgellow · on Nov 7, 2019

Thanks for the answer! :)

mratsim · on Nov 7, 2019

ffast-math is not needed when you unroll manually and write your own accumulators.

See my benchmarks [1] with fast-math in the middle and the generated assembly comparison at the bottom.

The base language is Nim but pure C code should generate the same assembly.

[1] https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1....

keldaris · on Nov 7, 2019

That's absolutely right (and thank you for the reference), but this benchmark starts off with fairly naive code where -ffast-math may still help considerably.

mratsim · on Nov 7, 2019

If in-between the start and the end, I went up to 8 accumulators and in files in the same folder I have the same with SSE and AVX instructions.

I actually wanted to know exactly what ffast-math did, down to the codegen (so you have assembly at the bottom).

nwallin · on Nov 8, 2019

-ffast-math quickly loses its value once you start dispatching raw intrinsics. -ffast-math tells the compiler "just figure or the math for me, I know what I want but don't know how to get it", but raw intrinsics say "use this instruction, but deal with manual register management for me."

If you take the code `v /= v.length();` and given -ffast-math the compiler can easily compile that as a mul, two fmas, an rsqrt, and three muls.

But if you take the code: (can't format right on mobile) (edit: fixed on desktop)

  __m256 len = _mm256_mul_ps(v.x, v.x);
  len = _mm256_add_ps(len, _mm256_mul_ps(v.y,v.y));
  len = _mm256_add_ps(len, _mm256_mul(v.z,v.z));
  len = _mm256_sqrt_ps(len);
  v.x = _mm256_div_ps(v.x, len);
  v.y = _mm256_div_ps(v.y, len);
  v.z = _mm256_div_ps(v.z, len);

You've told the compiler exactly which instructions to execute. And it will do so faithfully even though it's like two orders of magnitude slower.

I would be very interested to see -Ofast used before SIMD intrinsics start getting thrown around, but it has no value afterwards. The code in the linked article is eventually almost all SIMD intrinsics (v4?) so it won't make a difference.

mkbosmans · on Nov 8, 2019

What are you talking about? Intrinsics don't force the compiler to use a specific instruction. It's not even a 1:1 mapping.

I think all compilers will fuse separate add and mul instrinsics to fmadd.

Both GCC and Clang will convert the _mm256_div_ps to a vmulps. GCC calculates the scaling factor by combining vsqrtps and vrcpps. Clang will emit a vrsqrtps instruction, but as that has only 12 bits of guaranteed accuracy, it fixes the result up before using it to scale x, y and z.

https://godbolt.org/z/rG2rH8

pcwalton · on Nov 7, 2019

Honestly, a lot of this comes down to whether you prefer iterators or C-style for loops. I prefer iterators myself, as it's way too easy to make C-style for loops unreadable. Reasonable people can disagree, of course.

kbumsik · on Nov 7, 2019

Maybe because I'm not familiar with Rust, but I always have kind of impressions that Rust code is very hard to read. It is just impossible to figure out what the code is doing for beginners.

[C++ code]: https://github.com/parallel-rust-cpp/shortcut-comparison/blo...

[Rust code]: https://github.com/parallel-rust-cpp/shortcut-comparison/blo...

edflsafoiewq · on Nov 7, 2019

The Rust is practically line noise. This is just awful

    let pack_simd_row = |(i, (vd_stripe, vt_stripe)): (usize, (&mut [f32x8], &mut [f32x8]))| {
        for (jv, (vx, vy)) in vd_stripe.iter_mut().zip(vt_stripe.iter_mut()).enumerate() {
            let mut vx_tmp = [std::f32::INFINITY; simd::f32x8_LENGTH];
            let mut vy_tmp = [std::f32::INFINITY; simd::f32x8_LENGTH];
            for (b, (x, y)) in vx_tmp.iter_mut().zip(vy_tmp.iter_mut()).enumerate() {

And I tend to write stuff like this in Rust too. The loops especially. It's just so easy to throw together all those iterator combinators and get some ugly blob that's going to make people's eyes glaze over.

cogman10 · on Nov 7, 2019

I honestly don't see what's so bad about this. About the only thing that might be more useful is better variable names and maybe a function or two to clear up exactly what's going on. But even still, this is math heavy code. I don't think I've ever seen math heavy code (particularly dealing with matrix manipulation) that didn't end up looking like this.

brutt · on Nov 7, 2019

Code is easy to follow. I see no problem there.

`pack_simd_row` is lambda: `|arguments: types| { body }` Arguments are: * `i` of type `usize` (unisgned size_t), * tuple (anonymous struct) with two fields: vd_stripe and vt_stripe, which are modified inside of lambda. They are references to fixed size array of 8 floats.

Inside function we have loop over result of iterator, which produces tuple with two fields: `jv` and tuple with two fields: `vx` and `vy`. `vx` and `vy` are elements from `vd_stripe` and `vt_stripe`. `jv` is their index.

Inside loop we create two temporary mutable variables: `vx_tmp` and `vy_tmp`, which are fixed size array of 8 floats, which are initialized with infinity.

Then we have next loop, which goes to modify these temporary arrays in place.

And so on.

jml7c5 · on Nov 7, 2019

While the Rust version does flow logically, succinctness helps a lot more than people seem to appreciate. It's one of the reasons math notation is often so inconsistent: the brevity allows for easier manipulation and scanning. (Though I would not stretch the analogy too far, as inconsistent and overy brief notation can limit understanding while reading proofs.)

Being able to fit part of a program clearly into 5 lines with fewer characters makes it easier to ensure correctness than an algorithm spread over 10 lines that is full of extra "noise". It's why there's so much syntactic sugar in so many languages.

tluyben2 · on Nov 8, 2019

It is easy to follow but I cannot say it's very pleasing to read.

snaky · on Nov 7, 2019

Maybe it's time to propose alternative syntax for Rust?

rezeroed · on Nov 7, 2019

I could produce greater horror with c# linq.

tluyben2 · on Nov 8, 2019

Please give us an example. I have read a lot of linq; it never looked quite this horrible so I am curious.

exDM69 · on Nov 7, 2019

The reason the C++ code looks simple is that it uses OpenMP #pragma parallel for, which is a very easy way of doing very simple loop parallelization.

By contrast the Rust example uses parallel iterators from the Rayon crate. If the C++ example had something similar using a C++ library, it would probably be worse.

It's not Rust vs. C++, it's OpenMP directives vs. explicitly writing multithreaded code.

gpderetta · on Nov 7, 2019

Well, the whole purpose of OpenMP is to be a syntactically lightweight way of parallelizing code; it is a well supported standard. C++17 in fact has paralel extensions to the standard library, but if you have access to OpenMP it is often better.

OpenMP is not C++ specific, it is also supported in fortran and C (although the newer OpenMP standards have first class support for C++ iterator semantics). In principle it could also be supported in rust although I assume that making it work with the borrow checker might not be easy.

blub · on Nov 7, 2019

The calls starting with par_ are by far not the biggest readability problem. The Rust code has a lot of noise, whereas the C++ reads like almost like a straight C implementation that anyone could understand.

exDM69 · on Nov 7, 2019

Would you like to point out specific instances of this "noise"?

The C++ code looks like straightforward "C code" because it's using the OpenMP parallel for. If it dealt with threads explicitly it would not look as pretty.

I do agree that the Rust code has some verbosity, for example there are a lot of type annotations and using named constants instead of magic values.

For example (C++):

    int na = (n + 8 - 1) / 8;

Versus (you could write this like the above too):

    let vecs_per_col = (n + simd::f32x8_LENGTH - 1) / simd::f32x8_LENGTH;

An example of the type annotations (are they really necessary?):

    let pack_simd_row = |(i, (vd_stripe, vt_stripe)): (usize, (&mut [f32x8], &mut [f32x8]))| {

Comparing the actual internals of the algorithm is quite straightforward, e.g. look at the middle loop (lines 70-82 in Rust vs. 58-97 in C++). Do note that the C++ code is loop unrolled by hand which makes it look simpler.

In the Rust code, there's a pretty cool feature where multithreading can be toggled on/off with a compile time flag, see parts with: #[cfg(not(feature = "no-multi-thread"))]

If the C++ example wouldn't use OpenMP, this would be pretty impossible to achieve without being noisy too.

Also note that even with OpenMP, it would be quite easy to create a memory unsafety issue in the C++ code by for example accessing an array out of bounds inside the loop body.

Apart from these, I have no problems reading and understanding the C++ or Rust code (and I'm a noob with Rust).

imtringued · on Nov 7, 2019

From what I have read Rust users don't seem to care about OpenMP at all. Some even go as far to say that they would prefer using something like a Rayon alternative for C++ instead.

exDM69 · on Nov 8, 2019

It's not obvious how the hypothetical OpenMP for Rust should work. If you'd put #pragma omp parallel for in front of a loop, there would be no compile time guarantees of memory safety or freedom from data races.

OpenMP is suitable for the most simplest kind of parallelism in the first place.

If you look at the example code here, there's like 5 lines of pretty simple Rayon code that's almost a drop-in replacement for single threaded iterator code. It performs almost as good as the OpenMP C++ code, and it's guaranteed to be safe.

_pferreir_ · on Nov 7, 2019

It's not the most readable Rust code I've ever seen and I guess that's not the main concern here. That said, I guess people find C++ easier because they're already used to the syntax.

blub · on Nov 7, 2019

Even an amateur C++ programmer or C programmer would understand what that C++ code does. The only slightly opaque thing is the std::tie call. I'd feel comfortable having one of my C-focused colleagues modifying that C++ code.

The Rust code is... typical Rust. The algorithm almost gets lost in the specific peculiarities.

the_why_of_y · on Nov 7, 2019

The first thing an amateur C++ programmer would notice is that that code isn't standard C++ code, what with its weird "#pragma omp" thing that isn't in the index of the Stroustrup book.

gpderetta · on Nov 7, 2019

the nice thing about those pragmas is that the code has can be understood even if you ignore them. I do not know if this is true for all omp pragmas (IIRC some the newer ones might have some non ignorable semantics) but it is true for this omp for.

the_why_of_y · on Nov 7, 2019

Yes, of course - a language extension can often give you better ergonomics than a library interface, both in implementing the code and in diagnostics reporting. The downside is that it's more effort to implement a language extension, it's more difficult to evolve it, and it restricts user's choice of implementation to those that have implemented the extension. For example, Microsoft Visual C++ supports only OpenMP 2.0, from 2002.

An interesting question in this context would be, is there anything preventing adding an OpenMP extension to Rust?

gpderetta · on Nov 7, 2019

Of course there is a cost in making something part of the language, and the C++ way has always been to put features in the library whenever possible.

Regarding choice of implementation, there are more C++ compilers that support OpenMP than rust compilers.

pcwalton · on Nov 7, 2019

Rayon is basically the "OpenMP of Rust" already.

OpenMP is designed around C-style for loops, which Rust discourages.

gpderetta · on Nov 7, 2019

FWIW OpenMP supports any C++ random access iterator and c++11 range-for loops. I guess you count any external iterator as C-style loop.

MauranKilom · on Nov 7, 2019

*newer OpenMP versions. MSVC is stuck on OpenMP 2.0 in which you can't even use unsigned integers as loop variable. No, seriously.

petschge · on Nov 7, 2019

MSVC is hilariously bad and not used in HPC code. Not sure how much high performance game code is compiled with it.

pcwalton · on Nov 7, 2019

A lot. I don't think OpenMP is used much in games.

neutronicus · on Nov 7, 2019

Yeah OpenMP is mostly for problems with much less data-dependency than games.

imtringued · on Nov 7, 2019

Rayon only covers a fraction of OpenMP's features.

timvisee · on Nov 7, 2019

I think Rust is actually very readable in general.

In this snippet the author went through some insane optimization, and opted to use quite a few attributes, those definitely make it look noisy in this case. FFI bindings are verbose and not so clean either.

rat9988 · on Nov 7, 2019

I don't think it is very hard to parse. I think it is very ugly though. Which kind of makes it unreadable. I mean reading fn is not natural for me. Reading function is way more natural. I feel like the language designers didn't pay attention at all to how natural it will look to developers

pcwalton · on Nov 7, 2019

> I feel like the language designers didn't pay attention at all to how natural it will look to developers

This is not true. We agonized over syntax decisions. "fn" was actually preferred by most of the Rust community.

jandrese · on Nov 7, 2019

I've had the impression that Rust developers come from a more pure-math background than most language designers. People who read mathematical notation all day probably find Rust to be more natural than a typical programming language. For people coming from a traditional computer language background the number of sigils on each line is a point of pain.

IMHO Rust may have tried to roll a little too much into the language. It has the air of the second system effect to it, where every good idea from every language is added together to get something that is less than the sum of its parts and you get code that is hard to decipher until you've learned a full college course worth of syntax.

pcwalton · on Nov 8, 2019

I designed the lifetime syntax and I'm far from having a pure math background. There aren't any sigils in Rust that C doesn't have, other than the lifetime syntax, which needs separate syntax as it's a novel feature.

Can you name a specific feature you want removed from Rust?

jandrese · on Nov 8, 2019

I'm not sure I'd want it removed, but generics tend to be pretty horrific looking to people new to the language.

An example:

  fn largest<T>(list: &[T]) -> T {
    let mut largest = list[0];

    for &item in list.iter() {
        if item > largest {
            largest = item;
        }
    }

I'm sure experienced Rust devs look at that and say it's fine, but that's definitely a hurdle for new developers.

slavik81 · on Nov 9, 2019

You need to specify the properties of T that you use. So, that first line should be:

  fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {

https://tomtongue.com/docs/rust-wk-20.html#trait-bounds

pcwalton · on Nov 8, 2019

What would you want to see instead? That's pretty much the same syntax as generics in Java.

blub · on Nov 7, 2019

Then the only reasonable conclusion is that the Rust community has different tastes compared to the mainstream programming community. To put it euphemistically.

The Swift team probably also agonized over syntax decisions and they came up with something very different which has won over many people and attracted little criticism.

pcwalton · on Nov 7, 2019

The only concrete thing I've seen brought up so far is that some people don't like the keyword "fn". Swift chose "func". Is there really a huge difference? (I prefer "fn", by the way.)

Swift also doesn't have pervasive use of lifetimes like Rust does. Lifetimes, by their nature, always add some level of "noise".

blub · on Nov 7, 2019

When people do make an effort and outline pain points, those are waved away by Rust programmers or deflected by claiming that they like said syntax choice.

At the end of the day though, these tiny thing like fn, or the lifetime annotations and the many others add up to an unpleasant experience and less readable code.

pcwalton · on Nov 7, 2019

If Rust were an unpleasant experience, it wouldn't have users. I'm sorry that you disagree with the syntax decisions. We can't please everybody.

blub · on Nov 7, 2019

I'm not an extremist, the language clearly has certain advantages. At the same time I do believe it could have a lot more users if it were a more pleasant experience.

If it becomes a mainstream language (like Java, C++, etc), I'd be happy to concede I was wrong. It could be that memory safety will trump ergonomics and it wouldn't be the first time that somewhat painful to use tools become very popular - see Linux, git, etc.

rat9988 · on Nov 7, 2019

Lifetime annotations are an other ugly unreadable quirk. I'm happy to know I'm not the only one.

pcwalton · on Nov 7, 2019

Those aren't syntax though. Without lifetimes you don't have memory safety without GC. There are plenty of languages that don't have memory safety, and plenty of languages with GC, but they don't broadly share Rust's goals.

rat9988 · on Nov 8, 2019

The problem is not lifetime but its annotation.

rezeroed · on Nov 7, 2019

fn is great. I'm not at all mathematical as suggested by others. When I go back to Go I find myself wondering why I have to type the extra two characters.

FpUser · on Nov 7, 2019

I would very much prefer function over fn , just for plain readability . Of course it is subjective but whatever.

gpderetta · on Nov 7, 2019

Strupstup law is relevant here:

"For new features, people insist on LOUD explicit syntax. For established features, people want terse notation."

fn is fine.

rezeroed · on Nov 7, 2019

fn usually appears at the start of a line following an empty line - how is it not readable? I honestly think even fn serves little purpose, other than compiler parsing. New paragraph == new function.

FpUser · on Nov 7, 2019

Whatever tickles your fancy. To each their own

leotaku · on Nov 7, 2019

I personally much prefer the smaller "fn" to "function". I started learning programming with python's "def", so maybe it comes from that.

rat9988 · on Nov 7, 2019

def is way better than fn. you can read def at least.

kbenson · on Nov 7, 2019

Read def? What the point of that? You might say "def" when you see it, but you think of the concept of a function. You should be translating fn to "function" in your head or if you speak it out loud, the same way you say "plus" when you see +. How do you "read" + otherwise? It's the same for every mathematical symbol, since there is no actual way to read them, since they communicate a concept that you most likely translate into your own language in your head.

lenkite · on Nov 8, 2019

Previous poster has a small point. Even I am guilty of saying 'f-n' when reading rust code alound. Kotlin is the best - all functions are fun

saghm · on Nov 7, 2019

I'm not convinced that the C++ way is better here; either `fn` or `function` makes it clear that you're looking at a function (and incidentally makes it very easy to search for a function with a given name in a file), whereas just having the return type and then the function name makes it ambiguous until you read further whether you're looking at a variable or a function. If I see `fn size`, I know I'm looking at a function before I get to the parentheses, but if I see `int size`, that's still ambiguous until you go a character further.

boring_twenties · on Nov 8, 2019

Unfortunately, one character further is not enough to disambiguate the C++ syntax:

    int size();

Am I declaring a function returning an int? Or defining an object of type int and invoking the default constructor?

bschwindHN · on Nov 7, 2019

fn vs. function is probably the least of your concerns when reading and writing rust. Lifetimes, generic parameters, and complex where clauses are what will throw you.

But if you spend enough time with the language it gets easy enough for these to not be a concern. I personally think it's very much worth learning.

thanatropism · on Nov 7, 2019

I’m still learning Rust but from the “looks department” I can say so far that it “looks like” a lot more like Python etc. than the other systems languages.

It’s almost like Rust starts from a functional programmer’s perspective and adds the necessary complexity to have a “low level” language, whereas C is a simplified assembler and C++ tries to add “high level” amenities.

vhakulinen · on Nov 7, 2019

I have the opposite: I know Rust, but I hardly know C++ beyond C (granted that I probably could figure out the not-so-easy c++ easier than I would figure out the rust version if I didn't know either). Don't get me wrong, I know what you mean. But the same can happen with many other languages too.

cranej · on Nov 7, 2019

For me the Rust code is easy to read. Personally I don't think it's reasonable to ask programming languages 'easy to read for people not familiar with them' / 'read like English' / or something similar.

StreamBright · on Nov 7, 2019

Why is that? The number one reason I do not like Rust is the use of ', & and so on to represent something that would be trivial to implement with plain English.

Example:

    struct HReq<'a> {
 
    }

    struct HReq<a: LifeTime> {
      
    }

Isn't that better?

steveklabnik · on Nov 7, 2019

That doesn’t show the difference when you have more than one, and you also didn’t show how much more verbose the usage gets inside the struct.

It’s not a language level constraint to name lifetimes with a single letter. But folks tend to not use longer names for good reason.

StreamBright · on Nov 7, 2019

I just have a hard time deciphering what the intention was with these single character things. Are you suggesting to use 'a_lifetime? What is that good reason? I usually use longer names to help the next person reading my code.

steveklabnik · on Nov 7, 2019

Rust was originally implemented in OCaml, and takes a lot of inspiration from it. The ‘a syntax is used by OCaml for generic type parameters.

Rust is in a weird spot because we have two different kinds of generic parameters: types and lifetimes. They need to be distinguished from one another somehow. Nobody loves the lifetime syntax, but nobody has ever proposed something that would end up significantly better.

I’m not proposing you should use ‘a_lifetime, I’m saying you could. In the end, it ends up obscuring more than helping.

kccqzy · on Nov 7, 2019

That would mislead people into thinking it's a type variable satisfying a trait called LifeTime instead of a lifetime variable. It would be confusing.

xlc0212 · on Nov 7, 2019

How is C++ code better? Rust code is harder to write, but I don’t find it harder to read.

lugg · on Nov 7, 2019

Rust is a very terse language. Inexperienced devs tend to write overly compressed code.

That example isn't too bad once you're used to the syntax and idioms but the lack of white space to space out logical blocks is what's making it hard to read.

Well, that and only what I can say equates to writing C++ in rust, much the same way you can write php in JavaScript if you squint hard enough.

Probably doesn't help it's trying to be a comparison which doesn't really work. A lot of that could be refactored out into idiomatic rust and it wouldn't look so bizarre.

gpderetta · on Nov 7, 2019

very nice.

Most language comparison benchmark are completely useless other than for bragging points, but those, like this one, that go into details of why one specific implementation is faster or slower than another are much more interesting and allow making an idea of what makes a language slower or faster.

Also, it interesting that the hand optimized program is about 100 times faster than the unoptimized one, showing that even today there is room for manual optimizations and you cannot trust the compiler blindly, but you have to iteratively work with it to get to an optimal solution. I can't figure out from either this article or the original one whether -fast-math was being used. Would be nice to know if that would help the compiler vectorize and unroll the loop with multiple accumulators.

mratsim · on Nov 7, 2019

If you are interested on the exact same topics (matrix multiplication parallelization) here are other step by step tutorials I used:

- UlmBLAS [1], for a HPC course in 14 steps

- BlisLab [2], make sure to checkout the tutorial.pdf. It gives you exercises to solve in C and each one build on the previous one

In Rust matrixmultiply crate [3] implements those techniques to reach 80% of OpenBLAS speed, see blog post GEMM: a rabbit hole [4].

This is a generic approach that can be followed by any low-level languages.

I reach 100% of OpenBLAS and MKL-DNN speed in Nim on large matrices as well [5] without any assembly and a generic code that can also generates integer matrix multiply [6].

Regarding fast-math, that's what you do manually, you interleave the fused-multiply adds as they have a latency of about 6 cycles (Broadwell, I don't remember on Skylake)

[1]: http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/

[2]: https://github.com/flame/blislab

[3]: https://github.com/bluss/matrixmultiply

[4]: https://bluss.github.io/rust/2016/03/28/a-gemmed-rabbit-hole...

[5]: https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1...

[6]: https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1...

gpderetta · on Nov 7, 2019

Thanks for the pointers, I'll take a look.

re fast-math, I was interested on how much additional parallelism the compile can extract from the naive code. Fast-math should,at least in principle, allow the compiler to unroll the loop and add the additional accumulators, although that violates strict IEEE semantics.

mratsim · on Nov 7, 2019

Yes exactly.

Usually you can extract 2x to 4x instruction level parallelism on simple add instructions [1] vs [2]

For fused-multiply-add even though the latency is higher the instruction is also slower so its the same. BLIS, OpenBLAS and my code extract 2x parallelism (2 accumulators) at the lowest level because we are restricted by the number of registers and the fact that x86 can only issue 2 FMAs per cycle [3].

Details, we divided the work until we have a small micro matrix C or size MR x NR (input matrix of size MxK and KxN so output of MxN), and then here is are the constraints you have to deal with:

Registers constraints and micro-kernel tuning

  - To issue 2xFMAs in parallel we need to use 2x SIMD registers
  - We want to hold C of size MR * NR completely in SIMD registers as well
    as each value is reused k times during accumulation C[i, j] += A[i, k] * B[k, j]
  - We should have enough SIMD registers left to hold
    the corresponding sections of A and B (at least 4, 2xA and 2xB for FMAs)

On x86-64 X SIMD registers that can issue 2xFMAs per cycle:

   - NbVecs is 2 minimum
   - RegsPerVec = 2 * NbVecs => 4 minimum (for A and for B)
   - NR = NbVecs * NbScalarsPerSIMD
   - C: MR*NR and uses MR*NbVecs SIMD registers 
   - MR*NbVecs + RegsPerVec <= X
      -> MR*NbVecs + 2 * NbVecs <= X
      -> (MR+2) * NbVecs <= X

Some solutions:

   - AVX with 16 registers:
         - MR = 6, NbVecs = 2
           FP32: 8xFP32 per SIMD --> NR = 2x8
                 ukernel = 6x16
           FP64, ukernel = 6x8
         - MR = 2, NbVecs = 4
           FP32: 8xFP32 per SIMD --> NR = 4x8
                 ukernel = 2x32
           FP64, ukernel = 2x16
   - AVX512 with 32 registers
         - MR = 6, NbVecs = 4
           FP32 ukernel = 6x64
           FP64 ukernel = 6x32
         - MR = 2, NbVecs = 8
           FP32 ukernel = 2x128
           FP64 ukernel = 2x64
         - MR = 14, NbVecs = 2
           FP32 ukernel = 14x32
           FP64 ukernel = 14x16

And in-depth overview of the lowest level details is available in the paper Automating the last mile for High Performance Dense Linear Algebra[5].

In short, the compiler is completely unable to deal with this, and a high performance computing compiler should give an escape hatch to allow hand optimization like Halide[6] or Tiramisu do[7].

[1]: https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1...

[2]: https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1...

[3]: https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1...

[4]: https://github.com/numforge/laser/blob/e660eeeb723426e80a7b1...

[5]: https://arxiv.org/pdf/1611.08035.pdf

[6]: https://halide-lang.org/

[7]: http://tiramisu-compiler.org/

fyp · on Nov 7, 2019

In case people missed the link, the reference implementation is an amazing post on its own: http://ppc.cs.aalto.fi/ch2/v7/

xyst · on Nov 7, 2019

So this is where my parallel computing professor got his material from. Definitely a good read for parallel computing, but that matrix multiplication problem was discussed to ad nauseum in the first few weeks.

gatherhunterer · on Nov 7, 2019

For some reason many people are choosing to focus on code readability. This is not how a real-world program would be written. Who actually writes a single train-of-thought implementation without isolating concerns or using any code separation features? No realistic coder would expect their team to be satisfied working with code like this. It’s just a benchmark program, it doesn’t actually fit a use case or solve a problem.

jcelerier · on Nov 7, 2019

> This is not how a real-world program would be written.

I have seen much more programs that look exactly like this (at least for the C++ part) than programs with clean separation of concerns in my life.

> No realistic coder would expect their team to be satisfied working with code like this.

assuming you've got a team (and that they are trained and not a bunch of interns or barely-graduated with 100% year-over-year turnover), assuming you are a professional developer and not someone who codes in the context of another profession (researcher, artist... heck, I've seen a music teacher making small python apps for the lobby of their music school once, etc).

larusso · on Nov 7, 2019

Very cool. I love the visual explanations for certain iteration to understand the better see how the pre process step prepares the data. What I always miss in such posts is the full tool explanation how to retrieve the resulting assembly code. Don’t get me wrong I know how to search the internet. But the post goes quite some length to explain the basic rust setup. Maybe the post is aimed for veteran cpp programmers. I certainly would appreciate a link or example line how to generate the assembly code lines :)

ChrisSD · on Nov 7, 2019

Out of interest, why was lto (link time optimisation) set to false? I doubt this would affect the results much but it's useful for cross-crate inlining.

OskarS · on Nov 7, 2019

Presumably because the thing they're benchmarking is in a single translation unit, so LTO wouldn't matter. And it might screw up the benchmark if the C++ code was optimized between the "test harness" translation unit and the "code to benchmark" translation unit, but Rust wasn't. It's sensible for this kind of benchmark not to use LTO.

fluffy87 · on Nov 7, 2019

Not that it matters but you can use LTO with C++ and Rust, just need to compile and link both with LTO enabled.

Hitton · on Nov 7, 2019

Awesome. I haven't started to learn Rust yet, but I still learned a lot.

The_rationalist · on Nov 7, 2019

Would have been nice to compare code uglyness and performance of OpenMP 5 vs rayon + SIMD.

sdan · on Nov 7, 2019

TLDR: Most of the time C++ with GCC was best.

galangalalgol · on Nov 7, 2019

Were you even looking at the same article? Rust was always better at intruction level parralellism and gcc was usually best at cache depending greatly on which i5 xeon was used. Clang was slways the worst.my main takeaway was that they are all three roughly identical and that small processor differences dominated the results.

gpderetta · on Nov 7, 2019

ILP is not the correct metric, total runtime is (I was confused as well initially, but it is well explained in the article). the initial rust code had higher ILP simply because it has to execute more code for the bound checks. The C++ code stalls waiting for memory, while rust can fill up those bubbles with the checks (and that's why the checks do not cause slowdown initially). But as the code gets progressively optimized, the stalls are removed and the the C++ code can match the rust ILP, but at this point rust needs to execute more code and get penalized leading to an higher runtime. After the bounds checks are removed from the rust code, both implementations converge to the same ILP and runtime.

fluffy87 · on Nov 7, 2019

Yeah, there were differences, but these were peanuts. Using two different versions of GCC will givenyou similar differences.

gpderetta · on Nov 7, 2019

yes, you can see from the assembly listings in the final stages, that most of the differences were due to very minor code changes, due to details in the optimization passes and not really to language differences.

Someone · on Nov 7, 2019

"Clang was always the worst”

Were you even looking at the same article? :-)

I thought the same as you, until I looked at the results for the "Mid-range laptop CPU with 4 physical cores and 8 hardware threads.”

There, clang does best in multi-core for v4 through v7. That made me think that, possibly, the default set of LLVM optimization transformation passes in clang is (better) optimised for Apple’s main audience: mobile.

Also: why are the fastest run on the “4.3 GHz Mid-range desktop CPU” slower than the equivalent ones on the “3.4 GHz Mid-range laptop CPU”?

maeln · on Nov 7, 2019

Important to mention though: The difference was most of the time very slim.

foota · on Nov 7, 2019

Was it? Seemed to swap between rustc and gcc to me.

vihren · on Nov 7, 2019

It seems that when they squeeze all possible optimizations, GCC was a bit faster, but for the not so optimized code GCC and rustc seem very comparable.

tiborsaas · on Nov 7, 2019

My TLDR: don't base your choice on performance if you have to pick between Rust or C++

wscott · on Nov 7, 2019

I am surprised he didn't benchmark the C++ program with Clang to give a closer comparison to rust. In my experience, despite all its other advantages, Clang still lags GCC a bit in raw performance.

Still a really useful set of comparisons. I am impressed Rust is able to compete with all the magic OpenMP is doing in the background.

steveklabnik · on Nov 7, 2019

I haven’t read the whole thing yet, but clang seems to be benchmarked here: https://parallel-rust-cpp.github.io/results.html

wscott · on Nov 7, 2019

Wow, I totally misread that.