We should have developer benchmarks for programming languages such as "time to first successful compile", "time to first running copy", "time to find the root cause of a certain bug". Similar to that Sun vs NeXT competition back in the day: https://www.youtube.com/watch?v=UGhfB-NICzg
Maybe platforms like Leetcode and HackerRank can publish such statistics.
I'd find such benchmarks way more interesting than comparisons of milliseconds for solutions to some arbitrary algorithms that we'll never use in real life. I'm not saying they aren't useful, but they aren't interesting.
Live competition. Benchmarks would just be metrics recorded during the live broadcast, not goals. You can later aggregate those metrics to come up with accurate benchmarks about many scenarios about software development. Think of NBA but for programming.
Just posting numbers without commentary doesn't seem that useful. Per the GitHub "Note that implementations might be using different optimizations, e.g. with or w/o multithreading, please do read the source code to check if it's a fair comparision or not."
I think this shows that Zig isn't a slow language despite its relative youth, but it'd be much more useful if someone did the work to look through the code and provide commentary on comparable the entries are.
The optimizations both looked correct. Both told the compiler to target broadwell. The fastest nbody was rust, but it was non-portably using x86 intrinsics. Zig has explicit simd vectors in the stdlib and so did better than the portable explicit simd of the third place rust entry. However, zig is using optimized float mode equivalent to gcc ffastmath so it is almost certainly getting the wrong answers simce it didn't use the iterative sqrt trick.
https://github.com/hanabi1224/Programming-Language-Benchmark...
Is zig's optimized float mode also extremely error-prone like gcc ffastmath?
Reminder: With gcc/clang, -ffastmath makes it undefined behavior to run a calculation that results in an infinity or NaN. Due to the way UB works, the compiler can end up miscompiling not just the floating-point calculation, but also other code nearby (e.g. delete array bounds checks).
This is why Rust does not have any fastmath-equivalent: it would allow violating memory safety in safe code.
Seems like there should just be a fastmath mode that doesn't consider it UB. IIUC most of the gains come from being able to assume addition, multiplication etc are associative and commutative.
> I think this shows that Zig isn't a slow language despite its relative youth
Which shouldn't surprise anyone since the heavy lifting is done by LLVM and zig doesn't attempt to add too much semantics on top of it. (For the same reason I don't think benchmarking unsafe Rust makes much sense either)
It makes sense to benchmark unsafe Rust against safe Rust, both for the same algorithm without changes as well as alternative architectures to get around the constraints of the borrow checker and type system. I would go as far as saying that if you want to use unsafe this should be a precondition. It happens too often that someone reaches for unsafe first "for performance wins" that then aren't manifested in practice.
The compiler might make unfortunate decisions which no optimization can fix. E.g. the amount of pointer chasing may affect things a lot. Mutability and aliasing influence drastically which optimizations LLVM would be allowed to make, also stack vs heap allocation for short-lived objects, etc.
I think a more interesting comparison is Zig vs Nim. Where Nim beats or at the very least matches performance against Zig in multiple tests, while having an infinitely nicer, friendlier syntax.
I think Nim looks good, but I'm more likely to learn Rust or Zig personally. I've tried to put my finger on why that is, and I think it's because Nim looks like another good language that is just good all around, it makes good trade-offs and finds a local optimum being good at everything. A lot of language have tried this though, I've seen it all before. Rust did something new, they accepted that "yeah, our language might be really painful to write your seat-of-the-pants business logic in" and they did the borrow checker stuff. Zig is similar, keeping many of the pains of C, but making some important improvements.
I feel the same way. I wish Nim was more popular, but I'm not letting that stop me from using it. It's already extremely useful enough. I spent the last few weeks writing Nim code (specifically an OpenCV binding that wraps just the C++ libraries I need). Nim let's me write fast, compiled, "Pythonic" code. And the INim interactive shell let's me program "1 line at a time" the same way I would with the Python REPL or JavaScript console.
That pythonic syntax is something that prevents Nim to gain more popularity. I think it would be a great move from Nim side to introduce dual syntax support: python like and c/d/java/JavaScript/typescript/go/rust/zig ... like
You can often/usually put ()s around things if you want. It's just an uncommon style. (Much like you can format bracey code like lisp with the close brackets bunched up all at the end of one line instead of one per line, but people rarely do.)
Afaik, nim was influenced by and originally developed in D-lang. Both are wonderful languages and a real joy to develop on. My ultimate dream is a language that's low level like zig but has the flexibility of nim.
I imagine leetcode has a lot of data on benchmarks of highly optimized solutions implemented in different languages. I wish they would do a blog post or something.
Seems kind of pointless, since the implementations for different languages are independently written and thus the results depend mostly on the skill of the programmers who wrote them (at least for languages that are efficient), and on the quality of the specific standard library features being used if any.
Not entirely pointless since this is open source and programmers of some caliber may be able to contribute better or more efficient code. All benchmarks should be taken with a huge grain of salt since this is rarely what these languages are used for by the majority. As well you shouldn’t be choosing a language solely based on speed and not on the individual application of the language. Memory usage though is something useful and impressive in these examples.
It's not a good idea to use this site (or the Computer Language Benchmarks Games https://benchmarksgame-team.pages.debian.net/benchmarksgame/ that it is partially based off) as an indicator of how may rewrites are necessary in order to generate good programs. The skill levels of the contributors and the size/popularity of the various programming language communities can vary a lot. The benchmark rules have been changing over time and contributors have been figuring out better algorithms over time so these both result in the contributed programs getting updated over time as well. Older programming languages have been around longer than newer programming languages and will have had more contributed programs consequently. Some programming languages are under more active development so require more revamps of existing programs. Etc...
You could certainly make an argument that an easier language allows a programmer to reach a certain skill level faster or maybe even reach a higher peak skill level. By extension you could also argue that, by some definitions, the skill level of that language's community might be higher. However assuming you had equally skilled Rust and Zig programmers, I think it is wrong to say that the Rust programmers would require more rewrites to match or surpass the Zig programmers.
Additionally, there are multiple reasons why one programming language might have nine different programs:
-The programs could have been written by someone who prefers to make small iterations and perform many submissions instead of someone who likes to make bigger changes with fewer submissions.
-The programs could be submitted by a novice who needs to make more submissions than would be required by a pro.
-The programming language may be older and has had more submissions.
-The programming language may be more actively changing and requires more updates in order to fix old programs.
-The programmers may also be making more submissions to improve other characteristics like memory usage, code size, code readability, compatibility, etc... that are unrelated to performance.
In short, previously the benchmarks game explicitly stated that the program numbers were arbitrary and signified nothing; and the same reasoning applies to hanabi1224's website.
Tldr; they’re about even. Which is what I’d expect given they’re both performance oriented languages which compile via llvm.
As I see it, performance wise Rust has one advantage and one disadvantage compared to zig. Rust’s advantage is that it can add the equivalent of C’s noalias all over the place because of the rules imposed by the borrow checker. This can help the optimizer. And the drawback of rust is that all array accesses are bounds checked. (Well, at least in safe rust). But thanks to prediction intrinsics, the slow down from this is much less than I always expect. Bounds checks do bloat the binary size though.
So rust and zig trading blows benchmark to benchmark is about what I would expect to happen. And that’s exactly what I’m seeing here.
> it can add the equivalent of C’s noalias all over the place
There's an open proposal to do this in Zig as well, with the ability to opt out at the individual parameter level (and with safety checks in debug builds).
Either way we can definitely thank Rust for blazing the trail. noalias in LLVM had never been stress-tested to that degree, and they were finding and fixing noalias-related optimizer bugs for years
People keep being surprised that bounds-checking doesn’t really seem to incur that much cost but frankly it seems pretty straightforward to me.
In the years of Rust code I’ve written, I don’t think I’ve ever actually indexed into an array manually. If I have it’s been an incredibly small number of cases. I’m almost always iterating, which makes bounds checks essentially unnecessary.
As usual, I recall the remarks of C.A.R Hoare on his 1980's Turing Award speech, regarding bounds checking on Algol compilers and customers point of view on how it should be unlawful to do otherwise.
Never, ever, since 1986, have bounds checking been the major source of performance issues on applications I have written.
I suspect it's more that bounds checking actually helps performance in many circumstances in that it can improve branch prediction. Not always, but sometimes.
Inserting branches to your code can only make it slower or equally ~fast but not faster than branchless code so what you say doesn't make a lot of sense.
Not necessarily. If the branch is predictable, it may be better than branchless, particularly if the alternative is a cmov. From Agner:
> As a rule of thumb, we can say that a conditional jump is faster than a conditional move if the code is part of a dependency chain and the prediction rate is better than 75%. A conditional jump is also preferred if we can avoid a lengthy calculation [...] when the other operand is chosen
This sounds purely theoretical. And still even if compiler would be able to coalesce multiple branches into less, such branchy code still cannot be faster than the branchless one.
Sadly, not yet. Even when (Safe) Rust's type system guarantees the noalias annotations will be inserted correctly, the Rust compiler team has been very cautious of turning this on because of numerous LLVM miscompilation issues. It was turned on two years ago in https://github.com/rust-lang/rust/pull/82834 but was subsequently rolled back in https://github.com/rust-lang/rust/pull/86036 because of a miscompliation bug.
Some restrictions aids performance, but checking that pointers have not run wild like elphants in a porcelain shop requires extra checks, which takes time.
Only if the restrictions are static and don't need to be checked dynamically, naturally. For dynamic arrays or dynamic indices it's not always possible to do that in Rust.
i see bound checks as advantage. I don't want random stuff from program tbh. And, if someone really wants to disable bound checks for certain stuff, they can always do that.
I'm divided. The majority of the time i know how big the array is with information the compiler doesn't have. However I have to admit 10% of the time i'm off by one.
Bounds checks can sometimes be optimized out, such as when using iterators or a for loop over a fixed range. LLVM is pretty decent at optimizing out redundant checks too.
You can noalias in C as well, but it is used so infrequently in production C code that llvm didn’t even compile it correctly until a couple years ago. Figuring out when parameters in a C program can safely be declared noalias is very tricky, and I’ve almost never seen anyone bother with it. I assume the same is true in zig.
Nbody looked about the same, zig had really long lines for stuff that rustfmt split. But that aside, I agree the zif metaprogramming really shome here.
I think that this kind of benchmarks should always be taken with a grain of salt, as anyone with some experience with benchmarks knows that there are many ways to distort the results.
Having said that, performance is a key feature for both languages and from what I can see the methodology seems legit.
I'd prefer to also have a C or C++ to use as a performance baseline and to get a clearer overall picture of where the current optimizers stands relative to each others.
Also, competition is good, I believe that we'll all benefit from it in the end, even if the heated debates and quasi-religious stances can be annoying, in the long run this is mostly noise.
Slightly on a tangent here. But i think someone with more knowledge of either the zig implementation/language or programming language and runtime in general should take a second pass at Zig's wikipedia page.
I see a lot of assertion which doesn't seem substantiated by the provided references, some of the reference almost look unrelated and some stuff also look wrong ( i am not a zig experts)
> The goals of Zig are in contrast to the many similar languages introduced in the 2020s time-frame, like Go, Rust, Carbon, Nim and many others. Generally, these languages are more complex with additional features like operator overloading, functions that masquerade as values (properties), generic types and many other features intended to aid the construction of large programs. These sorts of features have more in common with C++'s approach, and these languages are more along the lines of that language.
> reference (https://www.infoworld.com/article/3113083/new-challenger-joi...)
The provided reference doesn't really mention this statement, and C++ did not originate a lot of those features.
On a more abstract level, no having abstraction features doesn't always make a language simpler.
> A common solution to these problems is a garbage collector (GC), which examines the program for pointers to previously malloced memory, and removing any blocks that no longer have anything pointing to them. Although this greatly reduces, or even eliminates, memory errors, GC systems are relatively slow compared to manual memory management, and have unpredictable performance that makes them unsuited to systems programming.
I am not sure how authoritative the given reference is, but even assuming it is, there is no mentions on manual vs GC speed, or anything about how GC systems are inadequate for system programing (what ever that means in this context). I know the the memory management styles trade-off are still hot debates, and a lot of progress have been made with regard to modern implementation.
> Another solution is automatic reference counting (ARC), which implements the same basic concept of looking for pointers to removed memory, but does so at malloc time by recording the number of pointers to that block, meaning there does not need to perform an exhaustive search, but instead adds time to every malloc and release operation.
This doesn't sound like the best way to describe ARC..., again the provided references doesn't substantiate this definition.
> Zig aims to provide performance similar or better than C, so GC and ARC are not suitable solutions. Instead, it uses a modern, as of 2022, concept known as optional types, or smart pointers.
I don't think smart pointer and optional types are the same. Quickly glancing at zig official website , this looks like a mistake
> Instead of a pointer being allowed to point to nothing, or nil, a separate type is used to indicate data that is optionally empty. This is similar to using a structure with a pointer and a boolean that indicates whether the pointer is valid, but the state of the boolean is invisibly managed by the language and does not need to be explicitly managed by the programmer. So, for instance, when the pointer is declared it is set to "unallocated", and when that pointer receives a value from a malloc, it is set to "allocated" if the malloc succeeded.
Mixing optional types and they way zig uses it to represent references.
Could you please stop posting unsubstantive comments and/or flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.
That’s naive to think that someone will actually make proper use of custom allocators in Zig. Low-lever, performance-oriented programming is hard, especially when you are doing a more exploratory job without clear goals up front.
And when you do know what you want, explicitly optimizing those parts is equally possible in the two languages.
Would you mind elaborating on this? I don't think I've read anything (yet?) that touches on performance differences due to that particular aspect of the languages.
The only thing I can think of is that Zig makes it easier to use different allocators on a per-collection basis, whereas for Rust the only way to alter the allocator used by the standard containers is to change the global allocator. But Rust programs aren't allocating that much in the first place, certainly not so much that swapping out an allocator on a per-collection basis is going to triple the performance of arbitrary programs; at best, fine-grained allocator control will give modest gains on certain extremely allocation-heavy workloads. And collections in Rust can certainly use custom allocators if they want to, e.g. https://crates.io/crates/bumpalo , so that's not something inherent to the language.
> to use different allocators on a per-collection basis
It's far more than that. There are different memory management optimization patterns that are hard to implement in Rust, ie. splitting allocations to different arenas based on allocation lifetime.
> But Rust programs aren't allocating that much in the first place
Are Rust programs allocating mostly from the stack then?
> is going to triple the performance of arbitrary programs
The difference between malloc and any hand-rolled O(1) allocator is enormous, it's just people rarely benchmark this particular aspect.
There is a reason why Rust used to ship jemalloc and why #[global_allocator] is a thing. I would argue there's even more reason to avoid touching heap in the first place.
> Are Rust programs allocating mostly from the stack then?
Yes, overwhelmingly so.
The basic rule of manual optimization is that the upper bound for the improvement that can be gained by optimizing any one "thing" is equal to the proportion of the program's runtime that is devoted to that thing. E.g. if a flamegraph shows that 10% of a program's runtime is devoted to a specific function, then improving the performance of that function is going to improve the performance of that program by at most 10%. (We might call this "generalized Amdahl's law": https://en.wikipedia.org/wiki/Amdahl%27s_law )
Therefore, in order to improve the performance of a program by 2-3x by changing allocators, that requires that a program be spending at least 50% to 66% of its runtime on allocation. And it would be a highly anomalous Rust program that was allocator-bound in this way.
How do you measure cache line utilization with a flame graph? I'm not sure it's as simple as you're describing. Poor memory access patterns from heap allocations can kill performance via death from a thousand cuts. It's not about spending "X% of the time on allocation," it's about every computation spending unnecessary time reading and writing to the cache due to poor choice of allocation strategy. Custom allocators can ensure data layout fits the access pattern.
More specifically, arrays are about the fastest thing around, and handing out objects from within a preallocated array tends to give the best access performance by a large margin. From what I understand you basically have to sidestep the Rust borrow checker to achieve this. Which does raise some interesting questions.
I think the original poster was also saying that heap allocations are slow, which is true, but I agree it'd be easier to tell if your program is having trouble with that.
I'm unclear how cache line utilization is relevant here. In both Zig and Rust, the choice of allocator has no effect on the layout of data in a given collection. In other words, this Zig code:
var foo_list = ArrayList(u8).init(foo_allocator);
try foo_list.append(1);
try foo_list.append(2);
try foo_list.append(3);
...and this Rust code:
let mut foo_list = Vec::new();
foo_list.push(1);
foo_list.push(2);
foo_list.push(3);
...are both going to produce an array of [1,2,3] stored in the heap. The choice of allocator only affects where that array itself ends up being stored. Fetching an element of foo_list is going to cache the other elements in the list regardless of its location in memory, so the choice of allocator doesn't matter for that purpose. The only thing that could make a difference is if you want multiple different collections to be fetched in the same cache line, but if your collections are so small that multiple collections fit in the same cache line then your data isn't big enough to be worrying about optimizing your cache utilization (and furthermore, even if you carefully lay out your collections in such a way, as soon as any of your growable arrays need to be reallocated that completely throws all of your careful organization out the window).
> More specifically, arrays are about the fastest thing around, and handing out objects from within a preallocated array tends to give the best access performance by a large margin. From what I understand you basically have to sidestep the Rust borrow checker to achieve this.
Where did you get this impression? Rust has stack allocated arrays, and you can hand out references to them just as well as you can to any other owned type. I can think of no distinction that the borrow checker makes between stack- and heap-allocated types.
While I agree with you, my point was specifically the cost of a malloc. As someone who had actually profiled a pool vs malloc, this is kind of a salient point for me.
In some workloads short-lived allocations are common. Like memory is allocated for milliseconds and then deallocated - and X% of time will be spent just inside the allocator trying to find the best block to allocate. At this point you want to make sure that you don't spend more time in the allocator than allocations actually live.
I wonder how common business tasks can be coerced to this model.
Also I believe this is not a good excuse - the language should still provide a way to properly manage allocations even if it is not needed most of time.
> The basic rule of manual optimization is that the upper bound for the improvement that can be gained by optimizing any one "thing" is equal to the proportion of the program's runtime that is devoted to that thing.
Yep, and this is why you should always measure things before optimizing. But then again, you can save yourself a lot of time by not doing stupid things like allocating from heap in a hot loop and profiling just to discover such basic mistakes.
All references in Rust must have an explicit owner. It's common to create additional structures whose entire purpose is to manage the lifetime of your actual data structures. This means more memory consumption and potentially extra indirection, e.g. slot map handles vs pointers.
You don't need heap allocations in order to have an explicit owner. Data on the stack is owned, and references to stack-allocated data can be handed out just fine.
Sure, but you can't always use the stack. It has a limited amount of memory and doesn't handle dynamic data structures well, e.g. for a growable undirected graph you need auxiliary data structures for managing its nodes.
Long story short, heap allocation is painfully slow. Any sort of malloc will always be slower than a custom pool or a bump allocator, because it has a lot more context to deal with.
Rust makes it especially hard to use custom allocators, see bumpalo for example [0]. To be fair, progress is being made in this area [1].
Theoretically one can use a "handle table" as a replacement for pools, you can find relevant discussion at [2].
Maybe platforms like Leetcode and HackerRank can publish such statistics.
I'd find such benchmarks way more interesting than comparisons of milliseconds for solutions to some arbitrary algorithms that we'll never use in real life. I'm not saying they aren't useful, but they aren't interesting.