This is mostly in fact an example of a big problem with benchmarks. What's measu...

hsn915 · on Aug 10, 2021

Does any of this matter?

It's just an example to show that zero cost abstractions are actually not zero cost at all.

The language would have you believe the abstraction is zero cost. If you were to really believe it, you would never think about whether you need to allocate a vector of u8 or your custom byte type. They should be one and the same. That's the point of the promise of zero cost abstractions.

By the way, pre-allocating a lot of virtual memory upfront is not unheard of.

In this situation, to avoid paying the cost of the abstraction, you would have to stop and think "I can't allocate _my_ byte type! I must allocate u8, then cast the result to a vector of my custom byte type. Maybe the compiler will not like the cast so I have to create an "unsafe" block and do some pointer casting? (I don't know if that's what you would need to do in rust, or if it would have been something else).

catlifeonmars · on Aug 10, 2021

> you would have to stop and think "I can't allocate _my_ byte type! I must allocate u8

No, this is premature optimization. Rather, you would write the code in the most obvious way, and then profile it to figure out which optimizations are worth doing. At that point you can comment your code explaining why it looks all wonky :)

HexDecOctBin · on Aug 10, 2021

And applying this thinking everything means that your entire code is pervaded with slowness with no obvious hot spots, and you jump with joy since it obviously means that your code is as efficient as it possibly could be (because otherwise the profiler will show some peaks, right?).

catlifeonmars · on Aug 11, 2021

I know this was meant as (light) sarcasm, but have you ever profiled a nontrivial program that turned out to have zero peaks? My gut tells me this would be really difficult to do by accident. The same way that writing crypto code that is resistant to timing attacks is hard.

HexDecOctBin · on Aug 17, 2021

I have definitely seen code where the peaks are very shallow, and flattening them does nothing for the performance.

hsn915 · on Aug 10, 2021

You only need to encounter this problem a few times before you stop trusting the compiler at all.

qw3rty01 · on Aug 10, 2021

This definitely brings it closer, but on my machine, touching the entire array still ends up being 2 seconds slower (5.4s for the calloc side, 7.8s for the clone side)

irishsultan · on Aug 10, 2021

It's not using clone but memset. Of course memsetting memory to 0 when the operating system already set it to 0 is still silly and something that could be further optimized, but it's not cloning things.

qw3rty01 · on Aug 11, 2021

It does clone: https://stdrs.dev/nightly/x86_64-pc-windows-gnu/src/alloc/ve...

it might be optimized to a memset, but it's the clone codepath, as opposed to the u8 specialization codepath as discussed in the article

But the original commenter is talking about how the benchmark isn't useful because it doesn't touch memory, but you get similar results even if you do touch memory.