Hacker News new | past | comments | ask | show | jobs | submit login

This is mostly in fact an example of a big problem with benchmarks. What's measured here is the difference between never using an object (and so the actual work to prepare it can be elided, albeit by the OS not the programming language) and doing all the work up front in the expectation you'll use it, but then not using it.

u8 turns into "Hey, Linux kernel, zero these pages if I ever read them" (and then never reading them) whereas the opaque type turns into memset() zeroing all the pages.

Not doing any work is in fact a million times faster but your real program would have needed to do work, otherwise why bother having the vector? Whereupon the benefit disappears.

Where possible design your benchmarks to really do the thing you think you're measuring. If what you're measuring is nothing then be sceptical about supposed "performance" measured for that, since it's nothing, you're probably exploring the same space as the people who wanted to find out how much the human soul weighs (trick question, there is no such thing, but they put a lot of effort into trying to measure it anyway).




Does any of this matter?

It's just an example to show that zero cost abstractions are actually not zero cost at all.

The language would have you believe the abstraction is zero cost. If you were to really believe it, you would never think about whether you need to allocate a vector of u8 or your custom byte type. They should be one and the same. That's the point of the promise of zero cost abstractions.

By the way, pre-allocating a lot of virtual memory upfront is not unheard of.

In this situation, to avoid paying the cost of the abstraction, you would have to stop and think "I can't allocate _my_ byte type! I must allocate u8, then cast the result to a vector of my custom byte type. Maybe the compiler will not like the cast so I have to create an "unsafe" block and do some pointer casting? (I don't know if that's what you would need to do in rust, or if it would have been something else).


> you would have to stop and think "I can't allocate _my_ byte type! I must allocate u8

No, this is premature optimization. Rather, you would write the code in the most obvious way, and then profile it to figure out which optimizations are worth doing. At that point you can comment your code explaining why it looks all wonky :)


And applying this thinking everything means that your entire code is pervaded with slowness with no obvious hot spots, and you jump with joy since it obviously means that your code is as efficient as it possibly could be (because otherwise the profiler will show some peaks, right?).


I know this was meant as (light) sarcasm, but have you ever profiled a nontrivial program that turned out to have zero peaks? My gut tells me this would be really difficult to do by accident. The same way that writing crypto code that is resistant to timing attacks is hard.


I have definitely seen code where the peaks are very shallow, and flattening them does nothing for the performance.


You only need to encounter this problem a few times before you stop trusting the compiler at all.


This definitely brings it closer, but on my machine, touching the entire array still ends up being 2 seconds slower (5.4s for the calloc side, 7.8s for the clone side)


It's not using clone but memset. Of course memsetting memory to 0 when the operating system already set it to 0 is still silly and something that could be further optimized, but it's not cloning things.


It does clone: https://stdrs.dev/nightly/x86_64-pc-windows-gnu/src/alloc/ve...

it might be optimized to a memset, but it's the clone codepath, as opposed to the u8 specialization codepath as discussed in the article

But the original commenter is talking about how the benchmark isn't useful because it doesn't touch memory, but you get similar results even if you do touch memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: