Looking at performance counter data is good, but I would have liked to see a rea...

steveklabnik · on Sept 11, 2019

> (or maybe there is even a flag for this?)

We specifically do not give you a flag to control this behavior, but you can choose to call different functions to use unchecked access.

glangdale · on Sept 12, 2019

This, imo, is absolutely correct (it is a dark idea to have a "let's be unsafe for more performance" flag) but maybe a experimental build of the Rust compiler could have this as a configuration option? Possibly the toolchain could warn every step of the way if such a 'tainted' module is ever linked, etc.

It just seems like this sort of question is going to recur, and being able to persistently track the overhead of checking (it would allow you to monitor specific performance improvements) is much nicer than having someone do a one-off experiment.

steveklabnik · on Sept 12, 2019

If it is implemented, it will be used. And people will put it in their own builds.

We already have one “secret” escape flag feature, and people do use it, as much as we don’t talk about it and tell people not to use it when they find it.

glangdale · on Sept 12, 2019

Maybe put a tainted flag in it that causes the linker or runtime to fail? Then don't open source or release the modifications to allow the linker/runtime to avoid that failure check and refuse to let anyone check in a "fix" that allows this check to be skipped to an official build...

This surely seems like an incredibly important cost. Surely it's worth doing a bit of ugly magic to be able to keep track of it persistently.

tom_mellior · on Sept 12, 2019

Thanks to both of you for the insightful discussion. A flag would be helpful for testing, but it's true that if it's there, it will be used. Still, this can be tracked as part of a CI system by keeping around a patch for disabling bounds checks and regularly building and benchmarking a patched version. Less nice, but should get the job done.

account42 · on Sept 13, 2019

This attitude is the best argument for not using rust I have seen so far.

sh1mmer · on Sept 11, 2019

I'm curious if there is a discussion thread on that design choice.

steveklabnik · on Sept 11, 2019

I’m not 100% sure if there’s a source exactly, but we don’t like safety and correctness to depend on what flags you pass or do not pass. We don’t offer a fast-math flag either for similar reasons.

The odd one out is overflow, and that’s only because it’s well defined (a “program error”) and not UB to overflow in Rust. This gets checked in debug but not currently release, though the spec allows for it.

celrod · on Sept 12, 2019

What do you think of Julia's macro-based approach? That is, there are `@inbounds` and `@fastmath` macros that turn off bounds checking/enable fast-math flags in the following expression. `@fastmath` works simply by swapping functions (eg `+`) with versions (eg, `Base.FastMath.add_fast`) that have the appropriate llvm flags. When testing Julia libraries, all `@inbounds` are ignored (ie, it'll emit bounds checks anyway).

I assume it's already possible for a user to similarly implement `inbounds!` and `fastmath!` macros in Rust to substitute `[]` for `.get_unchecked()`, etc. (I haven't checked if there are already crates.) But it sounds like it should be easy enough for folks to check performance sensitive regions this way (in particular, loops that may need these flags to vectorize).

steveklabnik · on Sept 12, 2019

People have already implemented them, yeah. Dropbox did for some of their code, IIRC.

I think having the speedbump is worth it.

sh1mmer · on Sept 12, 2019

I guess my thought is that much of correctness comes from the compiler being able to make assertions that some type (and thus some memory address) will only be used in a correct way at compile time, etc, etc.

For example if we were dynamically linking a Rust crate into a Rust binary is it necessary to check boundaries in both or can some of that be deferred because we can assume the binary that will link has already done the boundary checks, etc?

I know it's a bit contrived since ideally we'd just compile statically, but I think it's still potentially valid. If both pieces of software have the guarantees then ideally you can factor out some of the overhead.

kjeetgill · on Sept 12, 2019

There are unbound-checked versions of that call. You can make that simplifying assumption in your code if you'd like.

Diggsey · on Sept 11, 2019

In case anyone is interested, rust has unsafe methods to index arrays without bounds checking:

https://doc.rust-lang.org/std/primitive.slice.html#method.ge...

hossbeast · on Sept 12, 2019

Sounds much worse than the compile time flag.

Diggsey · on Sept 12, 2019

Not really: indexing out of bounds without this check would invoke undefined behaviour. A compile time flag would not be able to distinguish the cases where a bounds check is required for the program to be correct, from the cases where the index is provably within bounds and so is unnecessary.

Who wants a compile-time flag that makes valid programs have undefined behaviour? Nobody, especially when you consider that UB in any language really does mean undefined: in the best case the program crashes, in the worst it deletes all your files.

What's wanted is a way to tell the compiler "no, in this specific case which I have determined to be a bottleneck in my program, I want to omit bounds checking because due to XYZ it's impossible for the index to ever be out of bounds" and that's exactly what this method provides.

fluffything · on Sept 12, 2019

They can just profile to find out which functions in their program are consuming the most CPU. Finding if these functions do have any bound checks, and if so, writing the single line of code required to tell the compiler "trust me, it is impossible for this index to ever be out-of-bounds, a bound check is not necessary".

If they are right, and bound checks are the issue, doing this should recover the performance difference.

lallysingh · on Sept 12, 2019

Is there a disassembly to compare? I'd rather just check.

jcelerier · on Sept 12, 2019

> only 2% - 10%

yeah, I wonder in which world they live. If I could sell a limb to get 10% more audio plug-ins in my DAW, you could be sure that my bedtime book would be "Life pro-tips for quadruple amputees"

emmericp · on Sept 12, 2019

2-10% for an already fast user space driver is nothing.

State of the art for a lot of these use cases is still the kernel driver which is ~7 times slower. Sure, all that stuff is moving to XDP/eBPF/AF_XDP, but that is still ~20-30% slower than a user-space driver.

Also, these 2-10% only show up when underclocking the CPU while running the unrealistic benchmark of forwarding packets bidirectionally on only one core (trivial to parallelize).

In the end it's about 6-12 cycles spent more in the driver. That's not a lot if you have a non-trivial application on top of it.

ZoomZoomZoom · on Sept 12, 2019

Fortunately for your body, this problem is easily solvable by hardware. Modern DAW's performance scales well with multithreading, for regular use cases at least.

jcelerier · on Sept 12, 2019

But more cores generally means that every core frequency goes down due to heat, which means that you can do less on each individual track.

ZoomZoomZoom · on Sept 13, 2019

I don't know your use case, but generally, if you have so many VST processing on a single track that it loads a core of a modern CPU, it means you're doing either something really creative, sculpting a sound, or some heavy-handed audio restoration. Both are candidates for freezing/rendering to a stem. YMMV, of course.

jcelerier · on Sept 16, 2019

yes, that is mostly what I'm doing. and I hate waiting 5 minutes for a track to be frozen.