The more pertinent question to me is can we implement some new static analysis that understands buffer re-use and can hoist buffer initialization outside the loop? Rather than make the programmer write obfuscated code for efficiency, it is usually better to have the compiler do the heavy lifting.
P.S. Also, folks, don't re-use buffers without zeroing unless you absolutely need the performance and know what you're doing.
Because the compiler optimizes based on the assumption that consecutive reads yield the same value. Reading from uninitialized memory may violate that assumption and lead to undefined behavior.
(This isn't the theoretical ivory tower kind of UB. Operating systems regularly remap a page that hasn't yet been written to.)
If you read something where you have not written, who cares whether the compiler optimizes things such that if you read from there again, you get the same value, even though that is not true?
Anyone who wants to be able to sanely debug. Code is imperfect, mistakes happen. If the compiler can optimise so that any mistake anywhere in your program could mean insane behaviour anywhere else in your program, then you get, well, C.
(E.g. imagine doing a write to an array at offset x - this is safe in Rust, so the compiler turns that into code that checks that x is within the bounds of that array, then writes at that offset. If the value of x can change, then now this code can overwrite some other variable anywhere in your program, giving you a bug that's very hard to track down)
I see what you're getting at: situations in which the compiler trusts that the location has not changed, but needs to re-load it because the cached value is not available. When the location is reloaded, the security test (like a bounds check) is not re-applied to it, yet the value being trusted is not the one that had been checked.
This is not exactly an optimization though, in the sense that it will mess up even thoroughly unoptimized code (with more likelihood, due to caching optimizations being absent).
So that is to say, even the generation of basic unoptimized intermediate code for a language construct relies on assumptions like that certain quantities will not spontaneously deviate from their last stored value.
That's baked into the code generation template for the construct that someone may well have written by hand. If it is optimization, it is that coder's optimization.
The intermediate code for a checked array access, though, should be indicating that the value of the indexing expression is to be moved into a temporary register. The code which checks the value and performs the access refers to that temporary register. Only if the storage for the temporary registers (the storage to which they are translated by the back end) changes randomly would there be a problem.
Like if some dynamically allocated location is used as an array index, e,g. array[foo.i] where foo is a reference to something heap allocated, the compiler cannot emit code which checks the range of foo.i, and then again refers to foo.i in the access. It has to evaluate foo.i to an abstract temporary, and refer to that. In the generated target code, that will be a machine register, or a location on the stack. If the machine register or stack are flaky, all bets are off, sure. But we have been talking about memory that is only flaky until it is written to. The temporary in question is written to!
> The intermediate code for a checked array access, though, should be indicating that the value of the indexing expression is to be moved into a temporary register. The code which checks the value and performs the access refers to that temporary register. Only if the storage for the temporary registers (the storage to which they are translated by the back end) changes randomly would there be a problem.
You'd almost certainly pass it as a function parameter, prima facie in a register/on the stack, sure, and therefore in unoptimised code nothing weird would happen. But an optimising compiler might inline the function call, observe that the value doesn't escape, and then if registers are already full it might choose to access the same memory address twice (no reason to copy it onto the stack, and spilling other registers would cost more).
I don't know how likely this exact scenario is, but it's the kind of thing that can happen. Today's compilers stack dozens of optimisation passes, most of which don't know anything about what the others are doing, and all of which make basic assumptions like that the values at memory addresses aren't going to change under them (unless they're specifically marked as volatile). When one of those assumptions is broken, even compiler authors can't generally predict what the effects will be.
Makes sense. When a temporary is the result of a simple expression with no side effects that is expected to evaluate to the same value each time, the temporary can be taken back. An obvious example of this is constant folding. We set a temporary t27 to 42. Well, that can just be 42 everywhere, so we don't need the temporary. The trust "evaluate to same value each time" is based on assumptions, which, if they are wrong, things are screwed.
I like that direction better but it requires the ability to declare data-flow based contracts whereas Rust’s tools are only lifetime and type contracts. Is there a language that has data-flow based contracts?
That would be easier but is not required. There are no compiler hints these days to unroll loops or hoist invariants, even though if done incorrectly it could change the result. It would take some complicated analysis, but I think it could be done safely in some cases.
No? It depends on your definition of inner loop I guess.
If you're doing some sort of zero-copy IO, the time to clear the buffer might be non-trivial (not huge, but non-trivial). It's true that you need a large enough buffer that syscall/ffi overhead doesn't dominate, but that's not unrealistic.
It's rare that we care about this, that's true, that's why generally rust has been fine with "just zero buffers". There are definitely domains that care though.
In some languages, like Java, go and probably Javascript, this is probably true, depending on how much memory needs to to be initialized. But in rust FFI isn't any more expensive than any other non-inlined function call.
The loop unrolling & invariant hoisting is a static transformation. What the “read” function does semantically isn’t captured today within that and the compiler wouldn’t be able to automatically infer it. It would have to be told that information and there would need to be unsafe annotations for things like syscalls and FFI boundaries. The other approach is to change the API which is what BorrowedBuf is.
If you can think of a different approach of how the compiler can figure out automatically what memory has become initialized by a random function call I’m all ears.
That's what I glossed over as "complicated analysis". In my mind, if a compiler can understand register and stack use (required for static transformations), it can (theoretically, and with some effort) understand heap use. Am I wrong?
Yes, you are wrong. This isn’t basic constant hoisting.
The compiler doesn’t reasonably have any of that information to understand what read is filling in at runtime because that information is encoded purely at runtime and the compiler has no reasoning mechanism even close to answering runtime data flow questions.
There’s also all sorts of complexity that has to do with the kinds of transformations that are possible as the legal information that exists at the language level is often erased before it gets to the stack/register piece and vice versa the language layer knows nothing about registers and minimal stuff about stack.
This is the same reason that the compiler fails to compile something like:
for _ in 1..10 {
let x: String = create_new_string();
eprintln(“{x}”);
}
Fails to hoist x out of the loop even if the returned string is String::new(“ABC”) unless maybe LTO is on (and even then maybe not). Basically the compiler’s “magic” is very limited to static transformations that follow as-if - the compiler must know the static transformation is blindly identical and the amount of reasoning about the structure is often very limited.
Said another way, if the compiler could do the optimizations you’re hypothesizing, it would be equivalent to applying a mid level performance engineer to every code base it encounters.
The problem is that that kind of analysis would require full program optimization, rather than optimizing individual functions, because the signature of the call to read doesn't provide enough information for the compiler to know if it ever reads the data. And indeed, it can't know until link time.
I think it's theoretically possible, but at the cost of much longer compile times, and greater complexity in the compiler.
There would need to be contractual declarations on the read method that the compiler is able to enforce that tells it that the input &mut slice has N elements clobbered based on the returned length. That’s basically what BorrowedBuf is accomplishing via the type system and runtime enforcement of the contract. Using a non-existent syntax:
fn read<T, N: size_t>(&mut self, buf: &mut [MaybeUninit<T>] becomes &[T; N] after call) -> N {
… enforces the body initializes N elements out of buf
}
and then rules that &mut [T] can also be supplied to such functions that today could only accept a &mut [MaybeUninit<T>] transparently.
A more likely interface you could write today would look like:
fn read_uninit<T>(&mut self, buf: &mut [MaybeUninit<T>]) -> (&[T], &[MaybeUninit<T>]) {
… enforces the body initializes N elements out of buf
}
You still have to cast &[T] into &[MaybeUninit<T>] somehow.
> You still have to cast &[T] into &[MaybeUninit<T>] somehow.
unsafe{ std::mem::transmute(slice) }
This is probably the only way that will ever exist, because
let slice: &mut [NonZeroU8] = ...;
let slice_uninit: &mut [MaybeUninit<NonZeroU8>] = ...;
let nonzero_uninit: &mut MaybeUninit<NonZeroU8> = &mut slice_uninit[0];
*nonzero_uninit = MaybeUninit::zeroed();
slice[0]; // Undefined behavior for sure by now.
Is all safe except for the cast.
I.e. MaybeUninit<T> allows you to write invalid bit-patterns to T, so you can't safely cast a reference to T to it (and if you do unsafely cast a reference to T to it you can't soundly write an invalid bit pattern). All current forms of safely making a MaybeUninit take ownership of the value they are declaring to be MaybeUninit for this reason.
I guess at some point we might get methods for this on types that can take on all bit patterns - if/when that's encoded as a trait.
I think an ergonomic way to do that would to have read return not an integer, but a slice of that integer’s length.
Problem would be: how do you express “you can only access the buffer you sent me through the read-only slice I returned, but you have to free that same buffer when you’re done calling me?
I think that can be done using a function creating a read buffer for a given input stream that
- during calls to read is ‘owned for writing’ by that stream (so, it has to borrow a capability that the creator of the buffer doesn’t have. I don’t think Rust currently supports that)
- where stream.read returns a read only slice whose lifetime is bound to that of the buffer
So, the creator of the buffer can only pass it to read to get a slice back that contains precisely the data read.
Fair, but note there is a significant subset of Rust-targeted programmers who dislike the compiler doing things like that. They also dislike the compiler doing things like auto-initializing every loop iteration, but two wrongs wouldn't make it right, just less wrong.
P.S. Also, folks, don't re-use buffers without zeroing unless you absolutely need the performance and know what you're doing.