Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's probably no way for the compiler to prove safety. Rust is designed to allow 100% safe bare metal development, like a perfectly safe C that still allows you to get close to the hardware, and that's tough.


> There's probably no way for the compiler to prove safety.

That's already the case for the AnyBitPattern stuff though. (Indeed according to the docs AnyBitPattern traits already get cast from uninitialized bytes, which in C/LLVM semantics are not necessarily frozen, even if in practice Linux would not be remapping the pages they're in).

> Rust is designed to allow 100% safe bare metal development

I wouldn't say that since bare metal rust always needs some unsafe; rather it's designed to allow managed, contained use of unsafe constructs in code that's say 98% safe. The whole purpose of this BorrowedBuf is already something like that.


I'm failing to understand the correlation to "safety" here. Reading a byte for which you don't know the value isn't "unsafe". It's literally (!) the desired behavior of foreign data being read from an external source, which is in fact the use case in the article.

There's no safety problem as long as the arbitrary value is deterministic, which it is, being process RAM. The "uninitialized data read" bugs reported from instrumentation tools in C code are because the code is assuming the value has some semantics. The read itself has no value and is presumably an artifact of the bug, but it is safe.


> There's no safety problem as long as the arbitrary value is deterministic, which it is, being process RAM.

The article discusses how it is in fact, on Linux with memory returned from at least one very common allocator, not deterministic. Ctrl-f tautology.


That's just a terminology collision. All RAM access is deterministic in the sense that the value will not change until written. It's not "predictable" in the sense that the value could be anything.

C code that reads uninitialized data is presumed to be buggy, because it wouldn't be doing that unless it thought the memory was initialized. But the read itself is not unsafe.

Rust is just confused, and is preventing all reads from uninitialized data a-priori instead of relying on its perfectly working type system to tell it whether the uninitialized data is safe to use. And that has performance impact, as described in the linked article, which has then resulted in some terrible API choices to evade.


> All RAM access is deterministic in the sense that the value will not change until written.

Again, the article literally points to how this is not true given modern allocators. The memory that Linux exposes to processes will change without being written to prior to being initialized given how allocators manage it. This isn't a fiction of the C-standard or rust reference, it's what actually happens in the real world on a regular basis.

Rust is not confused, it is correctly observing what is allowed to actually happen to uninitialized memory while the process does nothing to it.

You could change the C/Rust specification of that memory. You could in your C/rust implementation declare that the OS swapping out pages of uninitialized memory counts as a write just like any other, and that it's the programmers (allocators) responsibility to make sure those writes obey the normal aliasing rules. Doing so would be giving up performance though, because the fact that writing to memory has the side-effect of cancelling collection of freed pages is a powerful way for processes to quickly communicate with the OS. (You'd probably also cause other issues with memory mapped IO, values after the end of the stack changing, and so on, but we can just focus on this one issue for now).


You have some misconceptions about C and undefined behavior.

> C code that reads uninitialized data is presumed to be buggy, because it wouldn't be doing that unless it thought the memory was initialized. But the read itself is not unsafe.

The read itself is very much unsafe because it's undefined behavior. The compiler is allowed to assume the programmer doesn't allow reads to uninitialized memory to happen, so if the programmer does allow a read from uninitialized memory, any false conclusion can follow from the false assumption.

This is a problem even in trivial cases; see [1] and try commenting out and switching arround the calls to foo and bar. The behavior is very unintuitive, because reads from uninitialized memory is unsafe.

[1]: https://godbolt.org/z/71vbPYT6G


> You have some misconceptions about C and undefined behavior.

The discussion is about RAM and Rust, not C. And the particular use of uninitialized data in C that corresponds to the linked article (as a target buffer for a read call) is clearly not undefined behavior.

This is a classic HN tangent, basically. You're making the discussion worse and not better.

The question is "Why can't Rust act like C when faced with empty buffers?", and the answer has nothing to do with undefined behavior or unsafe. They just got it wrong.


Okay, I think I see. In the linked article, due to the behavior of `read` uninitialized memory is never read. The same would be true in equivalent C code.

However, in C, the programmer doesn't need to prove to the compiler that uninitialized memory is never read, they are just expected to prevent it from happening. In this case, it's clear to the programer that there's no undefined behavior.

In Rust though, the compiler must be able to statically verify no undefined behavior can occur (except due to unsafe sections). It's not possible to statically verify this in either the Rust or C case, because not enough information is encoded into the type signature of `read`. The article discusses a couple of ways that information might be encoded so that Rust can be more like C, and discusses their trade-offs. C explicitly sidesteps this by placing the responsibility entirely on the programmer.

So to directly answer your question "Why can't Rust act like C when faced with empty buffers?", it's because the Rust compiler cannot yet statically verify there's no undefined behavior in this case, even though there is in fact no undefined behavior, and one of the primary design goals of Rust is to statically prevent undefined behavior.

And to what's perhaps the initial question, this is discussed using the term "safety" simply because Rust defines things which can't be statically verified to not invoke undefined behavior as "unsafe". Perhaps a better term would be "not yet statically provable as safe", but it's a bit of a mouthful.


> it's because the Rust compiler cannot yet statically verify there's no undefined behavior in this case

Uh... yes it can. It's a memory write to the uninitialized region. Writes are not undefined, nor unsafe, and never have been. They aren't in C, they aren't in hardware. Writes are fine.

The bug here is API design, not verification constraints.


The issue isn't writes to uninitialized memory, it's reads from uninitialzed memory. The compiler doesn't know how much of the buffer `read` writes. The docs say it returns a unsigned integer with how many bytes it wrote, so a programmer can know the later read from `buffer[0..num_bytes_written]` is valid, but the compiler doesn't know what the number returned from `read` represents, so from the compiler's point of view, the whole buffer needs to be initialized regardless of what read does for reads from it to be valid. That means it has to be initialized before it's passed to read, otherwise the compiler can't prove the elements which are later read from the buffer are initialized.


I'm basically going to give up and declare victory. You're saying more or less exactly what I said above and that you took issue with, which is that the fundamental problem here is one of API design (Rust's kinda sucks) and not the safety of the underlying primitive being abstracted, which has never been at issue. And certainly nothing about undefined behavior, given that it's 100% well defined.


The write is fine, but the subsequent read isn't (unless you know that the write happened, which the Rust compiler doesn't).


> All RAM access is deterministic in the sense that the value will not change until written

That's correct at a hardware level. It's not correct from a userspace program's perspective when interacting with Linux if the memory in question is uninitialized.


> That's correct at a hardware level.

Not even there, due to memory caches.


I might be one of today's luck ten thousand.

Suppose you're writing an OS for a modern computer. You have full access to the subset of hardware actually exposed to you. You have a single process (having not yet done anything with IPI, APIC, ...) so far. I think two things are true, and I'm much more confident about the first one:

1. If you write to a region of RAM, then at any point in the future (ignoring hardware failures) if you read that same region you'll read the value you wrote, unless you issue write instructions to that same region first.

2. If you read from a region of RAM, then at any point in the future (ignoring hardware failures) if you read that same region you'll read the value you previously read, unless you issue write instructions to that same region first.

Caches matter, but they're hidden from software, except where that hiding is too expensive (hence, atomics and whatnot for you to synchronize between processes).

Is my view of the world too simplistic? Is there some way in which those caches are even more poorly behaved than I imagined?


1. Yep, this is an abstraction HW works hard to preserve.

2.The specific scenario I had in mind was: write value X to address A from core 1, write value Y to A from core 2, read value from address A from core 1 (still X in core 1's cache), core 1 cache gets invalidated, read value from address A from core 1 again (now it fetches Y from memory).

See [1] for reference on how it applies to C specifically, especially "Absent any constraints on a multi-core system, … one thread can observe the values change in an order different from the order another thread wrote them".

[1]: https://en.cppreference.com/w/c/atomic/memory_order


> Is my view of the world too simplistic?

It's not. There are no artifacts of cache incoherence visible on modern devices that are treated as part of the C language undefined behavior rules. The language runtime can assume memory is just memory.

Obviously there are visible artifacts, but all that code lies outside the realm of standard C, even though you write it in C and build it with a C compiler. This is one of the reasons I find arguments that start from the UB rules as postulates unpersuasive. At the end of the day we have to write code for real hardware.

The tl;dr version is that Rust as currently understood is just never going to be able to express stuff like incoherent DMA spaces. As witnessed here it still struggles with representing read() in a way that doesn't require zero-filling the buffer.


> At the end of the day we have to write code for real hardware.

We have to write code for real (often future) compilers, and compilers are generally unsympathetic to code whose behaviour is undefined under the standard (I think this is unreasonable behaviour by compiler maintainers, but that argument has been lost).

In my experience C compilers don't really think or care about what happens for code that triggers UB, so it wouldn't at all surprise me if it was possible to get a compiler to emit code that exposes cache incoherence (i.e. it reads and writes memory in a pattern that does not have defined behaviour according to the underlying platform's memory model (would need extra barrier instructions etc.), and then on that hardware the result is a cache incoherence effect). Probably not on x86 with its friendly memory model, but maybe on ARM or the like (Alpha used to be a notorious place to see such bugs).


I don’t understand your point and you’re wrong on a couple of things.

> C code that reads uninitialized data is presumed to be buggy, because it wouldn't be doing that unless it thought the memory was initialized. But the read itself is not unsafe. Rust is just confused, and is preventing all reads from uninitialized data a-priori instead of relying on its perfectly working type system to tell it whether the uninitialized data is safe to use

Reads of uninitialized memory is unsafe full stop. That’s literally what Rust’s memory safety is about. If you give that up you’re not in safe land and you can always use unsafe & all the risks that come with that to try to write more optimal code / abstractions.

This article is literally about the mechanisms Rust is trying to stabilize how you use the type system to go from a block of uninitialized memory & is aware of writes occuring so that it can take a &[MaybeUninit<T>] and give you back a &[T] after a call to read which wrote into the slice. But reading uninitialized memory by definition is tautologically unsafe. It doesn’t mean that the computer will pull out a knife and kill you, but it does mean you’re no longer memory safe.


> Reads of uninitialized memory is unsafe full stop. That’s literally what Rust’s memory safety is about. If you give that up you’re not in safe land and you can always use unsafe & all the risks that come with that to try to write more optimal code / abstractions.

Reading uninitialized (but correctly owned) memory is not inherently unsafe. It has strange (and mostly useless) semantics, but it's not a safety issue. E.g. something like the OpenSSL key generation that Debian broke - [uninitialised buffer] ^= [generated random values] - is safe, and afterwards you can read from that buffer and get (safe, real, stable) values.


I’m not sure what point you’re arguing. C and C++ treat uninitialized reads as UB, full stop. Any C/C++ compiler can do whatever it wants if it realizes that’s the case. Rust is no different, it’s just that it makes it a compile error in “safe” mode and forces you to annotate it with unsafe to be explicit it’s intentional (while still following the rules mind you - accessing uninitialized memory is still UB even in unsafe rust and doing so gives the compiler freedom to break your code)


> C and C++ treat uninitialized reads as UB, full stop.

Not true; if it's an object that has its address taken, of a type that does not have a trap representation, then reading it results in an unspecified value and is not UB. Which aligns with what one might naively expect to happen.


This is true of Rust as well since it has a very similar memory model. You just have to explicitly tell the compiler you know what you’re doing in Rust through the unsafe keyword.


So reading an uninitialised u8 is not UB? I found the Rust documentation pretty unclear, u8 does not have a "restricted set of valid values" which suggests it should be ok, but the documentation also says "the only cases in which reading uninitialized memory is permitted are inside unions and in “padding” (the gaps between the fields of a type)".


> All RAM access is deterministic in the sense that the value will not change until written.

No, this is directly addressed in the article.

RAM access to uninitialized memory is not deterministic and can change. See MADV_FREE.


That's a VM feature. I mean, yes, if you change your process's memory space between accesses you break whatever the compiler might have assumed. You don't need fancy flags, either! Just, y'know, munmap() will do.

This is yet another unrelated tangent. Can you explain why it it you think the presence of MADV_FREE disallows Rust from allowing writes to uninitialized memory?


You’re missing the point. It doesn't matter that this is a VM feature and not a hardware feature. The fact that it exists at all means that reading from uninitialized memory is unsound. You can write to it all you want, but not read from it.


Once more, the use case in question is WRITING to uninitialized memory, not reading from it. Rust is applying the constraints of the latter (and thus requiring "unsafe") to the former, which is not an unsafe operation.


No. Rust is only applying any constraints to reading from it.

The entire discussion is how to make an API which defines which portions of the memory can be read from because they have been written to, so we can expose an API that cannot be misused. There have never been any constraints on writing to uninitialized memory.


What does safe mean here? Everything can be interpreted as a [u8], right?


[u8] guarantees to the compiler that two reads through the array at the same location without any intervening writes return the same value.

Turns out that's not the case on freshly returned uninitiated allocations. The first read could return old data (say "1"), and the second read could return a freshly zeroed page ("0").


No, it's not so.

If the allocation is backed by the kernel, then it will be zero-filled for security reasons. If it's backed by user-space malloc then who knows; but there's never a scenario where a mallocated page is quietly replaced by a zero-filled page behind the scenes.


Ctrl-f tautology in the article, it turns out that is not the case because of madvise free.


https://www.ralfj.de/blog/2019/07/14/uninit.html perhaps (the OP also talks about this when linking to a talk about jemalloc)


Interesting.

Tl;dr: its not to do with any hardware concept, the compiler can substitute any value for a read of uninitialised memory, and the value does not have to be stable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: