From my point of view, it should be the same as in Ada, Modula-3, C#, Go, Swift ...

geofft · on May 27, 2016

I used to hold this position, but I heard an example (I forget the source, but I think it's on the web) that convinced me otherwise: a &str is no different from a &[u8] in terms of representation, except for the type-system guarantee that it contains valid UTF-8 sequences. (Hence the syntax as &str instead of &[str], since "str" refers to sequences of bytes.) In a valid UTF-8 string, if you see the first byte of a multibyte sequence, you can assume that there is at least one more byte. It would be nice if we could write decoders that used that property, without having to do bounds-checks: the type system promises us that &strs are, in fact, valid UTF-8. But changing a &str to be invalid UTF-8 isn't inherently a memory-unsafe operation.

So we're left with two options. The first is to say, despite the typesystem, a UTF-8 decoder for &str isn't permitted to do anything that would be invalid/undefined/wrong if done to an arbitrary &[u8]. (In other words, &str is merely a hint, and everyone must code defensively as if any &str could be an arbitrary byte string.) The second is to narrow "safe" down to "does not break typesystem invariants," even though the set of possible typesystem invariants is pretty large.

I think Rust actually has a good claim to being different from other languages here, given how much more of a typesystem it has, and given how much more it tries to do with newtype wrappers and zero-cost abstractions. The inability to use newtypes for optimization would be pretty unfortunate in a language that otherwise does so many excellent things with detailed typing. There are some languages where types are just hints (I think Objective-C basically works this way), but that's definitely not Rust's style.

Manishearth · on May 28, 2016

> But changing a &str to be invalid UTF-8 isn't inherently a memory-unsafe operation.

It kind of is related though since there are APIs that do unsafe things based on this assumption, IIRC.

The type has some guarantees, and while the guarantees aren't related to memory safety, breaking them will cause memory safety issues.

GolDDranks · on May 27, 2016

Yes, and I think most people think the same. "Unsafe" is not a general marker for "proceed with caution" – it's specifically for memory safety. However, the debate is about what is considered memory safe. That's important, because that affects what the compilers are allowed to optimize and what not.