"Throws: UTFException if invalid UTF sequence and useReplacementDchar is set to UseReplacementDchar.yes"
My guess is that this is a mistake and should instead say UseReplacementDchar.no since it makes sense to throw an exception if you can't use U+FFFD here, rather than do both.
Anyway, in my view this is bad the same way the Billion Dollar Mistake is bad, and Rust made the right choice here. Arrays of stuff are great, but they aren't strings. Having to sprinkle "or maybe not" cases all over these libraries because of course these might not really be strings, results in exception fatigue from your developers, which in turn results in lower quality software and more effort for the conscientious developers who stick it out.
D's strings are less stupid than C's (and thus some of the C++ strings) but they're still just arrays which are maybe but maybe not actually text.
The problem does need solving, but it only needs solving once. D's approach means the programmers needs to make this decisions over, and over, and over again everywhere they have an alleged "string". Or they must track somehow (by convention perhaps?) whether string A is or is not "really" a string.
If you have type safety, you can make the choice just once.
Rust's String::from_{utf8,utf16}_lossy turn valid UTF-8/16 sequences into strings, and "fix" invalid ones with U+FFFD
Meanwhile String::from_{utf8,utf16} attempt the same but with an Err instead of replacement on failure if that's what the programmer wants.
Imagine if all D's numeric functions took the same attitude as its string functions, insisting on being passed arrays of bytes so that each function can parse those bytes, decide if this is actually a 16-bit unsigned integer (for example) and if so do what's expected otherwise perhaps return an error. We'd spot right away that this was not a practical design.
D's choices here are conventional, but I've come to expect a lot more and so I'm disappointed when I can't have it.
That's surely the whole point, every D std.string function is also a string decoder with varying features. But a suitably decoded "string" is still just the same type, whereas Rust has a distinct type for actual UTF8 strings
I think the point is that you run the unicode validation once on your [u8] array, which gives you a &str (or String for the lossy variants). From then on, you know you have valid unicode and don't need to keep checking.
On the other hand, the sad reality is that even when you have a plethora of string types to accommodate with reality like Rust, people will just not care out of convenience. See how Rust build scripts communicate paths to cargo via stdout, and how most of them just use Path::display (or something similar or worse) to do that, which is lossy. Rustc itself doesn't handle paths correctly either. IIRC, all in all, it's basically impossible to compile Rust code from a non-UTF-8 path.
D's string is not text by itself because it is an array of UTF-8 code units. However, we have this infamous feature called auto-decoding in the standard library that presents strings as unicode code points.
On the other hand, D's dstrings are more like text because they are not only UTF-32 but also random-accessible code points. (D does not address multiple representations of graphemes at language level. For example, at language level, ğ is different from "g and combining breve" but there are std.uni and std.utf modules that help.)
> D's string is not text by itself because it is an array of UTF-8 code units.
Bytes. It's an array of bytes. D's char type isn't actually restricted to UTF-8 code units, char x = '\xFF'; works just fine even though that's not UTF-8.
https://dlang.org/phobos/std_utf.html#.byUTF
"Throws: UTFException if invalid UTF sequence and useReplacementDchar is set to UseReplacementDchar.yes"
My guess is that this is a mistake and should instead say UseReplacementDchar.no since it makes sense to throw an exception if you can't use U+FFFD here, rather than do both.
Anyway, in my view this is bad the same way the Billion Dollar Mistake is bad, and Rust made the right choice here. Arrays of stuff are great, but they aren't strings. Having to sprinkle "or maybe not" cases all over these libraries because of course these might not really be strings, results in exception fatigue from your developers, which in turn results in lower quality software and more effort for the conscientious developers who stick it out.
D's strings are less stupid than C's (and thus some of the C++ strings) but they're still just arrays which are maybe but maybe not actually text.