Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the bug report. I filed it for you: https://issues.dlang.org/show_bug.cgi?id=23405

Having string be a magic builtin type does not eliminate the problem of dealing with invalid UTF sequences.

Invalid UTF sequences are inherent to the Unicode design, and programmers are left on their own to deal with it. The options are:

1. ignore them

2. use the replacement char

3. throw an exception (or other error indication)

D enables the programmer to pick which they need, on a case by case basis.




> Thanks for the bug report. I filed it for you: https://issues.dlang.org/show_bug.cgi?id=23405

#23405 was resolved as fixed a week ago. It isn't fixed. I guess at least I didn't waste my time filing the bug.


The problem does need solving, but it only needs solving once. D's approach means the programmers needs to make this decisions over, and over, and over again everywhere they have an alleged "string". Or they must track somehow (by convention perhaps?) whether string A is or is not "really" a string.

If you have type safety, you can make the choice just once.

Rust's String::from_{utf8,utf16}_lossy turn valid UTF-8/16 sequences into strings, and "fix" invalid ones with U+FFFD

Meanwhile String::from_{utf8,utf16} attempt the same but with an Err instead of replacement on failure if that's what the programmer wants.

Imagine if all D's numeric functions took the same attitude as its string functions, insisting on being passed arrays of bytes so that each function can parse those bytes, decide if this is actually a 16-bit unsigned integer (for example) and if so do what's expected otherwise perhaps return an error. We'd spot right away that this was not a practical design.

D's choices here are conventional, but I've come to expect a lot more and so I'm disappointed when I can't have it.


I don't see the difference here. D offers the same options when processing a string.


That's surely the whole point, every D std.string function is also a string decoder with varying features. But a suitably decoded "string" is still just the same type, whereas Rust has a distinct type for actual UTF8 strings


I think the point is that you run the unicode validation once on your [u8] array, which gives you a &str (or String for the lossy variants). From then on, you know you have valid unicode and don't need to keep checking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: