Hacker News new | past | comments | ask | show | jobs | submit login

> I can't even imagine what sort of convenience methods one could implement on strings that deliberately have no specified representation!

A lot. Go's `string` type, for example, is conventionally UTF-8 encoded, but it is not enforced. Consider, for example, how much simpler this code[1] could be if `Vec<u8>` was more convenient to use as a string type.

There's a reason why the regex crate provides an API for both `&str` and `&[u8]`, because being able to deal with `&[u8]` as if it were a string is occasionally convenient. Importantly, without this API, ripgrep couldn't feasibly exist!

Other examples include file path handling. I need to be able to run globs (via regexes) on them, and the only way I can do that in a way that is zero cost on Unix is to get the bytes from the underlying `&OsStr` (on Windows, I do a lossy decode, which avoids the extra allocation in most places, but still requires the UTF-8 check). In particular, on Unix, this isn't even a matter of performance but rather of correctness, since file paths can contain arbitrary bytes and indeed have no specified representation! (Other than some rules like "no NULs and no /.")

It is often very convenient, in practice, to simply assume UTF-8 or at least an ASCII compatible encoding, rather than enforcing it as an invariant. For example, the link above to the ripgrep config parsing assumes the file contains ASCII-compatible text on Unix, and that's it. The file could contain latin-1 or UTF-8, it doesn't matter, and that is required for correctness. (Because file itself could contain file paths which may be arbitrary bytes.)

(To be clear, I think the UTF-8 invariant for String/&str was the best choice, certainly. What I'm saying here isn't that one should use Vec<u8> for strings in lieu of String, but rather, that using Vec<u8> as a string can be extremely useful in certain circumstances.)

[1] - https://github.com/BurntSushi/ripgrep/blob/7120f3225862f6c71...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: