> a way to not allow byte slice functionality on a thing that is clearly not a byte slice
This already exists in the form of structs or opaque types. Both of these approaches would end up being implemented in "userspace" anyways, whether that's standard library or third-party.
However, (UTF-8) strings are byte slices. You can do simple manipulation with them as byte slices safely and validly. Split on spaces? Sure. Tokenize? Sure. Find substring? Sure. You can't do things that depend on say UTF-8 graphemes, but you can safely do most things that depend on bytes. For most purposes, treating strings as byte slices is the safest and correct approach.
Doing find substring by find byte subsequence won't behave correctly in many cases, where semantically equivalent strings have multiple different bytesequence representation. Treating strings as byte slices exposes a lot of footguns; it shouldn't be easy just as e.g. treating floating-point numbers as byte sequences shouldn't be easy.
Technically the shortest UTF-8 representation is _the_ representation and _correctly_normalized_ Unicode is uniquely represented, but fair enough. The unknown input may be slightly malformed. Complexities like this is why one shouldn't underestimate the nuances (and runtime costs!) of implementing proper Unicode. As for representing byte sequences as byte sequences, that is the most basic way to represent strings of text without placing any assumptions on them. It's the assumption of potentially incorrect invariants that's the issue. If you have faculties to handle Unicode correctly (and very few languages do), then using something more opaque may be better fitting than a byte slice.
> Technically the shortest UTF-8 representation is _the_ representation and _correctly_normalized_ Unicode is uniquely represented
Not necessarily the shortest (NFC means not using composed characters from later revisions of the standard), and you only get a normalised representation if you've actually normalised it - if you've just accepted and maybe validated some UTF-8 from outside then it probably won't be in normalized form. IMO it's worth having separate types for unicode strings and normalized unicode strings, and maybe the latter should expose more of the codepoint sequence representation, but I don't know if any language implements that.
> it shouldn't be easy just as e.g. treating floating-point numbers as byte sequences shouldn't be easy.
That's a nice analogy.
> Doing find substring by find byte subsequence won't behave correctly in many cases, where semantically equivalent strings have multiple different bytesequence representation.
Unfortunately that's nearly impossible to do sanely in the general case, no matter how the string is represented.
I'm curious, what would be a good reason why treating floating point numbers as byte sequences should be any harder than what is required to make it obvious (provided their binary format is well defined)?
There are footguns in making that representation easy to access, e.g. if you try to hash the byte sequence to use floats as hash table keys then it will almost work but you'll get a very subtle error because 0 and -0 will hash differently. And frankly most of the things you'd do with the byte sequence are things that there are more semantically correct ways to do. There should be a way to access that representation but it shouldn't be something you'd stumble into doing accidentally, IMO.
You are talking about what stringy things can be done with byte slices and I'm talking about all the byteslicy things that shouldn't be done with strings.
Like subslicing. And accessing individual bytes in it.
This already exists in the form of structs or opaque types. Both of these approaches would end up being implemented in "userspace" anyways, whether that's standard library or third-party.
However, (UTF-8) strings are byte slices. You can do simple manipulation with them as byte slices safely and validly. Split on spaces? Sure. Tokenize? Sure. Find substring? Sure. You can't do things that depend on say UTF-8 graphemes, but you can safely do most things that depend on bytes. For most purposes, treating strings as byte slices is the safest and correct approach.