I happen to think baking unicode into your concept of a string is fundamentally misguided, so that all string operations following from that premise are inherently wrong. The very first example, constrasting encoded byte length with String.length("é")=1, calling the latter the "proper length" walks into a shibboleth which puts Elixir on the side of String.length("ﷺ")=1, even with the grapheme clusters concept, for which the only salvation is integrated font rendering.
It's practical and informative, but I can't consider it well-thought-out.
ed: to clarify, ﷺ is an Arabic ligature which represents many more than one (linguistic) characters. A more accessible example might be "ffi".
I could be wrong, but I think the reason why String.length is one is to have a consistent idea of what happens when you have monospaced console output. Things in the elixir standard library exist "when you need them for elixir itself", and monospaced console output formatting working is needed in a few parts of elixir. If you care about bytes only, you can use byte_size, as indicated in the docs.
No, codepoint length is totally useless for monospaced console output, see the third example. Grapheme clusters are closer, but still wrong in the presence of wide characters.
I've written a fuzzing library that tests random Unicode inputs and the width of the output was sensible on three platforms (Linux, Mac, and powershell).
It's practical and informative, but I can't consider it well-thought-out.
ed: to clarify, ﷺ is an Arabic ligature which represents many more than one (linguistic) characters. A more accessible example might be "ffi".