> Technically the shortest UTF-8 representation is _the_ representation and _correctly_normalized_ Unicode is uniquely represented
Not necessarily the shortest (NFC means not using composed characters from later revisions of the standard), and you only get a normalised representation if you've actually normalised it - if you've just accepted and maybe validated some UTF-8 from outside then it probably won't be in normalized form. IMO it's worth having separate types for unicode strings and normalized unicode strings, and maybe the latter should expose more of the codepoint sequence representation, but I don't know if any language implements that.
Not necessarily the shortest (NFC means not using composed characters from later revisions of the standard), and you only get a normalised representation if you've actually normalised it - if you've just accepted and maybe validated some UTF-8 from outside then it probably won't be in normalized form. IMO it's worth having separate types for unicode strings and normalized unicode strings, and maybe the latter should expose more of the codepoint sequence representation, but I don't know if any language implements that.