Hacker News new | past | comments | ask | show | jobs | submit login

> The whole point of UTF-8 is that it's identical to Latin-1 except in cases where it can't be.

It's only compatible with ASCII, not Latin-1. You are maybe confusing UTF-8 with the Unicode codepoint space, of which the first 256 codepoints are the same as Latin-1.




Just in case there's any confusion:

- 7-bit ASCII encodes 128 characters; latin-1 encodes 256 characters; UTF-8 encodes all unicode characters

- latin-1 and UTF-8 both extend 7-bit ASCII; the three encodings are equivalent for ordinals 0-127

- UUID characters are part of 7-bit ASCII, so they have the same encoding in ASCII, latin-1 and UTF-8


100% correct - UUID strings are ASCII < 128 .. but there are imposters for character code 45 ("-"). Assumption based approaches to conversions or field handling should not be assumed to be safe ones just because UUID is "technically" ASCII. Only the keeper of the field can truly enforce that constraint.


Aha, thanks for calling me out. I was indeed thinking of ASCII.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: