Hacker News new | past | comments | ask | show | jobs | submit login

It is beyond correct. Mysql's "utf8" only stores stuff from 1-3 bytes.

https://mathiasbynens.be/notes/mysql-utf8mb4

You have to use utf8mb4 to have proper utf8 support. Yes its stupid, this is the main reason I prefer not using mysql for anything. Too many gotchas lurking about.




Exactly, but in any case even mysql version of UTF-8 should store an UUID the same way as ASCII.


In dealing with conversions or characterset handling the UUID pitfall is the hyphen. This is an extremely vulnerable character when dealing with multiple data handling and feed sources (to say nothing of data conversions).

The hyphen has a handful of imposters and one of the more troublesome ones is in extended ASCII as "en dash" (there's another as "em dash"). These do not have a direct mapping to unicode as they are not "hyphens" but they sure do look like them to humans in a hurry. It will throw your meticulously thought out and implemented keys (or any other well-measured column) into a tailspin. Dealing with extended ASCII implementations has been the most hazardous area of characterset handling, at least for me.

Although probably unlikely the UUID is not a guaranteed direct ASCII to UNICODE conversion. I'd advise awareness and perhaps even some level of caution if you have a complex data flow (or, more apt, a seemingly ridiculously simple and bulletproof one).

Extended ASCII sucks so if you use it and don't have one or both feet nailed to the floor then get off of it, sooner.


Store, yes. But indexes don't do variable length (think about the index being a btree of binary values), so it gets expanded. UTF8 (utf8mb4) will thus expand to 4 bytes per character. And MySQL (pre 5.7) will stop you from making such stupid indices.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: