It is beyond correct. Mysql's "utf8" only stores stuff from 1-3 bytes. https://m...

nvivo · on June 9, 2017

Exactly, but in any case even mysql version of UTF-8 should store an UUID the same way as ASCII.

seorphates · on June 9, 2017

In dealing with conversions or characterset handling the UUID pitfall is the hyphen. This is an extremely vulnerable character when dealing with multiple data handling and feed sources (to say nothing of data conversions).

The hyphen has a handful of imposters and one of the more troublesome ones is in extended ASCII as "en dash" (there's another as "em dash"). These do not have a direct mapping to unicode as they are not "hyphens" but they sure do look like them to humans in a hurry. It will throw your meticulously thought out and implemented keys (or any other well-measured column) into a tailspin. Dealing with extended ASCII implementations has been the most hazardous area of characterset handling, at least for me.

Although probably unlikely the UUID is not a guaranteed direct ASCII to UNICODE conversion. I'd advise awareness and perhaps even some level of caution if you have a complex data flow (or, more apt, a seemingly ridiculously simple and bulletproof one).

Extended ASCII sucks so if you use it and don't have one or both feet nailed to the floor then get off of it, sooner.

sroussey · on June 10, 2017

Store, yes. But indexes don't do variable length (think about the index being a btree of binary values), so it gets expanded. UTF8 (utf8mb4) will thus expand to 4 bytes per character. And MySQL (pre 5.7) will stop you from making such stupid indices.