Hacker News new | past | comments | ask | show | jobs | submit login

Nope, tveita is right.

MySQL's storage has some space limits defined in terms of bytes, and to express that in terms of characters it has to allow for the worst case.

If it allowed 255-byte space to be used as VARCHAR(255) in UTF-8, then an insert of 255-character long string with emojis would fail.




The name VARCHAR is confusing since it's not properly defined what a character is. 1 byte? Something on the unicode table? If it'd be named VARBYTE(255) it would be pretty obvious that a 255-emoji-string insert would fail.

https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/


That's not a real issue. I can understand this may bite some people that don't read the docs or don't understand encoding, but it's more misinformation than a technical problem.

UTF-8 is variable length anywhere, not just inside mysql. If you don't want this behavior, you can either use UCS-2 or UTF-32. UTF-16 and UTF-8 are variable length encodings, period.

Most databases that have use char length for unicode actually work with UCS-2, and use 2 bytes per char, like MS SQL Server.


Not sure what deserved a downvote here... Is there anything incorrect?


It's not incorrect, but a strawman (I didn't downvote you btw).

The question was why UTF-8 in MySQL has weird limits. It wasn't about what MySQL should have done in an alternative universe.


Yeah, but I didn't point out what mysql should have done in an alternative universe. Mysql DOES support fixed lenght unicode, just use the correct encoding: ucs2 or utf32. What it does with utf-8 is what any system that supports utf-8 must do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: