Hacker News new | past | comments | ask | show | jobs | submit login

I think this is correct for CHAR and incorrect for VARCHAR.

A CHAR column always allocates the maximum size, so a utf8 (MySQL's weird "UTF-8, but only for the BMP" encoding) CHAR(100) needs to reserve 300 bytes per row, and a utf8mb4 (MySQL's actual UTF-8 encoding) CHAR(100) needs to reserve 400 bytes per row.

But a VARCHAR is dynamically allocated, so, yes, a VARCHAR(100) that stores 100 lower-ASCII characters is only going to use 102 bytes of storage (using 2 bytes for the string length).

See, e.g., the "tip" section at the end of https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8...




Author mentions compound keys, so issue was probably with the index key prefix length, which is by default limited at 767 bytes. That limits you to a maximum of having just first 255 characters of string data indexed with utf8, but in latin1 (one byte per character) you can have 767 characters long prefix. So with latin1 it's perfectly ok to have compound index on 2 varchar(255) columns, but you can't do that with utf8. Converting from latin1 to utf8 will break all those indices.


The 767 limit is with the default row format. In 5.7 the row format changes, and allows 3000+


With innodb it is variable storage regardless of varchar or char when UTF8.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: