How big do you think an index on a 255 character string is (at least in MySQL)? And if you're not putting limits on your strings, do you think it's likely you're putting limits on your indexes?
Warning, fixed-size and limited-size is not the same thing. VARCHAR(12) is a VARCHAR which can not go beyond 12 characters, CHAR(12) is a fixed-size string which will be padded to 12 characters.
I'm pretty sure Rails uses VARCHAR (or its equivalent in other DBs) exclusively. But unless a size is specified it will use VARCHAR(255). VARCHAR(16) will enforce a maximum size (through truncation) and may (depending on the database[0]) lead to optimizations by storing the string differently and/or ending with lower row (and table) sizes.
As for TFA, his example using phone numbers is not very good because while there are standard sizes for phone numbers (E.164 allows for 15 digits including a country code of 1-3 digits, giving national number maximums of 12 to 14 digits depending on the country) people usually fuck them up hard (by using the sizes — and formattings — of their home countries), and when adding local formatting conventions it's an even bigger messes (depending on the country, you can have up to half a dozen format characters interspersed between digit groups)
[0] it does not in Postgres: as your link notes, Postgres will dynamically decide how to store a string — in-row or out-of-row for instance — on the spot based on its size. On the other hand maximum string size has an effect on Postgres indexes: b-tree (the default) will blow up on strings beyond 4k chars, and this is not statically checked by postgres, it will blow up while trying to insert a string of more than 4k chars.
Of course if you're trying to index a string column with such sizes using anything but full-text indexes you're probably doing something wrong anyway, but if you're using VARCHAR (which maxes out at 1GB due to implementation details) you may find yourself in that situation even though you're only storing 4-letter codes.
> VARCHAR(16) will enforce a maximum size (through truncation)
Thankfully, real databases won't do that. They will blow up and leave the application to do the truncation.
If there is one feature I want a database to have, it's that I'm guaranteed to read the data back that I stored there.
That also means that if I store "foobar" in a varchar(3) field that I don't want to read back "foo" with no way to ever find out that I didn't actually ever store foo there to begin with.
Storing a 6 character string into a field dimensioned for three characters is an error and not something that would warrant silent data corruption.
If you read the PostgreSQL docs you'll see that it only truncates spaces and throws an error if any non-space data is present. Apparently, this behavior is required by the SQL specifications.
So, in your example of storing "foobar" in a varchar(3) field, PostgreSQL will throw an error and not store the data. You could however store "foo " in that same field without any problems and you would get back "foo" in return.
MySQL will silently truncate it without throwing an error however. And even when you know about it, you'll still get bitten occasionally and SERIOUSLY kick yourself when you realise the error.
Postgres is better for the little things, but when it goes wrong it tends to be harder to find the reason why. At least in my experience. That said, I'll still reach for Postgres over MySQL these days.
MySQL will silently truncate it without throwing an error however.
It will indeed, for folks that prefer that or haven't looked into enabling alternate behavior. Set STRICT_TRANS_TABLES (for InnoDB) or STRICT_ALL_TABLES (for any storage engine) in MySQL's config and it'll throw the following error instead of silently truncating data:
[Err] 1406 - Data too long for column '[some column]' at row [XYZ]
I know that Postgres does this the right way. I was referring to the initial comment that was suggesting that truncation happens, but of all the databases I know, only MySQL does that and it's totally wrong IMHO
>On the other hand maximum string size has an effect on Postgres indexes: b-tree (the default) will blow up on strings beyond 4k chars, and this is not statically checked by postgres, it will blow up while trying to insert a string of more than 4k chars.
What do you mean by "blow up" here?
create table foos (col varchar(100000));
create index on foos(col);
insert into foos values (repeat('s', 100000));
Indeed, there is no need to use "varchar" with Postgres. One should use the modern "text" data type (which is like varchar, but practically unlimited).
Unlike MySQL's text datatype, it has no technical limitations; it can be used for fulltext indexing, it can be used with any table, and it is stored in the actual row (not externally).
varchar is an alias for "character varying", which is exactly the same as text.
I use both varchar and text. varchar for things that should be one line (i.e. a username). text for things that might be multiple lines. formtastic will show a html input field for varchar columns and a textarea for text columns.
(of course, you can insert paragraphs into the varchar columns, but it's a form of documentation)