Hacker News new | past | comments | ask | show | jobs | submit login

It made more sense at the time. Earlier days of the web, dynamic scripting languages used everywhere, etc. No emoji, barely any use of 4-byte characters at all. Large corps weren't using open source DBs, or sometimes open source anything. Simpler times!

And for sake of comparison -- if I understand correctly, prior to just 2 months ago, sqlite didn't even have the option for strict type enforcement [1]. Do you judge sqlite as harshly, or think its creator is out of touch with the database world?

[1] https://www.sqlite.org/datatype3.html




Relational databases have been used in production since the late 1970s. You would be hard pressed to find any that silently corrupted user data like that, ever, until MySQL came along.

Sqlite is more of an embedded database, but if it silently truncated character data under some conditions that would make it equally unusable. Better to have no type enforcement than defective type enforcement.


I think you misunderstood my point. Yes, relational databases have been around a long time. But in the early days of MySQL, most of the companies using MySQL differed quite a bit from the companies typically using DB2, Sybase, Oracle, etc. I've been using MySQL professionally since 2003 and am speaking from first-hand experience here: the web development world was largely separate from the traditional enterprise world at that time. It was considered a different use case, it's not like today where the biggest companies in the world are tech companies built on open source software.

Regarding "silent", MySQL at least emits a warning when this happens. Still not great / obviously wrong in a modern context, but just for clarity's sake it wasn't completely silent.

As for "defective type enforcement", that's a subjective opinion. In C, atoi("123hello456") returns 123 -- once you hit an invalid character, everything from that point is truncated. Is C atoi defective? Perhaps, but it's an opinion.

With strict mode disabled, four-byte characters worked the same way in MySQL utf8mb3: four-byte characters are invalid in this data type, and everything gets truncated at the first invalid character. Would simply stripping the invalid characters, but keeping valid chars after them, necessarily be more "correct"? No, it's an implementation decision, especially considering that emoji did not exist at the time and barely any non-BMP (4-byte) utf8 characters were used anywhere.


It is certainly true that most users of relational databases before MySQL used them for applications where throwing away data is a potential catastrophe, and most uses of MySQL were for toy applications where if that happened no big deal. It was about a decade before MySQL even supported transactions.

atoi is deficient by modern standards, but at least it clearly documents what the behavior is and always has. It doesn't truncate anything. For a database, truncating data is unforgivable. That is what errors are for, so your database doesn't act as a black hole where your data may silently disappear forever.

And yes, if you can't return an error for some reason, you should preserve as much data as possible. Throwing away hundreds or thousands of characters after an initial anomaly is insane. But a database should never do that anyway, it should either store what you give it, or return an error. Same with a filesystem or any other trustworthy means of data storage. Sorry we were too lazy to tell you we threw your data away isn't acceptable, nor is relying on anyone to check for warnings that should be hard errors.


I would argue yes, that atoi is defective. Stripping characters is equally defective.

A QA walks into a bar and orders "123hello456" beers. The correct answer is not for the barkeep to give them 123 beers, its to ask "I beg your pardon?" So the truncation in general is the problem. If the operation fails - too many bytes, wrong encoding, string too long - the operation should fail. It's not the db's job, that is the business logic's job.


All competent software tokenizes a input string before calling atoi on an extracted piece of it. Otherwise you can't even tell the difference between "0" and an empty string.


Thanks for the info, I've always wondered when SQLite would get type enforcement, lets hope more awesome datatypes gets added over time.

The world is moving towards strict typing, and since this is what my brain is wired for I'm very happy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: