Hacker News new | past | comments | ask | show | jobs | submit login

I think you misunderstood my point. Yes, relational databases have been around a long time. But in the early days of MySQL, most of the companies using MySQL differed quite a bit from the companies typically using DB2, Sybase, Oracle, etc. I've been using MySQL professionally since 2003 and am speaking from first-hand experience here: the web development world was largely separate from the traditional enterprise world at that time. It was considered a different use case, it's not like today where the biggest companies in the world are tech companies built on open source software.

Regarding "silent", MySQL at least emits a warning when this happens. Still not great / obviously wrong in a modern context, but just for clarity's sake it wasn't completely silent.

As for "defective type enforcement", that's a subjective opinion. In C, atoi("123hello456") returns 123 -- once you hit an invalid character, everything from that point is truncated. Is C atoi defective? Perhaps, but it's an opinion.

With strict mode disabled, four-byte characters worked the same way in MySQL utf8mb3: four-byte characters are invalid in this data type, and everything gets truncated at the first invalid character. Would simply stripping the invalid characters, but keeping valid chars after them, necessarily be more "correct"? No, it's an implementation decision, especially considering that emoji did not exist at the time and barely any non-BMP (4-byte) utf8 characters were used anywhere.




It is certainly true that most users of relational databases before MySQL used them for applications where throwing away data is a potential catastrophe, and most uses of MySQL were for toy applications where if that happened no big deal. It was about a decade before MySQL even supported transactions.

atoi is deficient by modern standards, but at least it clearly documents what the behavior is and always has. It doesn't truncate anything. For a database, truncating data is unforgivable. That is what errors are for, so your database doesn't act as a black hole where your data may silently disappear forever.

And yes, if you can't return an error for some reason, you should preserve as much data as possible. Throwing away hundreds or thousands of characters after an initial anomaly is insane. But a database should never do that anyway, it should either store what you give it, or return an error. Same with a filesystem or any other trustworthy means of data storage. Sorry we were too lazy to tell you we threw your data away isn't acceptable, nor is relying on anyone to check for warnings that should be hard errors.


I would argue yes, that atoi is defective. Stripping characters is equally defective.

A QA walks into a bar and orders "123hello456" beers. The correct answer is not for the barkeep to give them 123 beers, its to ask "I beg your pardon?" So the truncation in general is the problem. If the operation fails - too many bytes, wrong encoding, string too long - the operation should fail. It's not the db's job, that is the business logic's job.


All competent software tokenizes a input string before calling atoi on an extracted piece of it. Otherwise you can't even tell the difference between "0" and an empty string.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: