How is MySQL going to store "1890-02-30" as a date? What internal format does it...

bigiain · on Sept 9, 2012

A varchar named something like "DateAsEnscribed". With a related date field that you can search on with some well defined policy about what happens when gravestones have invalid dates enscribed them.

The problem is, you probably don't work out ou need that until you've got a million rows stored in a date column, and when you discover it, you then start asking ourself "I wonder how many of our dates have been auto-magically 'corrected' from accurate-but-invalid enscribed dates into valid-but-not-as-enscribed ones."

Someone · on Sept 9, 2012

I guess it uses a mixed-radix number with radixes 10-10-10-10-12-31 or 10000-12-31 (or, maybe, 10000-13-32 to allow for zero months and days) if the config flag ALLOW_INVALID_DATES (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#...) is set.

I still fail to see why anybody would want that or even the default 'if you cannot figure it out, use 0000-00-00' mode, though. That flag makes a broken system more broken, and if someone wants more flexibility in storing dates, he could always use char(8) or so.

In the context of this article: if you use your database as a dumb store and put all logic in your application, why would you let MySQL decide for you that, e.g., 2000-12-34 becomes 0000-00-00 and not, for instance, 2001-01-03?

lmm · on Sept 9, 2012

>if you use your database as a dumb store and put all logic in your application, why would you let MySQL decide for you that, e.g., 2000-12-34 becomes 0000-00-00 and not, for instance, 2001-01-03?

I'm not letting MySQL decide for me intentionally. My application should be checking my dates; if I ever get as far as attempting to store 2000-12-34 in the database, it's because I made a mistake in my code.

So when live customer data discovers some untested path, what do you want to happen? In my experience in real applications, silently storing "corrupt" data (which I can fix by hand as soon as I discover the bug) is better than throwing an error back to the end user, and those are pretty much the only options.

freshhawk · on Sept 10, 2012

> silently storing "corrupt" data (which I can fix by hand as soon as I discover the bug) is better than throwing an error back to the end user,

I think this statement might be a good test if you want to predict which camp they will fall into.

I frequently store enough data that fixing anything by hand is a large task and my experience with these types of errors is that this silently corrupted data (no need for quotes, that's what corrupted means) is sometimes corrupted in a lossy way, so you can't fix it by hand or in any other way.

Even if you can fix it by hand and it's not lossy I still find the fail fast philosophy is right most of the time, I want an error logged so I get notified and can fix it even if that means that an end user sees an error (there was one after all).

I might be biased having had the experience of exactly this type of mysql error destroying months of data that was the result of very expensive marketing because no one noticed until they tried to analyze it. Mysql was silent and our testing had missed it (if it had thrown an error our testing would have easily found it).

VLM · on Sept 9, 2012

CHAR(10) worst case, or probably something a lot more like rowname.year INT, rowname.month INT, etc. Yes you could do your own homemade date type in that in postgres and your query would look like "SELECT " and then you'd write your own date DBMS routines, but it would be icky. Compare the execution time of "Select from blah order by somedate limit 10" on each design, especially if the DB and webserver are on separate boxes.

It comes down to the fundamental question of who defines bad data, the DEV in his model or the DBA in his table design. Worst case is both, with no coordination, second worst case is both with coordination (wasted effort)

bigiain · on Sept 9, 2012

I suspect, given a large enough sample of gravestones, you find enscriptions like "Christmas Day 1832" or "The last day of Winter 1906". I suspect the argument for keeping the "30-02-1890" data intact would apply equally to my made-up examples. I'd design this with a "date as enscribed" varchar column, and an "linterpreted date for search/sorting purposes" date column.

einhverfr · on Sept 9, 2012

I'd do varchar with a table method and a check constraint. Not hard, not a lot of effort. Still allows for conversion.

A more interesting question becomes what happens when you have to store local calendar values which are non-Gregorian, like '1712-02-30' which was a date that existed in Sweden (due to a rare Gregorian to Julian conversion). PostgreSQL treats all dates as Gregorian and so Julian dates and weird pseudo-Julian dates (the double leap day to abort the failed conversion to the Gregorian calendar) have to be handled by conversion.

This is good and consistent. If you are recording dates and you need to know what date they represented you need a consistent calendar. If you want to convert Gregorian to Julian that can be done. but you'd have to code that no matter what db you are working with.

Otherwise you run into weird issues like determining the length of an interval across two calendars where you may not know that because calendars changed at different times in different countries.

xtdx · on Sept 9, 2012

Wasn't it you who said "I don't want to have to store as a CHAR or VARCHAR and have to write my own date handling routines in my app"? rowname.year and rowname.month sounds a lot like writing your own date handling routines.

einhverfr · on Sept 10, 2012

And I am confused as to why you wouldn't use varchar or char to record, you know, inscribed writings. I mean if it says 1890-03-300 I assume you'd want the extra zero recorded, right?