Dammit, MySQL

drblast · on July 22, 2010

The design philosophy differences between Postgres and MySQL should be a case study.

When I first started using SQL databases, Postgres had a reputation for being a fully ACID compliant database that paid a performance penalty for it, and was hard to use. They made it "correct" as a top priority.

MySQL had a reputation for being blazingly fast, even outperforming expensive commercial DB's in some areas. However, things like foreign keys were missing. There were debates in the MySQL community as to whether foreign keys were even necessary, since you could just do all that stuff in the application layer.

That seemed reasonable to me at the time and MySQL was trivially easy to install at a time when Postgres was a huge pain, so I picked MySQL. I almost immediately got bitten by the lack of foreign keys, and although I didn't know it at the time, spent most of my time coding application logic that was a workaround for MySQL's lack of views. So I switched to Postgres, it solved many of my problems and have tried to use it whenever possible.

Every release the MySQL people get closer to Postgres's features, but there's always something missing, and as of a few years ago at least, all of these features were dependent on what backend table type you were using, which became especially annoying if you didn't have control over the installation (cheap web hosting.)

The funny thing is, MySQL's performance on microbenchmarks really meant nothing in more complex applications. A few years ago I was doing something more than a simple search that involved a few joins and intersections with MySQL; the query optimizer in MySQL seemed determined to pick the slowest possible path to return a result (intersection of a small set of rows with a huge one somehow would cause MySQL to check every row of the large set for a match). It ended up being 50 times faster to implement the logic in PHP and use two queries instead of one.

So I switched to a host with Postgres access, and the identical query that took forever in MySQL was no problem for Postgres.

After a few experiences like that, you begin to see the virtue in doing things right the first time and optimizing later. You also see the virtue in using a database system that takes the onus off of the application developer to implement logic that should be in the DB.

Lewisham · on July 22, 2010

MySQLs lack of views for a long time was just killer. I'm truly surprised MySQL has done as well as it has for as long as it has done. I thought that it's popularity would have tailed off when people realized just how much it was missing (I felt a little bit unwell when I found it didn't have views), but it seems that it just hit critical mass.

drblast · on July 22, 2010

It goes to show you how important a few critical features (buzz, easy installation, performance) can be to early uptake, and it snowballs from there.

From my perspective, there has been absolutely no reason to choose MySQL over Postgres if you want a DB for a long time now. If you just wanted a fast data store with replication and without the ACID features, you probably really wanted Berkeley DB over MySQL.

The reason this is an interesting study is because MySQL has been so popular for so long. They did the right things at the right time to become popular.

And the timing WAS perfect; an easy to use and install database in 1999 when before the .com bubble burst, when so many people (like me) were using a database for the first time and didn't know the first thing about RDBM's.

derefr · on July 22, 2010

I have a feeling that the basic "need" for ORMs was originally driven purely by the unwieldiness of querying a MySQL database. When everything is in the application layer, you have to write abstractions on top of it to keep it comprehensible.

When you have triggers and views, on the other hand, you can just "ask the right questions" of the database, and get your information; the SQL becomes self-documenting, rather than a tangled mess.

pvg · on July 23, 2010

http://en.wikipedia.org/wiki/Object-relational_impedance_mis...

Whatever their merits, ORM layers and the problems they try to address existed long before and quite independently of the particulars and limitations MySQL.

mmt · on July 23, 2010

I've always felt that begs the question, by assuming that a mismatch is somehow fundamental, rather than an implementation artifact.

Just as real RDBMSes don't strictly adhere to the formal math relational model, object-oriented languages vary in their, for lack of a better word, orientation.

Looked at another way, what about a relational-object model instead? Making a useful one for MySQL doesn't seem as plausible, given its chronic feature absence.

gaius · on July 23, 2010

Just as the "need" for sharding is driven by MySQL's lack of a sensible way to do big joins.

gaius · on July 23, 2010

Yeah, I remember this. Every feature MySQL was lacking, they had "good reasons" that you didn't need it anyway. Even transactions, if you needed to roll back, do it in the application. Atomic update across several rows or several tables, why would anyone need that? Foreign keys? Too slow. Triggers? Do it in the application. Stored procs? Do it in the application. And so on and so on.

If you ignore the hype ("OMG! Slashdot uses it!!") MySQL occupies a dead ground between SQLite and Postgres... There's just no reason for it to exist.

stavros · on July 23, 2010

On the contrary, it's great for sabotaging enemies by suggesting they use it.

SkyMarshal · on July 22, 2010

After a few experiences like that, you begin to see the virtue in doing things right the first time and optimizing later.

You know what they say about premature optimization being the mother of all fuckups...

chrisbolt · on July 22, 2010

Another thing that makes Postgres a huge pain compared to MySQL: upgrading. Any major version upgrade requires a full dump and restore of all data.

silentbicycle · on July 23, 2010

Yeah, but where Postgres would nag you to dump and restore, YourSQL respects your time and just truncates your data.

rbranson · on July 22, 2010

http://developer.postgresql.org/pgdocs/postgres/pgupgrade.ht...

samstokes · on July 22, 2010

The last MySQL limitation to bite me was that time columns are truncated to whole-second granularity - no milliseconds or microseconds.

http://bugs.mysql.com/bug.php?id=8523

If you want sub-second granularity, the advice is to store milliseconds-since-epoch as a BIGINT and have your application convert it to a time:

http://feedblog.org/2007/05/26/why-doesnt-mysql-support-mill...

pbh · on July 22, 2010

I had the same problem (and, really, 5 years?), but "solved" it at the time with ISO8601 strings. They sort, they're standard, they support microseconds, they're fairly common in web services. Slower than a reasonable native type though, obviously.

mmt · on July 22, 2010

I only wish I were so lucky as to have been spoiled by Postgres for so long.

My favorite quote from an experience DBA is "MySQL's biggest feature is that it's retarded." Sadly, as someone who actually likes technology and working with it, I can't possibly see that as a feature.

DDL Doesn’t Respect Transactions

My Oracle DBA friend tells me Oracle doesn't, either, not that this is any kind of excuse.

Maybe with the first-class replication in the 9.0 release, it will happen sooner rather than later.

I thought RDS wasn't so great on MySQL replication, either, or is that only if one wishes to do it off-AWS?

briansmith · on July 22, 2010

MS SQL Server and Oracle use CREATE SCHEMA and ALTER SCHEMA for DDL transactions. I think it is the ANSI/ISO standard way to do it. But, also "The CREATE SCHEMA command does not support Oracle extensions to the ANSI CREATE TABLE and CREATE VIEW commands (for example, the STORAGE clause)."

mmt · on July 22, 2010

I'm certainly no expert on standard SQL, but I suspect that's academic.

Since my knowledge of transactional DDL is PostgreSQL[1], which supports effectively everything, with no onerous locking, I now have the question: what's the difference? Was my DBA friend in error, or merely out of date?

[1] first-hand knowledge, anyway. My 3rd-hand knowledge of MSSQL is that it's more or less Sybase, which does fully support DDL inside a transaction.

pradocchia · on July 23, 2010

Yes, MSSQL does support transactional DDL, via locking rather than versioning. Not sure how PostresSQL implements it. Anyone? Row versioning would be more concurrency-friendly, and arguably more "right".

I'd be interested if anyone took advantage of versioned transactional DDL to persist arbitrary data structures on the fly.

mmt · on July 23, 2010

Postgres does it with versioning, as that's how everything is stored on disk.

As I've seen DDL outside of a transaction take much longer than inside one, it may use locking there.

ora600 · on July 22, 2010

Can you link to documentation of ALTER SCHEMA for Oracle that does transactional DDL?

I don't think it exists.

briansmith · on July 23, 2010

You are right. I think ALTER SCHEMA might not be in standard SQL.

julius_geezer · on July 23, 2010

Oracle DDL occurs as transaction per command.

CREATE TABLE A (b number); -- one transaction; INSERT INTO a(b) VALUES(1); CREATE TABLE c (d number); -- another transaction, commits the insert.

samuel · on July 23, 2010

Oracle DDL doesn't respect transactions, but at least you can wrap it into an autonomous transaction so it won't commit your main transaction.

Said that, it's a great feature. It allows you to deploy your stored procs transactionally, instead of the locking nightmare that Oracle's model may result into.

sgt · on July 22, 2010

I see that many of you are discussing PostgreSQL. I have to admit that I'm already using PostgreSQL 9.0beta3 in production, and it's been rock stable so far.

Calamitous · on July 22, 2010

Does Postgres have reliable, tolerably-easy-to-install-and-configure master/slave replication yet? That's the only thing that kept us off postgres last time we were looking, we definitely need a fast, reliable, read-only slave.

stephen · on July 22, 2010

I think what you're looking for is the marque feature of 9.0:

http://developer.postgresql.org/pgdocs/postgres/release-9-0....

Calamitous · on July 22, 2010

Oh, man, sweet. :) Now to get the Ops guys on board...

mmt · on July 22, 2010

Any "Ops guys" who don't find something like Slony tolerably easy to install might not be suited to your environment.

I've found the "feature" that MySQL replication is integrated[1] to a huge problem, especially when faced with messes made by those who installed/implemented it because understanding something more complex (like the realities of database replication) would have been too high a barrier to entry. It also, of course, meant that competing solutions were rare, something that, in the case of MySQL, I do consider a benefit.

[1] I hesitate to use the term "built in," since there's a giant piece missing: initial synchronization. They want me to do what?! A full dump and restore, before I even turn on the replication? That, as well as the "tip" of just copying files over always struck me as amazingly manual.

chrisbolt · on July 22, 2010

The way we do initial synchronization with MyISAM:

* Everything is on LVM

* Run FLUSH TABLES WITH READ LOCK

* Take an LVM snapshot then record binary log filename and position

* Run UNLOCK TABLES

* Mount snapshot, rsync to new server and bring up replication with recorded file and position

Very fast and only locks tables for a short time.

LiveTheDream · on July 23, 2010

Beware of FLUSH TABLES WITH READ LOCK potentially causing an exclusive lock. The percona guys give an explanation: http://www.mysqlperformanceblog.com/2010/04/24/how-fast-is-f...

mmt · on July 23, 2010

Everything is on LVM

An excellent idea, which I, too, have advocated. It is, of course, external and not integrated with MySQL.

sandGorgon · on July 23, 2010

With a live snapshot, doesnt MySQL performance take a massive hit ?

I have seen benchmark numbers from the MySQL Performance Blog people.

donw · on July 23, 2010

I'm an Ops guy by trade, and I would sacrifice non-redundant organs to have worked for more companies that used Postgres.

Of course, my startup does.

prodigal_erik · on July 23, 2010

More than once, our DBAs have come to us asking how to handle discrepancies between masters' and slaves' copies of the same rows. That this is even possible says all you need to know about how reliable MySQL's replication design is. The solution to this problem (2PC) is older than I am.

jasonlotito · on July 23, 2010

That was the big feature missing when I explored using PostgreSQL. At the time, many many years ago, it was black and white. Went with MySQL, and never had any issues. I was happy to see they are finally getting solid replication in 9.

About time. =)

MichaelGG · on July 22, 2010

MySQL: Visual Basic's "On Error Resume Next" for databases.

dlsspy · on July 23, 2010

This didn't mention that check constraints are completely ignored. That one bit me pretty hard the last time I tried to use MySQL.

For whatever reason, app developers kept getting longitude and latitude backwards. I wrote a check constraint that prevented that from being possible (in many cases), so if an app was doing it backwards, the DB would reject it.

I ended up losing data (in dev -- which really meant I lost 45 minutes in having to restore it) in testing this. Then I read the docs. I expect things to do what I tell them too frequently.

kingofspain · on July 22, 2010

Only the auto-updating timestamp thing has really got in my way in the past. Much as I liked Postgres when I used it, I have such easy access to MySQL and I so rarely encounter any problems that I kinda ignore the toy jibes.

Maybe I'm not pushing it enough but if that's the case, why should I jump through hoops to set up something I won't be making good use of?

Lewisham · on July 22, 2010

It depends how valuable your data is to you. For many/most companies, data is incredibly valuable, and as the company grows, it gets more valuable over time.

If you find the usability of MySQL better than Postgres, and you don't put a lot of value on your data, then choosing MySQL is fine. The author is right to point out that a lot of the complaints are really just "this doesn't have to be this way." You should really expect your database to enforce non-null, especially if it lets you specify it and doesn't warn you it's not going to do a damn thing about it.

As I mentioned in my comment above, you can fix buggy applications, but you can't fix incorrect data.

mmt · on July 22, 2010

don't put a lot of value on your data, then choosing MySQL is fine

Funny (and I use the term loosely) you should mention that.

I've just recently witnessed a situation where a BI[1] analyst was brought in around the same time I was and realized that, not only was there likely data missing, but, worse, much of the data was wrong. Said analyst has been spending a remarkable amount of effort merely convincing everyone else that the reporting logic/infrastructure wasn't broken so much as the data.

I'm not convinced MySQL itself is to blame, but it's Awfully Suspicious, considering the use of MyISAM and a kludgy, if not exotic, replication setup.

[1] Business Intelligence, aka reporting. I'm not sure how common the term is. Presumably a small company that hires someone for this role considers the data to be extremely valuable.

ars · on July 23, 2010

> You should really expect your database to enforce non-null

It does. The author of the article was wrong.

gxti · on July 23, 2010

You're technically correct (the best kind of correct!) -- it's not inserting NULL. But it is inserting a value it pulled out of its ass, which I posit is actually worse. For text types it uses the empty string, for numbers it uses 0, etc. It seems to flag this as a "warning" but said warning does not appear in my terminal or in the mysqld logs, so I have no idea what it says.

ars · on July 23, 2010

Type "show warnings" to see them.

I really don't understand why people have such a problem with this (and boy oh boy am I loosing a lot of karma for this opinion).

It's not like it forces this on you. First you can turn it off, and second it only happens if you leave out the columns from the insert, don't leave out the columns and you have nothing to worry about.

I personally find it quite useful. I set column defaults as needed for real data, but when testing I let MySQL put in empty values - it saves having to type each and every column that I don't care about.

gfodor · on July 23, 2010

When you add "NOT NULL" to a schema, you should be thinking: "make sure the user specifies a value for this column" not "this column should be defaulted to zero." This is insane.

If you want to specify a default, well, specify a default.

kingofspain · on July 22, 2010

Oh of course and this wass from a purely personal POV. Without trying to sound too flippant, my data can go to hell without me losing too much sleep - even if it never has in 10 or so years using MySQL. I keep decent backups and I know what behaviour to expect so personally, I'm fine with it. It's one of those 'educated' leaps of faith I take every day.

If my data integrity was so very important, I'd pass the buck to someone who knew their beans and they could find & maintain me a safe db. I'm a competent amateur (hence MySQL), not a DBA :)

JoachimSchipper · on July 22, 2010

Is this still true when considering silent corruption, instead of catastrophic failure? The latter is much easier to recover from...

kingofspain · on July 22, 2010

Like I said, I haven't experienced any problems. I guess when I do then maybe I'll curse the day I foolishly told the world I was happy with mysql :)

cosmok · on July 22, 2010

I wrote about how I ran into his complaint #1 here:http://www.trk7.com/blog/programming/innodb-mysql-implicit-a... . MySQL is quite annoying but, it seems to do the job for most of the projects that I have worked on.

ww520 · on July 23, 2010

The only feature that tipped in MySQL's favor was easy replication. Now that Postgres has it too, there's no more reason to use it.

c00p3r · on July 23, 2010

MySQL is the same thing as PHP - illusion of simplicity, and as a result - vast and active community and availability in any distro and on all cheapest hosting. While quality of the code and design decisions were seconded.

Hype and community - that is why it was sold for a billion dollars.

naturalethic · on July 23, 2010

Postgres has no unsigned 64 bit integer. :(

anthonyb · on July 23, 2010

No, but you can define your own:

http://bytes.com/topic/postgresql/answers/423780-numeric-typ...

Also note the issues with MySQL and it's large integers :) Although they might have fixed things since that post.

naturalethic · on July 23, 2010

Thanks a ton for this.

ergo98 · on July 22, 2010

What's the beef with the timestamp? I would expect, by default, that a timestamp reflects the last change to the row. Most systems use timestamps for optimistic locking/replication purposes.

mbreese · on July 22, 2010

Um... no. Shouldn't a timestamp reflect a specific point in time? And it shouldn't change unless there is an ON UPDATE condition set. That would be the correct thing to do according to the principle of least surprise. You could then set a column to auto-update, but it shouldn't be the default action.

I'm fairly shocked that I haven't been hit with this "feature". Thankfully, I usually set a DEFAULT on timestamp columns.

randallsquared · on July 22, 2010

In MySQL (and SQL Server, apparently), it reflects the specific point in time when this row was last updated, and neither database will let you have more than one such field in a table. The data type everyone here seems to want is "datetime", which behaves like you want a "timestamp" to behave.

JoachimSchipper · on July 22, 2010

> neither [MySQL nor SQL server] ... will let you have more than one [timestamp] field in a table.

Wait, what?

gxti · on July 23, 2010

There can be only one because it exists solely to be the "last updated" field. There's no point in having more than one. In MySQL, it simply only updates the first -- so you get a sometimes magic field.

JoachimSchipper · on July 23, 2010

I would think it useful to have 'created' and 'updated' fields, for instance. And silently updating only the first...

randallsquared · on July 23, 2010

But you can have this: the "created" field is a datetime field, and the "updated" field would be "timestamp". This is far less problematic, in my opinion, than things like requiring a table with exactly one row to do simple selects, or pretending the empty string and null are the same (both of these behaviors are Oracle, by the way, not postgresql).

brown9-2 · on July 23, 2010

The problem here is that one word ("timestamp") is used to mean two completely different things:

1) a datatype (date with time)

2) a pseudo-version number for concurrency control

Both are equally as valid, and depending on your background, when you see the word "timestamp" with zero context you may automatically refer in your head to one definition or another.

I don't see what the problem here is if you expect #1 but got #2 when you used MySQL. This seems like the type of problem that bites you the very first time you encounter it, and then you read the documentation and say "Oh I understand, 'timestamp' has a different definition here, what I really need is the 'foo' datatype. Isn't that it interesting that one word can have more than one meaning!"

prodigal_erik · on July 23, 2010

ISO standards define the meaning of "timestamp" in SQL. MySQL actually implements the standard type, but gives it the wrong name! They should have used some other name for the bizzare mishmash of data type and trigger they also invented.

xzilla · on July 25, 2010

"Both are equally valid", except that the former (a data type) is more equally valid than the latter (pseudo-version), since the SQL Standard defines it that way.

ergo98 · on July 22, 2010

I primarily use the timestamp data type in SQL Server. Which is a data type that isn't actually a time (or date) at all, but instead is a incrementing database-wide value that is automatically updated on every row update.

Lewisham · on July 22, 2010

I often use timestamps so I know when a record was created. If I want it to reflect the last change to a row, I would set it to have a trigger on a "last_updated" column.

I haven't been hit with this bug either, but this was by accident (or design in the GUI interface I was using). I find it quite shocking.

jbooth · on July 22, 2010

Every one of his complaints is a practice I've constantly yelled at everyone around me not to use -- for my whole career.

Application logic belongs in the application. Don't rely on a database to do it for you, or you're going to wind up with a giant ball of mud driven by side-effects.

Lewisham · on July 22, 2010

I don't see anywhere where he is calling for application logic. He's making sure that the data has integrity, and the database absolutely should be in charge of that. If your database can't enforce integrity, then you're in a very, very bad place.

Putting in sane integrity checks to a database should be a priority. You can fix broken applications, but you can't fix broken data (especially if you collected it from users, sensors).

masklinn · on July 22, 2010

There was nothing about application logic in the post. Quite the opposite in fact (transactional DDL and deferred constraints are things which have no reason to be handled in the applicative layer). It was only about the internal consistency of the data.

kingkilr · on July 22, 2010

Don't rely on the database to not put null in a column marked not nulls? That's like saying don't rely on C to not put a struct in a char*. It's got nothing to do with application logic.

ars · on July 22, 2010

He's wrong about that BTW.

It doesn't put a NULL there, it puts zero for a number field, or a blank (0 length) string for a string field. Date fields also get 0 (00-00-0000).

Edit: I get a downmod for this? The group think here is amazing. I suppose if I bash MySQL I'll get some upmods even if the bashing is incorrect?

dagheti · on July 22, 2010

Inserting a default value is even worse than inserting NULL. Now you can't tell what data was wrongly inserted if you tried to clean up the mess later.

kls · on July 23, 2010

Correct, NULL is NULL, if the system is designed that it has to have a default value it should at least scream at you "I can't do NULL, you are going to get a 0 here". Or better yet just error at design time, not run time, and say sorry put a value here I don't do NULL. Run-time errors are the worst kind of errors and it is little unexpected behaviors like that, that can create huge problem in a run time environment.

ars · on July 22, 2010

What?

Have you ever programmed SQL? Because what you wrote doesn't make sense.

No data is inserted wrongly - you are simply leaving out a field. There is no mess. Just an unused field.

Are you thinking it's like csv where if you leave out a column all the others are shifted? It's not like that.

The standard says if you leave out a column that does not have a default the SQL should return an error. Instead MySQL puts in a default (but only if you tell it too in the configuration). Putting a NULL in a non-NULL field would be much worse.

dagheti · on July 22, 2010

Oh I agree they are both very bad. Putting NULL in a non-NULL field is totally wrong.

However I really do think that inserting an unexpected default value is worse than inserting NULL into a NON-NULL field. The NULLs will cause problems, but they are problems you can see and resolve.

The default values are silent errors that will corrupt your data and be very difficult to recover from in the future. You can only guess which data was wrongly inserted.

ars · on July 22, 2010

I still don't see how it's possible to insert wrong data, or corrupt it??

There IS no data. How do you corrupt something that doesn't exist?

And NULL doesn't help either. NULL is valid data, NULL is not a replacement for programming errors (which is what this is).

This argument is pointless. People love to bash on MySQL, they look for the silliest things. The more popular something is the more people bash on it.

I understand that, but at least bash on real problems? Like the transaction DDL - that's a real problem. This? This is nonsense. (It's actually a very useful - and optional - feature BTW.)

Lewisham · on July 23, 2010

I'm not sure you understand the use of NULL.

NULL is not "default value" or "I don't care", NULL signifies "this might have a value, I just don't know what it is".

There is a very significant difference between a payroll record which states your pay is "0" vs. NULL. If the database is putting in default values, you have no way of knowing whether the employee really did have a salary, but it was incorrectly inserted as NULL, or whether the employee is unpaid.

ars · on July 23, 2010

I understand NULL very well, and that's not the only use for a NULL.

NULL also means "value does not exist", not just "value is unknown". For example if a student is not in a class, put NULL in the class id.

NULL is perfectly valid data, and is not a replacement for a programming bug.

And with mysql if your salary field is defined as accepting NULL then you will get a NULL in there.

And to use your example if the field accepts NULL, you would also have no way of knowing if the salary was not negotiated vs a programming bug.

If you want to argue the insert should fail, then fine, no problem. (And MySQL can do that.)

But arguing that putting in NULL is better (in a field that does not accept NULL), is simply wrong. I'll say it again: NULL is not a replacement for a programming bug - NULL is valid data, and should not be used to find programming errors.

anthonyb · on July 23, 2010

> For example if a student is not in a class, put NULL in the class id.

I think you mean "don't insert a row in the student_class table, which is a many-to-many join between student and class".

As a general rule of thumb, if your data schema requires NULLs for things like that, then your schema is wrong, for most of the reasons that people are trying to point out. NULLs are the absence of data, and should really only be used for exceptional circumstances - hence the reason that silently inserting NULLs into NOT NULL fields is a Bad Thing(tm).

AlisdairO · on July 23, 2010

Pretty sure he's arguing that it should be an error.

ora600 · on July 22, 2010

In any database comes a time where a developer or administrator will have to make manual changes to the data through an ad-hoc script. It can be straight sql or perl/php/ruby thing.

The easier you'll make it on him not to cause horrible corruptions in the data by forgetting to update that additional table that depends on whatever he is updating - the better life will be.

Triggers, foreign keys and constraints are all excellent ways to do just that.

dasil003 · on July 22, 2010

What if you have multiple applications talking to the same database?

derefr · on July 22, 2010

I think, in 2010, a "cloud-based web platform with a RESTful API" is 1990's "database." We don't let multiple applications talk to a single database; we stick a web server in front of the database and make them talk to that. If all the web API is doing is allowing CRUD on your data (with validation and some minimal triggered interactions) then, in effect, you're just creating your own, completely custom DBMS by doing this.

dasil003 · on July 22, 2010

Yes, and this has a lot of advantages. One advantage is openness of APIs for mashups etc. But the big advantage driving the NoSQL/custom DB is of course scalability and performance. Incidentally performance was also a key consideration 40 years ago when everyone rolled their own custom data stores. The modern SQL database evolved as a response to all the problems that created, which by now are long forgotten.

My point is not to throw the baby out with the bathwater. If you need massive scale then of course you need fresh approaches, but you are sacrificing a lot by abandoning a centralized DB.

If your data is moderately sized, then you will be trading a lot of data integrity, queryability and flexibility by giving up an SQL databases. Certainly consumer web services of the type that are en vogue in silicon valley need new approaches. However the majority of applications out there are probably still best served primarily by a relational DB, especially when you consider the value of a byte of corporate data vs the value of a byte of facebook data.

mmt · on July 22, 2010

in effect, you're just creating your own, completely custom DBMS by doing this.

I've argued for years that this is what the "sharding" hype (and to some degree the current NoSQL hype) was mostly about.

I'm not suggesting a custom DBMS can't possibly be the right answer, but it seems silly to layer it on top of something as heavyweight as a relational database.

randallsquared · on July 22, 2010

If you don't use relational databases, you're throwing away decades of work into a problem that's very similar to the one you're trying to solve. If you have a brilliant new scheme that forces you to do that, fine, but throwing away finely tuned systems because complex and relatively opaque might not be the best first step.

mmt · on July 23, 2010

Exactly. I would advocate not throwing any of it away, absent rare circumstances.

I implied, though foolishly didn't state outright, that I believe a custom DBMS is only very rarely the best answer.

tomjen3 · on July 22, 2010

I made the same argument to my database teacher. He pointed out that isn't going to do you much good when you have more than one application talking to your database, since you would then have to write the code to validate it in each application.

ryanpetrich · on July 22, 2010

Better yet, require that all applications call through some common API (such as a library or web service) to access the data.

electrum · on July 23, 2010

Why not put that API inside the database (i.e. stored procedures and views)?

ora600 · on July 23, 2010

You can require this, but its tricky to enforce. Especially on DBAs.

mmaunder · on July 22, 2010

You're creating the wrong impression here by calling MySQL a toy and citing a handful of user preference issues you have with arcane features.

http://www.mysql.com/customers/

MySQL is used by the largest and most successful companies in the world. It is incredibly performant and stable in the most brutal environments.

Who on earth expects transactions to be available when you're creating a schema? "drop table if exists" is a fine way to back-out if your deployment dies.

Deferred key constraints? You should be spanked for using those in the first place.

gfodor · on July 23, 2010

This is a weak argument. Deferred key constraints and the other intra-transactional feature mentioned are hardly bad practice, since you're relying upon the database to cover your ass. Go figure, that's what a database is supposed to do. You should be free to do anything in a transaction you wish, it is, after all, not committed until you say so.

The appeal to authority doesn't add any weight to the argument.

kls · on July 23, 2010

MySQL is used by the largest and most successful companies in the world. It is incredibly performant and stable in the most brutal environments.

I hate that line, you know what most of these companies have in common? They where started by guys with only a few years in the field and a vision. Some of them not the best developers in the world, just guys with a great idea for a product and a passion. The other thing they have in common, is that they have spent millions if not more on rolling their own solutions to compensate for a technology decision they made a long time ago in a garage.

This is the way it works, the new guys come in, they say well company X uses this, they build something on it because X is using it and they hit growing pains at a certain point.

Twitter is experiencing those growing pains now, fortunately they seem to be throwing out the bad and trying new things. A good deal of companies become dogmatic about their choice and given their success just throw money at it to build compensating workarounds.

MySQL was not chosen by those companies because of its merits, it was chosen because those companies started in a garage, and the technical selection was "what are the other guys using in their garage".

alexsolo · on July 23, 2010

Or just because MySQL is really to get started with when using Rails and other frameworks, and that when you're starting a business, you think "if I actually ever run into MySQL scaling issues, that's a good problem to have".

blueberry · on July 23, 2010

Name one database that won't have scaling issues for sure. Scaling DB-wise is not only a good problem, it is inevitable if you constantly grow.

mmt · on July 23, 2010

Name one database that won't have scaling issues for sure

You are, in effect, asking someone to prove a negative.

I don't think there's any contention that scaling will always be an issue, but, rather, where along the scale curve debilitating issues occur and how painful the resolutions to those issues are.

For example, with MySQL, without sacrificing ACID properties[1], one cannot even separate a table's indexes onto a separate spindle from its data. For rotating storage, this make creating a new index astonishingly painful. Even with SSDs, more sequential/bulk I/O combined with eliminating contention for the paths to the drive, there can be a performance improvement.

Then there's query optimization. Is there even anything remotely close in MySQL to the EXPLAIN ANALYZE functionality in PostgreSQL?

I posit that scaling issues with MySQL come early and, once they do, require sweeping and expensive changes to address. One can't merely exercise a little cleverness and cheap hardware to tide one over for another factor of 2-4 growth.

[1] i.e. InnoDB, not MyISAM

llaxsll · on July 23, 2010

It's an ok environment. There is a lot of tooling around it which is to its advantage.

Once you start doing something more advanced, though, you start seeing its flaws. I've had pretty miserable time with its query optimizer and sprocs.