There are other reasons to choose NoSQL For example, when Craigslist was using m...

zzzeek · on Feb 23, 2012

Doing data migrations up front in a NoSQL system means you've changed the fields around in your document types, then you're done. Now 95% of your documents are wrong, missing those changes. Maybe you have linkages between different types of document (akin to a foreign key) - now a lot of those might be pointing to nothing.

In this model, it's the application's job to anticipate and work around these inconsistencies. Assertions that check for data integrity, if you have them, have to be modified to work around this. For a lot of the web applications we talk about these days, who cares - it's people's lists of friends and TODO notes. It's simple data and some dangling records aren't going to hurt anyone.

In the SQL world, we instead write migration scripts that migrate the structure and data all at once. This is a little more work up front, but as long as you stuck to a mostly normalized form and use great tools (which we now have) this is not a big deal ("MONTHS" to migrate across slaves sounds like they had some very wrong decisions made earlier on). The application can move straight to supporting the new structure and entirely forget that the old one existed. In this world, we can also have really complex relationships between fields, like when we're storing accounting or medical data linked to temporal records of changes. The application can consume this structure without worrying about consistency.

einhverfr · on Feb 23, 2012

I think the problem is that when you change a table structure in MySQL, it takes quite a bit of time to do the actual writes. Adding a column tends to force a rewrite of the table. Braindead I know....

I think the problem is that MySQL isn't really a standard SQL-world db. It has some of the advantages of one, but not all of them and some very annoying gotchas.

matc · on Feb 24, 2012

> The application can move straight to supporting the new structure and entirely forget that the old one existed.

Not always. Changing the schema can break an application, in particular when the database supports multiple applications.

> It's simple data and some dangling records aren't going to hurt anyone.

It depends on the application. In healthcare people literally die due to some dangling inconsistent records.

The problem in anticipating and working around these inconsistencies is that the workaround is added ad-hoc in code, rather than through a model defining the data change. You need a model http://chronicdb.com/preparing_schema_changes

newhouseb · on Feb 23, 2012

PostgreSQL can do the majority of migrations you would need to make instantaneously (adding columns that can be null, adding/dropping indexes etc). MySQL suffers roughly O(rows) time when changing the schema, but with PostgreSQL now at feature parity for most everything else there's little reason to go NoSQL over something like Postgres if you are worried about schema changes. With respect to fault tolerance, this is only a problem if your dataset outgrows a single master (pretty rare) and you haven't figured out a proper way to shard (most places shard on users, it's pretty hard to believe anything is un-shardable).

jcromartie · on Feb 23, 2012

I don't want to presume that I'm smarter than Craigslist's engineers, but why is this so hard? Schema changes should be captured in migrations, which are programs, and could be coordinated to run in parallel on thousands of machines in a moment with a shell script over SSH.

matc · on Feb 24, 2012

The fact that ALTER TABLE locks on most databases is not the primary reason changing a schema is hard. The primary reason is that a schema change breaks the application [1].

Migrations are programs, but your existing live application is also a program. Changing the schema breaks your live application to prepare for the new version of the application.

[1] - http://chronicdb.com/blogs/change_is_not_the_enemy

jeltz · on Feb 24, 2012

This is true for NoSQL too. If you change the format of your documents you applications might need to be modified to support both the old and the new format at the same time.

huggyface · on Feb 23, 2012

MySQL had some historic issues with schema changes -- adding a column to a table with a few million records could take hours. While it does have a lot of work to do -- it is restructuring every page of the database -- in competitive products such a change takes a minute fraction of the time (limited only by IO performance).

gizzlon · on Feb 23, 2012

But altering a table would still block it? And possible even other tables as well? And take a long time?

Only ever done this in MySQL, where you really can't alter big, live, tables. Well, facebook made an utility for it: http://www.facebook.com/notes/mysql-at-facebook/online-schem...

jeltz · on Feb 24, 2012

Most changes would still block, yes, but at least in PostgreSQL many changes will only block a short time (since they do not require a full table rewrite, only altering the table descirption) and in my experience rewriting a table is generally quick for a decent sized database (tables with a handful of millions of rows).

jcromartie · on Feb 23, 2012

I see. That makes much more sense. Yes, I suppose I'm used to DB servers that are more nimble in that department.

kfool · on Feb 24, 2012

NoSQL does not support flexibility of schema change.

http://chronicdb.com/blogs/nosql_is_technologically_inferior....

Flexibility of schema definition and flexibility of schema change are two different things. Defining schemas only involves data. But changing schemas involves not just data, but code too.

huggyface · on Feb 23, 2012

There are other reasons to choose NoSQL For example, when Craigslist was using mySql and they had to change their schema, it took MONTHS to facilitate the change across all their slaves.

That isn't a reason to choose NoSQL. That's a reason to-

-doubt MySQL -- many of the purported downsides of SQL solutions (for instance the ridiculous "avoid joins" meme that has zero bearing on any good database product) are actually MySQL problems. Or rather, they were -- the product has made some pretty incredible strides.

-understand and embrace normalization. Most of the "we keep trying to change our schema and SQL is just so restrictive" stories could often be described as "the problems with denormalization".