There is a lot in this article that shows the author has very little RDBMS knowl...

velly · on Feb 3, 2011

From the author:

First off, I apologize for my thoughts being all over the place at 3AM in the morning ;)

I've written several transactional systems using Oracle and DB2 in the financial industry in a former life. The trappings of SQL are directly proportional to it's power. As demonstrated in mysql-sr-lib they can be packaged neatly to great effect however. Also I was talking about transactional operations on datasets.

Lastly, on slowness of SQL. I just meant that query plans cost and are not free. There have been several experiments of late of the amazing performance you can get by working with InnoDB directly instead of going through MySQL for example. Basho is just one of the players that has done work in the space. There is also hook that can be directly embedded in MySQL to bypass the standard workflow and interact directly with the datastore.

My main point in all this is that it will take some time to reach a 1.0 version of a viable, modern persistence solution in Redis. Time would be better spent now at least evaluating other open source persistent engines before diving head first into a very shallow pool. The main point is that most of the current persistence tools have been battle tested in a number of environments, antirez should leverage that considering the small size of his team; just two people.

xd · on Feb 3, 2011

Thanks for the article. It's always good to read the open opinions of others and freely debate them .. sort of like the scientific community :D

I agree on your point about query plans; they do cost a hell of a lot. This is one of the reasons why we use caching solutions such as Redis and Memcached on top of a well planned out relational databases that we can poke and prod with SQL.

What's the hook for MySQL you are talking about?

velly · on Feb 3, 2011

Bret Taylor of FriendFeed/Facebook fame implemented a scalable system schema free system on top of MySQL where he could attach thin tables arbitrarily to JSON collections as the index. When migrations happened the collections would be scanned and the tables rebuilt. Digg has done similar work, but no open source project ever made the light of day. This schema free system powered most of the site (one true datastore).

After cracking open Redis and then looking at mysql-sr-lib it brought that FriendFeed idea back full circle. Basically I think mysql-sr-lib 2.0 could be a 1:1 implementation of Redis functions implemented as MySQL stored procedures. The only adaptation I would make is that functions would be namespaced (a table per namespace) so that collections could be made. It would be up to the client to decide how to take advantage of namespacing. Do use it for functional collections or sharding for example? Clients would also have to use a distribution strategy to map keys to a namespace/collection. The idea is a lot clearer in my mind now thanks to making the mysql-sr-lib discovery.

Joakal · on Feb 3, 2011

[MYSQL] SPs are a pain to implement, encapsulate and maintain. Looking at other RDBMS, SPs don't appear to be very portable and doesn't seem to offer speed advantages unless you only use simple queries.

It seems better to write SQL and make use of the DB drivers for me. Why would you consider SPs?

eftpotrm · on Feb 3, 2011

As a fan of SPs:

They offer a clean and easy (IMHO) way of making sure I've got the exact same code for common jobs encapsulated and tracked within the database. One routine to add a user (for example) which is common to all client interfaces, handles all the exceptions in exactly the same way, tracks all the security in exactly the same way. I've had my share of problems where you get an odd bug because the app has subtly different SQL for the same job in different places, which just don't happen with stored procs.

In theory you can get this benefit from any common code repository and indeed, where SPs haven't been available, I have written code with a common SQL cache held outside the database. SPs have the advantages though of being incredibly easy to update - no need for a build and deploy, you can correct the code on the live server without disturbing it almost immediately if you need - and of doing so in a way that is then fully trackable by the database itself. I can easily query the information schema to find all procedures that reference a certain object that haven't been updated in the last week, for example.

They've also got security benefits. It's common practice in some areas to lock the tables down completely and restrict access to purely through SPs by permission, which can offer major benefits because you've got far greater control over what user accounts can do what to your database.

They're not perfect and I agree they're not portable (though in practice I've needed that very, very rarely) but I find the benefits hugely outweigh the downsides and will pull a face if told not to use them because it's simpler just to treat the database like a slightly funny version of Excel (or words to that effect, which I have had and don't get me started...!)

sokoloff · on Feb 3, 2011

One application is to allow very fast patching. If we need to patch an issue and our query is in code that will take N minutes to update, perhaps compile, and push out to every web server, it can be a tiny fraction of the time if all you had to do was update the SP in the DB server(s).

I can do an alter proc with literally seconds of overhead (beyond the obviously required time to actually change the SQL). If it doesn't work, or makes things worse, I'm seconds away from reverting or taking a second bite at the apple. If your SQL is in the app on your web servers, most places running at any sort of scale can't match that.

(as a result of being heavily [99.8%+] SP based, we've had to develop some compensating technology to allow us to do releases without downtime when each release needs its own specific SPs. Solved, but took work. We're also predominantly MS-SQL with limited mySQL, but I doubt that changes much.)

SkyMarshal · on Feb 3, 2011

If it's not a proprietary secret, can you elaborate on the compensating technology you guys developed?

I prefer encapsulating database logic in SPs as well, and am always interested in learning how others have solved some of the problems incurred with them.

sokoloff · on Feb 3, 2011

I've written about it briefly before on HN, and it's very beneficial but not proprietary tech. Basically we create a "shim" database containing our app's sprocs, views and functions, and additionally views to the transactional database tables in another database.

That means that release N can be pointed at the "real" database, while release N+1 can be pointed at the shim DB and they're both using the same transactional data. You can run both in parallel until you commit to N+1, whereupon you shutdown all N app servers, update the sprocs, views and functions in the main DB, and (optionally) point your web servers back at the main DB. There are a few other details to take care of, but that's the gist of it, and the details are minor, or at least we found them to be.

Disclaimer: the above is my experience on MS-SQL at a pretty good sized eCommerce site. Other RDBMSs may not work quite as well, or your app may use heavily features that don't work in "shim mode".

SkyMarshal · on Feb 3, 2011

Ah, very cool, thanks.

br1 · on Feb 3, 2011

"Using BerkleyDB or Innostore could remove the slowness and pain of interacting with the data through SQL."

I think the author was referring to building redis on top of SQLite, not talking SQL from an application.

xd · on Feb 3, 2011

OK, I can accept that. But would it not remove instead of "could remove" the SQL layer?

The author(s) just doesn't seem to have a sense of understanding about the subject matter. It appears they are just throwing buzz words and jargon into sentences in the hope that some kind of substance ensues.

I'm sure the author(s) have good intentions and it all makes sense to them, but it is very misleading information at the end of the day. And to quote their About page "Do you love high quality, trusted, and authoritative news" ..

salsakran · on Feb 3, 2011

As far as "Atomic Datasets", he's referring to the ability to send off an atomic operation against a data structure. Yes, you can fake this by normalizing and storing it in a table (or any number of other permutations), but for counters, the ability to fire off an atomic increment is very useful.

SkyMarshal · on Feb 3, 2011

Your 'atomic operation' sounds a lot like a transaction. Or am I misunderstanding?

salsakran · on Feb 3, 2011

Not really.

If you have a bunch of counters, you don't want a hot counter to require a transaction every time you want to increment a page view. I'm sure there's a solution in high end DBs, but in mysql/postgres UPDATE somecounter SET value = value + 1 typically ends up being very painful for any significant write load. With Redis and Mongo, there are "atomic" increment operations that do a +X on a given field of a list or hash that is very fast.

salsakran · on Feb 4, 2011

see twitter's Rainbird for what a production scale solution looks like. 100k writes per second, with a huge skew in which rows get hit.

arnorhs · on Feb 3, 2011

It would also be interesting to find out what kind of logic is being performed in those stored procedures. It sounds like somebody's relying too much on them for logic (besides data-logic).