I'll be honest (and chances are I'll get a great explanation as to why I'm wrong...

Maro · on Aug 5, 2009

I wrote a key-value store['] and now I'm building a service which handles lots of data/requests that uses it. My short conclusion: KV stores are to (My)SQL what Assembly is to C++[''].

As in, when you design a relational DB by the book, you first create a normalized schema with indexes, and subsequently issue SELECTs with WHEREs and JOINs and GROUP BYs and ORDER BYs and LIMITs against them in a declarative manner.

With a KV store you basically have direct access to the btree layer, so you have fine control over what's going on. Just as with Assembly, it is much harder to organize your data/program; most people don't need this kind of low-level access, and will shoot themselves in the leg given the possibility. Currently it only makes sense to use a KV store if you need this kind of low level access for performance/scalability reasons (or you want really strong replication which only exists in Keyspace, which is a KV store).

Hardcore SQL admins will say that one can optimize the RDBMS , and they're right; this is the usual trade-off when your high-level abstraction's default behaviour is too slow and you start to break open the abstraction to tune the underlying layer: in my opinion, it makes sense to just abondon the higher-level abstraction at this point --- but this is a matter of taste since you're loosing many other conveniencies along the way.

The other use-case is sharding. With a KV store it's much more natural to store parts of your tree on different servers and just issue GETs and SETs over the network. This transparency is one of the things you gain if you go down a level in the layers of abstraction.

['] http://scalien.com/keyspace

[''] Prolog really

qhoxie · on Aug 5, 2009

Am I wrong in finding a happy medium between multiple technologies?

No, you are correct. The main source of contention lies in people trying to use a particular technology outside of its comfort zone. Many in the NoSQL crowd suggest that this is a common symptom of SQL installations. Unfortunately, a good portion of the arguments that address this point are hyperbolic and seem to suggest that SQL is rarely the right tool. This is what keeps degrading the signal to noise ratio.

bjclark · on Aug 5, 2009

To elaborate on qhoxie's statement, NoSQL is great for some things, but there is this idea out there that it's a panacea for scaling your webapp. And while all the things highlighted in the article are performant, few of them even scale, and none of them are perfect.

bsaunder · on Aug 5, 2009

I think you are missing another fundamental difference. The NoSQL solutions are more like a dynamically typed language where as the YeSQL ones are statically typed.

In NoSQL, I don't have to tell the database layer what I'm stuffing into the value (or even how I'm building the keys). Of course the application is taking on a lot of responsibility that previously belonged in the persistence layer.

I think this is the true draw to the NoSQL philosophy. Especially when you consider the apparent (to me) rise in meta-programming.

silentbicycle · on Aug 5, 2009

On the other hand, it also means that if you want the database to check any structural constraints on your data, such as that any comments posted must be associated with a valid user, you're stuck writing it in PHP, Ruby, etc., rather than using functionality in a database that has likely already had man-centuries of optimization and debugging. Oh, and every check means sending the entire record (or table!) over the network, unpacking, probably instantiating objects, and iterating over it in PHP/Ruby/Python/etc., repacking it (with stored statements, if you're lucky), and then sending it back. That's a lot of extra processing that RDBMSs are already designed to handle internally.

Static typing is a good analogy, but the main purpose of static typing is expressing the semantics of data structures in a way that the system itself can reason with them. (OCaml or Haskell's type systems are much better examples of static typing's utility than Java's or C's, FWIW.) If you're still prototyping and/or the data structures are still in flux, then something completely dynamic will avoid a lot of extra work, but once things settle down, having the database itself be smarter about processing the data is worth consideration. (Of course, if your entire data set fits in a Python dictionary, then using a real database is overkill anyway.)

bjclark · on Aug 5, 2009

I don't know that being schemaless is fundamentally different. Instead of thinking that key/value stores are schemaless, I think of them as having a single rigid schema: key and value. The data you stick into a k/v db will still have essentially, a schema, you just end up writing your application to be flexible to data inconsistency.

The comparison to dynamic vs static typing doesn't hold much water.

Can you explain how meta-programming relates to this topic at all?

silentbicycle · on Aug 5, 2009

Metaprogramming: I think he/she is associating schema-less DBs with dynamically typed languages and metaprogramming, and static typing with languages such as C that tend to lack sophisticated metaprogramming facilities.

I'm not sure it's a valid comparison, though: OCaml has a statically typed Lisp-style macro engine (camlp4), for example. (In all honesty, though, I've never used it. Lazy evaluation, the packaging system, and other language features cover many of the same use cases.) I think it's a case of assuming the C family's type system is the cutting edge of static typing, when it's actually pretty archaic.

The point about K/V databases having a rigid schema of (key_type -> untyped_value) is a good one, by the way. Of course, association tables (AKA "dictionaries") are pretty versatile as a basic collection type - looking at Lua, Python, Awk, or Javascript. They're not ideal for all cases, but they're a good start for most.

bsaunder · on Aug 6, 2009

Sure I can see the single rigid schema view. But I'd rather see the K/V as just a persistence mechanism. I view the schema as being more related to the application.

meta-programming relationship: Since you can store anything in the value part of a K/V store. You could (and I do) store the schema definitions for the other values in the data store. And while you are at it, you could store code segments.

So imagine the following key store (in pseudo code):

{ key: 1728273, value: { fields: { first_name: string, last_name: string } } }

{ key: 8274289, value: { type: 1728273, data: { first_name: "Bruce", last_name: "Saunders" }}}

You could stuff code segments in there in a similar way. So I think K/V stores are potentially highly related to meta-programming. Though certainly the two could exist entirely separately of each other.

msluyter · on Aug 5, 2009

I may be missing something, but can't you create reasonably performant KV stores in relational databases? I'm thinking, for example, of IOT tables (indexed by key) in Oracle, or possibly pl/sql function caching in 11g.

bjclark · on Aug 5, 2009

No, you're not. This point is made in the article.