Hacker News new | past | comments | ask | show | jobs | submit login
Should You Go Beyond Relational Databases? (thinkvitamin.com)
52 points by olegp on June 24, 2009 | hide | past | favorite | 30 comments



This was and interesting article, but there I would have a significant quibble with it.

His list of "symptoms" that a relational database is not right for you are more often symptoms that your relational database was not well designed (or that the problem you were trying to solve changed along the way) rather than that the relational model itself does not fit your needs.

I think this applies to all his "structural symptoms" but especially with "Do you have tables with lots of columns, only a few of which are actually used by any particular row?" This is more often a sign that the database was not properly normalized than anything else.

This of course is not to say that the relational model is right for everything. In some cases, object oriented databases are the way to go and if you truly need a vast level of scalability and you can afford to relax the ACID standard then it makes sense to look at non-relational options.

There is room in the world for both relational and non-relational models, but I do not think fashion should play a role in choosing a technology personally.


If your objects have a wide variety of attributes, either you need lots of columns with many of them empty, or you need some sort of attribute table with object-key-value triples. Both are bad relational design, but the structure is inherent in the data. If you're having to use a fixed DB schema, I can't see any way around bad design.

When you mention object databases, can you name any examples? As far as I know object databases where talked about a lot some years ago, but never really gained significant widespread adoption.

Also, the article makes the point that non-relational databases are not just about scalability. In fact, in the case of graph databases, scaling problems are just as pronounced as with relational databases. The different data models can enable you to make good database designs, even if the structure of your data is against you.


If your objects have a wide variety of attributes, either you need lots of columns with many of them empty, or you need some sort of attribute table with object-key-value triples. Both are bad relational design, but the structure is inherent in the data. If you're having to use a fixed DB schema, I can't see any way around bad design.

PostgreSQL HStore is very handy in situations such as this: http://www.postgresql.org/docs/8.3/static/hstore.html


So your solution is to point to a database capacity that violates the first normal form?


Um, no. "Set of key-value pairs" can be a perfectly valid atomic data type as far as the database is concerned, and so a database containing tables with hstore columns can be in any normal form you want, just like a database with varchar columns can be normalized, as long as you don't give the contents of the varchar or set any structural meaning as far as the database is concerned. There is nothing in the relational model that forbids storing composite values (even whole other database tables [edit: of which hstore is just a trivial example]) as a value of a field. Whether it violates any normal form doesn't depend on the type of the elements stored, but on their interpretation in your data model. Although that doesn't mean that I would be surprised if I saw a use of hstore that is ill-advised. Quite the opposite, actually.


"If your objects have a wide variety of attributes, either you need lots of columns with many of them empty, or you need some sort of attribute table with object-key-value triples. Both are bad relational design...

Frequently you can avoid having object that have a wide variety of attributes most of which are empty in the first place. That often indicates that the objects are not of the same type and that that table should be broken into other tables of objects that are truly alike.

That, though, is not always the case. When you truly have that situation with objects of the same kind, then I do not see why allowing the numerous empty columns (as long as they are all bound to the key and only the key) or the "object-key-valuy" tables are bad relational designs. If I am missing something, please let me know.

As for the object databases, the only one I have really heard of is db4o as another posted mentioned. I do believe that they are used in some areas of physics though. I have never used them personally, but I have heard of people using them in small scale projects just as a way to avoid the object-relational impedence mismatch complications.

To your last paragraph, I agree completely. There is room and a place for both relational and non-relational databases.


I've been playing with the db4o OODBMS. So far my tests are simple and I've been using tinker toy Scala programs to test against it, but I love the simplicity. I could definitely make a case for using it in an OLTP department-sized application. Beyond that level, I haven't tested enough to be comfortable giving advice either way.


Pretty interesting. I never heard of db4o before. I'm skimming the tutorial (http://www.db4o.com/about/productinformation/resources/db4o-...), and it looks fairly nice.

It seems that db4o supports some pretty flexible query mechanisms. That leads me to worry about performance. One of my recurring nightmares of relational databases and SQL is optimizing queries on non-trivial constantly-evolving schemas, and I can't help wondering if db4o has the same horrors in store.


Unfortunately my tests have been too basic to give any advice in your situation. Adding data elements has been nothing more complex than adding it to the class. Queries that filter on elements in a single class are trivially easy - but I think there are types of applications that need nothing more. Querying across classes would require more work than the relational equivalent of querying across multiple tables.

HOWEVER, my sense is that bringing a relational mindset to object databases is not going to realize the full potential of them; you have to think about storing the data elements differently.

Object databases seem great for OLTP apps. In particular, ones that require working with individual accounts/patient charts/etc, i.e. query a person's account, modify a data element, and save it back to the db. For something more analytical, like spotting trends in historical data, I'd prefer the full power of an RDBMS and the SQL language that goes with it.


(or that the problem you were trying to solve changed along the way)

All non-trivial problems change along the way.


> not to say that the relational model is right for everything. In some cases, object oriented databases are the way to go

Allow me to channel CJ Date and point out current popular databases are not truly relational. A proper relational database built around Tutorial D would pretty much always be the way to go, even if your problem is highly OO.


I've had problems with MemcacheDB and I'm curious to hear about other people's experiences.

Basically, I've found that under high read/write loads I get occasional socket timeouts (tested on linux/osx). I think the underlying reason for the timeouts are Berkeley DB locks when data gets flushed to disk. Beyond the hassle of getting intermittent socket timeouts the real problem is that the memcache client API fails gracefully because it was specifically designed to be fault tolerant. You can check for socket errors and retry queries but you'll still get unpredictable query times.

I'm coming to the conclusion that it's an architecture issue - MemcacheDB is a persistent database abstracted behind an interface specifically designed for non-persistence. The abstraction leaks when the database locks.


I have had similar issues like you (i.e. very poor performance of Berkley DB, especially under load). I am sure thought one can configure this (some options are listed here: http://www.oracle.com/technology/documentation/berkeley-db/d... ), I have tried different options, but I can't recall the effects.

Anyway, after testing MemcacheDB and Tokyo Tyrant in production my conclusion was to use Tokyo Tyrant instead of MemcacheDB. Tokyo Tyrant implements the memcached protocol and performs really well under load (and has TONS of features such as master-master replication, Lua scripting, different types of engines [hash, b-tree or memory]). You can also check LightCloud, which is a distributed key-value database built on top of Tokyo Tyrant.


Thanks for the response - I'm glad to hear I'm not the only one experiencing these issues. I've read a lot of articles recommending MemcacheDB but none mention these issues. I wish the authors had actually tested MemcacheDB before writing about it. I wasted a day working on this.


Have you tried TokyoCabinet? It serves a similar purpose (a persistent key/value store which speaks the memcache protocol) and I've only heard good things about its performance.


Ooh, my head hurts. Too many options, but this article is a good place to see the variety post-relational databases. It's also good to see that innovation is alive an well in the database arena. I think that it is also good that all the options will make a good architect think about his design. To use these highly distributed designs you give up things, like umpteen indexes per table, ACID compliance, a column for every day of the year, and 20-way joins.

I think its a good to go away from relational and look at these things and come back and look at your current practices. Maybe every attribute doesn't need a column and an index. Maybe use a JSON bag to hold the loose parts. Maybe throw away a bunch of indexes; create them on the fly for the monthly report. Maybe you can give away some of your referential integrity.


FWIW I've been using the JSON bag approach for the loose parts on my current project, and it's been much easier than "the standard approach" both from maintenance and speed of feature development.


Standard SQL cannot query transitive relationships, i.e. variable-length chains of joins which continue until some condition is reached

Actually it can: SQL:2003 includes recursive queries (WITH RECURSIVE), and several popular database systems implement it.


Which still returns a flat data structure that you have to manually use to rebuild the tree you're looking for. I you really need the ability to easily store and retrieve trees of data, don't use a relational database.


If you can rebuild the tree using only the information returned in that flat data structure, and the database performance is good, then why not just rebuild it in code?

It's trivial to scale out stateless code; databases are the hard part.


You're not going to get better performance by tearing apart a tree, sticking into a database that doesn't store it well, and then taking it back out and rebuilding it from scratch every time. If you want performance, use a database that will store your data structure as is and avoid the disassemble/reassemble step as well as avoiding writing the extra code to do it.

Relational databases didn't win because they're fast, they won because they're flexible for querying and language neutral, both features which necessarily slow the database down.

> It's trivial to scale out stateless code; databases are the hard part.

Not really relevant to the conversation. There are databases that can store trees of objects just fine as is, not all databases force you to store everything in tables.


> Not really relevant to the conversation. There are databases that can store trees of objects just fine as is, not all databases force you to store everything in tables.

To me, the conversation is about trade-offs, and if the alternative database (for example) forces you to get rid of ACID transactions, then it's not as cut-and-dry as you purport.


Uhhggg... who said anything about getting rid of ACID transactions? Seriously, what kind of FUD have you been reading. Transactions have absolutely nothing to do with the format a database uses to store it's data. Most object databases have ACID transactions AND can store your data-structures in their native format.

Some of these new fangled distributed databases forgo transactions because transactions and distribution don't really go well together, they are opposing forces. Don't confuse issues that have nothing to do with each other, and don't assume that only relational databases support ACID compliant transactions.

Look at a real object database like Gemstone that is directly comparable to your big iron Sql database. These new distributed key/value pair database are little more than distributed persistent hash tables, they're different beasts entirely and aren't directly comparable in features because they're meant to solve different problems.


Now this is getting interesting...

Is there anything open-source (and language-agnostic) that's comparable to Gemstone?


Gemstone has a free version limited to 4 gig, but few if any object databases are language agnostic, that would defeat the purpose. Object systems are language specific, storing them in their native format is thus generally also language specific.

There's an open source one called Magma in Squeak that's like a Gemstone lite, but I'd take the Gemstone version any day because it'll scale to any level you'll ever need, ever. I'm working on a Gemstone project now, and after having used it, nothing else I've seen comes close to the massive productivity it offers.

By the way, databases shouldn't be language agnostic, this leads to the ever present anti pattern of using the database as an integration point between many programs which turns the database into a giant ball of mud global variable that becomes impossible to change.

Integration databases suck. Application should own their own data and integrate with other application via services. That's how the web became so successful and that's how big ass enterprises should be ran as well, many small apps loosely coupled, not one giant global db where every app is bound to a generic schema that isn't suited for what it actually needs.


This article is confused and misleading. Implementation minutiae such as tables with lots of columns is not the right way to think about the problem of selecting a database model or technology. The important questions to ask are: why do we need this data, where does it come from, who or what will need to use it, what parts are most important, how long must we keep it and what can we afford.


Any recommendations for a solid OODB for ruby? I've had great success years ago with Java based and Smalltalk based ones. I may be in the market for an OO backend for a new ruby project.


Maglev will be one soon, and it's just Gemstone under the hood so it'll be bad ass.


I like the rules of thumbs about whether you want a key-value store or a graph database or relational DBs.


Where does LucidDB fall into this scheme?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: