>For example, suppose that for every user you store an array of cities they've e...

sendob · on Oct 1, 2013

That would be one way, or

On Postgres, you could use arrays: http://www.postgresql.org/docs/9.3/static/arrays.html also has some great contrib modules as well like: http://www.postgresql.org/docs/9.3/static/intarray.html

Mongodb documents their multikey indexes for a similar approach: http://docs.mongodb.org/manual/core/index-multikey/

not to take away from rethinkdb, awesome to see a project move rapidly!

petergeoghegan · on Oct 1, 2013

Importantly, in Postgres the arrays are indexable with a GIN (inverted) index. So you can ask arbitrary questions along the lines of: show me all rows in this table that have any of the following integers: [5, 7, 9] somewhere in their array column. This will be performed very efficiently.

In recent versions, Postgres will even keep track of statistics for arrays (and other, similar non-scalar types), so that you can get a good selectivity estimate, which is often critical to getting a good plan from the query planner.

Nate75Sanders · on Oct 1, 2013

That's not storing a collection in a table. That's approximating the storage of a collection in a table by using standard RDBMS features.

If you think these are the same somehow, I encourage you to think about the number of disk seeks required for recovering the full collection in both cases.

vidarh · on Oct 1, 2013

If all you need is the indexed values, then most mature RDBMs's supports using covering indexes in queries and satisfying the query entirely from the index.

Since the point of B-tree's is to minimize disk seeks and packing values with the same key close together, the number of disk seeks can often be made just as small as if the collection had been stored in the "owning" record.

Secondly, whether or not it is is an implementation detail. There's nothing inherently preventing RDBMSs from supporting "inlining" of collections from related tables. But in practice most RDBMSs choose to support that exactly through the much more generic optimization of supporting queries using covering indexes...

In other words: It only takes a lot of disk seeks if you use a RDBMS that is lacking in features and/or haven't added the proper indexes.

corresation · on Oct 1, 2013

If you think these are the same somehow

I didn't say they were the same, but instead was replying to the statement "RDBMSes typically can't do that, but document stores can.", but of course an RDBMS can store an "array of cities you've ever lived in".

I encourage you to think about the number of disk seeks required for recovering the full collection in both cases.

Ignoring XML, array types, or materialized views, for the RDBMS it is generally zero on virtually any modern platform, as the common case now is a database that is 100% in memory. A $79/m OVH server has 32GB. A low-end Dell server can have 256GB for $3000 (there are extremely few databases where the hot data surpasses even a GB). And the best thing about normalization is that it, like LZ, is a form of compression, yielding a more likely scenario that your database fits in memory.

That's all a side topic, however, and is neither here nor there on the long running NoSQL/SQL debate.