There was very interesting presentation by one professor. I'm not sure about what university, but he seemed to know his work.
He talked about how databse world is about to change. ACID is really expensive in terms of resources, and so are the more difficult things about relational schema (foreign keys, checks, etc). And architecture of classic RDBMSes is pretty wasteful -- they use on-disk format but cache it in memory.
He talked about how there are basically three new paths for DBMSes to follow.
1) Some drop the restrictions to become faster. This is the NoSql stuff, because you don't really need ACID for writing to Facebook wall.
This is called NoSql database.
2) OLAP, in data warehousing, the usual way to do things is that you load ridiculous amount of data into database, and then run analytical queries, that tend to heavily use aggregation and sometimes use just few dimmensions, while the DWH data tend to be pretty wide.
For this, column store makes perfect sense. It is not very quick on writes, but it can do very fast aggregation and selection of just few columns.
This is called Column store.
3) In OLTP, you need throughtput, but the question is, how big are your data, and how fast do they grow? Because RAM tends to get bigger exponentially, while how many customers you have will probably grow linearly or maybe fuster, but not much. So your data could fit into memory, now, or in future.
This allows you to make very fast database. All you need to do is to switch the architecture to memory-based, store data in memory format in memory and on disk. You don't read the disk, you just use it to store the data on shutdown.
This is called Main memory database.
No, that was the presentation. It was awesome, and if someone can find it, please give us a link! My search-fu was not strong enouhg.
...
What interests me is that we have NoSql databases for some time already, and we have at least one huge (are very expensive) column store: Teradata. But this seems to be first actual Main memory database.
My dream would be to switch Postgres to main memory or column store mode, but I guess that's not happening very soon :)
If you've ever worked with large scale "enterprise" database warehouses, they tend to be slow and clunky. Back in 2006ish SAP took the whole Data Warehouse (well mainly just the data cubes) and chucked it into a columnar database (at the time it was called TREX, then became BW Accelerator) - http://en.wikipedia.org/wiki/TREX_search_engine
TREX exist way before 2006. SAP also bought a Korean company called P* (IIRC) which did non-columanr (traditional relational) and threw it into memory. SAP also had a produce called APO LiveCache - http://scn.sap.com/community/scm/apo/livecache - which lived around the same time.
This has now all evolved to a standard offering called SAP HANA - http://www.saphana.com/welcome - In it's second year of inception I believe SAP did roughly $360m in sales just on HANA alone.
Also, IIRC is InnoDB basically the open source version of exactly what you're talking about with "Postgres to main memory"?
InnoDB isn't anything like that - it's a transactional database engine that's been around since the 90's and has since become the standard storage engine for MySQL - it competes directly with Postgres' storage layer.
The date on some of those posts in interesting. 2009 is quite a while ago now, and I'd suggest that columnar datastores haven't exactly taken over. Some implementations have made some progress (eg Cassandra), but OTOH many non-traditional datastores have added traditional-database like features (eg, Facebook's SQL front end on their NoSQL system), and traditional databases have added NoSQL features too.
Stonebraker is a very smart person but he's also not shy about promoting his own companies/research. You generally get a well-informed but very opinionated take on things from him.
VoltDB, for example, is good for certain complex workloads over large but not-too-large data sets. For a lot of situations it isn't really an alternative to memcache+MySQL or a NoSQL solution.
If I may drool a little, you guys represent the heart of Hacker News. Insightful summary, mentioning that somewhere somebody gave such a talk. As I was reading the first comment I was silently cheering for "a librarian's follow-up", and there it was!
The log mechanism Prof. Stonebraker prefers, command logging vs ARIES, almost all newer data stores use command logging w/checkpoints (i.e., redis, mongo) and ship changes to other nodes similarly.
After running a large production redis environment, having a large redo log makes startup/recovery painful. I'm not convinced command logging is the most efficient in all scenarios especially when doing counters where the command is more verbose than the resulting change.
You mentioned OLTP. Erlang's Mnesia store comes to mind, but as far as I'm aware it's limited to a 4GB data set. I'm not sure if that qualifies as a main-memory db, but it might be similar.
> My dream would be to switch Postgres to main memory or column store mode, but I guess that's not happening very soon :)
If it can be done besides the traditional architecture, be it in a fork or without touching existing code; and if you can at least start the work, it could happen soon.
A few years ago I was tinkering with Haskell and looked at a framework call HAppS which kept all its state in memory. Doesn't look like the project has really been active lately.
What you're talking about was the HAppS-State component of the HAppS application server, a project which is in deed not active anymore. Happstack is the active fork of HAppS and had a "happstack-state" component for a while, but this was eventually rewritten from scratch and made independent of Happstack and is now known as acid-state [1]. It's even used for the new Hackage server that powers the Haskell package ecosystem.
He talked about how databse world is about to change. ACID is really expensive in terms of resources, and so are the more difficult things about relational schema (foreign keys, checks, etc). And architecture of classic RDBMSes is pretty wasteful -- they use on-disk format but cache it in memory.
He talked about how there are basically three new paths for DBMSes to follow. 1) Some drop the restrictions to become faster. This is the NoSql stuff, because you don't really need ACID for writing to Facebook wall.
This is called NoSql database.
2) OLAP, in data warehousing, the usual way to do things is that you load ridiculous amount of data into database, and then run analytical queries, that tend to heavily use aggregation and sometimes use just few dimmensions, while the DWH data tend to be pretty wide.
For this, column store makes perfect sense. It is not very quick on writes, but it can do very fast aggregation and selection of just few columns.
This is called Column store.
3) In OLTP, you need throughtput, but the question is, how big are your data, and how fast do they grow? Because RAM tends to get bigger exponentially, while how many customers you have will probably grow linearly or maybe fuster, but not much. So your data could fit into memory, now, or in future.
This allows you to make very fast database. All you need to do is to switch the architecture to memory-based, store data in memory format in memory and on disk. You don't read the disk, you just use it to store the data on shutdown.
This is called Main memory database.
No, that was the presentation. It was awesome, and if someone can find it, please give us a link! My search-fu was not strong enouhg.
...
What interests me is that we have NoSql databases for some time already, and we have at least one huge (are very expensive) column store: Teradata. But this seems to be first actual Main memory database.
My dream would be to switch Postgres to main memory or column store mode, but I guess that's not happening very soon :)