InnoDB faster than NoSQL solutions?

coffeemug · on Dec 20, 2010

You cannot get 750K transactions/second with durability for writes, or for reads when they go out of RAM. An HDD drive operates at roughly 250 ops/second, an SDD at roughly 10k ops/second (huge oversimplification, but roughly correct). Even when you batch things and sacrifice a little bit of latency, you can't get even close to 750K QPS.

InnoDB is a great engine, but we need to know exactly what the benchmarking number means before we can start relying on it.

morgo · on Dec 20, 2010

Right: It's very workload sensitive how this is going to help. If your data set fits in memory, it's probably very good. If you're waiting on IO, the cost of SQL parsing becomes trivial in comparison.

What is actually interesting, is that InnoDB actually has some optimizations around writing that NoSQL databases do not (configure multiple read/write threads - required for raid controllers and faster SSDs). So while the benchmark does show the absolute best case, it's not like everything else is a write off.

A B+Tree index (assuming hot spots) can usually scale quite well for out of memory fit. This is different for example, from a hash index which aims to have random distribution to each bucket.

(Disclaimer: I work for Percona, the company that releases Percona Server).

arethuza · on Dec 20, 2010

I guess the only thing that might easily give you that kind of performance is a high end storage device like a RamSan-630:

http://www.ramsan.com/products/2

Although in this context it would be cheating!

foobarbazetc · on Dec 21, 2010

Put a RAID card with a BBU into your machine and 250 ops/second is irrelevant.

br1 · on Dec 21, 2010

You can get more writes than the number of ops/second of your disk by abandoning B-Trees for TokuDB. (I don't work for them. I'm just a fanboy).

amalcon · on Dec 20, 2010

It doesn't necessarily need to be all on one disk. Of course, 75 independent SDDs is pretty unreasonable to expect.

teoruiz · on Dec 20, 2010

There was a more detailed discussion about HandlerSocket in another thread:

http://news.ycombinator.com/item?id=1886137

Original article: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-a...

antirez · on Dec 20, 2010

I think that this exact post, a few months ago, could get very different comments, possibly along the lines "wow, so what the hype about NoSQL is at all?".

Instead here I see tons of comments that are to the point, that show how we are evolving as the hackers community about the ability to understand tradeoffs with database systems. This is a huge win.

Also I bet the reverse is true, that the "magical features" about NoSQL systems to be fast, scalable, handling huge data sets, fault tolerant while distributed, everything at the same time, is hardly believed at this point.

meursault · on Dec 20, 2010

Doesn't bypassing the SQL parser kind of make this a NOSQL solution?

seunosewa · on Dec 20, 2010

It's also a NoACID solution.

jerf · on Dec 20, 2010

It says you still get the ACID from InnoDB in the article; I have no way to independently verify that, but since people are talking about the speed gains coming from bypassing the SQL parser it would make sense that InnoDB still sees the same basic queries and therefore has the same properties it usually does. InnoDB, despite its association with MySQL, does have ACID and transactions and stuff.

aidenn0 · on Dec 20, 2010

I'd want to verify that claim, since it is also avoiding mutex contention.

morgo · on Dec 20, 2010

Each modification is going to be atomic, but you don't have transactions.

It still has durability against partially written writes via InnoDB's double write buffer - so it's more ACID than most NoSQL solutions.

m0th87 · on Dec 20, 2010

Yes, I'm sure it is (for some definitions of faster) and isn't (for others).

No benchmarks, no comparative NoSQL code, and the example is trivial. There's nothing interesting here.

cies · on Dec 21, 2010

indeed.

foljs · on Dec 21, 2010

> No benchmarks, no comparative NoSQL code, and the example is trivial. There's nothing interesting here.

Yes, nothing EXCEPT raw information, that you have to access yourself. Is only spoon-feeding "interesting"?

I, for one, didn't know about this alternative MySQL connection pipeline at all. Now, I know at least that it exists, how it works, etc. I can do the benchmarks myself, if I need to.

elliottcarlson · on Dec 20, 2010

Is the same InnoDB data available via MySQL itself still using standard SQL syntax, or is utilizing InnoDB in this manner locking it out from that?

Having a dual angle of accessing the data could be very beneficial when it comes to reporting etc, while maintaining a high throughput access for the actual production code.

danudey · on Dec 20, 2010

You do get both forms of access, so you can make queries from both interfaces.

I believe there is a locking mechanism whereby the HandlerSocket interface holds its own lock for all its concurrent client, which it releases occasionally to allow MySQL to have access as well ('occasionally' in this case might be 'a hundred times a second' for all I know).

Note that this doesn't give you SQL or any of the benefits. For example, you can't do joins (but you could write them manually, since you get better throughput manually so the speed might outweigh the inefficiencies of multiple round-trips.

ams6110 · on Dec 20, 2010

Unlikely that you could manually join tables any faster than MySQL can, given that the optimization of joins in MySQL has surely been given considerable attention.

danudey · on Dec 20, 2010

Maybe not, but if you're bypassing MySQL's internal mutexes and SQL parser to get significantly faster results, it's possible that those speed improvements would outweigh doing two fetches via HandlerSocket.

runningdogx · on Dec 21, 2010

This sounds like Drizzle, where they've modularized almost everything, and you can get similarly ultrahigh tps by keeping connections open and bypassing the standard sql parser and query planner. (Obviously, the assumption for such high numbers is that your disk array can keep up or that the transactions are only hitting memory.)

js4all · on Dec 21, 2010

People would not be discussing speedier solutions, if there weren't these fast NoSQL DBs. Plus 1 for NoSQL.

Another aspect where this solution will always lose under real load, is the speed loss due to locking and blocking. CouchDB for instance uses MVCC, which never blocks.

shykes · on Dec 21, 2010

Riak uses InnoDB as its default disk storage engine. Membase is built around the well-established memcache protocol. Both are very promising projects.

I see an encouraging trend in enhancing proven technology, instead of re-writing everything indiscriminately.

peschkaj · on Dec 21, 2010

The current version of Riak uses Bitcask as the default disk storage engine, but InnoDB is an option that you can use. They each work well for specific workloads - Bitcask scales up until your keys (not total data) can't fit in RAM.

shykes · on Dec 21, 2010

Thanks for the correction. I still by everything in my comment except for the word "default" :)