Hacker News new | past | comments | ask | show | jobs | submit login
InnoDB faster than NoSQL solutions? (mysqldba.blogspot.com)
33 points by petervandijck on Dec 20, 2010 | hide | past | favorite | 25 comments



You cannot get 750K transactions/second with durability for writes, or for reads when they go out of RAM. An HDD drive operates at roughly 250 ops/second, an SDD at roughly 10k ops/second (huge oversimplification, but roughly correct). Even when you batch things and sacrifice a little bit of latency, you can't get even close to 750K QPS.

InnoDB is a great engine, but we need to know exactly what the benchmarking number means before we can start relying on it.


Right: It's very workload sensitive how this is going to help. If your data set fits in memory, it's probably very good. If you're waiting on IO, the cost of SQL parsing becomes trivial in comparison.

What is actually interesting, is that InnoDB actually has some optimizations around writing that NoSQL databases do not (configure multiple read/write threads - required for raid controllers and faster SSDs). So while the benchmark does show the absolute best case, it's not like everything else is a write off.

A B+Tree index (assuming hot spots) can usually scale quite well for out of memory fit. This is different for example, from a hash index which aims to have random distribution to each bucket.

(Disclaimer: I work for Percona, the company that releases Percona Server).


I guess the only thing that might easily give you that kind of performance is a high end storage device like a RamSan-630:

http://www.ramsan.com/products/2

Although in this context it would be cheating!


Put a RAID card with a BBU into your machine and 250 ops/second is irrelevant.


You can get more writes than the number of ops/second of your disk by abandoning B-Trees for TokuDB. (I don't work for them. I'm just a fanboy).


It doesn't necessarily need to be all on one disk. Of course, 75 independent SDDs is pretty unreasonable to expect.


There was a more detailed discussion about HandlerSocket in another thread:

http://news.ycombinator.com/item?id=1886137

Original article: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-a...


I think that this exact post, a few months ago, could get very different comments, possibly along the lines "wow, so what the hype about NoSQL is at all?".

Instead here I see tons of comments that are to the point, that show how we are evolving as the hackers community about the ability to understand tradeoffs with database systems. This is a huge win.

Also I bet the reverse is true, that the "magical features" about NoSQL systems to be fast, scalable, handling huge data sets, fault tolerant while distributed, everything at the same time, is hardly believed at this point.


Doesn't bypassing the SQL parser kind of make this a NOSQL solution?


It's also a NoACID solution.


It says you still get the ACID from InnoDB in the article; I have no way to independently verify that, but since people are talking about the speed gains coming from bypassing the SQL parser it would make sense that InnoDB still sees the same basic queries and therefore has the same properties it usually does. InnoDB, despite its association with MySQL, does have ACID and transactions and stuff.


I'd want to verify that claim, since it is also avoiding mutex contention.


Each modification is going to be atomic, but you don't have transactions.

It still has durability against partially written writes via InnoDB's double write buffer - so it's more ACID than most NoSQL solutions.


Yes, I'm sure it is (for some definitions of faster) and isn't (for others).

No benchmarks, no comparative NoSQL code, and the example is trivial. There's nothing interesting here.


indeed.


> No benchmarks, no comparative NoSQL code, and the example is trivial. There's nothing interesting here.

Yes, nothing EXCEPT raw information, that you have to access yourself. Is only spoon-feeding "interesting"?

I, for one, didn't know about this alternative MySQL connection pipeline at all. Now, I know at least that it exists, how it works, etc. I can do the benchmarks myself, if I need to.


Is the same InnoDB data available via MySQL itself still using standard SQL syntax, or is utilizing InnoDB in this manner locking it out from that?

Having a dual angle of accessing the data could be very beneficial when it comes to reporting etc, while maintaining a high throughput access for the actual production code.


You do get both forms of access, so you can make queries from both interfaces.

I believe there is a locking mechanism whereby the HandlerSocket interface holds its own lock for all its concurrent client, which it releases occasionally to allow MySQL to have access as well ('occasionally' in this case might be 'a hundred times a second' for all I know).

Note that this doesn't give you SQL or any of the benefits. For example, you can't do joins (but you could write them manually, since you get better throughput manually so the speed might outweigh the inefficiencies of multiple round-trips.


Unlikely that you could manually join tables any faster than MySQL can, given that the optimization of joins in MySQL has surely been given considerable attention.


Maybe not, but if you're bypassing MySQL's internal mutexes and SQL parser to get significantly faster results, it's possible that those speed improvements would outweigh doing two fetches via HandlerSocket.


This sounds like Drizzle, where they've modularized almost everything, and you can get similarly ultrahigh tps by keeping connections open and bypassing the standard sql parser and query planner. (Obviously, the assumption for such high numbers is that your disk array can keep up or that the transactions are only hitting memory.)


People would not be discussing speedier solutions, if there weren't these fast NoSQL DBs. Plus 1 for NoSQL.

Another aspect where this solution will always lose under real load, is the speed loss due to locking and blocking. CouchDB for instance uses MVCC, which never blocks.


Riak uses InnoDB as its default disk storage engine. Membase is built around the well-established memcache protocol. Both are very promising projects.

I see an encouraging trend in enhancing proven technology, instead of re-writing everything indiscriminately.


The current version of Riak uses Bitcask as the default disk storage engine, but InnoDB is an option that you can use. They each work well for specific workloads - Bitcask scales up until your keys (not total data) can't fit in RAM.


Thanks for the correction. I still by everything in my comment except for the word "default" :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: