Just going to point out that percona IMHO is potentially hugely underappreciated in the MySQL world...
At $work we are managing a fairly heavy MySQL instance and having scaling issues (~5TB, lots of blobs, and frequent user-managed schema changes for interactive/batch data analysis, many concurrent complex joins involving multiple 1M+ row tables) and after reviewing the various MySQL derivatives, it seems that the percona ppl are really focusing on 'real deal' operational reliability/scalability issues... (e.g. minimizing transaction locking issues for coherent backups, real-time db cloning, replication data integrity testing, etc). I strongly recommend anyone looking at mysql flavors to not overlook their offerings. Looking forward to doing some production tests of their XtraDB mysql flavor in the coming months (for now usage has been limited to unit tests and using the percona toolkit/xtrabackup)
I've found MySQL to have issues with datasets that don't fit into the innodb buffer pool. Are you doing a lot of partitioning to make everything work well?
If everything is behind an ORM, have you considered trying out PostgreSQL w/ the MySQL FDW? pgsql 10's improvements to FDW's make it a truly viable option.
Also to clarify: I'm just saying this as pgsql seems to handle queries that involve disk access and large datasets much better. I have found that as long as your hot dataset fits into the innodb buffer pool and/or you're doing only key lookups (e.g.: select * from tbl where pk=1;) MySQL is most certainly faster for real-world usage. If you're pulling and sorting/filtering millions (or billions) of rows per query, I find that pgsql stands up extremely well to that. Even in a single huge server setup.
Also a semi-unrelated ditty, but as someone who has to run migrations on pretty big (100m rows) tables, ALTER TABLE LOCK=NONE ALGORITM=INPLACE is pretty great (here's a table of what you can user per-situation https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-...)
If you made it this far in my comment you'll probably realize I didn't grasp the article well. I just re-read it and tl;dr, MyRocks works well (as expected) with datasets that don't fit into the innodb buffer pool. :)
Our workload is application specific and currently requires MySQL - 100% of queries are on primary keys so most 'hot' paths I think tend to stay resident, at least as far as query operations go. Also, we tend to fetch 'chunks' and operate on those in memory rather than running server side so some of the processing is offset by local operations.
Would be good to dig a bit deeper into this - thanks for the pointers.
The percona toolkit is critical for any MySQL DBA. So flexible and invaluable for many aspects of running a production fleet.
edit: I'm getting nostalgia of doing things with the toolkit like running a 24 hour lagged slave because one Developer kept dropping tables from the live primary database he'd decided were "obsolete" (when they weren't). Sadly we could never remove his root access.
Also of particular value: pt-online-schema-change, pt-table-checksum and pt-query-digest. The latter you can do some neat stuff, like tcpdump live traffic and pump it in to find out interesting stuff about live performance.
Back in the day I used to be a Cloud Engineer at a small ISP/VPS provider + managed support(largely LAMP). Percona server was our MySQL install and we used their other tools pretty heavily for operations(backups, migrations, etc).
If I ever really needed a MySQL consultant, they would be top of the list for sure. I would say they are Tier N support, and I'm somewhere around Tier N-Y lol.
> Just going to point out that percona IMHO is potentially hugely underappreciated in the MySQL world...
Certainly not by MySQL users. Percona has run the yearly MySQL conference (now Percona Live) for years and has done excellent work on tools, DBMS distributions, and performance. Vadim's work in the cited article is pretty typical of the high standard of performance analysis.
indeed it is quite possible i am hugely out of the loop..
for those of us less 'in the know', i think there is a tendency to look at mariadb and oracle, and kind of overlook percona - this is what I was getting at.
I actually think the correct conclusion is the reverse: if you can fit your tables in RAM, choose Inno. Otherwise, Rocks outperforms Inno when the working set no longer fits in memory. OP's testing is a little confusing because the working set remains constant and the memory is scaled, where normally we think of this in terms of seeing how much we can scale the workload on a node with fixed resources.
That's the right conclusion. Most use cases will fit into memory and InnoDB will win vs Rocks when all of your data fits into memory. Very large users are the ones who would more typically benefit from Rocks, as their data sets may dramatically outstrip available memory.
There is another side to MyRocks/RocksDB Story - Compression
With a lot higher compression than possible with Innodb MyRocks can be in memory (File Cache) when Innodb workloads are already IO bound. Plus you can save a lot on disk storage (and IO if you're paying for it)
This is common misconception. The larger there is a cost of sub-optimal performance. If you spend $1000/month on your infrastructure, halving infrastructure cost will save you $500/month which can't justify a lot of investments. Now if it is $100M/month saving $50M a month is worth a lot.... this is why Facebook for example has created many custom built highly optimized systems like RocksDB
MyRocks is LSM based. The LSM is different from BTREE in what you need to do multiple "physical" lookups and merge result to perform one logical lookup. There are some tricks like Bloom Filters are employed to optimize it but it is still not as efficient in memory as BTREE especially for range lookups. Here is some information on WIKIPEDIA https://en.wikipedia.org/wiki/Log-structured_merge-tree
As in any situation when it comes to performance: measure your own real world performance before making any decisions.
Benchmarks are nice to show that somethings work better than others in theory or "perfect" examples, but real databases tend to be much more complicated and messy, and sometimes just writing smarter queries, creating an index, or changing application logic may have a much bigger impact.
Perhaps, but it depends on your workload, as is described in the article. In any case: do your own measurements on your own database/application to see if it works for you.
At $work we are managing a fairly heavy MySQL instance and having scaling issues (~5TB, lots of blobs, and frequent user-managed schema changes for interactive/batch data analysis, many concurrent complex joins involving multiple 1M+ row tables) and after reviewing the various MySQL derivatives, it seems that the percona ppl are really focusing on 'real deal' operational reliability/scalability issues... (e.g. minimizing transaction locking issues for coherent backups, real-time db cloning, replication data integrity testing, etc). I strongly recommend anyone looking at mysql flavors to not overlook their offerings. Looking forward to doing some production tests of their XtraDB mysql flavor in the coming months (for now usage has been limited to unit tests and using the percona toolkit/xtrabackup)
Also: no I am not a paid sponsor.