Hacker News new | past | comments | ask | show | jobs | submit login

I am one of the creators of MyRocks at FB. We have a few common MySQL features/operations we don't use at FB. Notably:

1) Schema Changes by DDL (e.g. ALTER TABLE, CREATE INDEX)

2) Recovering primary instances without failover

We use our own open source tool OnlineSchemaChange to do schema changes (details: https://github.com/facebook/mysql-5.6/wiki/Schema-Changes), which is heavily optimized for MyRocks use cases like utilizing bulk loading for both primary and secondary keys. ALTER TABLE / CREATE INDEX support in MyRocks is limited and suboptimal -- it does not support Online/Instant DDL (so blocking writes to the same table during ALTER), and enters non bulk loading path and trying to load the entire table in one transaction -- which may hit row lock count limit or out of memory. We have plans to improve regular DDL paths in MyRocks in MySQL 8.0, including supporting atomic, online and instant schema changes.

I am also realizing that a lot of external MySQL users still don't have auto failover and try to recover primary instances if they go down. This means single instance availability and recoverability is much more important for them. We set rocksdb_wal_recovery_mode=1 (kAbsoluteConsistency) by default in MyRocks, which actually degraded recoverability (higher chances to refuse to start even if it can be recovered from binlog). We're changing defaults to 2 (kPointInTimeRecovery) so that it can be more robust without relying on replicas for recovery.

It would have been a really bad experience when hitting OOM by 1) then failing to restart because of 2). We have relations with MariaDB and Percona, and will make default behavior better for users.




Thanks for explaining this! Really appreciate that you joined in here.

We've been test running our real-time dwh etls on myrocks (and postgres and timescale and even innodb) to comppare with our previous workhorse, tokudb. We've chewed through cpu years iterating over every switch and setting we can think of, to find optimum config for our workloads.

Like for example we've found that myrocks really slows down if you do a SELECT ... WHERE id IN (....) from too long a list of ids.

So we have lots of thoughts and data points on things my team have found easy, hard, painful, better etc. I'd be happy to share with you folks.

(FWIW we are moving from tokudb to myrocks now, with tweaks to how we do data retention and gdpr depersonalization and things)

Ping me on willvarfar at google's freemail domain if that's useful!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: