The problem is that the program runs out of RAM. The challenge is to write the data and metadata in such a way that a program crashing at any point for any reason is recoverable.
This is the basic promise of the Durability in ACID, and people using MyRocks expect it.
Rather than actually making sure that MyRocks is durable, they simply slap on a 'max transaction rows' to make it unlikely you run out of RAM. Instead, you simply get an error, and can't do stuff like ALTER TABLE or UPDATE on large tables.
Of course its easy to run out of RAM despite these thresholds, and its easy to find advice when you google the error messages you get that lead you to up the thresholds and to even set a 'bulk load' flag that disables various checks you probably haven't investigated.
The whole approach is wrongheaded!
A database that crashes should not be corrupt!!! Isn't this reliability 101? Why doesn't myrocks have chaos monkey stress testing etc etc?
>A database that crashes should not be corrupt!!! Isn't this reliability 101? Why doesn't myrocks have chaos monkey stress testing etc etc?
Because Facebook has little incentive to ensure that RocksDB works well in your use case. MyRocks was built for Facebook and anything that Facebook doesn’t do probably isn’t particularly hardened. They aren’t going to invest time doing chaos monkey stress testing on codepaths they don’t use. Things like durability might not be super important to them because they will make it up in redundancy.
I remember being burned by something similar during the early days of Cassandra. I’m sure Cockroach has hit the same bugs.
The problem is that the program runs out of RAM. The challenge is to write the data and metadata in such a way that a program crashing at any point for any reason is recoverable.
This is the basic promise of the Durability in ACID, and people using MyRocks expect it.
Rather than actually making sure that MyRocks is durable, they simply slap on a 'max transaction rows' to make it unlikely you run out of RAM. Instead, you simply get an error, and can't do stuff like ALTER TABLE or UPDATE on large tables.
Of course its easy to run out of RAM despite these thresholds, and its easy to find advice when you google the error messages you get that lead you to up the thresholds and to even set a 'bulk load' flag that disables various checks you probably haven't investigated.
The whole approach is wrongheaded!
A database that crashes should not be corrupt!!! Isn't this reliability 101? Why doesn't myrocks have chaos monkey stress testing etc etc?
</exasperated ranting>