It's pretty amusing that "it writes your data to disk before acknowledging the write" is something that has to be described as "impressive" and written in bold text when talking about a database. MongoDB really has lowered the bar.
That said, I love the look of Rethink and I can't wait to give it a try.
Everything you need to understand about persisting data to a physical medium has be written by richard hipp et al. You can't cheat. You can easily write async write calls in your own app to a synchronised storage engine and not assume the cost. When you have to have written logs which are accessed by government agencies you need to think about the atomicity of what a transaction is in you business. Is it when it enters the building as an electrical signal or when you flip the bits on a spinning disk
Methinks the world has forgotten that high throughput systems existed long before the web of recent years. Most of what the web world thinks is high throughput is hilariously slow. The ability to run up another instance to scale sideways has ruined people. It doesn't scale in a linear fashion.
OP here - many NoSQL/document DBs will trade off write acks for eventual consistency. I really liked their approach to pushing toward durability by default - that in particular was the thing that impressed me, which I should have been more clear about.
Does it write to a disk or to a log? If it writes to disk, it may still not be completely fail-safe unless it also writes to a log before writing to disk. Postgres for example has a write ahead log and it writes to disk before acknowledging the write.
RethinkDB has a log-structured storage engine. There is no separate log like in Postgres, the log is implicitly integrated into the storage engine. You don't have to write data twice (like you would with a traditional log), but you're still guaranteed safety in case of power failure. The design is roughly based on this paper: http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf.
So does postgres. That alone isn't enough because you can get situations like torn pages where part of the database page is old data and part is new data and nothing makes sense anymore. A log fixes that by first writing to the log so that if the database page gets messed up you have a secondary source you can use to restore it from.
>Does it write to a disk or to a log? If it writes to disk, it may still not be completely fail-safe unless it also writes to a log before writing to disk.
why we still discussing it at a tech forum in 21st century in Silicon Value? Shouldn't it (together with isolation, ACID, CAP, etc...) be a base knowledge taught in elementary school? Like you can't expect Daddy to buy you a firetruck that Mommy promised if Mommy hasn't been able to talk to Daddy yet... though until Mommy talks to Daddy you probably can convince Daddy to buy you a railroad...
Why? It was stupid and unsafe 10-15 years ago when MySQL was doing it, too, and all the devs who had been using more mature DBs (Oracle, DB2, etc.) complained about how bad it was.
I was nodding in agreement right up until the word "Oracle". Essential any history of databases will say that for years, Oracle was not an RDBMS even by non-strict definitions (the claim is that Ellison didn't originally understand the concept correctly), and certainly did not offer ACID guarantees.
Possibly Oracle had fixed 100% of that by the time MySQL came out, but now we're just talking about the timing of adding in safety, again -- and both IBM and Stonebraker's Ingres project (Postgres predecessor) had RDBMS with ACID in the late 1970s, and advertised the fact, so it wasn't a secret.
Except in the early DOS/Windows world, where customers hadn't learned of the importance of reliability in hardware and software, and were more concerned simply with price.
Oracle originally catered to that. MySQL did too, in some sense.
In very recent years, it appears to me that people are re-learning the same lessons from scratch all over again, ignoring history, with certain kinds of recently popular databases.
I am curious as to why. The underlying systems have only gotten more reliable and faster then they were 10-15 years ago. 10-15 years ago writing to disk was actually _more_ of a challenge then it is now with SSD's that have zero seek time.
I don't think it's gotten any easier to verify that something was actually persisted to disk though.
The hard part has always been verifying that the data is actually persisted to the hardware. And the number of layers between you and the physical storage has increased not decreased. And the number of those layers with a tendency to lie to you has increased not decreased.
For some systems it's not considered to be persisted until it's been written to n+1 physical media for exactly these reasons. The os could be lying to you by buffering the write, the driver software for the disk could be lying to you as well by buffering the data. Even the physical hardware could be lying to you by buffering the write.
In many ways writing may have gotten more reliable but verifying the write has gotten way harder.
There's a lot of FUD going around when it comes to MongoDB write durability. Please read the manual.
Mongo lets the user decide whether or not to wait for fsync when writing to an individual node. This is not the default configuration. If you want it, you can enable it. You may complain that Mongo has bad defaults for your particular use case. It continues to have bad defaults to this day. Saying Mongodb is unable to acknowledge writes to disk is pure FUD.
It's like if MySQL shipped with libeatmydata configured by default. The defaults should be safe. Mongo made not just a bad choice, but a really idiotic decision to make their default configuration non durable.
That's one opinion fitting one set of use cases. There are plenty of use cases where speed is more important than durability.
Hell, Redis default configs don't enable the append-only log, but you don't see the HN hate train jumping all over Redis. This is because Redis use cases typically don't require that level of durability.
Redis doesn't market itself as a general purpose database, more as an advanced memcached, which is why those make sense. People who value performance over durability can of course change the setting, being aware of what they are getting into. That's very different from someone who doesn't realize that just because his database seems to save his data doesn't mean it won't eat it tomorrow because he didn't know to change the configuration. I stand by my judgment of hopelessly stupid.
> WriteConcern is at least well-named: it corresponds to "how concerned would you be if we lost your data?" and the potential answers are "not at all!", "be my guest", and "well, look like you made an effort, but it's ok if you drop it."
> There's a lot of FUD going around when it comes to MongoDB write durability. Please read the manual. Mongo lets the user decide whether or not to wait for fsync when writing to an individual node. [...] It continues to have bad defaults to this day. Saying Mongodb is unable to acknowledge writes to disk is pure FUD.
> Let the downvotes ensue.
There's a lot of FUD going around when it comes to Ford Model X car not having brakes enabled. Please read the manual. Ford Model X lets the user decide whether or not to enable brakes or not. [...] It continues to have bad defaults to this day. Saying Ford Model X is unable to brake is pure FUD.
Firstly MongoDB's write durability was set to use the safest option on all of the drivers at the time. So your point makes no sense. And secondly we aren't ignorant users of the system. We are highly technical and as such your analogy again makes no sense.
Typically when you COMMIT the changes will be written to the transaction log, which is sequential, then later written asynchronously to the data files. So you get the performance of sequential writes and the flexibility of random writes, which is nice. But once something is COMMIT'd it is permanent, it will survive any crash after COMMIT returns. If it has not yet be written to the datafiles, the recovery process will do that.
Writing to memory before writing to disk can be safe if you do it right. You need to deploy multiple instances in a replica set with a quorum threshold to guarantee safety. This is what Cassandra provides off the box. I don't think MongoDB made it clear at the start to its users that you should never work with a single database instance if you don't want to lose data.
There's a very marked difference between "safe" and "low probability of failure". With uncommitted writes, even with a quorum, there's still a chance that you lose the write.
I have lost plenty of data on three separate occasions with mongo db, never running it by itself. always with at least a 3 member replica set. (this was 1-2 years ago, I'm sure it's improved). but it's not accurate to only blame the data loss issues on documentation.
That said, I love the look of Rethink and I can't wait to give it a try.