It's pretty amusing that "it writes your data to disk before acknowledging the w...

easytiger · on April 17, 2015

Everything you need to understand about persisting data to a physical medium has be written by richard hipp et al. You can't cheat. You can easily write async write calls in your own app to a synchronised storage engine and not assume the cost. When you have to have written logs which are accessed by government agencies you need to think about the atomicity of what a transaction is in you business. Is it when it enters the building as an electrical signal or when you flip the bits on a spinning disk

Methinks the world has forgotten that high throughput systems existed long before the web of recent years. Most of what the web world thinks is high throughput is hilariously slow. The ability to run up another instance to scale sideways has ruined people. It doesn't scale in a linear fashion.

robconery · on April 17, 2015

OP here - many NoSQL/document DBs will trade off write acks for eventual consistency. I really liked their approach to pushing toward durability by default - that in particular was the thing that impressed me, which I should have been more clear about.

picardo · on April 17, 2015

Does it write to a disk or to a log? If it writes to disk, it may still not be completely fail-safe unless it also writes to a log before writing to disk. Postgres for example has a write ahead log and it writes to disk before acknowledging the write.

coffeemug · on April 17, 2015

RethinkDB has a log-structured storage engine. There is no separate log like in Postgres, the log is implicitly integrated into the storage engine. You don't have to write data twice (like you would with a traditional log), but you're still guaranteed safety in case of power failure. The design is roughly based on this paper: http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf.

bmurphy1976 · on April 17, 2015

They use MVCC: http://rethinkdb.com/docs/architecture/

slashdev · on April 17, 2015

So does postgres. That alone isn't enough because you can get situations like torn pages where part of the database page is old data and part is new data and nothing makes sense anymore. A log fixes that by first writing to the log so that if the database page gets messed up you have a secondary source you can use to restore it from.

trhway · on April 17, 2015

>Does it write to a disk or to a log? If it writes to disk, it may still not be completely fail-safe unless it also writes to a log before writing to disk.

why we still discussing it at a tech forum in 21st century in Silicon Value? Shouldn't it (together with isolation, ACID, CAP, etc...) be a base knowledge taught in elementary school? Like you can't expect Daddy to buy you a firetruck that Mommy promised if Mommy hasn't been able to talk to Daddy yet... though until Mommy talks to Daddy you probably can convince Daddy to buy you a railroad...

MCRed · on April 17, 2015

Yeah, that should be table stakes.

dkhenry · on April 17, 2015

Except it wasn't mongodb that did that. People had the same knock against MySQL with MyISAM.

army · on April 17, 2015

Right, I'd be a lot more forgiving of MongoDB if they had been bringing the product to market 10-15 years earlier.

jlarocco · on April 17, 2015

Why? It was stupid and unsafe 10-15 years ago when MySQL was doing it, too, and all the devs who had been using more mature DBs (Oracle, DB2, etc.) complained about how bad it was.

DougMerritt · on April 18, 2015

I was nodding in agreement right up until the word "Oracle". Essential any history of databases will say that for years, Oracle was not an RDBMS even by non-strict definitions (the claim is that Ellison didn't originally understand the concept correctly), and certainly did not offer ACID guarantees.

Possibly Oracle had fixed 100% of that by the time MySQL came out, but now we're just talking about the timing of adding in safety, again -- and both IBM and Stonebraker's Ingres project (Postgres predecessor) had RDBMS with ACID in the late 1970s, and advertised the fact, so it wasn't a secret.

Except in the early DOS/Windows world, where customers hadn't learned of the importance of reliability in hardware and software, and were more concerned simply with price.

Oracle originally catered to that. MySQL did too, in some sense.

In very recent years, it appears to me that people are re-learning the same lessons from scratch all over again, ignoring history, with certain kinds of recently popular databases.

dkhenry · on April 17, 2015

I am curious as to why. The underlying systems have only gotten more reliable and faster then they were 10-15 years ago. 10-15 years ago writing to disk was actually _more_ of a challenge then it is now with SSD's that have zero seek time.

zaphar · on April 18, 2015

I don't think it's gotten any easier to verify that something was actually persisted to disk though.

The hard part has always been verifying that the data is actually persisted to the hardware. And the number of layers between you and the physical storage has increased not decreased. And the number of those layers with a tendency to lie to you has increased not decreased.

For some systems it's not considered to be persisted until it's been written to n+1 physical media for exactly these reasons. The os could be lying to you by buffering the write, the driver software for the disk could be lying to you as well by buffering the data. Even the physical hardware could be lying to you by buffering the write.

In many ways writing may have gotten more reliable but verifying the write has gotten way harder.

reqres · on April 17, 2015

There's a lot of FUD going around when it comes to MongoDB write durability. Please read the manual.

Mongo lets the user decide whether or not to wait for fsync when writing to an individual node. This is not the default configuration. If you want it, you can enable it. You may complain that Mongo has bad defaults for your particular use case. It continues to have bad defaults to this day. Saying Mongodb is unable to acknowledge writes to disk is pure FUD.

Let the downvotes ensue.

slashdev · on April 17, 2015

It's like if MySQL shipped with libeatmydata configured by default. The defaults should be safe. Mongo made not just a bad choice, but a really idiotic decision to make their default configuration non durable.

omni · on April 17, 2015

> The defaults should be safe.

That's one opinion fitting one set of use cases. There are plenty of use cases where speed is more important than durability.

Hell, Redis default configs don't enable the append-only log, but you don't see the HN hate train jumping all over Redis. This is because Redis use cases typically don't require that level of durability.

edit for source: cmd+f for "appendonly" https://raw.githubusercontent.com/antirez/redis/2.8/redis.co...

slashdev · on April 18, 2015

Redis doesn't market itself as a general purpose database, more as an advanced memcached, which is why those make sense. People who value performance over durability can of course change the setting, being aware of what they are getting into. That's very different from someone who doesn't realize that just because his database seems to save his data doesn't mean it won't eat it tomorrow because he didn't know to change the configuration. I stand by my judgment of hopelessly stupid.

outworlder · on April 17, 2015

Well, MySQL does ship like that. Only instead of not saving your data, it will mangle it in an effort to insert it in the database somehow.

takeda · on April 18, 2015

It loses data even with WriteConcern.MAJORITY[1].

Emin Gün Sirer summarized[2] it best:

> WriteConcern is at least well-named: it corresponds to "how concerned would you be if we lost your data?" and the potential answers are "not at all!", "be my guest", and "well, look like you made an effort, but it's ok if you drop it."

[1] https://aphyr.com/posts/284-call-me-maybe-mongodb

[2] http://hackingdistributed.com/2013/01/29/mongo-ft/

brazzledazzle · on April 17, 2015

>Let the downvotes ensue.

Can we not have this reddit-ism take hold?

rdtsc · on April 17, 2015

> There's a lot of FUD going around when it comes to MongoDB write durability. Please read the manual. Mongo lets the user decide whether or not to wait for fsync when writing to an individual node. [...] It continues to have bad defaults to this day. Saying Mongodb is unable to acknowledge writes to disk is pure FUD.

> Let the downvotes ensue.

There's a lot of FUD going around when it comes to Ford Model X car not having brakes enabled. Please read the manual. Ford Model X lets the user decide whether or not to enable brakes or not. [...] It continues to have bad defaults to this day. Saying Ford Model X is unable to brake is pure FUD.

Let the downvotes ensue.

Kudos · on April 17, 2015

Oh great, an analogy. Now we can start debating its subtleties instead of discussing the matter at hand.

threeseed · on April 17, 2015

You are just being childish.

Firstly MongoDB's write durability was set to use the safest option on all of the drivers at the time. So your point makes no sense. And secondly we aren't ignorant users of the system. We are highly technical and as such your analogy again makes no sense.

krakensden · on April 17, 2015

For some reason I had it in my head that most databases don't actually block until fsync() returns- instead, the guarantee you get is that:

# if execution continues, everything agrees on the state of the transaction

# if execution halts, because of a crash or whatever, you'll come back online at a consistent state from the past

gaius · on April 17, 2015

Typically when you COMMIT the changes will be written to the transaction log, which is sequential, then later written asynchronously to the data files. So you get the performance of sequential writes and the flexibility of random writes, which is nice. But once something is COMMIT'd it is permanent, it will survive any crash after COMMIT returns. If it has not yet be written to the datafiles, the recovery process will do that.

istvan__ · on April 17, 2015

Haha very true!! My #1 complain against MongoDB was the silent data loss scenario. Anyways I am curious what RethinkDB has to offer.

picardo · on April 17, 2015

Writing to memory before writing to disk can be safe if you do it right. You need to deploy multiple instances in a replica set with a quorum threshold to guarantee safety. This is what Cassandra provides off the box. I don't think MongoDB made it clear at the start to its users that you should never work with a single database instance if you don't want to lose data.

yid · on April 17, 2015

There's a very marked difference between "safe" and "low probability of failure". With uncommitted writes, even with a quorum, there's still a chance that you lose the write.

spotman · on April 17, 2015

I have lost plenty of data on three separate occasions with mongo db, never running it by itself. always with at least a 3 member replica set. (this was 1-2 years ago, I'm sure it's improved). but it's not accurate to only blame the data loss issues on documentation.