This is a great post. The author explains his methodology for testing this behavior (including a wrong turn at the beginning, which was instructive); does thorough research; demonstrates striking performance graphs; and provides a simple and effective solution to the problem.
I'm not convinced I like the methodology, but I do like the question.
I've done some similar testing of my own to see what the performance impact of using libraries' mechanisms for flushing writes to disk and found I could get pretty decent performance results.
However, I think this type of testing can prove that something isn't durable, but not being able to prove something isn't durable by observation doesn't imply that it is durable.
The only answer I'd really trust would be something along the lines of SQLite's testing strategy. Those guys get it right.
They've done tech talks and the like on it, but I'd strongly recommend anyone intending to write software to read this: http://www.sqlite.org/testing.html
This topic is mostly section 3.3, but there's a lot to test.
Judged on my experience, by storing millions of keys in Tyrant, I would say it's very durable. While this is a great blog post, I think the author misses stating that Tyrant supports master-master replication and "hot" backups - which makes Tyrant _much_ more durable. Making fallback code is also pretty easy for a master-master systems (much easier than doing it for master-slave systems like MySQL is - - which for most parts require manual work, while fallback code for a master-master system can be automated ).
Would the result still hold for single threaded applications? If so, it is related to something other than locking during sync. It would be reasonable to think that you should get better throughput by batch syncing, as the disk head would trash less.