> UPS systems will stay active for a few minutes, based on their capacity, and t...

mtravis · on Nov 25, 2013

Hi, yid. UPS protects against multiple simultaneous system crashes. Single system crash gets failed over, no problem. If both UPS systems detect their upstream PDU's as being out, then the InfiniSQL management protocol will initiate graceful shutdown, including persisting to disk. For write() issues, at least intially, I think that stuff in commodity hardware (such as ECC memory) is sufficient protection in most cases. Attaching a high end storage array, or using ZFS, also protects against low level disk problems. I don't see those problems as needing to be solved for a 1.0 relase, but am very much open to contributions that address those issues any time you want!

The fundamental insight about not needing transaction logs is pretty simple actually: if the power is guaranteed to either stay on, or to allow the system to quiesce gracefully, then the cluster will not suddenly crash. That's the motivator for transaction logs--to make sure that the data will still be there if the system suddenly crashes. Get rid of the need for transaction logs, get rid of the transaction logs.

Regarding consensus, I expect that there will be a quorum protocol in use amongst an odd number greater than 2 of manager processes, each with redundant network and power. But the specific protocol I haven't ironed out. If there's something I can grab off the shelf then it may be preferable to implementing from scratch, but I haven't gotten there yet.

This stuff hasn't been implemented yet, but the core around which it can be implemented, has been.

Do I sense a volunteer? ;-)

yid · on Nov 25, 2013

I feel like you're banking a little too heavily on external measures for error protection/durability like UPSes and ECC memory, rather than embracing the inevitable fact that corruption and failures will occur. In short, I'd really need to see a white paper on why you're not reinventing the wheel before I'd use InifiSQL.

Kudos for engaging the community though; please do keep us posted as you progress.

mtravis · on Nov 25, 2013

Actually, there's precious little that an application can do if a memory chip fails, or if ECC gets too many corrupted bits. If it gets too many corrupted bits, the kernel will generally do something like halt the system. I am not familiar with any application which does a write-read-write (or similar) to memory, but would be curious to learn about them. I'm sure such an algorithm can also be used in InfiniSQL.

I am familiar with applications such as IBM WebSphere MQ which does a triple-write to disk for every transaction, to overcome corruption problems. But even IBM will recommend turning that parameter off for performance reasons if the storage layer performs that kind of verification, such as HDS or EMC arrays. So, if an InfiniSQL user wants that level of storage protection, they can buy it.

Regarding InfiniSQL's planned use of UPS systems, that's pretty much the identical design to how the above-mentioned storage arrays protect block storage from power loss. I'm just moving the protection up to the application level.

Thanks for the conversation and thoughtful comments. Please go to http://www.infinisql.org/community/ and find the project on Twitter, sign up for newsletter, hit me on LinkedIn.

----

Hi, yid. For some reason I can't reply to your last comment. I think I didn't explain things well enough--I do plan to implement synchronous replication to protect against single replica failure. I describe the plan somewhat in http://www.infinisql.org/docs/overview/

Much of the code for the replication is in place (and it actually worked a few thousand lines of code ago) but it's currently not functional.

I have a big backlog to work on, but n-way synchronous replication is definitely in there.

yid · on Nov 25, 2013

> Actually, there's precious little that an application can do if a memory chip fails, or if ECC gets too many corrupted bits.

That's where replication and (distributed) consensus comes in, usually at the application level.

spof3 · on Nov 26, 2013

Or Resilient Distributed Datasets https://www.usenix.org/system/files/conference/nsdi12/nsdi12...

MichaelGG · on Nov 25, 2013

Well one example is that bug that caused a panic on a certain date (I think it may have been leap year related) - machines just die then. Only way to recover from that would be some neat reboot system that recovers the data from RAM. But, all systems have dataloss, so this problem affects any main-memory system.

You may get more volunteers by publishing a paper outlining the core concept. I clearly remember reading things like the H-Store paper, or the Dremel paper, and saying "damn, this makes sense and is really cool". Implementation details can be worked out and engineering approaches tried. But the underlying concept should be clear.

mtravis · on Nov 25, 2013

I'll consider this--meantime, will you please follow me on Twitter and/or sign up for the newsletter, so that you can be informed when the document is ready?

ams6110 · on Nov 26, 2013

UPSs don't always work as expected.

noise · on Nov 26, 2013

For example, I have real-life experience from this event:

http://programming.oreilly.com/2007/07/365-main-datacenter-p...

And that was a top-tier datacenter at the time. Good luck doing better on your own, and just punt if you are using the cloud.

mtravis · on Nov 26, 2013

I totally agree--the architecture I'm calling for is to have redundant UPS's, each managed by InfiniSQL processes--for ultimate availability. If people just want high performance but want to live with a datacenter / cloud provider's ability to maintain power, then I want to support them in that, too.

But I've suffered power outages in data centers, and they'll eventually come around to bite everybody.