> UPS systems will stay active for a few minutes, based on their capacity, and the manager process will gracefuly shut down each daemon and write data to disk storage. This will ensure durability--even against power failure or system crash--while still maintaining in memory performance.
How does a UPS ensure durability against system or program crashes, disk corruption in large clusters, and other failures that can affect a simple write()?
> The real killer for database performance is synchronous transaction log writes. Even with the fastest underlying storage, this activity is the limiting factor for database write performance. InfiniSQL avoids this limiting factor while still retaining durability
How do you plan to implement this (since it appears it hasn't been implemented)? What is your fundamental insight about synchronous transaction logs that makes InifiSQL capable of being durable while (presumably) not having a synchronously written transaction log? If your answer is the UPS, please see my first question.
Edit: I don't see any mention of Paxos anywhere. Could you explain what you're using for consensus?
Hi, yid. UPS protects against multiple simultaneous system crashes. Single system crash gets failed over, no problem. If both UPS systems detect their upstream PDU's as being out, then the InfiniSQL management protocol will initiate graceful shutdown, including persisting to disk. For write() issues, at least intially, I think that stuff in commodity hardware (such as ECC memory) is sufficient protection in most cases. Attaching a high end storage array, or using ZFS, also protects against low level disk problems. I don't see those problems as needing to be solved for a 1.0 relase, but am very much open to contributions that address those issues any time you want!
The fundamental insight about not needing transaction logs is pretty simple actually: if the power is guaranteed to either stay on, or to allow the system to quiesce gracefully, then the cluster will not suddenly crash. That's the motivator for transaction logs--to make sure that the data will still be there if the system suddenly crashes. Get rid of the need for transaction logs, get rid of the transaction logs.
Regarding consensus, I expect that there will be a quorum protocol in use amongst an odd number greater than 2 of manager processes, each with redundant network and power. But the specific protocol I haven't ironed out. If there's something I can grab off the shelf then it may be preferable to implementing from scratch, but I haven't gotten there yet.
This stuff hasn't been implemented yet, but the core around which it can be implemented, has been.
I feel like you're banking a little too heavily on external measures for error protection/durability like UPSes and ECC memory, rather than embracing the inevitable fact that corruption and failures will occur. In short, I'd really need to see a white paper on why you're not reinventing the wheel before I'd use InifiSQL.
Kudos for engaging the community though; please do keep us posted as you progress.
Actually, there's precious little that an application can do if a memory chip fails, or if ECC gets too many corrupted bits. If it gets too many corrupted bits, the kernel will generally do something like halt the system. I am not familiar with any application which does a write-read-write (or similar) to memory, but would be curious to learn about them. I'm sure such an algorithm can also be used in InfiniSQL.
I am familiar with applications such as IBM WebSphere MQ which does a triple-write to disk for every transaction, to overcome corruption problems. But even IBM will recommend turning that parameter off for performance reasons if the storage layer performs that kind of verification, such as HDS or EMC arrays. So, if an InfiniSQL user wants that level of storage protection, they can buy it.
Regarding InfiniSQL's planned use of UPS systems, that's pretty much the identical design to how the above-mentioned storage arrays protect block storage from power loss. I'm just moving the protection up to the application level.
Thanks for the conversation and thoughtful comments. Please go to http://www.infinisql.org/community/ and find the project on Twitter, sign up for newsletter, hit me on LinkedIn.
----
Hi, yid. For some reason I can't reply to your last comment. I think I didn't explain things well enough--I do plan to implement synchronous replication to protect against single replica failure. I describe the plan somewhat in http://www.infinisql.org/docs/overview/
Much of the code for the replication is in place (and it actually worked a few thousand lines of code ago) but it's currently not functional.
I have a big backlog to work on, but n-way synchronous replication is definitely in there.
Well one example is that bug that caused a panic on a certain date (I think it may have been leap year related) - machines just die then. Only way to recover from that would be some neat reboot system that recovers the data from RAM. But, all systems have dataloss, so this problem affects any main-memory system.
You may get more volunteers by publishing a paper outlining the core concept. I clearly remember reading things like the H-Store paper, or the Dremel paper, and saying "damn, this makes sense and is really cool". Implementation details can be worked out and engineering approaches tried. But the underlying concept should be clear.
I'll consider this--meantime, will you please follow me on Twitter and/or sign up for the newsletter, so that you can be informed when the document is ready?
I totally agree--the architecture I'm calling for is to have redundant UPS's, each managed by InfiniSQL processes--for ultimate availability. If people just want high performance but want to live with a datacenter / cloud provider's ability to maintain power, then I want to support them in that, too.
But I've suffered power outages in data centers, and they'll eventually come around to bite everybody.
How does a UPS ensure durability against system or program crashes, disk corruption in large clusters, and other failures that can affect a simple write()?
> The real killer for database performance is synchronous transaction log writes. Even with the fastest underlying storage, this activity is the limiting factor for database write performance. InfiniSQL avoids this limiting factor while still retaining durability
How do you plan to implement this (since it appears it hasn't been implemented)? What is your fundamental insight about synchronous transaction logs that makes InifiSQL capable of being durable while (presumably) not having a synchronously written transaction log? If your answer is the UPS, please see my first question.
Edit: I don't see any mention of Paxos anywhere. Could you explain what you're using for consensus?