Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Re: IBM outage

https://news.ycombinator.com/item?id=23471698

TLDR is connectivity to and from the IBM cloud datacenters (which includes softlayer) was generally unavailable, globally, for a couple hours. If you were in multiple IBM datacenters, you were as down as if you were in only one (mostly, I was poking around when it was wrapping up, and some datacenters came back earlier than others).

> Its the transferring of database and maintaining consistent version of databases in both the locations. Moving the snapshots after every X minutes doesn't maintain consistency. I would like to read about any company that is able to do this, as honestly it sounds really hard to me

The gold standard here is two-phase commit. Of course, that subjects every transaction to delay, so people tend not to do that. The close enough version is MySQL (or other DB) replication, monitor that the replication stream is pretty current and hope not a lot is lost when a datacenter dies. There's room to fiddle with failover and reconciliation; I recommend against automatic failover for writes, because it gets really messy if you get a split brain situation --- some of your hosts see one write server available and others see another, and you may accept conflicting writes. A few minutes running like that can mean days or weeks of reconciliation, if you didn't build for reconciliation.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: