Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The short story is we do need high capacity network links to function well. By "high capacity" I mean at least double digit megabit links between your datacenters.

A query that inherently requires shuffling because the data is geographically distributed can't get past the bandwidth needs of performing the shuffle. At the very least, with the literal simplest query plan, you're going to need all the raw data to be transported to a single node/datacenter, and I doubt there's a query and network setup where that's more efficient than doing networked shuffles themselves.

I don't think you need gigabit networks, but you're certainly going to want at least 10 megabit links. We have not tried to benchmark scenarios where we are bandwidth constrained, so I can't tell you precisely what the minimums are. All the cloud scenarios we've tested (on GCE, Azure, AWS, DigitalOcean) are constrained on other dimensions (i.e. CPU cores, memory, disk IO).

And thank you :)



That makes sense- I think part of the reason such types of databases are well suited to cloud operations is the guaranteed throughput of the cloud providers own network backbone, which is almost impossible for any single "regular" organization to match, at least for the price. I think we are at a point where doing business without the cloud will become nearly (but not completely) impossible at huge scale with all these features.

Thank you very much for your detailed answer and good luck with the continued rollout!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: