> we set out to build and deliver Google’s infrastructure to everyone else This ...

justinsb · on April 6, 2015

I'm working on adding AWS support for Kubernetes. I just last week finished Load-Balancer (ELB) & Persistent Storage (EBS) support, and they're currently going through the pull-request review process. Once they merge (I'd guess a week or two?), AWS will be on-par with Google Cloud Engine feature-wise.

I have found the Kubernetes team to be nothing other than extremely supportive of efforts to support AWS & non-Google platforms. It takes a little longer to translate some of the Google-isms to other platforms, but I'm happy for the thinking behind those decisions, vs just adopting lowest-common denominator.

josephjacks · on April 6, 2015

I also see quite a lot of IaaS providers here:

https://github.com/GoogleCloudPlatform/kubernetes/tree/maste...

wyc · on April 6, 2015

> I'd guess a week or two?

I'm mostly concerned about feature lag and vendor lock-in, so I'm happy to hear that this will be out so soon. I'm excited to try it out.

> I have found the Kubernetes team to be nothing other than extremely supportive of efforts to support AWS & non-Google platforms.

I don't doubt it one bit; in my experience, people on the Kubernetes IRC channel have been always really helpful and supportive. I just tend to be a little more pessimistic when it comes to resource allocation: a Google team probably prioritizes support for Google platforms, and that's no one's fault or foul play.

Thanks for your work!

chubot · on April 6, 2015

About disk locality -- I've read that paper and know that Google increasingly has the philosophy of disk locality being irrelevant.

However, I don't buy it for 2 reasons:

1. Highly available distributed services need to have geographical diversity, i.e. they should be "multihomed". This is true on AWS or in Google's internal data centers. That means you have WAN latency, in which case locality becomes again the primary design concern for performance.

Pre-Spanner, Google's solution was to use application-specific logic to be multihomed -- i.e. nearly rewrite your application, depending on how stateful it is. Spanner isn't a silver bullet either. You still have to solve latency problems, just within the ontology of Spanner rather than the application.

It's bad for your code to ignore latency within the data center, and then later add (incorrect) hacks to work around latency between data centers. If you pay attention to network boundaries from the beginning, it will be easier to multi-home.

2. A single machine is still your domain of failure. Even if it doesn't matter for performance, you still have think about machines to handle failures.

The interfaces between machines should be idempotent to handle failures gracefully. And many distributed storage services have complicated performance vs. durability knobs for how many machines/processes have accepted writes.

So I think Google does have a "single system image" bias, and you are right that Kubernetes has these Google-isms in its architecture.

zaroth · on April 7, 2015

I have serious trouble with [2]. Disks not evolving as fast as the network? Under what rock have they been living?

The paper seems to peg local disk bandwidth at 150Mbps, and then compare it to remote network disk access at... 150Mbps. NVMe is going to grant us 2.2GBps bandwidth and 450K IOPS (from a single consumer-grade product), so that paper is off by more than an order of magnitude. Local disk is non-volatile storage sitting a PCIe lane away from your CPU. I just don't see how disk locality is not going to be crucial for many workloads, for decades to come.

In 2020 a flash-only SAN isn't going to deliver 20Gbit/sec to each of 100 blades in the rack. A 4TB NVMe card on each blade will though...

Look at Intel's latest Xeon-D SOC, yeah, it's got Dual 10GBE, but you're not going to get 7.7GB/s over that... [1]

[1] http://www.intel.com/content/www/us/en/benchmarks/server/xeo...

brazzledazzle · on April 7, 2015

You should look at their measurements and assumptions in context. As you can see from the URL, it was written in 2011 when the NVMe working group was first formed. It was also written in the context of cluster-based applications in a data center and specifically mentions SSD and cost effectiveness. Storage cost effectiveness is critical at these scales because your data is growing by terabytes per day.

You also mention blades, which goes into the next point of context which is that operations like Google and Facebook don't utilize blades like you might expect working at your average enterprise because they aren't leasing rack space or working with a limited amount of physical space. They don't need the same U to performance ratio so they can save money by using commodity hardware. Their applications also scale readily, so the loss of entire boxes is meaningless within a certain threshold.

raksoras · on April 7, 2015

Why does k8s has this restriction that each pod/minion should be in its own subnet?

brendandburns · on April 7, 2015

Clarification:

Each pod has it's own IP address that is routeable anywhere in the cluster. This makes life much easier because you don't have to do port-forwarding onto the host node.

In all current k8s set-ups, each Minion/Worker node has a subnet that it allocates these Pod IP addresses out of. This isn't a hard requirement necessarily, but it tends to be much easier to make this work, since you only have O(Workers) routes to configure instead of O(Pods), but long term, I think we would rather do away with subnets per node, and simply allocate IP addresses for each Pod individually.

atombender · on April 7, 2015

Re IPs, you could just go with private IPv6 addresses, could you not?

thockingoog · on April 9, 2015

Yes, in theory, but we have not really tackled IPv6 yet. Soon :)