what would you recommend to use for container orchestration and service discover...

oldmanhorton · on June 6, 2018

I believe the parent comment was trying to say that most people don't need service orchestration or discovery

tnolet · on June 6, 2018

Yes. That is exactly what I meant. Mea culpa for not being more clear.

pacala · on June 6, 2018

My team runs a grand total of 1 server, 1 daemon and up to 5 concurrent batch jobs. Sometimes we bring up 1 extra server for experimentation.

It is handy to not break the bank and use GKE autoscaling for the batch jobs. Surely we could setup some batch job manager over some raw VMs, or maybe spend some time to select and learn to use yet-another AWS service.

It is handy to go to the GKE "Workloads" page and see how many servers are currently active, using how many pods. Surely we could pull that off with raw VMs, but it would take me a few minutes extra / a brittle script to pull it off.

Do we _need_ Kubernetes? No. It makes us more productive? Yes. Am I confident that in the next assignment I'll be able to reuse my skills, regardless of what VM provider they use? 90% yes, pending EKS becoming usable.

Echoes of "you don't need C, just use Assembly, it's simpler and faster" from early '90s gaming industry.

brodsky · on June 6, 2018

no worries; that makes sense and I agree with you.

(it's strange that my question resulted in downvotes).

firebacon · on June 6, 2018

I think the recommendation is not to use another orchestration service, but to keep the infrastructure simple and not use any orchestration or service discovery at all.

Adding service discovery and container orchestration will probably not make your product better. Instead it will add more moving parts that can fail to your system and make operations more complex. So IMO a "containerized microservice architecture" is not a "feature" that you should add to your stack just because. It is a feature you should add to your stack once the benefits outweigh the costs, which IMO only happens at huge scale.

Most people know that "Google {does, invented} containers". What not so many developers seem to realize is that a Google Borg "container" is a fundamentally different thing from a Docker container. Borg containers at Google are not really a dependency management mechanism; they solve the problem of scheduling heterogenous workloads developed by tens of thousands of engineers on the same shared compute infrastructure. This however, is a problem that most companies simply do not have, as they are just not running at the required scale. At reasonable scale, buying a bunch of servers will always be cheaper than employing a Kubernetes team.

And if you do decide to add a clever cluster scheduler to your system it will probably not improve reliability, but will actually do the opposite. Even something like borg is not a panacea; you will occasionally have weird interactions from two incompatible workloads that happen to get scheduled on the same hardware, i.e. operational problems you would not have had without the "clever" scheduler. So again, unless you need it, you shouldn't use it.

I think the problem that Docker does solve for small startups is that it gives you a a repeatable and portable environment. It makes it easy to create an image of your software that you are sure will run on the target server without having talk to the sysadmins or ops departments. But you don't need kubernetes to get that. And while I can appreciate this benefit of using docker, I still think shipping a whole linux environment for every application is not the right long-term way to do it. It does not really "solve" the problem of reliable deployments on linux; it is just a (pretty nice) workaround.

barrkel · on June 6, 2018

Without service discovery, you end up generating configuration all over the place just to wire together your services, letting everybody know about which IP address and port where - super tedious. Add in multi-tenancy, and you're in a bunch of pain. Now try and take that setup and deploy it on-prem for customers that can't use cloud services - you rapidly want something to abstract away the cluster and its wiring at the address allocation level.

firebacon · on June 6, 2018

I'm not sure I understand this. You have a product that is split into many different components, and when you deploy this product to a customer site, each component runs on different hosts, so you have a bunch of wiring up of service addresses to do for every deployment?

Could something like mDNS be a lightweight solution to that problem?

And also I am genuinely curious how kubernetes would solve that. When you install kubernetes on all of these machines, don't you have to manually do the configuration for that either? So isn't it just choosing to rather configure kubernetes instead of your own application for each deployment? If it is that much simpler to setup a kubernetes cluster than your app, maybe the solution is to put some effort into the configuration management part of your product?

kross · on June 7, 2018

I'm not sure you understand the value k8s proposes based on your comments throughout this entire thread.

Managing many nodes is the reason orchestration software exists. Your suggestion to "put some effort into configuration management" is effectively naive, homegrown orchestration.

Build or buy? That's the same argument - except k8s is free and available from many providers.

shaklee3 · on June 7, 2018

Kubernetes service Discovery is just dns, so it sounds like you're doing the same thing, but by exiting CNAMEs.

clhodapp · on June 8, 2018

Ah, if you look under the covers, Kubernetes service discovery is actually kernel-space virtual load balancers running on each node in your cluster. The DNS record just points to the "load balancer".

dbmikus · on June 7, 2018

If your docker containers all build off of the same OS and then you run them on a host using that OS, won't the containers share the lower filesystem layers with the host?

aphexairlines · on June 7, 2018

Amazon describes its internal service architecture here:

https://www.youtube.com/watch?v=dxk8b9rSKOo

Services use an internal tool called Apollo to describe the service, deploy it to EC2 VMs, and auto-scale. Apollo inspired AWS CodeDeploy and AWS AutoScaling.

Services are reachable via load balancer sets similar to AWS ELB/ALB/NLB.

You reach a service by the LB-set's DNS name.

If you don't want to use AWS or AWS-specific deployment tools, you could use Puppet Pipelines to do the same on Google Cloud or Azure. Puppet Pipelines was built by people who previously built some of those internal Amazon tools and offers similar functionality but cross-cloud.

And if you want even fewer moving parts, just go PaaS or serverless.

auslander · on June 7, 2018

Never heard of Apollo, had to dig :) In short, it is a predecessor of AWS CodeDeploy.

www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html