Hacker News new | past | comments | ask | show | jobs | submit login
Google admits Kubernetes container tech is too complex (theregister.com)
573 points by pjmlp on Feb 26, 2021 | hide | past | favorite | 432 comments



I work on a 3-person DevOps team that just finished migrating ~20 services from GCE vms running docker-compose to GKE.

It's taken us a little over a year. Partly because K8s has a steep learning curve, but also because safely transitioning services without disrupting product teams adds a lot of overhead.

The investment is already yielding great returns. Developers are happy. Actual quote: "Kubernetes is the biggest quality-of-life improvement I've experienced in my career."

The same is true for me as a DevOps engineer. I never have to write a script to orchestrate a rolling deployment ever again. I'll never have to touch Puppet or wrangle OS upgrades. I no longer need to use Terraform to scale out a service.

Now that we've completed this transition, one of my goals for the next year is to get the team to a place where we spend 75% of our time on higher-level projects -- working on developer tools and Kubernetes extensions and setting site reliability standards, instead of the firefighting and operational work that has traditionally taken up so much of our time. Couldn't be happier about this change.

... That is until management decides we're overhead and fires us to free up budget for feature developers. ;)


... Until they hire you back because entropy exists :P

I'm on the DevOps side as well going through the same transition, k8s also allows insane customization, and I have some colleagues that are delaying our rollout unintentionally so they can play around with developing more tooling for deployments which is really frustrating. The k8s scene seems to be filled with constant scope creep and refactoring to get it just perfect before use. Either way, I agree the benefits far outweigh this annoyance that I've experienced. I'm so excited to work on developing tooling instead with my time.

However, I don't think we're free entirely from managing servers the old way with Chef / Puppet / Ansible, unless you're purely hosted there's still the rule of thumb you shouldn't run services that hold state in k8s. But with persistent vol's I do see that changing, though I'm not sure if everyone agree's that's a good idea.


"I have some colleagues that are delaying our rollout unintentionally so they can play around with developing more tooling for deployments which is really frustrating."

My impression is that the primary purpose of Kubernetes is to give SRE teams political air cover to rewrite a lot of their existing processes. Whether Kubernetes is actually required for that, or even net superior seems questionable. This unsexy work becomes justifiable because it's coupled to a mainstream accepted tech modernization.

You see this same phenomenon with database migrations. Where what the team really needed to do is just rewrite an app to use the existing database properly. But no one is going to approve that work. So what happens is people convince themselves that the existing tech sucks and use that to rationalize doing the rewrite. The result ends up not always being net superior, because sure you did the rewrite but you are also eating the operational cost of integrating a new technology into the org.


Yes, the number of proposed db switched I’ve seen is remarkably high. I once interviewed for a role as a database developer and was confused to find out they didn’t have the database that the role pertained to. One of the early questions in the interview was how quickly I could migrate production from ms sql to pg. Needless to say that was a gigantic red flag and I hope they found the right person for that job.

I’ve also seen a switch from rdbms to Hadoop because a company had “millions” of rows. Luckily on this one I only had to rewrite a handful of queries.


I've got relatively modestly-specced SQL Servers handling tables with hundreds of millions and even billions of rows without breaking a sweat. Somebody either just really wanted a new toy to play with, or has no idea what indexes are.


Exactly, I’ve seen Sql servers handle billions of rows with 2 thousand columns. I think also people that work too long at one company don’t realize how problems were solved elsewhere.


> I’ve also seen a switch from rdbms to Hadoop because a company had “millions” of rows. Luckily on this one I only had to rewrite a handful of queries.

Wat. That's gross. It probably costs them more per query now than the rdbms did.


Didn’t even think about that part because I don’t know too much about Hadoop other than it seemed impractical.


I guess millions sounded like a lot to a decision maker :)


> So what happens is people convince themselves that the existing tech sucks and use that to rationalize doing the rewrite.

That certainly is a thing that happens, but you could use that to dismiss any technology at all. In the case of Kubernetes, it makes operations a lot easier to the (important) effect that the development teams can do a lot of their own operations work. This is important since they're the ones who are empowered to solve operations problems and it also eliminates the blame game between ops and dev. Further, it eliminates a lot of coordination with a separate ops team--the dev teams aren't competing to get time from an ops team; they can solve their own problems, especially the most common ones. This also has the nice property of freeing the SREs to work on high-level automation, including integrating tools from the ecosystem (e.g., cert-manager, external-dns, etc).

Kubernetes certainly isn't the final stage in the evolution, but it's a welcome improvement.


> but you could use that to dismiss any technology at all

No, you can't; you need three (-ish) factors:

1. The technology is sufficiently incompatible with what you're currently using that you need a rewrite to use it (eg, this generally doesn't happen with gcc -> llvm, for example).

2. The technology is sufficiently (faux-)popular that it's possible to convince a pointy-haired boss that you need to switch to it (eg, this won't work with COBOL anymore, though unfortunately it successor Java is still going).

3. The technology sucks.

And really, if you want to dismiss a technology, point 3 ought to be enough all on its own (particularly since that's presumably the reason you want to dismiss that technology).


I think in your eagerness to 'gotcha' me, you missed my point. :)

Anyway, we're trying to assess Kubernetes' value proposition (i.e., to answer "does it suck?"). If your system for answering that question depends on already knowing the answer, it's not a very useful system.


> we're trying to assess Kubernetes' value proposition (i.e., to answer "does it suck?").

Well, I'm not, since I already know that, but if you don't know that yet, then your position makes more sense. (That is, using "dismiss" in the sense of finding out that it sucks, rather than (as I read it) in the sense of justifying a refusal to use technology that you already know sucks.)

Unfortunately, due to market-for-lemons dynamics, it's usually not possible to convey knowledge that a particular technology sucks until things have already gone horribly wrong. See eg COBOL or (the Java-style corruption of) Object Oriented Programming.


> you are also eating the operational cost of integrating a new technology into the org.

and in short order you will reap the savings of being able to hire people who already know your devops/infra tech stack, and can hit the ground running. not to mention being able to benefit from the constant improvements that come from outside your org.


Maybe. I don't buy that just because people are on Kubernetes they won't still kludge it up with custom in-house scripts or "extensions". Give it time.


Ha, are you me? I really pushed for us to follow the "change-as-little-as-possible and ship to prod quickly" route. Prod is where things get hard, and it's better to find out what's hard sooner rather than later.

We are running a handful of stateful services in K8s (things like MongoDB for which GCP doesn't have a compelling and affordable managed offering). It's definitely more complex than transitioning a stateless service, but so far our experiences with StatefulSets and PersistentVolumes have been good. And this allows us to sunset Puppet/OS management completely. I should note that we _are_ being extremely careful about backups. We also run each stateful service in a dedicated node pool for isolation. Who knows, maybe a year from now we'll be shaking our heads and saying "that was a TERRIBLE idea" but for now, so far so good.

We're running on GKE, so lots of things that would be hard in on-prem environments (ingress, networking, storage) are easy.


> We're running on GKE, so lots of things that would be hard in on-prem environments (ingress, networking, storage) are easy.

Agreed. The on-prem story is still really messy, but I think there's a lot of third-party work to build on-prem distributions that are cut and dry. Unfortunately, there are lots of them right now and it's not clear what the advantages and pitfalls are of each. Things will settle and this problem will be solved with time, but for now it's quite a pain point.


Management never hires people back, that's admitting failure.

They hire other people with the supposedly same skillstack and then have them rebuild it from scratch.


"Kubernetes is the biggest quality-of-life improvement I've experienced in my career."

Can you elaborate? What exactly about Kubernetes improved the developers life?


1. Reliable rolling deployments.

We didn't have these before. Yes, you can implement them without K8s (I have, at other companies) but to get the full set of features K8s provides, such as deploying N services in parallel, taking no more than X% of your capacity offline at a time, short-circuiting in the event the app is dead on arrival, connection draining with timeouts, you end up with a VERY complicated multi-threaded codebase.

2. Seamless horizontal scale-out.

Want to scale up your app in a test environment from 3 replicas to 6 to do some performance testing? This used to be a DevOps ticket that would take a few days -- DevOps engineer tries running Terraform, but oh no, a CentOS package update seems to have broken our Puppet manifests and we have to fix that. Now the developer makes a PR to the GitOps repo where they adjust a single YAML setting.

3. GitOps/ArgoCD.

ArgoCD is possibly my favorite piece of software of all time. It provides incredible visualizations for what's happening with your Kubernetes infrastructure. It really increases your confidence and trust in the system to be able watch a deployment rollout or scaling operation happen in real time. ArgoCD makes it spectacularly obvious when something has gone wrong -- you still sometimes need to go spelunking through the GCP console or use kubectl to inspect resources, but to a much lesser degree. I cannot emphasize enough how magical it is.

These are all things that our initial implementation has delivered, we are also planning to try to leverage K8s for things like on-demand pull request preview environments, hosted developer environments, and canary deploys that are MUCH harder to implement in a world without K8s (trust me, I've done it).


Yeh this list is not surprising and none of these things you mentioned exclusively require Kubernetes or running containers in Prod. Like I mentioned in another comment, it feels like the purpose of Kubernetes is that it's "brand" provides political cover to introduce these practices to the engineering org.


I don't know what to tell you except that with years of experience implementing this infrastructure outside Kubernetes, letting Kubernetes handle it is cheaper. That doesn't mean Kubernetes is a good fit for every organization or workload.

You can roll your own anything with enough time and manpower. Whether it makes sense to do so depends on your circumstances.


Well for example, I'm a lead devops on a team of 2-3. We have 6 core services.

For the last several years, due to static load we've been fine with 2 instances per service. Now that we want autoscale (even though all the real load is really just the DB), it seems we could get ourselves onto AWS autoscale in ~1 month, though it would require some coding.

Spending 3 devops on 1 year doing something is a red-flag to me. If your changes save devs 1 hour per deploy, it'll pay off 6,000 deploys from now.


It's not the only thing we've been doing, just the most important.

I think it's great you guys have 2.5 dedicated people operating 6 services across 12 prod instances, with that kind of ratio our team would be 15 people!


I agree with your point overall, I just want to add that some DevOps efforts aren't to save DevOps time, but to prevent errors.

When you make a change manually, there's a chance that you forget something, have a typo, etc. Those problems disappear when those same things are automated.


None of it requires kubernetes, kubernetes just does it out of the box better than any hand-rolled custom stack that tried to do the same. Its a true joy to work with compared to the cobbled together stuff I've built and/or used in the past.


I agree that ArgoCD is amazing. What's so beautiful is how well it integrates into K8s. Every element of ArgoCD is a CRD (custom resource definition). It's less of a piece of software (other than the interface) and more of an extension to K8s to provide Continuous Delivery. Saying ArgoCD does not require K8s is like saying you don't need Photoshop to run a Photoshop plugin.

I've used Puppet, Chef, and Ansible in production over the last decade. For me, Ansible replaced Puppet and Chef five or six years ago. For the last two years the only thing I have used Ansible for is to manage desktops and Android boxes AT HOME. Configuration Management systems like Ansible are fun and cool, but have been mostly obsoleted by the combo of Terraform and K8s. If you're on AWS, Kops can replace a lot of what Terraform does.

Kubernetes makes running services easier and more reliable. I've spent more time learning Istio that K8s itself. The learning curve of K8s is minimal compared to Drupal.


It is not a brand, it's when all the capabilities come together in a single package with one general way to do things that you get the benefits. You could do all this stuff with IaaS but there were too many ways and not enough lines in the sand between dev and ops in the IaaS world. IaaS should be considered a legacy approach.


You’re using hosted k8s! That explains everything! I was so confused how a devops person wasn’t cursing the name. Implementing and managing k8s is where all the complexity and headache lives.


Updated original comment to clarify :)

Although, to be fair, the article is about GKE.


Have you seen kops? rancher? loft? I don't think its that terrible to manage k8s at all using the tools available to you. k8s definitely used to be be difficult to manage but that isn't the case anymore.

But literally every cloud provider out there has a managed solution so at this point you really only need to do it for DC work or if you like to do it.

Amazon - EKS Google - GKE Azure - AKS

anything beyond those 3 is a rounding error but...

Linode - LKE DigitalOcean - DigitalOcean Kubernetes IBM Cloud Kubernetes Service Rackspace - KAAS


I've used Rancher and while it is very nice it still punts a lot of the hard bits like interacting with the outside network, vm creation, and storage (especially shared persistent storage) to the ops team to figure out.

Creating a big-ole cluster of app servers was never really the hard ops problem which is what k8s does really well.


Creating the cluster of servers isn't the value of kubernetes. We all had clusters of servers well before k8s, openstack, and things like AWS.

The benefit of k8s is the orchestration of those clusters. Spinning up 6 new http servers and getting them added to the load balancer automatically. Generating 4 new memcached nodes and getting them registered in DNS so clients pick them up and add them to the hashring.

The benefit of k8s is the scaling and elastic capabilities. It can trigger vertical scaling by spinning up larger pods or horizontal scaling by adding/removing pods.

Anyone thinking that people are using k8s because it can create app servers doesn't understand why anyone is using k8s. If all we needed to do is create a cluster of app servers, we wouldn't be using k8s.

That being said, a cluster of app servers still needs orchestration and config management and we had a ton of crazy solutions for that prior to k8s.


I previously managed my companies Kubernetes clusters using Kops on AWS. My company switched to Google, so I had to move everything to GKE. I miss Kops, it gave me more control and made life easier. I don't want to use the old crappy deprecated DNS server, but with GKE that's what you get. I don't get to control what goes on the masters on my clusters because they're not mine anymore. Having GKE do the masters means more headaches and complexity because I no longer have control. I love K8s, but GKE does a pretty poor job implementing it. It boggles my mind that the main company behind K8s is so incompetent in implementing it.


Its a bit of a pain, but you can definitely switch to CoreDNS on GKE if you need to. Also as someone who has used GKE, EKS, MicroK8s, and Minikube, I would say that the easiest to use implementation of Kubenetes as a Service is GKE. Not saying its easy or the worlds most thought out product, but I gotta give google just a tiny bit of credit for at least being easier to use then the competition.


If your company is implementing and managing their own k8s clusters they are doing it wrong. Use a cloud hosted solution.


Can only speak for myself personally, but having gone through a similar moment, it was 50% docker QoL improvements and 50% k8s QoL improvements -- most significantly around prod/dev parity. Every engineer in my org really did gain a significant amount of confidence around being able to spin up and deploy new services (when necessary) without being nervous about something going wrong in prod that wasn't configured properly in a lower environment.


I'm really happy with the local development story too. We use skaffold to pick and choose which resources we run locally, and then we use the same definition to generate our artifacts which keeps things in line.

It reminds me of docker-compose file, except I can publish it too.


> "Kubernetes is the biggest quality-of-life improvement I've experienced in my career."

Just wait until the developers discover Google App Engine, Heroku, or DigitalOcean App Platform.


Yeah, I think thats the point though. Kubernetes enables the orchestration and observability of a PaaS in a much more flexible way so that you can get all of that while still matching the requirements of your business.

I think Heroku and DigitalOcean App Platform are still going to be popular for small setups (as will things like Amplify) but when you outgrow those (or realize you are paying too much for them) then Kubernetes is a reasonable option.


+1. One benefit I see of Kubernetes is that it can handle pretty much whatever you throw at it, so you can run everything in the one service.

Need to run a bespoke database, yeah it can do that. Need to migrate an old service running in a VM that needs a disk, it can do that too.


Oh man, I can't believe we didn't think of migrating our mature 20-service stack with complex hardware, security, and compliance requirements to App Engine. Dang it. I should've known to consult HN first!


How much of this is specific to Kubernetes and how much is more accurately because you moved from managing your own VM infrastructure to using managed services? Cloud providers have (often long had) ways of saying "run this code" without managing a VM and without involving kubernetes, including with docker support. And they are often very easy to use though not without their faults. An example being Azure app service.

I do agree that kubernetes is a pleasant experience from an application developer perspective with an existing cluster, but in my experience it was not without excessive pain and long hours by those administering the cluster. A year doesn't surprise me in your case, which brings to mind this question.


> I no longer need to use Terraform to scale out a service.

Are you still using Terraform to specify your deployments? From experience, you need something there to manage all your yamls: deployments, configmaps, secrets. Especially if you have multiple environments.


As someone with limited experience with containers, how does K8s allow you to move away from things like Puppet for configuration management? Does it offer some substitute that alleviates the need for something like Puppet or Ansible?


Yes because whatever you used for app specific configuration like libraries and packages is now done in the Dockerfile and containerized. So the same thing run locally is run in the cloud. Then as far as the infrastructure for running code such as load balancers, service discovery, docker.. That is all given to you just by running K8s. So you are more concerned with shipping immutable containers to k8s than provisioning "machines".

Then you can focus on containers which can be run, tested and built wherever without the fear of broken updates or one thing stepping on another. We found back in the days of ansible and chef that we had very low confidence in upgrading hosts live. So we would then do immutable hosts and blue green deploy them to production. But why think in the scope of hosts and VMs when really you have some application that needs to run somewhere.

K8s IMO isn't the end all, I think eventually we will get to something that doesn't need containers at all and you run just processes. But it is a good step for now. Also once you have your stuff containerized it makes other non k8s stuff easy like AWS Lambda

Edit: Also yes you can use those to set up generic k8s nodes but when we ran bare metal we used kubeadm to make coreOS immutable nodes. I don't think that is used anymore haven't checked but really the best way to set up k8s is to deploy really thin hosts that have nothing but Docker and k8s. VMware and others have solutions for this too where you don't have to mess with building hosts.


Thanks for the detailed response. It looks like I've still got a lot to learn - I've just lately been playing with LXC to get more familiarised with containers. I've previously looked at Helm apps and they seemed to be very similar to Puppet manifests. From what you said it seems like the approach is to have immutable containers for each application, set up via Dockerfiles, which somehow also simplifies the upgrade process? Does that mean you just deploy a new version/container of an application linked to the same underlying database (for example) when you need to run an upgrade?

So if you had a fleet of ten containers running the same application in a load balanced config, I'm guessing you'd need to upgrade all of them at once (with downtime) rather than upgrading them one by one (because then the database would be inconsistent)? I'm assuming that since the containers are immutable the data is stored elsewhere.


Helm is so bad it raises my blood pressure just hearing the name. Helm tries to apply the old way of doing things (like as you said Puppet) and makes it worse than ever. K8s yaml config is simple and elegant, don't try hiding it under templates. Kustomize is the proper way of working with K8s yaml. Helm fights against it.


Helm isn't great (god, Go templates, shudder), but I love being able to bundle all the manifests for an application together with well-documented parameters/values... If you want do something like conditionally switch from a LoadBalancer service to Ingress in a test environment... I have no idea how you'd handle that in Kustomize, but it's straightforward in Helm. Ultimately it seems like you end up with the same complexity, just expressed via 100 layers of Kustomize transforms instead of with a conditional in your template.


There are two different types of patches in Kustomize. It does take a bit to get used to it as it's different. jid (Interactive jq) makes the process a lot easier. It's actually pretty easy to make changes like this.


> So if you had a fleet of ten containers running the same application in a load balanced config, I'm guessing you'd need to upgrade all of them at once (with downtime) rather than upgrading them one by one (because then the database would be inconsistent)?

That depends entirely on your application and the upgrade itself. Assuming we are discussing 10 different containers (e.g. 10 micro-services), k8s will normally update them in parallel but not atomically, it would be up to apication or deployment time logic to ensure they are updated as a single 'transaction'. If they are 10 copies of the same container, then k8s itself has tools for rolling upgrades where you can control the rollout.

Also, depending on application logic, the upgrade could be done in such a way that there is no need to synchronize the services, they could work with the DB as is.


In Kubernetes, deploying 10 containers takes a minute or two. I haven't worked with incredibly large deployments, but really deploying any amount of containers could easily take a minute or two if you have enough nodes. There is no downtime. Database inconsistency can cause problems but also any problems like that can be mitigated by doing a two-phase change to the database, and such changes are pretty rare and also devs instinctively avoid making those kinds of schema changes.


You get the load balancing for free in K8s and rolling deploys. What you do is upgrade the deployment with a new docker image, and yes immutably it is replaced. In a case of an HTTP service, k8s will wait until a pod (container) responds healthy until it is put in the loop. Then it steps down old pods according to your rolling deploy metrics and replaces them. You can define what that is like having a max number of pods with a minimum number of pods up.


> So the same thing run locally is run in the cloud

Who is preparing Dockerfiles? Developers and system administrators / security people do not generally prioritize same things. We do not use k8s for now (therefore I know very little about it), so this might not be relevant but how do you prevent shipping insecure containers?


Generally developers. When running in a container most of the attack surface is the app itself, and if it is compromised the damage is supposed to be limited to the container. There have been container escape exploits in the past though. But with a container you treat the container as the thing that you run and give resources to and don't trust it just like if you were running an application. All of the principles of giving an application resources such as least privilege apply to containers too.

But since you are not running multiple things or users in one space in a container, something such as an out of date vulnerable library can't be leveraged to gain root access to an entire host running other sensitive things too.

In Kubernetes and docker in general one container should not be able to compromise another, or k8s. But there are other issues if an attacker can access a running container such as now having network access to other services like databases. But again these are all things that can be locked down and should be even if provisioning hosts running things.


You still need something to provision the base OS and all the stuff under K8s (docker daemon, ntp, storage, networking, etc.) that it relies on, unless you go with a fully hosted solution.

Ansible or Puppet still excel at that kind of work.


And it looks like the parent went with a hosted solution which explains everything. Having to manage all the underlying services that k8s glues together is a huge PITA.


It enables 12 factor apps, and if you are using a cloud provider there is no infra setup, so you do not need puppet/ansible/etc. It's a better way to deploy apps hands down.


One great boon is to use Kotlin code to factor deployment descriptors into reusable parts that are somewhat statically type checked. I wasn't responsible for the migration from legacy system to Kubernetes, but that seemed to me a big win. Aside from being able to update the whole, and only parts of the system so trivially.

For me the scariest part would be that it's not tech I would use (i.e. update the descriptors) regularly, so what if something goes wrong, how quick would I be able to identify the problem. I have no answer to that because I'm off the project.


> ... That is until management decides we're overhead and fires us to free up budget for feature developers. ;)

It's a real risk, at least perceptually. Mitigate it by documenting the number of developers' hours you save with automation and better infrastructure. It's pretty hard to argue against a trend of increased developer productivity. Include your own hours; you're targeting saving 30 hours a week of your own time and that frees you up to improve other things even faster.


You spent more than 3 person-years migrating twenty services from one container orchestration system to another and you think it was worth it? Not sure that seems worth it to me...


Nope! We went from not having a container orchestration system to having one.

If you're determined to be skeptical of the value-add from Kubernetes, my random post on HN ain't gonna convince you :) The developer experience improvement is obvious to people actually interacting with K8s.

I do want to clarify that this isn't the only thing we accomplished in the past year. Just the thing we shipped that was our biggest priority and had (by far) the biggest impact.


If you have about 20 developers spending 10% of their time fighting with your current system it adds up fast. I mean 10% may be high but 20 may be low. It depends a lot on the exact situation if it was worth the time.


That was my experience with Kubernetes when I set it up for a QA cluster a few years back. That learning curve is more like a sheer cliff face but once you get to the top the view is really nice. But getting there is a hell of a hike. It's easier now since every major cloud has a managed offering but my journey was pre-EKS so it was doubly as intense for me.


> That is until management decides we're overhead and fires us to free up budget for feature developers.

Snake eating it's own tail.

But not only do I not understand K8s, I'm also an idiot.


I’m tired of the complainers of the complainers. First a technology seems cool, then after engineers become experienced with the tool the warts rear itself and you get a vocal group of complainers.

It doesn’t end there... after complaining for a really long time two things happen. First off so much time has passed that you get these domain experts (Devops people) whose entire job is to mess with kubernetes. Second the complainers have been complaining so long that people get tired of it.

You now get people who are so tired of listening to people complain whose entire job entire job revolves around kubernetes that these people now start complaining about the complainers.

It happened with JavaScript. Javascript was around for so long people started complaining about the complainers. In fact it’s been around so long that a whole generation of people who’ve never used any other language was literally born. These people started off as the complainers against the complainers but now they outnumber the complainers so you rarely see people talk shit about JavaScript anymore.

Actually JavaScript has been around so long that the entire language has changed and part of the terribleness was fixed by making another language (typescript) compile into JavaScript.

Which brings me back to kubernetes. Kubernetes is a bad tool with no alternative precisely because it requires a 3 man dedicated team a year to get things up and running.

A good tool would be something like allows me to to get it up and running in a week just by reading some docs. Even better an hour. Could such a tool exist and replace Kubernetes? Yes. Does such a tool exist? No.

I am a complainer and you are a complainer of complainers. What will likely happen some time from now is two possible things. Kube will be so integrated into the infrastructure ecosystem that wrappers will be written on top of kube just like how react and typescript have replaced JavaScript. If that doesn’t happen then a whole new tool will replace it.

I’m sorry to say but the ideal we are shooting for here is a tool that will ultimately make Devops a general thing that all developers can deal with rather then an entire specialist team. Again no such tool exists yet but it certainly can exist, especially when the inventor of the tool has become a complainer.

If the inventor of the tool becomes a complainer, that validates the complainers. And now the complainers of the complainers have nothing left to say.


There already are a number of platforms that make deploying and operating apps very simple! Heroku, AppEngine, Lambda, so many others.

Your 5-person startup does not need a DevOps engineer. Your 50-person org should start thinking about it. Your 500-person company probably needs multiple DevOps engineers -- having every dev team independently figure out how to handle things like deployments, reliability, and security is chaotic and wasteful. There are a lot of details that only start to matter at scale, and both Kubernetes and dedicated infrastructure teams are for this use case.


Yeah my point is, if you’re migrating to kube without using some service that handles all the details, even the 5 person start up needs a dedicated devops team.


Nice! Thanks to you we just reached level 3: The complainers of complainers of complainers! Level 4 is coming!


> A good tool would be something like allows me to to get it up and running in a week just by reading some docs. Even better an hour. Could such a tool exist and replace Kubernetes? Yes. Does such a tool exist? No.

Instead of complaining, why don't you build this tool? That's the problem I have with complainers.


You're complaining about me complaining.

Instead of complaining why don't you build me the tool to stop me from complaining? It's the same reason why I'm not building the tool.

That's the problem I have with complainers complaining about other complainers. Why don't you guys do something about my complaining rather then complain about it?


Why complain about the complainers? Because some complainers will never be happy no matter what the state of the world is. Ironically your solution (build more things) is the exact thing the complainers complain about with regard to the frontend world. They complain too many things have been built. You can’t ever satisfy everyone.


I understand their rationale. We manage thousand Kubernetes clusters and end-users can find lots and lots of creative way to shoot themselves in the foot:

- I can store anything in a secret? Let's have thousands of cat images. Etcd then stops working because we have over 2GB of funny cats in the key store.

- I can run a root Pod? Lets mount the docker socket and start building images with it. Oh and by the way, I never clean those up and my Node simply fills up. Also I add some additional docker networks that break Pod to Pod networks.

- Istio is nice - why we don't add automatic injection for Pods in all namespaces? Including kube-system? And then they brick kube-proxy and the cluster stops working.

- I can use validating webhooks for better security? Lets watch on all resources. To keep it more secure lets set the failure policy of the webhook to Fail, so we never admit any modification without the apiserver to make a call to out webhook. Whats that? My single replica webhook has was evicted from the Pod (we didn't add any resource requests and limits) and now it cannot even be created or scheduled because kube-controller-manager and kube-scheduler cannot update their lease and they lost leadership and now are idling, effectively bricking the entire cluster.

Google would reduce the pain points with this change, however they would still face countless other issues with Kubernetes.


I fully agree with your points and would sum them up as "Kubernetes has a steep learning curve, a (quite) large interface and ample opportunities to shoot yourself into the foot with it" (plus, they're very funny).

However playing the devil's advocate here: If you actually took the steps of learning the basic abstractions, then for me it's really hard to see what you could still get rid of.

If you actually go all-in and fit your application to the principles of Kubernetes-native applications (instead of the other way around), then it works nothing short to amazing.

We're running 120 microservices in GKE and the difference to our custom-built setup before is night and day. I let my Infra team go surfing together for two weeks because without changes it flies mostly on autopilot.

Let's not kid ourselves, distributed computing is _hard_ and Kubernetes is a testament to that. I'm not saying it can't be made more accessible by further standardization, but there are fundamental limits to how easy it can be made.

Which by the way is leading to my only pet peeve with it: I feel most of the complexity of K8S comes from the fact that it got hyped as an enterprise product and then lots of features were built that support shoving your non cloud-native workload into Kubernetes even if it was never designed for it.

If you don't do or need all of that, the amount of interface, complexity and footguns shrinks significantly. Maybe it's time to better pull them apart in the documentation.


> If you actually took the steps of learning the basic abstractions, then for me it's really hard to see what you could still get rid of.

This argument basically sums up to "Developers just need discipline, and stop blaming the tools". While this is a sound argument on paper, the intrinsic complexity of software systems make it hard to pin the blame on developers. BTW This is the same argument Uncle Bob makes which is not so popular with many mainstream developers.

You're right about feature creep in k8s though.


I get what you're saying, but my point is a bit more nuanced:

If your goal is to build highly reliable and available services to end users that are secure and scalable with a team of more than 10 engineers, eventually you will run into more than 50% of the concepts in Kubernetes anyway and end up re-inventing them.

Scaling up and down, node draining, finding out whether services are healthy, RBAC, resource distribution, secrets management, service hardening, introspection capabilities, explicit declaration of dependencies and endpoints and many, many more.

My point is: Sure, if your goal isn't that, it doesn't make sense to start out using Kubernetes.

But if at least eventually that's what you need, imho it's way preferable to just learn and apply well proven abstractions instead of reinventing the wheel along the way and end up with a less maintainable, capable and standardized solution you won't find anyone for maintaining.

If I hear about some of the comments here suggesting to "just spinning up docker-compose with Traefik in front" (disclaimer: I really like Traefik), then that reminds me of how some of the ops mess started that I historically had to care for.


Agreed, the truth usually lies somewhere in between and my point was we can't absolve the tools/ecosystems and put it on squarely on the devs. That definitely doesn't absolve teams and they need to do their homework before jumping on the bandwagon. K8s is great if you know what you're signing up for.


We are too proud to sometimes admit we simply „are not good enough at something”.

It’s always easier to shift the blame somewhere else.

Thats why some radical but correct concepts are so hard to push.


IMO fixing BEAM VM to easily work on cluster could be better for distributed systems than k8s.


That would do nothing to help the 99% of users who don't use BEAM VM languages


They'd have a reason to start using BEAM VM languages, though.


To a person who isn't using Erlang or ASP.net, suggesting that we should use either of those language packages and it will solve any of our problems without creating a thousand new ones sounds equally non-starterish to me.

To add a counter-example, I have lots of Ruby experience and I've just joined a Go team. I won't tell them to use Ruby, I will just do it where it makes sense and saves us time. (And then we'll have two problems... enter "limiting blast radius")

Point of my counter-example is, I'm extremely skeptical that all the world's problems can be solved by adopting a new mono-culture, whatever it is. There are 100% always gonna be some problems that are better solved in a different language. PHP is the best way to run Wordpress, for example (ok, so it's the only way to run Wordpress, but you get the idea... "Wordpress is the best way to..."), but I've been in high-functioning IT organizations that won't touch that with a ten foot pole, because "it's another language to support, and PHP is icky."

We also got rid of a perfectly fine Wiki in favor of centralized Knowledge-base software for similar reason. "Better to just have one KB. We don't need to be hosting another thing." So the chances of moving everything over to BEAM VM are next to nil, unless you are a product-focused company with just one product, or happen to have an absolute champion leading the effort to migrate all the things. For all the other things, you need to have a consistent answer too.

No tool is one-size-fits-all. Where Kubernetes shines most is under any environment that isn't running a single monolith or building a software monoculture and/or can't manage that for whatever reason (because those are all basic use cases that are frankly easy enough to manage without adding on top the additional complexity of Kubernetes; don't need it, don't use it!) IMHO, diversity in infrastructure is a plus though, and Kubernetes is a technology that it turns out enables this.


The BEAM VM is fantastic but the scope of Kubernetes + docker is completely different.

For example you still need a way to get BEAM onto hosts, still need to manage the OS on the host, still need to setup networking, RBAC etc.


Ok, but we have a similar setup that runs on GCE instances. Deploying involves building an image and pushing a button. We don't really have the need for an Infrastructure team.


I once misconfigured iptables and locked myself out of our buildserver. Had to call lab support in a different country. Is Linux too complicated? Joyent famously took down their whole region by rebooting wrong nodes. It’s almost like running distributed networks of supercomputers at scale is hard or something...


> Joyent famously took down their whole region by rebooting wrong nodes.

In case anyone's interested, here's a pretty funny and educational talk by Bryan Cantrill about that particular incident:

GOTO 2017 • Debugging Under Fire: Keep your Head when Systems have Lost their Mind

https://www.youtube.com/watch?v=30jNsCVLpAE


For two systems of equal functionality, the one that allows or encourages fewer footbullets is the better design. Not all complexity is essential.


What kubernetes complexity is non-essential and can be replicated by simpler solution?


I once did an `apt autoremove` on a custom install of CentOS handed to my team. Apt uninstalled python (and a lot more), and apt depends on python to run, so that was a bummer. The easiest way out was to reinstall the OS.


These are some delightfully specific hypotheticals.


There is a benefit : fixing each of these issues fixes them for everyone using K8S.

One of the goal of K8S is normalization/standardization of a complex topic to better share knowledge


> I can store anything in a secret? Let's have thousands of cat images.

Why would someone want to store non-secret information as a secret?


"A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools"

— Douglas Adams


Because mutable persistence in Kubernetes can be super annoying to manage and people might grasp for whatever lifeline they can find.

If you have a managed object store or even relational database outside of k8s, the thought of storing arbitrary data in secrets probably doesn't come to mind. But if your enterprise spools up a cluster and tells you to use nfs PVCs with no other storage solution, suddenly you might start getting creative.


I think it's just a cute way of saying "data". Like saying you're seeding Linux ISOs when you're actually seeding pirated movies on BitTorrent.


What makes you think Kubernetes "secrets" are appropriate for storing secret information? They're not secure (not without adding a bunch of other nonsense on top of them).


> Why would someone want to store non-secret information as a secret?

Top reason given to me by developers: "I don't want to spend time thinking about the distinction."


> - I can run a root Pod? Lets mount the docker socket and start building images with it.

Just in case you haven't figured out the proper way to do this, you should use docker:dind.


kaniko is an even better way


If it didn't have wierd issues when you start stacking in docker file.


like what? I'd like to hear your experiences with it


Oh yes secret management with kubectl is needlessly complicated

Sure just put your secret data on a file then we'll use your file name as the key of the secret.

Cronjobs sometimes have weird bugs as well.

A lot of its complexity is due to the fact it's an evolving system, that's fine. But I see that some things end up way more complex or unreliable than it needs due to overengineering or use cases no one needs


I remember trying out docker sometime back in late 2013. Something about it never fully stuck with me. I always felt like the final boundary for a piece software should be the process, not the computer. Plopping an entire VM into a zipfile and saying "here's your software" felt like lazy engineering to many of us at the time (and still today).

For our current stack, the answer has been to make the entire business application run as a single process. We also use a single (mono) repository because it is a natural fit with the grain of the software.

As far as I am aware, there is no reason a single process cannot exploit the full resources of any computer. Modern x86 servers are ridiculously fast, as long as you can get at them directly. AspNetCore + SQLite (properly tuned) running on a 64 core Epyc serving web clients using Kestrel will probably be sufficient for 99.9% of business applications today. You can handle millions of simultaneous clients without blinking. Who even has that many total customers right now?

Horizonal scalability is simply a band-aid for poor engineering in most (not all) applications. The poor engineering, in my experience, is typically caused by underestimating how fast a single x86 thread is and exploring the concurrent & distributed computing rabbit hole from there. It is a rabbit hole that should go unexplored, if ever possible.

Here's a quick trick if none of the above sticks: If one of your consultants or developers tells you they can make your application faster by adding a bunch of additional computers, you are almost certainly getting taken for a ride.


I agree with you. However, after spending years and years trying to compile software that came with cryptic install instructions. Or have the author insist that since it works on their machine I 'm just doing something stupid. Docker was largely able to fix that.

It's a somewhat odd solution for a too common problem, but any solution is still better than dealing with such an annoying problem. (source: made docker the de facto cross-teams communication standard in my company. "I 'll just give you a docker container, no need to fight trying to get the correct version of nvidia-smi to work on your machine" type of thing)

It probably depends on the space and types of software you 're working on. If it's frontend applications for example then its overkill. But if somebody wants you to let's say install multiple elasticsearch versions + some global binaries for some reason + a bunch of different gpu drivers on your machine (you get the idea), then docker is a big net positive. Both for getting something to compile without drama and for not polluting your host OS (or VM) with conflicting software packages.


Completely agree. The whole compile chain for most software and reliance on linked libraries, implicit dependencies like locale settings changing behavior, basically decades of weird accidents and hacks to get around memory and disk size limits, can be a nightmare to deal with. If using slow dynamic languages, or modern frontend bunglers, all the implicit c extension compilations and dependencies can still be a pain.

The list goes on and on, it’s bizarre to me to think of this as the true, good way of doing software and think of docker as lazy. Docker certainly has its own problems too, but does a decent job at encapsulating the decades of craziness we’ve come to heavily rely on. And it lets you test these things alongside your own software when updating versions and be sure you run the same thing in production.

If docker isn’t your preferred solution to these problems that’s fine, but I don’t get why it’s so popular on HN to pretend that docker is literally useless and nobody in their right mind would ever use it except to pad their resume with buzzwords.


I don’t know. I cant take anyone seriously who says it’s hard to type

./configure

make

At a prompt, then install missing libs. Unless you have to maintain updates regularly, “It’s just so hard” seems like a damn meme.


When library versions start to cause issues: like V3.5 having a bug, so you need to roll back to V3.4... that's when ./configure && make starts to have issues.

Yeah, it happens with .so files, .dlls ("dll hell"), package managers and more. But that's where things like containers come in to help: "I tested Library Foo version V3.4 and that's what you get in the docker". No issues with Foo V3.5 or V3.6 causing issues... just get exactly what the developer tested on their box.

Be it a .dll, a .so, a #include library, some version of Python (2.7 with import six), some crazy version of a Ruby Gem that just won't work on Debian for some reason (but works on Red Hat)... etc. etc.


There are basically two options; maintain up-to-date dependencies carefully (engineer around dll-hell with lots of automated testing and be well-versed in the changelogs of dependencies) or compile a bunch of CVEs into production software.

There really isn't any middle ground (except to not use third-party libraries at all).


That assumes you are using free software without a support contract where the vendor has no incentive to maintain long term support for libs by only applying security patches but not adding any features to old versions. I understand this goes against the culture of using only the latest or "fixing" vulns by upgrading to a more recent version (which may have a different API or untested changes to existing APIs).

That makes sense for a hobbyist community but not so much for production.

In a former job we needed to fork and maintain patches ourselves, keeping an eye on the CVE databases and mailinglists and applying only security patches as needed rather than upgrading versions. We managed to be proactive and avoid 90% of the patches by turning stuff off or ripping it out of the build entirely. For example with openSSH we ripped out PAM, built it without LDAP support, no kerberos support etc. And kept patching it when vulns came out. You'd be amazed at how many vulns don't affect you if you turn off 90% of the functionality and only use what you need.

We needed to do this as we were selling embedded software that had stability requirements and was supported (by us).

It drove people nuts as they would run a Nessus scan and do a version check, then look in a database and conclude our software was vulnerable. To shut up the scanners we changed the banners but still people would do fingerprinting, at which point we started putting messages like X-custom-build into our banners and explained to pentesters that they need to actually pentest to verify vulns rather than fingerprinting and doing vuln db lookups.

Point being, at some point you need to maintain stuff and have stable APIs if you want long lasting code that runs well and addresses known vulns. You don't do that by constantly changing your dependencies, you do it by removing complexity, assigning long terms owners, and spending money to maintain your dependencies.

So either you pay the library vendor to make LTS versions, or you pay in house staff to do that, or you push the risk onto the customer.


> then install missing libs. Unless you have to maintain updates regularly

You lost me there already. Why should there be missing libs, and why would you not have to maintain updates regularly in production environment?

So let me see if I got this right, it's basically:

1. ./configure 2. make 3. ??? 4. profit!

Doesn't sound like a perfectly good solution to me.


Aren't we conflating compile complexities with runtime complexities here? There are plenty of open-source applications that offer pre-compiled binaries.


That difference isn't as black and white as you're making it out to be, sometimes it's just a design decision whether certain work is done at compile time or runtime. And both kinds of issues, runtime or compile-time, can be caused by the kinds of problems I'm talking about like unspecified dependencies.


This is why I wish github actually allowed automated compilation. That way we could all see exactly how binaries are compiled and don't need to setup a build environment for each open source project we want to build ourselves.


I am totally on board with the idea of improving productivity. The issue I see is that this is avoiding a deeper problem - namely that the software stack requires a max-level wizard to set up from scratch each time.

Refactoring your application so that it can be cloned and built and ran within 2-3 keypresses is something that should be strongly considered. For us, these are the steps required to stand up an entirely new stack from source:

0. Create new Windows Server VM, and install git + .NET Core SDK.

1. Clone our repository's main branch.

2. Run dotnet build to produce a Self-Contained Deployment

3. Run the application with --console argument or install as a service.

This is literally all that is required. The application will create & migrate its internal SQLite databases automatically. There is no other software or 3rd party services which must be set up as a prerequisite. Development experience is the same, you just attach debugger via VS rather than start console or service.

We also role play putting certain types of operational intelligence into our software. We ask questions like "Can our application understand its environment regarding XYZ and respond automatically?"


The issue Docker solves for me is not the complexity or number of steps but the compatibility.

I built a service that is installed in 10 lines that could be ran through a makefile, but I assume specific versions of each library of the system and don’t intend to test against the hundreds of possible system dependencies combinations or assume it will surely be compatible anyway.

The dev running the container won’t building their own debian installs with the specific version required in my doc just to run the install script from there, they just instanciate the container and run with it.


> Plopping an entire VM into a zipfile and saying "here's your software" felt like lazy engineering to many of us at the time (and still today).

At the risk of nitpicking, docker images aren't the equivalent of VM images, as they don't include a kernel.


This isn't nitpicking at all, it's an important distinction!

Docker is not virtualization, it's just an abstraction that makes some Linux process isolation features easier to manage.

It also allows you to bundle whatever dependencies you have in the same bundle, but that is not the same as having a VM.


Linux containers and equivalent technologies are virtualisation (specifically OS virtualisation[1]), just not a VM. Hardware virtualisation (VMs) isn't the only kind of virtualisation that exists.

[1]: https://en.wikipedia.org/wiki/Operating_system-level_virtual...


By that logic, processes are arguably virtualisation too. They do after all use virtual memory.

Threads, processes and containers exist on a continuum.


The key is that "containers" don't actually exist -- they're just processes running under a variety of different namespaces.


It's true that Docker isn't a first-class abstraction at the level of the Linux kernel, but BSD has jails, and Solaris has Zones. This is important in some respects, but I don't see that it informs things here. Containers are still 'a thing' regardless of how they're implemented.


Curious to learn more about how jails + zones are implemented. In Linux land, I find the notion that containers are a coherent abstraction really hinders developers from understanding how their application is deployed.


Indeed they are! The notion that each process has its separate address space is called virtual memory for that reason.

See also cgroups: while this feature is used by the container run times, it predates Docker, and can be used standalone with normal processes.


Indeed. I made a website for testing npm packages inside a cgroup/unshared "container" - about 6 months before docker came out.

If only I had realised that could have been useful for more than testing npm packages...


OCI “docker” containers are at this point are a description of a process. How it’s realized is up to the implementor. runc realizes the container with kernel namespaceing and runv realizes the container with hardware virtualization.

Both if implemented to spec will be logically equivalent and drop-in replacements for one another.


I don't know if that is the most important reason it is not equivalent. It also doesn't have any system processes; there is no systemd, sshd, no crond, etc. It doesn't need its own firewall rules configured or its own security managed. I could go on but I think you already get the point.


Indeed - essentially, a container is a glorified chroot with a few (important) bells and whistles attached to it.


I agree with the other commentor, this is an important distinction. It has no hypervisor and is really just a normal process using standard kernel features: cgroups (resource limiting) and namespaces (resource isolation). It's really not so different to chroot.


Containers solve the dependency problem by simply pretending it doesn't exist.

I used to do UNIX integration work in the late 1990's early 2000's and containers weren't really a thing. So you had to make sure libs from one program didn't crap on another program. And developers had to be conscious of what dependencies they included in their code. Nowadays they don't have to care as much because of containers. Every program can have its own dependencies. Thereby solving the integration problem.

A better solution would be to actually integrate programs and their dependencies into working systems, but no one has time for that. Software bloat is fine. Computers are cheap and fast. And actually understanding what we're doing would be too expensive. So just wrap all your have finished crapware up in a giant black box and dump it on a server.


I'm very interested in understanding what I'm doing and what I'm bundling into my software.

What I'm not interested in, is this kind of walking uphill in the snow both ways:

> So you had to make sure libs from one program didn't crap on another program. And developers had to be conscious of what dependencies they included in their code.

ie. having to understand what everybody else is doing in order for my software to run properly. No thanks. That's not why I'm here.

I'll put the exact dependencies I want, in the versions which work best for my software, into a Docker image or whatever tool offers a similar level of isolation, and I'll be working on my code while everybody else spends their time fighting over the ABI compatibility of C system libraries.


Why is this a better solution?

Separating the dependencies between programs allows you to test and release independently to allow incremental upgrades. IMO that is better.


> Horizonal scalability is simply a band-aid for poor engineering

And don't even get me started on having instances labeled "large" that have less memory and CPU capacity than my personal backup laptop (currently on loan to my 8yo for COVID reasons)...


But that doesn’t make any sense. We’re not talking about physical hardware we’re talking about tiny tiny slices of it. When VMs are the logical isolation boundary in your infra they get really small — 512 MB is a lot of memory for a single purpose server.


> 512 MB is a lot of memory for a single purpose server.

Maybe. But when the time comes 512 MB doesn't seem like much anymore, what do you do? Do you pick the next larger instance or do you split the load across more 512 MB slices of a computer?


Don't forget the ridiculously low and restricted IOs unless you pay a lot.

But you can use other cloud providers with better value.


wrt horizontal scaling. Do recall that the motivation for this strategy by Google was cheap-as-possible servers that failed constantly. Back when they built racks using legos the problem was hardware reliability. They had to 'scale' horizontally for reliability (given cost constraints) as much as load.

People have since bought into the marketing reasons for 'being in the cloud' and having 'infinite scalability' but that largely misses the point (and the pain) that caused many of these technologies and patterns to be developed in the first place.

The best example of how to scale without buying into this pattern I know of is Stack Overflow. At least circa 2016 - https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...


> Horizonal scalability is simply a band-aid for poor engineering in most (not all) applications.

Maybe if your app handles < 10k concurrent connections. Otherwise it is the most cost efficient solution and exists because it solves the scaling problem in the best way as of today.


When you approach the limits of what your kernel can handle, then it may be time to split your workload across boxes or to carve smaller boxes out of your metal (and probably directly attaching NICs to the VMs to the host OS doesn't have to deal with them). Making your workload horizontally scalable is always a sound engineering choice.

But...

Splitting a horizontally scalable workload across a dozen virtual servers that are barely larger than the smallest laptop you can get from Best Buy, you are just creating self-inflicted pain. Chances are the smallest box you can get from Dell can comfortably host your whole application.

The fact remains the odds of you needing to support more than 10K simultaneous connections are vanishingly small.


> When you approach the limits of what your kernel can handle

Even before. If you want low latency. And banks handle more than 10K concurrent every day.

Cost example: https://pt.slideshare.net/markmyers106/vertical-vs-horizonta...


Yes some banks need it As do giants like Google Facebook etc

Chances are very high that the problem domain you are working in does not.

Like the author said “1%” I think maybe 5% to 7%

The point being that masses of software is developer everyday on a cargo cult adoption of solution they do not require.


>The point being that masses of software is developer everyday on a cargo cult adoption of solution they do not require.

This is certainly true, but there is a possible benefit: standardization. Having a standard skillset allows employees greater flexibility since they can jump employers and still expect to be rapidly useful. Similarly, if your company uses a standard toolkit, there's going to be less training overhead for new hires. Now, the devil is in the details, and I'm inclined to agree that you'd be better off hiring someone that can think outside the box and keep the tooling simpler. But using the standard toolkit will work reasonably well across several orders of magnitude in scale.


10k? It's not 1999 anymore. Look at Netflix to see the state of the art in saturating NICs with commodity hardware and off the shelf NGINX plus FreeBSD.


Does anyone know of any raw stats at what one beefy server can handle with a typical HTTP CRUD app? <10k doesn't seem right?


10k concurrent idle connections is no problem, but 10k/rps is a decent amount of traffic. What is a typical CURD app? It really depends on what software you're using too. You can do >10k/rps with DB read/write on every request with PHP on a lowend server, but if you throw a heavy framework into the mix then that would not be possible.


agree with you, when you say cost efficient it means - that we can scale out our poorly written slow software to more servers to handle increased traffic, instead of hiring more expensive engineers and rewriting software properly than could handle increased load from a single instance


VMs also autoscale.


I get where you're coming from, but it turns out that plopping the entire VM into a zipfile ends up being a good way to have the kind of reproducibility that makes your operations sane. Do you pin specific versions of dependencies for your install scripts? Might as well pin the whole image. It's like 88% of the benefit of reproducible builds, at 3% of the cost, and that's not nothing.

Mind you, that's still Docker, though, not Kubernetes.

> If one of your consultants or developers tells you they can make your application faster by adding a bunch of additional computers, you are almost certainly getting taken for a ride.

Eh. There's a redundancy play in there somewhere too, if you know how to pull it off. (Big if.)


Horizontal scaling brings down total compute use if you have many distinct uses.

Imagine of your only unit of compute is a single bulky machine. You don't fully saturate it, but you need a second machine to avoid downtime anyway. Now you spin up a second or third service and suddenly you need 5 or ten machines and your compute utilization is 20%. You can pack things in tighter. But then you have a knapsack problem, and that's easier to solve efficiently with many small blocks even if it costs you a 1% overhead or whatever.


> Plopping an entire VM into a zipfile and saying "here's your software" felt like lazy engineering to many of us at the time (and still today).

I've seen this (a long time ago) in the education world market. Very small school with a STEM program. They had specific scientific software they wanted undergrads to use (and some of it was pretty proprietary and used to interface with lab equipment) + a pre-configured IDE.

Instead of going through the compatibility matrix of OS and their versions they just gave all students a VM image that would "just work". Everyone could bring in their own devices and as long as you could run an hypervisor everything would "just work".


Personally I think there is beauty to one text file that defines a reproducible build environment. Carefully controlled dependencies are quite nice.

That said, I haven't really used docker as a runtime container.


Horizontal scaling is not a _performance_ tactic, but rather about availability and cost. Having higher availability means trading off consistency, or in other words, using distributed systems. Also, you can not elastically scale vertically, without also scaling horizontally. In other words, horizontal systems are cheaper if you have fluctuations in traffic.


Performance is not the only reason to scale. Redundancy / failure tolerance.

Cassandra is a big PITA, but it hasn't gone down (knock on wood) in the 6 years I've been using it. PARTS of it have...

How many distinct services do you have in your monorepo? A monorepo is fine...

until it isn't.


> Something about it never fully stuck with me. ... Plopping an entire VM...

There's your first miss.


After that single node crashes, the app you're running on that one server will run a little more slowly..

How big should the one server that serves the whole of netflix be..?


Stackoverflow has always run on a couple of IIS instances. If you’re not bigger than them don’t worry.

You can pretend to be Netflix or Google, and build your tech-stack like they do. Or you can stop wasting your resources setting up a tech stack that you’re never going to get a return of investment on.


Why a few though? Couldn't they do with just 1..? that would make people on HN more happier it seems.

Stack Overflow is not a unit of measurement that anyone would be able to take seriously or find useful? How many stack overflows is one asana? Or how many stack overflows is one trello?

Horizontal scaling, docker K8s have their own benefits that are many and obviously to the industry. you don't need to be google to deploy and use them. If you deploy one server for each app and each team vs deploying a common K8s cluster where is the higher investment? You claim more ROI with more physical hardware and more servers?


> Why a few though?

Because it’s less dangerous and cheaper?

> Horizontal scaling, docker K8s have their own benefits that are many and obviously to the industry.

Which is why SO runs on more than one IIS...

You don’t need a tech-stack, that is apparently even too complex for google considering the article, to scale horizontally.

> If you deploy one server for each app and each team vs deploying a common K8s cluster where is the higher investment?

The investment comes from the complexity. We’ve seen numerous proofs of concepts in my country, and in my sector of work, where different IT departments spent one or two 2-5 full years worth of man hours trying to adopt a perfect devops tech-stach.

Maybe that’s because they were incompetent, you’re free and possible right to claim so, but that’s still professional teams expending real world resources and failing.

From a management perspective, and this is where I’m coming from much more than a technical perspective mind you, the most expensive resource you have is your employees. If software is so complex that I need one or two full time operators to run it, well, let’s just say I could run more than a million azure web apps, and have our regular Microsoft certified operators handle it.

> You claim more ROI with more physical hardware and more servers?

I haven’t owned my own iron since 2010. All our on-prem servers, and we still do have those, are virtual and running on rented iron.

I think we may be speaking past each other though. My point is financial and yours appear to be mostly technical. If you can set up and run your K8s without expending resources, then good for you, a lot of companies and organisations have proven to be unable to do that though, and in those cases, I think they would’ve been better off not doing it, until they needed to.


Kubernetes is not too complex. There are things to learn no doubt, but it's easy to reason about once you cross the initial learning curve.

Ofcourse transitions can fail. People can think yea let's do this small thing and end up chewing off a much bigger problem than they thought they were getting into. But that problem is in the whole of tech. "Let's just use our present people and switch from all proprietary to all open source in 3 months.." yea, best of luck with that... You need a solid team and going all in on K8s is hard, you need technical talent and leadership to drive this.

Agreed, maybe it may not be for everyone. Benefits are both technical and financial, less compute resources used, more reliable deploys, more resilient services. The problems being solved by this are not trivial. There are tangible benefits. Is it a risk? Ofcourse it is. The risk is not in the technology, the risk is in the competence of the team deploying it. If it can't change and adapt, maybe a lot more fundamental things need to change in that organization than just deploying a new orchestration layer.

My only point is, this shift from dedicated servers to VMs and now to containers is a fundamental shift in how things are done. People can hate on it all they like, but it's a better way of doing things and everyone will catch-up eventually.


> Or how many stack overflows is one trello?

I shall not express any opinion on this topic other than to say that Trello is not a good example to bring up. The entire customer base of Trello is not using a single shared board and thus they could scale in any direction they wanted to maximize ROI.


"You can handle millions of simultaneous clients without blinking. Who even has that many total customers right now?"

Apparently netflix has that many customers. Then again, if you split Netflix into regions and separate all the account logic from the streaming, the recommendations engine and the movie-content, you could perhaps run the account logic for one region in one server.


Besides just giving you a warm fuzzy feeling of running only one server.

What is the point of running one server?

Do you also object to them running in the cloud in VMs and not on physical hardware that they own? Sounds like an old man's "kids these days" rant..



Ofcourse, everything runs on servers. Question is, who owns and maintains the hardware.

The link you shared just says they manage their OS layer, ofcourse they do. Everyone running on AWS VMs is responsible for their own OS layer. Wether they want precise control over their OS doesn't change their preference for who owns and manages the hardware..


Well, in case of Netflix - Netflix run the servers - from hardware to the custom FreeBSD build as the explained in multiple talks.


Even if you run one container, containers let you use off the shelf apps instead of only off the shelf libs.

I'm not interested in being "not lazy." I only care about user value and ability to provide user value (tech debt/cost/velocity).

For the price of an M5A.16xlarge to get those cores, I can get like 27 m4.larges and have way more fault tolerance.

I think your disdain is misplaced.


Processes were the original containers.


I suppose your apps ran on windows. It isn't a problem (or at least it's a smaller problem) with windows ecosystem, especially enterprise one, since usually you're using same version and installation steps handled by infra with AD. Even without AD usually the installer and windows version is same, and Microsoft usually great at backwards compatibility.

But it's not the case in linux, or at least non-enterprise / ldap linux. Installing mysql/redis/elastic/dotnet core/etc on each machine may be different, and has different installer available.

With docker I just need to instruct them to install docker, setup docker compose and everything is handled via containerization.


Back when I was doing big boys UNIX, the respective package managers took care of everything.

Which we later replicated in Red-Hat with RPM.

Bare bones OS install + bunch of OS packages => done.

And in what concerns containers I was working with HP-UX Vault in 1999.


Agreed to an extent. Even outside RHEL, you have yum for RPM-based distros, Aptitude for deb-based distros, pacman for Arch, etc.

You need to start with the same base operating system, and you need to make sure that you pull in the same versions of packages in case there are backwards-incompatible bugs, or version bumps such that the dynamically-loaded library is no longer detected (hence the common albeit dangerous workaround of "add a symlink").

(If you're using rpm directly, then you need to bundle the actual packages that you're installing, or point to specific packages that you're confident won't change. And at that point, what's the difference between your approach and Docker?)

The challenge that I believe Docker solves (or, at least, attempts to) is environment reproducibility: without it, you have dependency hell.


Nixos fixes this issue (and then some). I wish it had won instead of docker. Maybe it took too long to become stable.


Nix and Docker are complementary, not enemies.

We use both together, since Nix is the only sane way to package Docker images, in my opinion.


Forgive my ignorance, but can't that be solved by statically linking things?

Don't you just end up with a less hackjob version of a container when you do that?


Putting on my Asbestos Longjohns: The more I look into k8s ecosystem, the more I'm convinced that it's one of those things that suits FAANG etc, but the regular Joe developer has caught on the fad and wants to add it to his repertoire, even though it's an overkill. After all no one got fired for buying IBM and recommending Kubernetes. Most teams need a simpler deployment strategies that other's have succinctly mentioned elsewhere in this page.


Kubernetes solves a very real and significant problem. But before you start using it, make sure you have the problem it solves.

If you're looking to have your small app eventually grow into a large one, read up on K8s and just make sure you're not blocking future-you from making your app work on it. E.g., work well in a container (which is useful for automated testing, deps management, etc), have a simple 'ping' endpoint to make sure the app is up, have a better config story than "recompile to change these variables", use a logging library, and tolerate any other services you're using to sometimes be down.

All useful things for a grown-up app to do anyways, all a bit of a PITA, and all better than trying to operate an app that doesn't do them.


Exactly this. Kubernetes is a service orchestrator, not a hosting platform. Those getting caught up in it just want a hosting platform, but those getting value out of it want a service orchestrator.

If you have one monolithic backend service (and most web applications really should start out this way), Kubernetes offers almost no benefits over alternatives.


Putting on my tinfoil hat: it suits FAANG to have potential competitors burn their runways on baroque tech fads like Kubernetes or <insert-react-state-management-architecture-of-the-month-here>. Extra credit if they end up hosting their overly complex solution on your platform.


I once have been told that development teams smaller than 20 developers have no business in using k8s, due to the complexity to brings. If something as essential as the infra so complex it is not readily understood by everyone on the team, a few (more than one) team members need to become the experts on the matter. For small teams this is simply not worth it.


As part of a small team currently using Kubernetes, I suspect it’s more how you use it - the tools and ecosystem have matured immensely in the last couple of years since I’ve first started using it.

I don’t think it suits all teams and use cases, but for us it’s absolutely fantastic and without going down the rabbit-hole of cloud-provider specific tools and recreating half the issues it solves, I’m not super sure what we’d use.


Agreed. I'm a solo technical founder and have been using k8s for all my hosting for 3+ years. It's so easy (for me) that I'm fine paying a premium for the managed service (GCP) since it saves me lots of time, my most valuable resource.

I've already climbed most of the learning curve so YMMV, but as a team of one and dozens of WordPress, MySQL, and bespoke app servers, kuberenetes makes ops manageable so I can spend time on things that really matter.

Deploying new web apps is trivial, declarative manifests are easy to reason about, TLS certs are issued and renewed automatically (cert-manager), backups are cheap and reliable (daily GCP snapshots), making changes to the cluster via declarative terraform is a breeze, etc etc. No way I could manage all the ops without leaning so heavily on the core foundation provided by k8s.


Curious: do you have a single workload (like a WP site) that requires more than one physical computer in resources?

I think that's the first thing with k8s: it all starts with an app that requires several physical nodes.


As in - a single workload that can't fit on a single physical machine? No I don't, although I certainly could if I needed to. Most of my workloads are either low-traffic WP sites or bespoke web-based business tools for clients with very bursty traffic.

Most of the value I get from k8s is the hands-off nature of it - I get slack notifications (prometheus+alertmanager) if anything is happening I need to address (e.g. workload down, node down, API not responding, etc). Otherwise I can safely ignore my cluster and know everything's good. Spinning up a new WP site takes 10m with backups, TLS, monitoring, etc built in.


I’ve run a production kubernetes cluster that was hosting a DGraph cluster of 3 machines on its own, some ML workloads, and 4-5 products (each consisting of multiple services) and that was more than a single machine would have been able to handle.

Well _technically_, sure, we could have run a bunch of those products on a single machine, but there goes your durability and the memory overhead on some of them was quite Hugh, and properly fitting them onto a single machine would have required more optimisation and technical skills than the devs I was working with had or were inclined to do.


Definitely; if you as a company want to do Kubernetes or even cloud services (beyond the easy managed service like Beanstalk or GCE), you need to have a dedicated expert on it. Or more abstractly, one full time unit (can be distributed). If it's some guy's part time hobby it will not work.


I think this is what went wrong with k8s. I saw lots of interests from hobbyist, and people proposing k8s in small teams. It become often hear that "you do containers in production? use k8s!" That's just a big disappointment waiting to happen.


If something as essential as the infrastructure is so complex that you need a dedicated expert on it, it's bad infrastructure. To take a offhand analogy, you don't need a dedicated highway maintainance engineer in order to drive your car.


I think Kubernetes in principle gets a lot of things very right - but it has over time grown into this huge amorphous blob of complexity that makes it very easy to shoot yourself in the foot with, as many people said :)

That issue is not endemic to Kubernetes, but rather to any larger system past a certain age, you learn stuff as you go along and would do stuff differently if you did it again today - but you can't easily, because you cannot break compatibility for everybody using your stuff.

As a concrete example from the Kubernetes world, there is a talk by Tim Hockin [1] about how today, they would fundamentally design the api-server differently and base pretty much everything on CRDs.

[1] https://www.youtube.com/watch?v=ji0FWzFwNhA


The industry and the k8s project are still figuring out the right way to do things that don't require the organization, size, and technical choices Google made.


A friend of mine is a contributor to k8s itself, and of course, this all comes incredibly easy to them. Following their recommendation, I gave it a shot for my single-person, single-node (!) homelab, all without using MicroK8s, k3s or similar.

After a week of almost full-time work, I threw in the towel. Admittedly, I also had to learn concepts like reverse proxies alongside, too, so I was by no means well-equipped to begin with.

Yet, tossing together some docker-compose.yml files and "managing" them with a Python script has worked very well. Kubernetes really scarred me in that sense, and I am healed! Also, Caddy has helped me in actually enjoying configuring the webserver.


Wait so running a kubeadm init and then removing a master taint took you whole week to figure out? How long ago was that?


No, setting up ingress did. Couldn't get reverse-proxying to work.


Ah ok. For a single node homelab setups I just throw everything on hostNetwork, second choice is NodePort (if there are port conflicts). In general k8s ingress on baremetal requires deeper understanding of its network design


I would (probably) spin up an ingress-controller on ports 80 and 443, using hostNetwork, then use Ingresses from then on (and as it's a single-node cluster, just create a wildcard DNS A record, and possibly an anchor for other CNAMEs to point at (depending on DNS server) pointing at the IP said ingress-controller is running on).

Does mean that anything that upsets the ingress controller is an outage, but for experimentation, that's probably OK.


Yeah, of course running random docker compose files and containers from the internet and blissfully exposing your mongodb or whatnot service unsecured to the whole world seems like an easy, non-complicated alternative. Kubernetes has a few shitty defaults, like exposing a service account for all pods by default or allowing to mutate pod image tags, but most of the functionality it provides is a must have when you actually care about your SLA. Rolling updates with health check and configured back-off time? Separate ingress for OAM and live traffic with automatic HTTPS, etc? I could go on.


Are you having a bad day?

I am talking about a homelab, a single server at home, for home use. It's much safer now, with Docker compose, because I understand it and I wrote the core exposed part's configuration, the Caddyfile, myself, manually. I know exactly what's exposed, and it's exactly right the way it is!

The remaining risk comes from the services themselves having security holes, but k8s has that very same risk.


From my experience it's actually sold as a simpler alternative to other infra provisioning. So you end up with situations where a team deploys whatever with a helm chart, and it sets up the stuff like magic and they build on it. Then when something goes wrong they literally have no idea how to fix anything and it becomes a waking nightmare.


As a longtime and frequent user of k8s, I stay away from helm charts. I tried them out when they first got popular but I found they introduced more friction than they solved on the whole.

Not every addon/tool for the k8s ecosystem is worth it. I also don't bother with the ever-growing list of service meshes... not enough value to me for the overhead.

K8s is definitely the simpler alternative for me but there is still a lot of essential complexity in k8s due to the nature of the problems it's trying to solve. Mostly I like building on top of a solid foundation of standardized k8s API objects (pods, services, volumes, etc).

Tldr; Bring in only the add-ons and tools you really need so you don't add more complexity than necessary. Don't get swept up in the hype and marketing from other devs and cloud vendors.


This is such a great point and so frequently skimmed over in k8s discussions. We as tech folks tend to focus on the front page blog posts about 1000s of nodes and all of the orchestration that goes into complicated top 1% high-traffic/high-complexity use-case setups. In reality there are a lot of profitable businesses out there happily running a simple cluster set up with a single-digit number of deployments chugging along on it with zero down-time.

Really when looking at tools in the k8s ecosystem, it's better to approach it as you would importing a new library into your application. Most decent devs wouldn't blindly import a new lib so that they can copy/paste a single line of code they found online for a business critical function, and k8s tools should be no different. We must think about what value does a given tool bring, and is it worth the cost of learning/maintenance? Sometimes the answer is a resounding "yes", but too often the question isn't even asked.


I like Kubernetes, I don't overly like Helm charts because yes, they work, but you can install one without having to think about what's it putting in your cluster.

Also, I don't much like Go's templating syntax.


> The more I look into k8s ecosystem, the more I'm convinced that it's one of those things that suits FAANG etc, but the regular Joe developer has caught on the fad and wants to add it to his repertoire, even though it's an overkill.

You are absolutely spot on because this is how not to pass the behavioral interview for Engineering Manager.


but the regular Joe developer has caught on the fad and wants to add it to his repertoire, even though it's an overkill

There are very strong financial incentives for every individual developer and sysadmin to adopt Kubernetes, regardless of the impact it has on the organisation as a whole. In a sense this is engineering reaching the level of corporate maturity of the sales department who will optimise everything for their commission regardless of the organisations ability to deliver it at a profit, or even at all.


I'm sure there's a name to this phenomenon. Companies want stable software, and Regular Joe want better pay, but companies won't pay unless Joe starts doing crazy complex stuff that complicates things further.


Regular Joe learns complex stuff at your expense. He then leaves for greener pastures and higher pay thanks to the boost to his resume. You are then left with complex stuff you need to maintain and so you have to hire another Regular Joe for a higher salary than your first Regular Joe.


The name is "poor management of resources.". Regular Joe should move on before trying this masochism.


> There are very strong financial incentives for every individual developer and sysadmin to adopt Kubernetes, regardless of the impact it has on the organisation as a whole.

Then that organization is doing a terrible job of aligning incentives. I'm guessing their pay structure isn't terribly merit-based nor high enough that people aren't constantly thinking about other jobs.

If this is about FAANG (your comment wasn't, but others were), perhaps part of this is exposing larger problems in many smaller orgs. (note: I'm ex-FAANG and happily so)


Sorry to have to be the one to tell you: sometimes architectural decisions are driven by factors other than YAGNI. Right now you have throngs of young developers paying $50k+ a year for the privilege to learn how to use Docker and Kubernetes while in college, and when 90% of them inevitably get rejected from FAANG after graduation, you'll be able to hire them on the cheap and entice them with development stacks they're comfortable with.


In my opinion, k8s starts to shine when you have to manage hundreds of containers. When you have just dozens of them it's an overkill, but there's no way to smoothly slot in another solution between "docker-compose up -d" and spinning up a k8s cluster: you will (or think you will) hit a maintainability ceiling again and have to migrate to k8s.


There actually is, Hashicorp Nomad fits solidly in between those two options.

Nomad is way simpler to get a cluster up and running, has a great configuration syntax (I'll take HCL over YAML anyday) and had first class Terraform/Consul/Vault integrations.

Onboarding devs is fairly straightforward, if they can write a docker-compose.yml, it's an easy transition to a nomad job specification.

It took me by myself ~4 months to get our current hashistack(Vault/Consul/Nomad) stood up using Terraform+ansible. Two members of my team have been working to replace the hashistack with a self hosted K8's deployment and they just went over the 1 year mark and we still do not have something capable of hosting the workload currently running on the Hashistack.

This got a little long winded but I feel like this "it's docker compose or K8's, take your pick" mentality had led to a bunch of needless time being spent by smaller teams/companies on solutions that just aren't right for them.


What I think K8S (EKS, GKE, DO hosted environments at least) provide is a nice way to integrate things like Gitlab. This gives you a really easy to use CI/CD pipeline for very little work and configuration. This allows you to deploy production from your main branch and spin up feature branches that can be tested by the people that requested the feature very easily. This does not require an additional effort once the system is setup.

Also you can get red/green deployments and rolling deployments with little to no effort, which can be very nice, nice to have.


I think that an important distinction is between deploying on k8s and operating it. For a small team (not measured in the dozens), the latter is unaffordable but the working style of the former is still powerful.

This feature helps a lot with that problem by bringing GCP closer to where AWS has been with Fargate. k8s will still be more work than using AWS ECS but it might also be preferable if you dislike using the provider’s components and want the control of, for example, doing your own load balancing and storage management.


k8s and its ecosystem represent a data center in software. Data centers are fairly complex constructs. It is then to be expected that this complexity will shine through in k8s' API, UI, UX. k8s' main mission seems to be to provide a complete digital data center, not an easy to use one, and I would argue that that is exactly the right choice. Over time, as the core of the beast is figured out, there will be (as there have been) more and more opportunities taken to actually help users navigate that complexity and/or resolve it into more natural and less error-prone interfaces. But in the meantime it seems like it's mostly on the community to provide (usually temporary) solutions for the most pressing usability concerns.


That reminds me of Basecamps article about "The majestic monolith" https://m.signalvnoise.com/the-majestic-monolith/


Kubernetes has to be most complex software I've ever tried to learn. I eventually gave up and decided to stick with simple single machine docker-compose deployments. I figure by the time any of my personal projects actually need to scale beyond 1 machine, I'd probably have enough revenue that I can afford to hire someone else to worry about it.


It's the right attitude. The management fees alone for this autopilot thing are 0.10$ per hour. Or about 70$/month. It's a bargain considering all the hidden costs kubernetes imposes in terms of requiring people that know how to tame the complexity associated with it (i.e. very expensive devops people costing magnitudes more than that). Automating those people away is worth money.

I like Cloud Run for the same reason because I can use it without needing a lot of devops skills in my team or without sacrificing my own time (because I have those skills but have more valuable things to do). It allows me to focus on keeping my CI/CD pipeline (cloud run sets that up with a button click) busy with new functionality. And our hosting cost are close to 0$ because we stay below the freemium layer until we actually need to scale.

Edit. corrected the typo 700->70


Hey, it's William from Google here. You're right about the costs, I just wanted to point out that Autopilot does include one cluster in GKE's free tier. So you'll only pay the ~$73/month if you have more than 1 cluster.

There's (almost) no limit to what you can run in one cluster too, and Kubernetes namespaces can help to separate different environments to allow for sharing.

Cloud Run sounds like the perfect solution for your workloads though!


yep, cloud run is pretty good. unfortunately, it doesnt cover all cases. (i.e. stateful stuff like: websockets and chunking, and recurring jobs)

for these case, i still have gke cluster around.


Nitpick FYI: WebSocket is coming. https://cloud.google.com/blog/products/serverless/cloud-run-...

(Still agree that Cloud Run isn't for everyone.)


I think you mean 70$, not 700$


Eh yeah, thanks for correcting me.


Long time ago I worked on HA setups for telecom (Wimax/LTE) equipment. Kubernetes is complicated but has nothing on those systems. Just to give you some idea - https://www.metaswitch.com/hs-fs/hubfs/Blogs/3gpp-ts-23-228-... (doesn’t even cover everything)


The very term "HA" still gives me nightmares. It can be very hard to get HA to work correctly. Many years ago, I worked in a startup and one of our main offerings was an HA network device. It was unbelievably finicky to get it to work in the first place and even harder to update the software on an HA cluster.


It's HA by intimidation. The cluster is complex enough that nobody even wants to touch it, and since human errors are the most common type of error out there, it breaks much less often.


Yes I believe this is why you see things like k3s in some iot/edge deployment scenarios. Because other alternatives for HA like OpenSAF have been severely lacking for years


I’d recommend giving Docker Swarm + Traefik a shot. It’s dead simple to set up manually and has very little “magic” in how things work under the hood. Plus much of your existing Docker Compose config will work out of the box. It vastly simplified the deployment process too.

I previously avoided Docker Swarm for ages since I assumed it involved the same level of complexity as k8s. I also initially figured that managed k8s would be a safer bet than managing my own Swarm cluster, but if you’ve used anybody’s managed k8s (or read https://k8s.af), you’ll realize that every cloud provider has their own closed source fork of k8s with plenty of nasty bugs that you can’t do anything about.


Docker Swarm is pretty much abandonware/on life support at best, so one should avoid using it for new stuff.

Hashicorp's Nomad is the best choice on the complexity for features scale IMHO, and that's why i'm writing an article how great it is, how easier some things are and what's missing compared to Kubernetes.


And yet, Docker Compose is pretty popular for local development, so much so, that it's not uncommon to find a docker-compose.yml in the repositories for many open source projects. And Docker Swarm builds on that, by bridging the gap between Docker Compose and multi-server deployments, with tools like Swarmpit and Podman for easier management of it as well, much like Rancher does for Kubernetes. I agree that Docker Swarm isn't developed as actively as it should be, but disagree that it should be avoided and disagree that it should be allowed to die. In my opinion, it's a more minimalistic and more sane approach to container orchestration with minimal up front investment (just install Docker and edit your Compose files a bit with deploy constraints, you're ready to go).

Hashicorp's Nomad is good if you have a strong engineering department or need to run mixed workloads (e.g. both containers and native processes) because its abstractions are well suited for this, but HCL, their DSL for describing deployments, doesn't map nicely to neither Docker, nor Docker Compose files, knowledgebases or tutorials. Nomad's integration with Consul is a major boon, but the need to run your own CA for safe communication between nodes, Nomad's read-only Web UI, and the oddity of HCL at times also makes it a non starter for me and some other people.

At the end of the day, these are just two data points, sadly the job market for Kubernetes also dwarfs everything else and sadly many companies will be burned by this and will learn nothing at the end of the day. Ideally, i think that the best route would be evaluating the orchestrators and other technologies that you want to use by doing pilot projects and such, and looking at them in real world circumstances, to determine their fit for your goals and needs (Web UIs will matter for some, but not for others, for example; as will onboarding and the need for long term investment vs plug and play).

Edit: as for Kubernetes, personally i find the K3s distribution to be an almost reasonable alternative to Swarm/Nomad, if the situation calls for it: https://k3s.io/

Edit #2: it would actually be pretty awesome to read more about your experience in this article that you're writing!


Absolutely. We use Swarm in production at ecoeats and it's a dream for simple clustering with multiple services. Using Hetzner clouds volume plugin gives EBS-like functionality too.


> Nomad's read-only Web UI,

Nomad's UI has a good number of functionalities that can be controlled through it. Sure, there are some more lower level operations that are CLI only(though this seems to be something they are actively working to improve on) but most of that probably won't be needed but someone just trying to run a couple containers on a single node.


Correction:

> Swarmpit and Podman

I actually meant Swarmpit ( https://swarmpit.io/ ) and Portainer ( https://www.portainer.io/ ).

Podman is another container runtime that acts as an alternative to Docker (even if it is not feature complete), so i misspoke.


We use swarm for a small cluster in production a well. Extremely easy, zero downtime deployments are fantastic. I can explain it to someone else and quickly get them up to speed. Having said that, the fact that it seems to be on life support has made me look at other alternatives, even a simple docker-compose per node.


I quite like swarm for its compatibility with docker-compose, in particular to have the option of zero downtime deploys. If I wanted to manage a real cluster though I'd probably use nomad or GKE to avoid getting burned when the system is under load.


I use Swarm at home (only on a single node, because it turns out 3 are overkill for my needs) and it's been running great for 6 months so far. Before that I tried various incarnations of k8s and eventually they'd just destroy themselves up and require a rebuild (the main issue was persistent storage).

My only complaint with Swarm is there isn't an easy way to expose containers directly on the network (like host networking). I have a few containers (wireguard and minidlna) which need this, so those are running through docker-compose. I've tried macvlan but wasn't able to get that working in Swarm mode.


Did you try this?

  networks:
    - host
https://docs.docker.com/network/host/


Maybe it has changed since I built it, but I wasn't able to get this working with Swarm services. I had to convert them to docker-compose to make it work. The docs suggest it should work with Swarm mode though, so maybe I need to try again.

Ideally I'd like to give each service its own IP on the network, which was possible with how I had k8s setup.


Mind sharing a bit more on your home setup?

Asking because I never saw the point in multiple container replicas for simple self-hosted stuff. One container each has served me well so far [0] (Nextcloud, Bitwarden, GitLab), and if they crash, they just get restarted. Multiple containers increase throughput, supporting more users, is that it? It just sounds nightmarish in regards to storage and parallel, conflicting writes.

[0]: One container per component (web, db, cache, ...).


I do have just one container of each thing, but I was originally planning to have multiple nodes. I have 3x ThinkCenter Tiny M73 (with the Pentium CPU) and thought one would be a bit underpowered for everything I wanted to run (it was for k8s), so was planning to distribute services automatically across the swarm. One node is more than enough though, so I'd actually be fine with just docker-compose, but splitting everything up into separate 'services' is nice.


This is a honest question, but may I ask what you found to be hard about it?

For long I was scared of it because so many people say it's crazy complex. But actually it took no more than a dozen of hours for me to learn it and get a working setup on aws.

Maybe being full stack and having a strong knowledge of Linux and Docker helped.

Now I'm not pretending to be an expert with it, and there are certainly traps and mistakes that I didn't experience yet. But I don't understand what people find to be hard about it.


Maybe your starting instances are workhorses, but if you go from something like a t3.medium to two it’s a ~$60 / mth increase... Not something I’d personally optimize for.

Also why Docker in the first place? I’m genuinely wondering - in the stacks I run (Express / Python) it doesn’t seem necessary at low scale. Elastic Beanstalk, Heroku, Digital Ocean etc all offer facilities for single-command deploys that work out of the box.


I like Docker for the 'keeping it clean' aspect. Install php, composer and stuff juat because one of the projects you host requires it? Nope. Have to make excessive configuration on a system component just to run another application? Nope. Forgot how you set it up and now you are struggeling at the new machine? Use the Dockerfile.


vagrant and ansible can help.


I use Docker without the network virtualization as a package manager.

Docker make it easy to run the same version of code in different places and let’s things run next to each other without version conflicts.

Also, I think you’re in a very small minority not to care about $720/yr increases in your hobbies.


GP was talking about projects that had revenue, and about "hiring someone" past a single instance.

I replied that beyond a single instance, you can probably get away with not hiring a K8s devops person and just spinning another instance. I'm not sure you've read this whole thing right.

And yes, I certainly wouldn't mind paying an additional $720 / yr for a project that had revenue; I almost certainly wouldn't want to spend money hiring a specialist, or spend time hyperoptimizing that myself - I make that in about a dozen hours of work, so counting how far one can go down the rabbit hole of optimizing server costs, and the associated cost of opportunity, the economics are crystal clear.

I don't have any successful personal projects but I have significant experience working with clients, and they are sold on the reasoning pretty much every single time ("I can charge you $3,000 for developing this feature, or we can use a paid service for $720 a year").

I also don't see how Docker is going to save you that much money; if you need a certain amount of compute, you need a certain amount of compute. AWS ElasticBeanstalk for instance charges nothing for spinning up an additional instance compared to EC2; there is no overhead for the PaaS aspect of it, like there would be in Heroku. Digital Ocean app platform is the same as EB, AFAIK.


GP talked about using Docker to do complex single host deployments until they needed horizontal scaling, which given max VM power, is after they can afford someone to manage it for them.

That makes me think they have multiple services of variable workload packed onto a single host, eg web server, async, and DB all on a single host via Docker.

That’s the antithesis of EB, which can only do horizontal scaling. Docker provides a way to replicate those multiple services in a deployment configuration when you want to set up a host image.

Not using Docker as an easy way to pack it all onto a single box as long as they can is just wasted expense.


I’m leaning towards using Ubuntu/systemd to start all my services, and using a single container per project with SQLite.

This way:

- I can easily move from dev to prod by using the private container registry

- I have apt-get on the server and just use the default Ubuntu

- No distributed/network state

I used to use Core OS, but all these container OSes are here today and gone tomorrow, and normally have their own config standard. At least with Ubuntu they have LTS and a bunch of Google pages for fixing stuff.


systemd-nspawn is the most underrated piece of software on any modern link system. It just works and does exactly what it needs to do. Not to mention using a directory structure as a container image?! What is this sorcery?!


Yes, k8s is not built for solo outfits and small projects.

In places with 10+ developers and 10+ services on a single cluster it works surprisingly well from the user (developer) side.


Hmm, I guess I should shut down my business then since I'm a solo founder using k8s exclusively for my hosting for 3+ years...

Definitely a learning curve but honestly not bad if you like ops. Absolutely possible and beneficial for any size team. As always, it depends on what your goals are and how you want to use k8s.


In my opinion (and I'm biased as I work on GKE Autopilot), GKE is viable for a 1 developer project, especially with Autopilot mode (I have my own hobby projects deployed on GKE).

If you were self-hosting k8s, then I'd agree with you.


I think it is more like: 10+ servers OR 10+ developers. you can have 10+ services as a single developer ;)


docker swarm is a good alternative for you then, comes out of the box with docker and you only need to add a few lines to your docker compose files to make them swarm compatible


> simple ... docker-compose deployments

So, not simple.


Sadly, this is a typical Register headline.

Google did not say "Kubernetes is too complex" but rather, they are making this new tool - called Autopilot - that is an abstraction layer on top of Kubernetes for certain types of applications / companies.

This Autopilot system still uses Kubernetes AFAICT.


I love The Register for that, they just add hyperbole to something which would never come out from a company itself, and often they are right.

K8s is too complex, and I say it as k8s dev(ops) who dreams the next great thing will come soon which will save me from piles of YAMLs and will bring back programming fun ;) /s


K8s isn't too complex for what it does and can do.

it's just that a lot of people think they have to use k8s because it's trendy even it doesn't fit at all their needs and scale. Then they whine it's too complex.

classic PEBCAK issue.


They have a quote from the GKE lead directly saying "Despite 6 years of progress, Kubernetes is still incredibly complex."

But it is a surprisingly negative headline. Autopilot sounds like a cool tool to simplify container orchestration!


> But it is a surprisingly negative headline.

That's the Register's schtick, they're snarky about everything.


They are usually closer to the truth than evangelists telling us how this new tool is going to solve all our problems.


Reality is too negative for social media.


El Reg's snark is what makes it fun to speak with as they do cut to the heart of it.

GKE Autopilot is still fundamentally Kubernetes and supports the k8s APIs. GKE has simplified a lot of Kubernetes tasks that can be a pain, such as upgrades & scaling, but as many people said on other threads Kubernetes definitely is not for all workloads. For workloads that actually do need to scale then Autopilot removes a lot of the time consuming tasks required to get Kubernetes running and staying running. There are restrictions so check the docs on if it would work for how you use k8s.

Disclosure: I run product for GKE at Google so I'm definitely not a neutral voice on this...


I'm looking for a job now.

Interested in having someone who has had to set up 1000s of containers in k8s and hated all of it?


I almost replied "who would look at Kubernetes and think it needs more layers?" but it sounds like Autopilot is a new cloud service - the customer (hopefully) doesn't interact with the underlying Kubernetes layer. I guess it kinda makes sense.


View from a Googler: Autopilot stitches together existing GKE functions and is just a cluster creation decision. The layer it adds is automating the different tools together and SRE support magic so that we're monitoring it now.


I mean, it's really too complex to run k8s yourself so Google made an automatic tool so you don't have to.


And? None of this is contradictory unless you're reading too much into the headline.


No, it is not a typical Register headline, it is a half of a typical Register headline. However, HN has limits on headline lengths and it has been cut instead of being editorialized which afaik is another HN guidelines violation.


Paging @dang, could we get the title changed? It’s clickbait in its current state.


I have a love/hate relation with K8S, I have several customer project that I work on deployed on GKE. For the most part it has been a rock solid set-and-forget experience, but...

The problem for me is that these project are not under full time development. Most of them see maybe one or two deployments per year. It seems like every time I do a deployment, some part of the YAML config structure has been deprecated, with no clear path on how to migrate. In the end I have to spend a day figuring out how to rewrite the YAML configs to do _exactly the same_ as before. This is really hard to explain to a customer.

But it's not just K8S itself that suffers from this, third party components do this too. Stuff like ingress control is a nightmare, I've seen the nginx ingress controller (not to be confused with the 'other' nginx ingress controller) grow from a 50 LOC base config to multiple KLOC of YAML config you are supposed to pipe to your production environment. The ACME/LE extension (cert-manager) is even worse, the default config is 26KLOC long! [0]. And you are supposed to pipe this config straight into your environment. The amount of new and amazingly complex components that are added with each new release is staggering.

If you are an average Joe developer and you want to use containers in production, just stick with a machine with docker-compose on it. It's a lot easier to maintain and has far fewer surprises down the road. It is also much easier to make a quote on maintaining a VPS box that having to get a crystal ball to predict when Google will deprecate parts of your setup.

[0] https://github.com/jetstack/cert-manager/releases/download/v...


> If you are an average Joe developer and you want to use containers in production, just stick with a machine with docker-compose on it. It's a lot easier to maintain and has far fewer surprises down the road.

This is exactly what I am doing. It's easy to configure and almost same configuration between local test and production machines. The only downside is it's hard to monitor whether service is up without additional tools, which I lazy to research.

Additionally if you don't have a good enough ci cd yet you can deploy with volume-linked container.


I can understand the use of kubernetes in very large orgs to manage clusters of hundreds of nodes, but it seems to me the complexity isn’t justified if you only have 1-100 say. There are lots of possibilities between 1 server and 100, and lots of ways to have simple replicable deploys if your needs are simple (probably 95% of businesses).

Simple load balancers without auto-scaling work fine!

For smaller non-critical services sometimes even one reliable server is an acceptable trade-off.

Do most companies really need kubernetes given the complexity and difficulty it introduces?


Maybe I can offer an answer to your question, I have worked at a couple of companies where we ran "small" scale k8s clusters (1-100 nodes as you say).

We have chosen k8s and I would again, because its nice to use. Its not necessarily easier, as you point out, the complexity of managing the cluster is considerable. But if you use a managed cluster like EKS or DO's k8s offering, you don't have to worry too much about the nodes and the unit of worry is the k8s config and then for deployment you can use Docker.

I like Docker, because its nice. Its nice to have the same setup locally as you have remotely.

In my experience the tooling around k8s is nice to manage declaratively, I never liked working with machines directly because even tools like Chef or Ansible feel very flimsy.

The other thing you can do is run on ECS or similar, but there the flexibility is a lot lower. So k8s for me offers the sweet spot of being able to do a lot quickly with a nice declarative interface.

I'd be interested to hear your take on how to best run a small cluster though.


Thanks, that's really interesting. Everyone has different challenges and requirements, and of course different experiences.

For smaller setups (say 1-10 services) I'm quite happy with cloud config and one VM per process behind one load balancer per service. It's simple to set up, scale and reproduce. This setup doesn't autoscale, but I've never really felt the need. We use Go and deploy one static binary per service at work with minimal dependencies so docker has never been very interesting. We could redeploy almost all the services we run within minutes if required with no data loss, so that bit feels similar to K8s I imagine.

For even smaller companies (many services at many companies) a single reliable server per service is often fine - it depends of course on things like uptime requirements for that service but not everything is of critical importance and sometimes uptime can be higher with a single untouched service.

I think what I'd worry about with a k8s config which affects live deployments is that I could make a tweak which seemed reasonable in isolation but broke things in inscrutable ways - many outages at big companies seem to be related to config changes nowadays.

With a simpler setup there is less chance of bringing everything down with a config change, because things are relatively static after deploy.


>We use Go and deploy one static binary per service at work with minimal dependencies so docker has never been very interesting.

how do you deploy your static binary to the server? (without much downtime ?)


Sorry that should have said one binary per node really, not per service (though it is one binary per service, just on a few nodes for redundancy and load).

Services behind a load balancer so one node at a time replaced then restarted behind that, and/or you can do graceful restarts. There are a few ways.

They're run as systemd units and of course could restart for other reasons (OS Update, crash, OOM, hardware swapped out by host) - haven't noticed any problems related to that or deploys and I imagine the story is the same for other methods of running services (e.g. docker). As there is a load balancer individual nodes going down for a short time doesn't matter much.


> how do you deploy your static binary to the server? (without much downtime ?)

Ask yourself how would you solve this problem if you deployed by hand and automate that.

1. Create a brain-dead registry that gets information about what runs where (service name, ip address:port number, id, git commit, service state, last healthy_at). If you want to go crazy, do it 3x.

2. Have haproxy or nginx use the registry to build a communication map between services.

You are done.

For extra credit ( which is nearly cost free ) with 1. you now can build a brain-dead simple control plane by sticking an interface to 1 that lets someone/something toggle services automatically. For example, if you add percentage gauge to services, you can do hitless rolling deploys or cannery deploys.


No, but it’s not just complexity. We’re a small company, but after spending about a month to setup our GKE a couple of years ago (incl learning terraform in the process), it’s been rock solid, low involvement and reliable. Declarative resource specification is a legitimate game changer and you couldn’t pay me to go back.


I'm not sure why you're getting downvoted unless people are afraid that you're sponsored by Google. Your experience aligns with mine. Running a SaaS app by myself that serves heavy loads around 1 rps with burst of 400+ rps, k8s works really great for my use case.

But I grant you, it has been a very complex, painful journey to get this working right, and in fact I'm still making tweaks and adjustments. Plus, often times it's really unclear if I should scale vertically or horizontally.


"k8s works really great for my use case. [..] But I grant you, it has been a very complex, painful journey to get this working right, and in fact I'm still making tweaks and adjustments."

TBO, it does not sound like it's working great for your use-case =) Wouldn't something like Cloud Run, AppEgine Standard or Heroku be much simpler and cheaper? Or just a single $5-15 per month* VM?

* depending on how much oomph you need for those 400 rps


Sadly no because I need stateful RAM. A single VM might work but I'm scared


Standardised declarative config/setup I've found really useful, declarative resource specification (as in n servers for service y) I don't really have a use for though I could see how in larger companies with hundreds of servers and lots of employees it would be attractive. If it is rock solid and reliable and impossible to mess up, I could see it being really nice (as your experiences sounds), though it feels to me like end users of services like GKE should not have to know about kubernetes or its complexity, nor have a chance to mess it up by editing configs directly.

I guess the ultimate goal of services like GKE autopilot is for you not to have to worry about kubernetes at all, just give them your workload and have it run on whatever resources they think is appropriate?

I do think it's important to recognise though that there are lots of ways to host services, and simpler with less abstractions is often better and more reliable and certainly easier to debug when things go wrong.


I mean yeah, I’m not arguing against autopilot and would probably have used it if it existed when we kicked off. Cloud Run had just come out and I really wanted to use that but the networking wouldn’t do what we needed, so had to go all in. But in the end, we learned enough Kube, we specified it all with Terraform, we haven’t tried to do too many fancy or complicated things, and we have reliable infrastructure fed with a simple deployment/build pipeline (just using cloud build, it’s clunky but it works). I mean maybe I’m conflicted as we’re part of GCS, but the system works well for us. If it didn’t we could’ve jumped to Azure or AWS and got credits there.


Sounds like a nice reliable experience to me, thanks for sharing your experience on it.

If I hear more reports like this I might just have to try out GKE :)


Kubernetes adds a vast amount of complexity, and in my rationale is because it centers scaling on the wrong unit (the Operating System).

Docker introduced a great level of abstraction and reproducibility over platforms. However, Docker (or OS-based containers) are the most atomic unit of computation on Kubernetes. Which causes centering scaling on the Instance, instead of the Application or even the functions.

This leads to a lot of unintended side-effects. Of which complexity is the most evident since now you need to handle scaling, monitoring and compliance over the VM layer rather than on the functions or the app.

I believe a VM centered on the app (WebAssembly VMs) or functions (serverless approach) is the right computation unit to allow proper scalability and a simpler and more powerful system on the long term.


>I believe a VM centered on the app (WebAssembly VMs) or functions (serverless approach) is the right computation unit to allow proper scalability and a simpler and more powerful system on the long term.

Docker's co-founder agrees:

"If WASM+WASI existed in 2008, we wouldn't have needed to created Docker. That's how important it is. Webassembly on the server is the future of computing."

https://twitter.com/solomonstre/status/1111004913222324225


So we circle back to app servers like Java EE WAR/EAR stuff.


With the exception that Wasm apps are based on a open standard, almost any language can target it, are more lightweight, and they can also run on the browser :)


I’m not sure I agree with the lightweight point. WebAssembly is missing features needed by many high-level languages. As a result, they have to resort to less than ideal tricks. For example, C# on WebAssembly runs using an interpreter even though normally it is JIT compiled.


> WebAssembly is missing features needed by many high-level languages

Agreed, although at some point in a not very far feature most of those missing features will resolved. So in my mind is just a matter of time.

The Wasm Community group is doing an awesome work on that :)


Ikr? Enter weblogic/websphere java ee (bleh)


To me, this essentially defines "the cloud" which exist because they've convinced CTOs that scaling is an infrastructure concern.

The scaling of their primary business (amazon.com, google.com) is at least partially an infrastructure problem - so they had to solve this anyways. Why not try to sell it too?

But it's very rare that you can scale a system by only scaling infrastructure. Ironically, probably the only way this will work is if you scale vertically - which means you don't need K8 and want to avoid the cloud like the plague.

There are all types of application-specific consistency issues that need to be treated as first class parts of the system for horizontal scaling to work.


> Which causes centering scaling on the Instance, instead of the Application or even the functions.

That should be why Red Hat took an early investment in Kubernetes. Operating systems did matter for them. Applications did not.

> I believe a VM centered on the app (WebAssembly VMs) or functions (serverless approach) is the right computation unit

I agree with that. The thing I worry about is a k8s variant like Krustlet, running Wasm instead of container, might introduce the same complexity into the Wasm world. We need a more app-centric scaling solution.


What happened to "focus on the business logic / application"?

Are we just making rabbit holes out of rabbit holes of abstraction using kubernetes?

I just use and push code to Heroku and I'm done for the day, simple. NoOps I call it.

I wish more tools and platforms were like this.


My background is in embedded, so I admittedly know extremely little about web development, but whenever I'm curious and sit down to read about microservices and containers and orchestration and all that stuff, my mind starts to numb and I can't help but conclude that 99% of companies that use it probably don't need to. And that they're just a complex way for engineers to keep themselves spinning their wheels and not actually working on an application. Like that guy who insists on doing nothing but refactoring and rearchitecting and moving things from one layer in the stack to the other, re-writing in different languages, integrating new third party libraries that do the same thing their existing third party libraries, but not actually adding anything that improves the product. How did the world get into this situation where you need all this complexity just to deploy and configure not even an application, but part of an application? Amazing!


> And that they're just a complex way for engineers to keep themselves spinning their wheels and not actually working on an application...How did the world get into this situation where you need all this complexity

Isn't this just labeling the knowledge that you don't have (and could probably read up on) as potentially unnecessary complexity?

I mean, every time I hear about embedded, I keep hearing about byte boundaries, RTOSes, compiler chains, musl, JTAG, and a million other things that make my mind numb. But I assume embedded engineers need to know all those things, because of the constraints unique to their field. Some of them could be just "a complex way for engineers to keep themselves spinning their wheels and not actually working on an application", but a lot of them provide value in the real world and solve some specific problem.

Cloud infrastructure management frameworks are the same.


Fair enough--we perceive things we're ignorant about as complex. I buy that.

I guess I was comparing it to how things were (or more precisely, how I perceived things) 20 years ago. You'd rack a server or two, install Linux, stick the application in /usr/local/bin, make sure Apache was set up, and you were off and running. Simple enough. It probably didn't scale to 2020-sized Internet user counts, though.


> You'd rack a server or two, install Linux, stick the application in /usr/local/bin, make sure Apache was set up, and you were off and running. Simple enough.

That sounds incredibly complex! :) Where would you find a place to rack a server? Are there minimum rates for such a place, or can you rack a single server for one month? Would you be able to rack it yourself? What if there was some problem with the PSU, and the server lost power when the data center was fine? What if the network card failed? What about if a hard drive in the server blew out?

Cloud infra in 2021 infrastructure has pretty turnkey answers to all of those (Kubernetes is not one for any of the ones I mentioned, except maybe for “what happens when any node goes down“). So yes, the complexity might be high, but so are the capabilities.

I mean, maybe you are running a mom-and-pop shop with a server rack in the basement for some reason, and have pretty daytime-specific hours. That's fine, and I still think you can do the server rack thing. Or at least that's the way I see it.

Perhaps one piece of meta-commentary here is that server hardware has not become substantially cheaper, faster, smaller, and easier to set up since 20 years ago. I can't really set up a server in my small bedroom that's capable of serving 10k concurrent users, and doesn't deafen me with its noise. Maybe if I did, all this cloud stuff might be less ubiquitous?


Yeah, everyone feels that way until they ran into several issues that suddenly make containerization look like a great idea.


examples?


> How did the world get into this situation where you need all this complexity just to deploy and configure not even an application, but part of an application? Amazing!

Because everyone else is doing it.

Things have gotten so complex that we need complex solutions, but w're afraid to make and own the solution ourselves because of cost, focus on core business or recruitment/knowledge concerns. So we need to find the next best open source solution to leverage the collaborative effort to reduce time and cost and have a large community to fall back on for support. And because everyone jumps on board of the same train all use cases must/will be accounted for (else the solution will fade into a niche), meaning the solutions becomes a problem in and of itself.

Repeating the cycle once again.


There's an xkcd for that.


I do work in web app development, and what you say is spot on.


I used Docker for embedded.

Running a cluster across your IOT fleet (1k+ devices; 5-10 apps each) gives a nice interface for pushing out tasks, choosing applications for a device, configuring supervisor-device relationships, etc. It turns out from scratch Docker containers on ARM are very portable.

I think you’re being dramatic — embedded is notorious for crazy builds of weird config flags to even get “hello world” to compile, and you’re waving your arms pretending that concepts from Erlang abstracted away from the language are too much.

Docker is just cgroups and namespaces, with a zip file of code. More or less literally.

Orchestration is the same mess it’s always been — back to at least the telephone days, when Erlang used the same concepts.


That sounds fascinating. Would be very interesting to read more about your IoT fleet, tasks, and deployment with that stack.


In my day, we used to just insert the server into the rack, plug it into the network, terminal into it and configure it. I'm not being sarcastic here. There was a clear correspondence between what we were doing and what it meant for the infrastructure. Now there are so many layers of abstractions that we've basically forgotten it's all just CPUs, hard disks, memory modules and network connections.


And Heroku is great for simple apps.

When you start having a lot of pieces to manage eg. cache, database, auth, multiple applications then Kubernetes comes into its own.

Because then you can scale, monitor, trace, debug, log, backup, audit, encrypt and visualise all of those pieces in exactly the same way.

And do it irrespective of which cloud you use or whether it's even in the cloud at all.


Even if your default starting stack is somewhat complex, for example, separate client and server apps, database, cache, and queue, most one-click cloud providers (ex Heroku or ElasticBeanstalk) offer ways of unifying logs, monitoring, simple provisioning, etc. You are “locked-in” in that you can’t move to another provider within an hour, but the lock-in would still be very low, you can use generic technologies (ex: Memcached or Redis) and have an impact only on a few config files...

I’m not telling people what to do, if you like K8s or Docker or what have you knock yourself out, and I mean it. For example, people keep telling other people on HN not to use React and that’s a hill I’d die on - and could write a dissertation defending it. I’m just wondering what the dev experiences of others are so I may learn from them.


Looking at the istio-linkerd-traefik-consul-whatever-the-heck mess of cloud-native ecosystem projects, I want to decree that backend engineers are no longer allowed to make fun of Javascript engineers for having too many frameworks anymore.


Not a good analogy.

All the tools you mention are not integrated into your app neither change how you write code. In fact your application shouldn't even know what service mesh you are using.

I can write a back end application once and then have it run with istio/linkerd/traefik/whatever with zero code changes.

That doesn't happen in the Javascript world. The choice of framework directly affects your code.


Mind blowing to me that "Heroku but with docker images" doesn't seem to exist. Would love to be corrected!


Heroku does allow building / deploying Docker images - https://devcenter.heroku.com/categories/deploying-with-docke...


Azure AppService has some pretty good support for that.

https://azure.microsoft.com/en-us/services/app-service/conta...


That's pretty much Cloud Run, no?

https://cloud.google.com/run/


I'm looking for something like that, but I'm afraid of using any google developer services, such as GCP, for personal projects. What if I breach their TOS somehow and get banned, or what if I didn't breach their TOS and still get banned?

Can't afford to test my luck until I've finished migrating all my accounts off my gmail.


Why don't you just create a new Google Account specifically for this?


Associated account bans are a thing in the Android world. I don't know if this extends to other Google platforms.


I'd also need a new credit card, I'm pretty sure they link accounts via shared CC. And even then, I'm pretty sure they still link accounts via other means. Did I link my gcp email in my android gmail app? Did I use the same IP address for both accounts? Or any of the thousand ways google has to know two accounts belong to the same person.

Honestly, creating a second google account might itself increase the chances of getting banned. Nobody knows with Google, and that's the problem.


Doesn't that require another active SIM card?


I'm using my main google account on GCP and just yesterday started to take precautions about this.

I created a new google account with no relation to my main one and added it as an owner to my GCP project, so hopefully in the event of either account being banned I can still access everything.


Cloud run is great. I'm using an Nginx image to serve my static website. However, if I remember correctly, you can only respond to HTTP(S). So though it may be enough for most usecases, it is not essentially equal to running any container on the cloud.


Websockets now work on cloud run. We implemented that last month. Cloud run does have a few limitations: no service discovery built in, no docker compose support, limited set options for CPU/memory, no persistent disk, etc. But it's great for things like a simple Spring Boot server or any kind of stateless service. But you won't be running redis or a database there. It's just not designed to do that. It's also not great for running batch jobs; we tried and our jobs kept getting killed/throttled. Use a vm for that.

Luckily, there are lots of other Google services for that that you can plugin for that sort of stuff. Cloudrun is great if you are planning to use those things.

Kubernetes is what you use when you want to mix stateful and stateless stuff so you can avoid depending on those services. That makes sense if you need to support multiple clouds or on premise installations. But otherwise, it's a lot of extra complexity and devops even before you consider the overhead of managing the kubernetes cluster. There are lots of companies that talk themselves into needing this where the need is arguably a bit aspirational. I've been on more than one expensive project where we served absolutely no traffic at all with hundreds of dollars worth of kubernetes clusters idling for months on end that had no realistic hopes of ever getting more than a very modest amount of traffic even if everything worked out as they planned.


websocket support: awesome ! have been waiting for that ! :)


gRPC and WebSockets are in preview. Doesn't look like you can use arbitrary ports though.

https://cloud.google.com/run/docs/triggering/grpc


You can open just 1 https port but you can map whatever port in your docker container to that. Some websocket implementations work with a second port and that just doesn't work. But you should probably split those services into two. But if you use something that can mix websockets and normal https traffic over 1 port, it works great.


For my usage (lightweight game servers), I need WebSockets, which are still in beta and they currently force WS connections to close after 1 hour, which is a dealbreaker for me.


Dokku is "it" for me, but I'm sure there are others. Works pretty well, but it's (to the best of my knowledge) for one node only.


It is coming. Check https://www.qovery.com/ and https://www.hyscale.io/

(Not affiliated with either)


Think thats AWS Fargate, but admittedly i don't do a lot with Heroku.


CloudFoundry has ability to host your own docker images easily AFAIK.

Edit: fixed name


Digital Ocean's App Platform can work this way.



Dokku looks super cool but it's still a lot more involved than Heroku. Heroku doesn't make you think about provisioning servers, updating infrastructure, or manually setting up common integrations like backups and logging.

Just to illustrate the point, dokku's docs[0] for logging say "Warning: The default docker-local scheduler will "store" these until the next deploy or until the old containers are garbage collected - whichever runs first. If you require the logs beyond this point in time, please ship the logs to a centralized log server."

Alright, well, that's gonna be a whole lot more work than two click in Heroku to send all my logs to any logging provider of my choice.

[0] https://dokku.com/docs/deployment/logs/


render.com


> I just use and push code to Heroku and I'm done for the day, simple. NoOps I call it.

I just enter my Honda Civic and drive to my office. Simple. I call it "Sedan".

(Honda Civic driver looking at an 18-wheeler in the highway and not understanding why somebody would use that)

In all seriousness though, Kubernetes solves a specific set of problems. Just because you personally don't have these problems, doesn't mean that Kubernetes is bad, or that people who use it, don't need it.


Not seeing anyone mention it, so I'll just share: https://k8slens.dev/

Lens has been a huge boon for helping us manage our kube cluster and regular devops operations, and even 15 minutes with it helped me grok a number of complex kubernetes concepts that I've struggled with for awhile now.

Everyone who works with k8s for a living should at least know of this tool imho its fantastic with prometheus


Using a client like that helped me get up to speed as well. If anyone wants a terminal based k8s client, I like https://github.com/derailed/k9s


This looks awesome, I wish someone told me about this tool earlier.



I'm scared. From 2000 til 2013 I did exclusively server side development. In 2013 I transferred (@ Google) into doing mobile and embedded work, and missed the whole docker thing & the K8 transition. I've now transferred back over to doing cloud/server/backend work and I'm just a little terrified of what I've gotten myself into. In addition to re-learning Google's Borg stack, learning Golang, and readjusting my headspace, there's a whole giant world of cloud tech and vocabulary that frankly wasn't happening 7-8 years ago. And then I see articles like this... haha..


Those 13 years of doing things the old way will come in really handy when the abstraction leaks and you're the one who actually knows how to troubleshoot firewalld


Do what every one else does for borg: copy-pasta something simple (hint: look at old versions of other apps' configs) and ask lots of questions.

As for cloud tech: https://www.amazon.com/Mastering-Kubernetes-container-deploy... will cover 90% of it for you.


Kubernetes is dramatically complex to the point where I refuse to touch it if I have to do any configuring myself. Unfortunately there are not really any good alternatives. I work on the Azure stack myself, and it has only Azure Container Instances, which Microsoft recommends to not use for production purposes (without explanation why). For deploying a simple machine learning model on Azure Machine Learning, you need to set up a whole Kubernetes cluster, and if I remember correctly it also needs to have a minimum of 6 machines which is a very high amount for simple projects. At least with Azure ML they have an abstraction layer over Kubernetes so that you don't have to deal with any of its intricacies.


From the TA: The maximum number of pods per node is 32, as opposed to 110 on standard GKE.

This is sad.. I don't understand why providers do it. It makes it very expensive to run small staging clusters. There is some reference on the current state here:

https://docs.google.com/spreadsheets/u/0/d/1yhkuBJBY2iO2Ax5F...


In autopilot mode you pay for your resource reservations, not nodes.

The pricing is... not cheap: https://cloud.google.com/kubernetes-engine/pricing

Comparable-ish with Fargate. You probably wouldn't be using this with an eye to save money unless you think (or have measured) that you'd spend more on operations, security or compliance otherwise.


Googler, opinions are my own.

Direct link to the docs: https://cloud.google.com/kubernetes-engine/docs/concepts/aut...

Google has a tool by the same name (autopilot) internally that does close to the same thing. There was a paper published on it 10 months ago talked about here: https://news.ycombinator.com/item?id=22980467


Is this really the same thing? The paper is talking about auto adjusting workload limits based on job history, right?


Probably the scaling aspect of k8s autopilot is the only part that is similar. The rest looks similar to functionality provided by other tools internally.

But likely they shared little in common.


As a small nonprofit with a tech team consisting of mostly volunteers, a few interns, and a small % of our paid staff time, this was a big concern for us.

Things are always fine when you're starting, but if you don't understand the system it's hard to troubleshoot when it breaks. Understanding Kubernetes is hard. Obviously there are ways to outsource cluster management, but we don't have the budget of a VC funded startup.

We ended up choosing Hashicorp's products -- Nomad for orchestration, Consul for service discovery, and Vault for secret management. Each one is just a binary with a <20 line config file, and it takes ~two days to read and understand the docs well enough that we have been able to troubleshoot issues quickly.


As someone helping a nonprofit with their engineering needs, I’m also looking into Nomad+consul. Do you have any pointers for me?


Complexity slows you down, regardless in which form it comes. You should only invest in a complex system/paradigm/language if you really need it. Before K8s OpenStack was all the rage, and I've seen several companies wasting millions trying to set up and operate their own cluster.

IMHO, most of what K8s offers can be obtained cheaper and in a simpler way using more "traditional" DevOps approaches and systems. I'm currently operating an IT infrastructure consisting of more than 20 different components, using only basic Linux technologies and open-source packages (ssh, iptables, ipsec, ferm, (r)syslog,...) plus Ansible to orchestrate it all. Never encountered a problem that I wasn't able to debug and fix within a few hours, and managed to have more than 99.99 % uptime so far. I understand that this approach might not work for large companies, but it seems to me a lot of startups are going down the K8s route just for the sake of it, and their DevOps processes become incredibly brittle and slow as a result.


I can't fight the feeling that this is all circling back to the application server stuff that was popular for a time. And, really, I can't tell when those went so wrong. :(


The circle of tech:

1. Someone has an idea. It's alright. Really good for their use case. Someone else hears about it, likes it, and adapts it to a similar use case. So and so forth until the idea has a large user base

2. Employees of large companies hear about the idea and implement it

3. Marketing gets a hold of the idea, gives it a flashy name, and uses it in promotions

4. A majority of the loudest voices in the industry get on board so everyone starts forcing this idea on every imaginable use case

5. A lot of people realize this is all too much extra work without much benefit so they start looking for more appropriate solutions

6. return to 1

It's happened with mainframes, object oriented programming, services, micro-services, web-all-the-things, blockchain, steaming data (kafka), sql, nosql, containers, agile, VMs, cloud, and many other things.

It's just technology people are the worst at getting caught up today's fad and I wish we could try to not do that as much.


You have a lot of tech companies digging up gold, and a whole lot more selling shovels to the rest of the crowd.

Most of the software tech world is bullshit. Most best practices are bullshit. And, if I were feeling a bit conspiratorial, I'd say the large tech companies do this on purpose. They promote fads that overburden any smaller company with more modest budgets, thus keeping competitors at an arm's length. Resume-driven-development plays a role in this as well.

A lot of companies following the FAANG cargo cult are digging their own grave and don't even know it. They don't realize they don't have the manpower for microservices. Or the calendar time to make it work and still get product out the door before their competitor that doesn't even unit test eats their entire lunch.


There's hope that companies that have to be competitive from poorer countries will push correct technologies. I see it happening in UK, Brazil, Russia etc.

But it's no good to be big company when Whatsapp can be build by one person you missed to hire.


> Marketing gets a hold of the idea, gives it a flashy name, and uses it in promotions

After a half-decade in the industry, I am beginning to realize there is a lot of bad engineering hiding behind marketing and emoji. So many of these web tools have nightmarish interfaces and add complexity that, for most of the industry, is unnecessary. But their docs pages are full of rocket ships and confetti.... I swear, seeing a page with rocket ship emoji has become such a turnoff. Why do we need our engineering handed to us sprinkled with decorations like a cupcake?

And the imposter syndrome that the entire industry seems to share dictates that a large percentage of developers feel like they need to be using these tools as a badge to wear saying, "Yes, I am with it and hireable."

Meanwhile, the technology itself keeps churning because there is now a profession full of people believing that creating and open sourcing the Next Big Thing in ____ Technology is the best way to move their career forward. People keep reinventing wheels because everyone is focused on making a name for themselves with the new; no one notices the person doing mundane maintenance on Rails or whatever.

I think it's been this way since the 70s/80s to some extent, after having done some reading on the history of the profession. I think it's just scaled with the number of programmers and the Internet has applied its intensification effects.


Good news, you can invest in my new startup where we are solving just that issue with a decentralized social technology blockchain oracle that will make and enforce these decisions, reducing tech churn and enhancing developer productivity.


Ashok?


This is quite accurate. On top of that:

1. Build an opinionated solution that you control fully (e.g. difficult to fork).

2. Convince everybody to adopt it. Nobody gets fired for choosing $bigcorp

3. Grow it complex and expensive to maintain over time.

4. SELL software and services to manage its complexity.

...and the industry is getting more marketing/fad/resume-driven every day.


This is not just limited to tech - what you are describing is essentially the Gartner hype cycle.


Look up the hype cycle.


I think the single biggest mistake people make with Kubernetes is implementing it too soon. Last company I worked for spent piles of time fighting K8s when a simple well implemented cluster would have done the job.

It makes a lot of sense to build portable infrastructure, but you can scale a long ways with much simpler technologies.


I hear this refrain consistently on HN but of the options I've tried, GKE is the most pleasant for my single person app.

- App Engine has a bunch of weird limitations and slow deploy times. Qualitatively, it feels like the spotlight has moved on.

- Running my own compute instances felt like reinventing Kubernetes, especially once you roll your own deploy mechanism and throw load balancing in the mix. I also don't buy that it's easier.

- Cloud run is promising but for database heavy apps it's a non-starter.

- GKE was pretty smooth. It feels like it gets a lot more love than App Engine. The UI was functional with lots of depth. Once I push a docker image, GKE updates the nodes to serve the latest version. Load balancing was a matter of ~3 yaml files at 10 lines a pop.


You are mistaking my point here. I'm not suggesting you replace K8s with some other thing which does the same things k8s does, but that you build out a simpler/ easier to maintain solution until you need something like k8s. Many Many Many businesses can do just fine without the kind of complexity K8s (or App Engine or GKE) brings to the table.

There are a lot of businesses that will never have to deal with dynamic scaling, engineering autoscaling into those solutions is pointless. If you need that kind of scaling then K8r is fantastic. My point is a lot of people turn to K8s well before they need to or without understanding why they might need it.


I've only played with k8s for like an hour in total, but it's pretty obvious to me that k8s fell very much victim to the second system effect.

It's an insanely complex solution to a very niche problem - scaling stateless web app backend nodes written in scripting languages.

Stray even a little bit off the garden path and you start feeling pain.


You only played with it an hour and you reached all these conclusions?


Yes, because I saw that my (not-so-niche, actually) use case wasn't even considered, and "noped" outta there as fast as I could.


What was your use case?


Some ETL logic packaged in Docker containers with a bit of simple scheduling and orchestration.

K8s sounds like a good idea on paper - you get reproducibility and resilence "for free" - but then I found out I have to effectively roll my own everything with k8s anyways and went with Jenkins instead.


> scaling stateless web app backend nodes written in scripting languages

K8s solves a far wider problem space. Need to run a data store? Use a StatefulSet and PersistentVolumes. Need an occasional task? Jobs and CronJobs. Need to know what's happening? Metrics and logs have APIs. Load balancing? Ingress? Firewalls? Security?

If someone knew nothing of operating systems other than a class on MINIX, I suspect running a massive datacenter would be easier with k8s than running a medium system of debian boxes.


Build debs, use the package manager. VMs are already an artificial abstraction, just use them.


I would never voluntarily deploy anything using k8s or similar cloud stacks and services. But in practice the bulk of the complexity isn't in pushing and running the actual software, it's in provisioning attached resources (disks, databases), sharing certificates and keys, routing and load balancing, etc.

Way too complex for my tastes, but if you're convinced you need a highly automated control plane for those things then pushing .deb or .rpm packages doesn't even come close to solving that problem.

As usual it comes down to how you define the problem. You can't dissuade people from using k8s by comparing and contrasting k8s to alternatives; you've already ceded the debate over how to define the problem at that point. W'ever its relative merits, k8s is a reasonable approach to the problem of automating the control plane for "scaleable" services.


Debs move state (and computation) and provide a way to resolve dependencies. Done right they also provide rollback. Debs can moved a whole lot more than just the software.

Not saying it is the answer to everything, but it can do a whole lot more than software. Most of our problems are self made we can unmake them by changing the rules we operate under.

> As usual it comes down to how you define the problem.

Totally.

Redefine the problem until the solution is tractable and simple. K8s is the problem to a solution.


Ex Amazon SDE here. You are absolutely right.


My place has Kubernetes for when we are ready to scale. I am not sure it has gone above the default two pods. It did cause problems with moderately large uploads (5gb), as the memory gets chewed up far faster when using multiple small machines versus one big machine. I am not sure we will ever actually need this level of scaling.


This was glaringly obvious from day one.

Containers add vast complexity, add another layer of complexity on top.


Containers don’t have to be complex. Docker and docker-compose are very simple to use. Docker swarm (rip) and Nomad are similar to kubernetes but orders or magnitude simpler.


Docker Swarm mode is alive and well and not going anywhere. It's right there, built into the regular Docker CLI.


That is great news. Thank you.


I used to be a Docker fanboy. But after writing a few semi-complex Dockerfiles, and seeing more of how complex and FUBAR things like networking is [1], I changed my tune.

This was years ago, so maybe they greatly simplified things. But somehow I doubt it =/

[1] " Docker, by default, punches massive holes through your firewall in non-obvious ways. People don't realize that with a default Docker configuration, containers are ignoring any normal firewall rules you may have setup with iptables or ufw." - https://news.ycombinator.com/item?id=25834444


Can you explain the whole concept of Kubernetes to someone whose knowledge of computers is limited to making simple webpages with HTML and using Excel/VBA?


In a 'normal' scenario, you might host your web page on an Nginx web server, on a Linux server somewhere. A container lets you do the same thing, but in an isolated area of the operating system. That means you can have an isolated area which has your web pages, the Nginx web server, and some other dependencies, all grouped up together into a distributable package called an image. And then you can pull that image on other servers and just start running it and it'll have your web pages and your Nginx running nicely.

So far that's Docker, or containers to put it more generally.

Now if your web page is so amazing that it receives a lot of traffic, your little container is going to get overwhelmed. And if it falls over, then it's dead and nobody can see your web page until you bring the container back up. Fortunately there are tools that let you manage this aspect of the image, called orchestration. You can tell orchestration tools how to figure out if an image is unhealthy and needs replacement, and if it falls over whether to bring it back, and importantly, how many copies of the image to run to handle the traffic. And if you need to push an updated image with an updated web page in it, how to gently make that new container available to the world without interrupting the traffic.

There's more to orchestration, you also tell containers how to talk to each other if needed, how to manage secrets, encryption, load balancing. There are lots of aspects of hosting that fall into this.

The two main orchestration tools I know of are Docker Swarm and Kubernetes. Docker Swarm is bundled with Docker already. It's pretty easy to shift from normal Docker use to Docker Swarm use, it works well enough for small-medium deployments. Kubernetes is a tool for much larger and highly flexible use cases, and it has a lot of levers and buttons and swiss army knives with its own swiss army knives. Many aspects of Kubernetes like the load balancing and secrets are all pluggable and you can use different tools in there.

Now you're at this article's topic, which is Kubernetes. K8s as it's called has a larger mindshare of the ops world, therefore everyone wants to use it, but it's very complicated, so a tool has been introduced to try to simplify it.


You know how there are linked references in Excel files that refer to other Excel files? What if you want to keep multiple copies of your Excel file with macros that refer to different files with data? And so you can run them on Windows, Mac or in your browser?

Kubernetes basically lets you define your references not as "c:\jon\reports\fy2020_final_final_2_comments_review_Bob_final.xlsx", but as "fy-report", with "fy-report" being defined elsewhere.

This is required to help you run the same program in different circumstances without breaking everything. You can say "run it on this slow computer with this test data", or you can "run it on many big computers with real data", but the program is exactly the same.

What makes it so complicated is that Kubernetes tries to abstract everything a given program would need, so it's not just references to external files, but practically the whole computer with all its network connections that must be defined elsewhere using Kubernetes' special language.

A lot of this special language is the same for 99% of programs, like in Excel you want your VLOOKUP to work on a $-pinned range with the last parameter set to FALSE 99% of the time. This makes people make the same stupid mistakes and finding them is hard.

And of course, this special language means you have to relearn a lot you know about running programs on computers, like when you move from Excel formulas to VBA.


I might start using that when people suggest adding new tech. If they can't explain it at that level, then it's going to be a difficult sell.


This is going to use some amusing metaphors.


At a previous job, our build pipeline

* Built the app (into a self container .jar, it was a JVM shop)

* Put the app into a Ubuntu Docker image. This step was arguably unnecessary, but the same way Maven is used to isolate JVM dependencies ("it works on my machine"), the purpose of the Docker image was to isolate dependencies on the OS environment.

* Put the Docker image onto an AWS .ami that only had Docker on it, and the sole purpose of which was to run the Docker image.

* Combined the AWS .ami with an appropriately sized EC2.

* Spun up the EC2s and flipped the AWS ELBs to point to the new ones, blue green style.

The beauty of this was the stupidly simple process and complete isolation of all the apps. No cluster that ran multiple diverse CPU and memory requirement apps simultaneously. No K8s complexity. Still had all the horizontal scaling benefits etc.


That's indeed the best approach for those who are already using a public cloud provider, to delegate the heavy lifting to ASG/MIG and ELB/NLB/GCLB/iLB, while keeping the stack nicely isolated.

At work, we have had multiple big and bigger incidents inflicted by the "share everything" nature of kubernetes. Sadly, it never occurred to the devops team that something like this is possible. Same goes for the HN crowds here who just assume that a container scheduler is a must, be it kubernetes or nomad or docker swarm.

I blame this on Google Cloud brainwashing. Let's see how long it will take for Thomas Kurian to sunset https://cloud.google.com/compute/docs/containers/deploying-c...


The implementation details and management tools aside, the mental model for what is basically interconnected hardware should never be this complex.


Hey that's only BE, throw in Angular with Observables and NgRx on the FE and people will start jumping out of the windows from insanity.


That's why I don't trust anything from google. They just release too complex software (cause theirs devs can deal with this crap anyway and it's good to let others suffer xD)

Only thing that's somewhat good is Go lang.


...and I haven't even mentioned writing tests for it.


K8s absolutely has a steep learning curve, but it feels like we always focus on the most complex setups and use cases so that we can admire the shiniest new features.

If you're going from "our app is running directly in tomcat in our on-prem data center" to "we want to move to a mirco-services architecture with a service mesh, canary deployments, multi-cluster observability, fault injection, automatic global failover, etc...", you're almost certainly going to have a bad time.

I really like the concept of "innovation tokens" [0] here. Pick one or two big innovations at a time and don't add more until you're comfortable with what you have. Get your app running in docker first, and then get it running in a basic k8s setup first instead of leapfrogging to the more advanced stuff. Chances are if you didn't need a service mesh for your original non-k8s deployment, you probably don't need one for the v1.0 first pass of your k8s rollout.

[0]https://mcfunley.com/choose-boring-technology


I'm surprised to see this k8s consensus on HN. Kuberentes has a very steep learning curve. I spent a week over Xmas 2019 just bringing up and tearing down clusters until I got something that was functional. The documentation is not particularly friendly so there was a lot of trial and error. But eventually it clicked. Then a couple months later we decided to invest by slowly moving non production services over, and then ultimately moving everything over.

We have not had any major k8s incidents in the past 12+ months of running our own HA cluster. We even run HA Postgres using local volumes. We've pretty much moved everything over and couldn't be happier.

We have red teamed disaster scenarios, we can bring the full cluster up on another cloud provider using vanilla k8s in about an hour (including a restore from Postgres S3 backups + wals). And this is all with a very small team.


As I understand it, the main benefit of Kubernetes is the ability to pack multiple applications in a single VM. This way, one uses fewer paid resources to deploy a service mesh. In return for actually being able to show that you are not wasting any RAM or IO, you give up isolation.

The "declarative" part of Kubernetes control buried in YAML seems to fail to live up to that label.

Is there any literature that shows that the cost savings from packing more than one service per VM are significant enough to outweigh the cost of the messes created by middling developers who have trouble reasoning about the functioning of single process application at non-Google companies create when they think they should do what Google does?


99.9 percent uptime for pods? That's 10 minutes downtime per week. We use simple VMs (on Google Cloud) at work and deploy services to them using Nix, and they have much less downtime than this. Does Kubernetes make it so difficult to do better than 99.9?


It's for the same people that make web apps that only works on 95% of browsers. They just don't care.


95% of browsers?! If the k8s was set up by MS advocates or some other consulting company, the web app works _only_ on Edge.


Well sure there is always room for improvement. I was already a 20 year Linux veteran when I started getting into k8s.

Imho it was very simple to understand all the components. But I can't deny that it does require a solid understanding of sysadmin concepts and containers to get ahead.

I was quite proud to have grasped kubernetes within a few months, without any formal training. No finished education in my country and not from an English speaking country. So I often struggle with technical docs that are using too much academic english.

Still today when new issues arise I can relatively quickly understand why they're happening.

In some cases though you must read the docs carefully.


Relevant question https://news.ycombinator.com/item?id=25944804

Im still wondered how people could give a green light to something that makes their life _harder_ not _better_ just because “it’s Google”.

My guess is — you can make more money on something as overcomplicated and non-developer friendly as k8s.

Imagine how many companies would never even exist if k8s was like a bit advanced version of swarm.

How many talented developers would be free to make something valuable for the world.

Instead of just solving stupid problems created by another developers with questionable design choices.


As Docker provided a simplified model with sensible defaults and a decent toolset for packaging and virtualizing single-node app environments into containers; so people went out and looked for a solution to virtualize multi-app workloads.

Honestly, I'm not sure Kubernetes is it. K8s seems like a leaky implementation detail when what people want is just to virtualize a workload (with similar simplicity to Docker) and have it work wherever the standard 'workload virtualiser' runs.


I thought I read somewhere that the creator of WebSphere expressed similar opinions about his creation, and others have even called it mentally abusive.

Perhaps the enterprise class stuff is always just for large division of labor efforts. Like the "Liberty" stuff from WebSphere perhaps this really just saying if smaller groups and less people want to mess with it, it should have some new thought around more consolidated abstractions.


K8s is amazing piece of tech, but still there is a big cost and complexity inherent in adopting & managing it.

I enjoy using it and playing with it, but so many use cases can be addressed with something simpler - either just Docker / Swarm / AWS ECS etc. alternatives or just going for VMs with well defined CI/CD processes that let you tear down the infrastructure and set it up again easily.

What very interests me are the concepts K8s build on that are not usually recognized - to me K8s seems a lot like a JVM, just that it operates on infrastructural (and not runtime) level.

I enjoy experimenting with these concepts when applied back in the runtime world - it is for instance interesting to run 100s of servers with JVM and let them load/execute new dependencies and code at runtime (JVM is very well suited for that).

This is area that is not yet explored and would probably deserve more attention as it allows for distributed rapid computing that is infrastructure/platform independent (the downside is that it requires (just) JVM and the isolation is not perfect).


Once I finally forced myself to learn K8s, I became depressed. I didn't think people could design something so poorly and force the whole world to adopt it. But, I guess that's another lesson: stupidity can easily hide inside complexity.

Now, can you get along wit K8s? Of course you can! Human beings have been adapting to harsh environments for hundreds of thousands of years. We can figure out some shitty software. So that means that probably, K8s will not actually improve, because people will probably adapt to it rather than change it.

So I think it's time to create a new distributed computing platform. Something with a simpler design, that has the necessary functionality built in rather than bolted on. Something that is very powerful but also terse. And of course, something that only gets complicated when you need to do something complicated, and allows you stay simple as long as possible. The ability to scale the complexity, essentially.


So based on the article, Autopilot is designed for GKE??? (https://cloud.google.com/blog/products/containers-kubernetes...)

I'm learning Kubernetes and is deploying my own test cluster on my ARM-based board at this very moment, and I already spend 3 days on K3s and have to give up due to a problem (https://github.com/k3s-io/k3s/issues/2509#issuecomment-78657...). I must say, this is way way harder than Docker Swarm.

Just want to know, will Google actually make Kubernetes a bit more friendlier for general non-GKE users?


Disclosure: I am a Googler on GKE

Google actually works a huge amount with the community to simplify Kubernetes across a number of SIGs. Its always a trade of increased flexibility and options as people use it for more workloads, versus simplicity.

Autopilot is just for GKE. You can use GKE on other clouds and also onprem (bare metal or VMs with Anthos)


Actually, deploy Kubernetes is not that hard after I learned the few concepts.

However, I found myself spend too much time on finding out what settings should be put into which files. I really hope there is a tool that can help users to generate and configure these settings.

I think a good example of this is `npm config edit`. When invoked, it'll open the correct config file, and list all available settings with their default values. User can then enable those settings by uncomment them as needed.

Maybe add similar functionalities into those CLI tools (kubeadm, kubectl etc) could greatly improve user-friendliness?


Fundementally, the problems that lead to all these solutions is that there isn't any way to abstract (at a low level) the differences between horizontal and vertical scaling of computing power. You will always have to care about the difference between scaling a node and scaling nodes when developing any sort of web based software stack. It doesn't matter if you're running VMs or containers, or are on the cloud or serverless. The fact that you can't abstract away the differences of horizontal vs vertical scaling will always dictate how you design the architecture of your service, all the way down to the coding patterns you'll follow. Thats really the problem. K8 and containers are solving a symptom of this problem, and not the problem itself.


Any best guesses if and when we'll reach the "Kubernetes: The Good Parts" stage? A condensed subset of the currently available features of k8s which would prevent most of the foot guns and make k8s feasible for <20 developer projects.


Slightly off topic: I use Google Cloud’s built in auto-scaling for instance groups, and I’m very happy with it. It offers a reasonable GUI, and a way for me to scale VM instances up and down based on e.g. CPU usage. It seems to do 80% of what I want at 5% the complexity of Kubernetes.

What I miss the most is having my infrastructure defined as code, instead of via the GUI. But given that I have only four services (out of which two use preemptible VMs and only one needs to scale) it’s not really a problem — it wouldn’t take me many minutes to replicate this setup at another cloud provider.


You should try out Terraform—combined with managed instance groups it basically makes GCP into one big Kubernetes instance without having to manage it yourself.


Looks like a bunch of people on this thread missed the point of containers being more lightweight than VMs, more compute efficient than VMs and physical machines.

Dedicated servers are heavily under utilized and over provisioned because once you allot a server to a team they don't want to give it back.

VM solved this problem and changed how servers were provisioned. Docker and K8s are the next progression of this. People who compare K8s to the 'next js framework' have to do some serious context alignment...


The more I learn about Kubernetes, the more I'm happy with our decision to run ECS instead. It seems to give me 80% of the benefit for 20% of the effort.


I really went deep into k8s. But as an engineer building services and features. I mostly dont care about the lower level platform. I just want to deploy code and services. The whole sysops / devops part should "just work".

I really like "Google Cloud Run" in this sense. Just deploy your Dockerfile.. And you're done.

A single yaml file. Limited configuration. Scaling just works. No cluster management.

I just want to build a great product.


As someone who has never worked with containers, I find this rather amusing: Kubernetes is so complex that Google needs to roll out an "autopilot" feature, yet "it has won in the critically important container orchestration space". Makes you wonder what the alternatives must have looked like?!


Could the downvoter please explain?


https://github.com/kubernetes-sigs/image-builder

Is a tool that will allow the configuration of base images. It is actually a thin layer wrapper around packer and ansible


In other news, water is wet.

Of course it's complex, it's managing a complex and very deep problem space through a mostly consistent generalized interface. Whether it is better than the alternatives depends on how deep your problems in that space go.


We switched our developer environment to k8s instead of docker-compose and our lives improved dramatically. Software is complex these days and having nice orchestration is awesome both in development and in production.


Isn't the kubernetes meant for making container management easy :P


"Auto industry introduces automatic transmission, admits manual transmissions were too complex." - probably actual Register headline from 1939.


Removing ssh is stupid

They should remove the need for CONFIGURING ssh.

Now they have removed the entire control plane of the container. Hiw should a developer debug something then?


Hey, I work on this project -- thanks for your feedback.

You can still shell into your own containers to debug workloads, we removed SSH access to the nodes in order to provide management of the nodes (we can't take on the management if people can SSH in and make unsupported changes).


Is there an authoritative book on kubernetes? Sort of the analogue to "The C++ Programming Language" for C++?


If I were a devops engineer, I'd always push for a migration to Kubernetes. That's job security.


the pricing calculator does not work for autopilot which is really really sad. it always shows: Estimated Component Cost: per 1 month when configuring a pod with zero/1gb epheremal storage and 0.25 cpu and 500mib memory for 24/7.


Hi merb, it's William from Google.

If I enter the following, it seems to work for me: Replicas: 1, CPU: 0.25, Memory: 500MiB, Ephemeral Storage: 1 GiB.

The result is $9.92 per month for the us-central-1 location.


More accurate headline: The Register publishes ad for Google Cloud services


The world needed more expensive server!


Think of all that is involved in running a single-machine application. You need to load the code, including shared libraries, into memory, resolve symbols to memory addresses, figure out where to allocate memory from, whether static, stack, heap, and when it's heap, when to swap pages to disk. To move data to and from disk, you need to know the sector and offset, the start and stop points for other files. When loading from and storing to memory, you need to know what memory blocks belong to what processes. The reason this doesn't seem complex to application developers is because the compiler and the kernel do this for you. You just give a file name and a variable name and those are automagically resolved to memory addresses and disk sectors and address spaces are kept separate without you needing to worry about it.

The issue here is we don't have a kernel and compiler for distributed applications. So instead we have no choice but to expose that complexity to developers. They need to specify the IP address and port of a remote service to invoke. We've solved the IP problem in part with DNS, but now what happens when you want to migrate or load balance? Kubernetes solves this with service discovery, so you just name a service and the container orchestration engine worries about resolving that to a pod at runtime. But you still need to specify a port. We haven't yet figured out how to automate allocating ports like we managed to automate register allocation on a CPU, especially when application code itself might need to know them. We still require you to know how much storage you need and specify that in the definition of a persistent volume. Ideally, it would be as easy as it is to ask for an array of integers in a programming language. The compiler will figure out how much storage an application requires and give you that much from some pre-allocated pool you as a developer don't have to worry about.

But again, there is no such pre-allocated storage pool. There is no such compiler. There is no POSIX filesystem or memory standard for multi-machine networked systems. There's a fragmented system of vendor-locked services providing storage servers, database servers, cache servers, http servers, message queues, some more open than others. Kubernetes is an attempt to provide abstractions that make it possible to define an entire network of such individual servers declaratively and it's a noble effort. But it's complex because the underlying problem space is complex. Distributed computing is at the point in its evolution right now that single-machine computing was in around 1950 or so, when you needed to tell the program exactly where in memory to find and store a variable, exactly where on disk to fetch a block of bytes. Will it ever get to where we are now with device drivers, compilers, and kernel allocators and schedulers doing all the heavy lifting for you? This is what a Kubernetes engine is trying to be, but it's early in the game. Very early. I don't see the point in writing an article implicitly shaming the developers for trying to provide higher level abstractions that make the simple cases easier, any more than criticizing a kernel developer in 1960 for inventing virtual memory as if they're admitting that symbol to address resolution in a multi-processing system is too complex. Of course it's too complex! And we're trying to make it less complex.


What would be a recommended solution for running a customer-specific deployments (up to ~10) plus a few shared services using Containers?

We are currently on AWS ECS which works very well for us but need to move to a GDPR/Privacy Shield compliant environment hosted in the EU. I was to look into Managed K8s but am turned off by this thread :-)


Nah. It is not yet complex, k8s 2.0 should introduce blockchain - kubecoin! /s


And scam retail. I worship Scott Locklin! Savior of Truth!


But Isn't kubernetes meant for managing containers easily.


so what do we use instead?


Hashicorp Nomad. Works on linux, windows, bsd's. Can work with docker, qemu, podman, jails workloads. Scales nicely, easy to troubleshoot, upgrades are trivial.


'build.js', 'deploy.js'. Makefiles if you want to scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: