Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: If Kubernetes is the solution, why are there so many DevOps jobs?
437 points by picozeta on June 1, 2022 | hide | past | favorite | 416 comments
Arguable the goals of DevOps align partly with the goals of system administrators in former days: Provide reliable compute infrastructure for

  1) internal users: mainly developers by providing CI/CD
  2) external users: end users
Nowadays we call people that do 1) DevOps and people that do 2) SREs (so one could argue that the role of sys admins just got more specialized).

The platform of choice is mostly Kubernetes these days, which promises among other things stuff like

  - load balancing
  - self-healing
  - rollbacks/rollouts
  - config management
Before the cloud days, this stuff has been implemented using a conglomerate of different software and shell scripts, issued at dedicated "pet" servers.

In particular, a main critic is "state" and the possibility to change that state by e.g. messing with config files via SSH, which makes running and maintaining these servers more error-prone.

However, my main question is:

"If this old way of doing things is so error-prone, and it's easier to use declarative solutions like Kubernetes, why does the solution seem to need sooo much work that the role of DevOps seems to dominate IT related job boards? Shouldn't Kubernetes reduce the workload and need less men power?"

Don't get me wrong, the old way does indeed look messy, I am just wondering why there is a need for so much dev ops nowadays ...

Thanks for your answers.




  >   1) internal users: mainly developers by providing CI/CD
  >   2) external users: end users
  >
  > Nowadays we call people that do 1) DevOps and people that do
  > 2) SREs (so one could argue that the role of sys admins just
  > got more specialized).
Both are called sysadmins.

SRE is a specialized software engineering role -- you'd hire SREs if you wanted to create something like Kubernetes in-house, or do extensive customization of an existing solution. If you hire an SRE to do sysadmin work, they'll be bored and you'll be drastically overpaying.

DevOps is the idea that there shouldn't be separate "dev" and "ops" organizations, but instead that operational load of running in-house software should be borne primarily by the developers of that software. DevOps can be considered in the same category as Scrum or Agile, a way of organizing the distribution and prioritization of tasks between members of an engineering org.

---

With this in mind, the question could be reframed as: if projects such as Kubernetes are changing the nature of sysadmin work, why has that caused more sysadmin jobs to exist?

I think a general answer is that it's reduced the cost associated with running distributed software, so there are more niches where hiring someone to babysit a few hundred VMs is profitable compared to a team of mainframe operators.


> so there are more niches where hiring someone to babysit a few hundred VMs is profitable

This makes a lot of sense. The same thing happened in the past with new technology, such as the electronic spreadsheet:

"since 1980, right around the time the electronic spreadsheet came out, 400,000 bookkeeping and accounting clerk jobs have gone away. But 600,000 accounting jobs have been added."

Episode 606: Spreadsheets!, May 17, 2017, Planet Money


In 1980 - there were 90M employees in the US. Now there's 151M.

Given that the US has transitioned out of manufacturing and into businesses services - I don't think much of this is explained by technology creating new jobs.

I think it's just explained by the workforce growing - and the shift in the US's role in the global economy.


workforce grows as more is goods and services are produced.

So the real question is - whether the ratio of different roles changed vs total population.


Basically, "why technological innovation creates and transforms jobs, instead of causing unemployment".


...except that 400k in 1980 translates to 608k in 2017, if you factor in overall labor force size. That means even though there were 600k jobs "created", it's still a net loss.

[1] https://fred.stlouisfed.org/series/CLF16OV


I think the point of the comment in discussion is:

"Though the spreadsheet was supposed to make clerks obsolete, it in fact just upgraded their job requirements to allow the operation of spreadsheets"

And probably their salaries too.

This is more in line with the discussion related to OP, as well, as devops employees exist, even though apparently it is automated.


This is a weak argument, since it ignores the productivity gains in all the industries that necessitated more accountants. Accountants don't only do each other's taxes, so more productivity in accountants is an indicator for a more productive economy


ATMs caused a decrease in teller transactions, but led to an explosion of bank branches.


SREs don't normally write Kubernetes alternatives. They are the people who operate/write automation that interacts with/advise teams how to run their software on/ Kubernetes to solve business problems like ensuring availability.


> but instead that operational load of running in-house software should be borne primarily by the developers of that software

Go back and read a few DevOps books and blogs by the founders of it. We will always need separate disciplines for dev and ops, just like we need mechanical engineers and mechanics/racecar drivers. But we need them to work together and communicate better to solve problems better and not throw dead cats over walls.

You can of course give devs more powerful tools, more access and agency to enable them to develop the software better. Sentry.io is a great example of what is needed; makes everyone's life easier, devs can diagnose issues and fix bugs quickly without anyone in their way. That doesn't require operations work because it's just simplifying and speeding up the triage and debug and fix and test phases. It's the fundamental point of DevOps.


My second job in large-scale software was at Google, which used the "DevOps model" since before DevOps was named. I have no need to read a blog on it.

You want the person who designs the car to know what a car is, and to be able to diagnose basic issues like "the fuel gauge says 'empty' and engine won't start". And there's no analogy to an Indy car driver in software, every distributed system is self-driving.

The most popular alternative to "DevOps" is a team of developers who do not run the software, and may not even have the skills or capabilities needed to boot up the server process. They do their development in an IDE, run unit tests to verify functionality, and do not have permission to log in to the production environment.

Meanwhile the "ops" side consists of people who may know basic shell scripting, or some Python if they're a go-getter, but are unable to read stack traces or diagnose performance excursions.


throwaway787544 deleted their reply to this post. My response was as follows:

---

  > So you're familiar with Six Sigma then? Value stream mapping? TPS?
  > W.E. Deming? Martin Fowler? There's more to DevOps than deployments
  > and CI/CD.
None of those have any relationship to DevOps.

  > I haven't worked at Google, but I expect somebody gave you some tools
  > and some access to cloud infra and said "good luck".
I was on Borg-SRE. My job was to build the tools and maintain the cloud infra.

  > If you're lucky there was a small army of ops people behind the
  > scenes keeping things running for you
The point of having an SRE team is to avoid having a "small army" of ops people. Manual operation of compute infrastructure is uneconomical at scale.

  > if you weren't lucky you were expected to know how to correctly build
  > and run complex systems at scale by yourself
Google does expect its developers to understand basic principles of distributed systems, yes.

  > > every distributed system is self-driving
  >
  > I didn't know Google was in the bridge selling business.
What I mean is that when your service runs 24/7 and downtime is reported in the New York Times, there is no room for manual action during normal operation. The system must have the capacity to maintain itself and self-recover.


Responding to the inlined kubectl command mentioned above:

As an outsider, that command looks really easy to mess up.

  * shell interactions with quotes
  * double quotes
  * interpolating into image namespace with no autocomplete
  * easy to forget an argument
  * do you get autocomplete against the deployment name?
Comparison: C# declaration of a complex type - it's less complex than the `kubectl` command above, but IDEs offer way more support to get it right.

  * var x = new List<Dictionary<CustomType,int?>>()
This will light up warnings if you get anything wrong;

You get:

   * go to definition of `CustomType`
   * autocomplete on classnames
   * highlighting on type mismatches
   * warnings about unused vars.
   * if you initialize the var, the IDE will try hard to not let you do it wrong
So structurally,

1) in the code realm, for doing minor work we have really strong guarantees,

2) in deployment realm, even though it's likely going to hit customers harder, the guarantees are weaker.

I think this is behind the feeling that the k8s system is not ready yet.


Normally you're never going to run a kubectl command in prod that changes anything except for kubectl apply -f. The yaml file may or may not be script or template generated but it's likely. In a lot of shops they're just going to write to a wrapper around the kubernetes API in whatever language the shop uses whether it be python, go or whatever. And there are plenty of linters for the yaml and ways to test without impacting prod.


> So you're familiar with Six Sigma then? Value stream mapping? TPS? > W.E. Deming? Martin Fowler? There's more to DevOps than deployments > and CI/CD.

>> None of those have any relationship to DevOps.

Deming. I think you mean maybe that Deming has less of a relationship to SRE (Google style). The DevOps Cafe guy wears a Deming t-shirt and talks about Deming all the time. Maybe he has denounced Deming for some reason?

Value stream mapping. This has been brought up in at least one talk at DevOps Days.

Six Sigma. There's some obvious overlaps if you look it up.


Thank you. I didn't want to get into a whole "thing" because the parent just doesn't know what they're talking about and I can't teach them everything in a comment thread, but suffice to say that Gene Kim and Jez Humble would agree with me, and that anyone who wants to know more can look up books and blogs by them.


> My second job in large-scale software was at Google, which used the "DevOps model" since before DevOps was named. I have no need to read a blog on it.

Cool, you probably know my old colleague Hugo, he was one of the first SRE's at Google in a team of like 20 or so. (I wasn't there, could be mistaken).

Anyway, what "Production" was, is different than what devops is. DevOps is different things to different people. In the beginning it was "What happens when systems administrators do Agile".

(Genuinely), Confusion came about because of the "100 deploys a day" talk and the conference being called "dev ops days" (to include developers).

The two things got conflated and now everyone thinks devops are.. well, build engineers or sysadmins or developers who learned a bit of bash and terraform.


> build engineers or sysadmins or developers who learned a bit of bash and terraform.

From what I have seen, it's also getting familiar with the inner workings and process of the company, as well as current and past runtime tendencies for performance and uptime.


> it's also getting familiar with the inner workings and process of the company, as well as current and past runtime tendencies for performance and uptime.

That sounds like sysadmins.


let's say there's 5+ leading definitions of DevOps. I'll add this which I think is quite unpopular but true:

DevOps is how Engineering respects the work of sysadmins.


> And there's no analogy to an Indy car driver in software, every distributed system is self-driving.

You typically build systems to sell a product. The driver is the customer that uses the app. You build the app to be as simple as possible to effectively operate while providing the best possible outcome for that customer. The driver doesn't care or need to know how the engine or the rest of the car works, just that the engine and car works well.


Did Google really use the “DevOps model”? Observing from the outside it looked like SRE was their alternative


The major point is that "mechanical sympathy" for how to operate software should be considered early in the development cycle. Fulfilling the business goals in operation should be a major design consideration that might warrant trade offs in other areas. Traditionally, this has been seen as more of an afterthought and DevOps solution to the problem is that involving the people who design and maintain software in its operation would automatically create incentives to make operational improvements because the people making decisions are the people feeling the consequences of those decisions.


Generalization:

- each role in a company tries to optimize/nudge whole organization toward this role's convenience.

- specialization improves local optimum (advances certain role) at the cost of global optimum (everybody has to dance around new roles processes)

- joining seceral roles into one, creates the oposite result, optimum is searched at more global level (not necessarily found)

- Separation of responsibilities (aka creation of new role) can generate a fractal (e.g. tester of left winglets blue stipe's thickness meter)

- complete joining off roles will create homogenous chaos after n employees (everybody should do everything)

prediction: we will see constant experimentation, first roles will be split, then some will get joined. then split again. then joined again. (people cant search for local optimum and global optimum at the same time)


I agree in general. Extreme example I've witnessed ~8 years ago: dev team is building linux server. Dev team is not allowed to touch production. Server is deployed via scp by ops team. Ops team is on the other side of the planet and generally pretty unknowledgeable. Dev team is deploying by creating a ticket and literally telling ops team what to type in terminal. Ops team fails at copy/pasting instructions. Deployments fail, dev team has to fix things by telling ops team what to type over the phone.

Lesson learned: kubernetes isn't that bad


If you look at construction, they have a system that's very similar to what DevOps espouses. Lots of trades and specializations, because building to code is complicated and important, and having the experience of doing one job well is important to be able to work around problems on the fly. But - this I think is the critical part - they must know about the other trades that are affected by their work, so as not to impede others' work or the project by accident. It's as much about compassion for the guy coming after you as it is reducing cost and speeding up construction. All of that is the aim of DevOps.


This is a very insightful comment.


> DevOps is the idea that there shouldn't be separate "dev" and "ops" organizations,

I agree with this definition of DevOps. However the vast, vast, vaaaast majority of real life uses of the term "DevOps" I've seen are just rebranded sysadmins. Sometimes it at least implies a more engineering approach to their coding. But in these institutions the Devs and Ops are very much separate groups of people, unfortunately.


Agreed. The DevOps that I'm familiar with is two job descriptions in one job, one headcount, one pay.


I agree. I think the grandparent is maybe describing an ideal state, but in practice most "DevOps" and "SRE" positions I see go by on job boards are for either sysadmins who can do some automation and infrastructure as code, or possibly devs who know how cloud infrastructure and networking work, depending on your perspective.

If you're lucky the hiring manager might have at least read the Phoenix Project or the Google SRE books.


essentially Jevons paradox



Kubernetes is a Google scale solution. Lots of teams said “hey if Google does it then it must be good!”…but forgot that they didn’t have the scale. It caught on so much that for whatever reason it’s now the horrendous default. I’ve worked on at least 3 consulting projects that incorporated K8s and it slowed everything down and took way too much time, and we got nothing in return - because those projects only needed several instances, and not dozens or hundreds.

If you need less than 8 instances to do host your product, run far away anytime anyone mentions k8s


Exactly.

I am consulting with a startup right now that chose to go everything docker/k8s. The CTO is half-shocked/half-depressed by the complexity of our architecture meetings, although he used to be a banking software architect in his previous assignments. Every question I ask ends up in a long 15 minutes monologue by the guy who architected all of it, even the most simple questions. They are soon launching a mobile app (only a mobile app and its corresponding API, not even a website) and they already have more than 60 containers running and talking to each other across three k8s clusters and half of them interact directly with third-parties outside.

Even as I am being paid by the hour, I really feel sad for both the CTO and the developers attending the meeting.

k8s is definitely not for everyone. Google has thousands hardware systems running the same hypervisor, same OS, same container engine and highly specialized stacks of micro-services that need to run by the thousands. And even, I am not sure that k8s would satisfy Google's actual needs tbh.

Ironically, there are some companies that highly benefit from this and they are not necessarily "large" companies. In my case, k8s and devops in general made my life infinitely easier for on-site trainings: those who come with a poorly configured or decade-old laptop can actually enjoy the labs at the same pace than every other attendee.


60 containers sounds like an architecture problem, not a Kubernetes problem. Kubernetes does not stop you from running 1 container in 1 pod receiving ingress and talking to a database.


Presuming too (it's hard to tell) that they mean 60 different types of containers. One of my clusters currently has ~311 containers, but that's mostly due to replication.

If I count actually different containers (like, unique PodSpecs, or so), that count drops to ≈30. Even that is "high", and from an architectural standpoint, it isn't really a number I'd use. E.g., we have a simple daemon, but it also has a cronjob associated with it. So it has "2" PodSpecs by that count. But architecturally I'd call it a single thing. How it implements itself, that's up to it.

A lot of our "unique PodSpec" count, too, comes from utility type things that do one thing, and do it well. Logging (which comes from our vendor) is 3 PodSpecs. Metrics is another 3. We have a network latency measurement (literally ping shoved into a container…): PodSpec. A thing that checks certs to ensure they don't expire: PodSpec. HTTP proxy (for SSRF avoidance): PodSpec. A tool that rotates out nodes so that their OSes can be patched: PodSpec. Let's Encrypt automation (a third party too): 3 PodSpecs … but hey, it does its job, and it's a third party tool, so what do I care, so long as it works the API between me and it suffices (and honestly, its logs are pretty good. When it has had problems, I've usually been able to discern why from the logs). DB backup. But most of these don't really add much conceptual overhead; any one is maybe tied (conceptually) to our applications, but not really to all the other utilities. (E.g., there isn't really coupling between, say, the cert renewer and the logging tooling.) A confused/new dev might need to have it explained to them what any of those given tools do, ofc., but … many of them you can just Google.

… in previous jobs where we didn't use Kubernetes, we mostly just ignored a lot of the tasks that these tools handle. E.g., reboot a VM for patches? It was a custom procedure, depending on VM, and what is running on that VM. You needed to understand that, determine what the implications were … etc. And the end result was that reboots just didn't happen. K8s abstracts that (in the form of PDBs, and readiness checks) and can thus automate it. (And ensure that new loads don't need TLC that … an app dev realistically isn't going to be given the time to give.)

If we needed a common thing on every node? That would be rough. We did finally get to having a common base VM image, but even them, all of the per-app VM images would need to be rebased on the newer one, and then all rolled out, and who/how would one even track that? And … in practice, it didn't happen.


Thank you for your comment. It's a mix of both what you described: different types of containers + what I have flagged as "utility containers" (stuff they just installed because it serves a purpose very well).

The problem I see with this approach is that it has become very difficult to evaluate system-wide topics such as accessibility (or security, or performance) as we constantly deal with a very diverse technological stack and increasingly complex attack surface.

In my opinion, this makes finding competent people who can actually evaluate or assess work almost impossible, unless you hire a Lemming who will run some third-party scanner he found on GitHub: if the scanner doesn't say something is awful or critical, then almost everyone at the table is instantly convinced the system is perfectly robust.

I try to warn my clients by asking them if they think that a judge will be satisfied if they answer "we ran the scanner the other day and the scanner said it was all good" after a customer sues them for failing to comply with a disabilities act.


Yep, that's microservices gone mad


It's the gateway drug


this is not the fault of kubernetes but of microservice architecture gone horribly wrong


Good god, there's a reason people worship monoliths. And here I think my company's app is over-engineered for using Sidekiq and Lambda for similar workloads.


> Kubernetes is a Google scale solution

The problem is that its _not_ a google scale solution. Its something that _looks_ like a google scale solution, but is like a movie set compared to the real thing.

for example: https://kubernetes.io/docs/setup/best-practices/cluster-larg...

no more than 5k nodes.

Its extra ordinarily chatty at that scale, which means it;ll cost on inter-vpc traffic. I also strongly suspect that the whole thing is fragile at that size.

Having run a 36k node cluster in 2014, I know that K8s is just not designed for high scale high turnover vaguely complicated job graphs.

I get the allure, but in practice K8s is designed for a specific usecase, and most people don't have that usecase.

for most people you will want either ECS(its good enough, so long as you work around its fucking stupid service scheme) or something similar.


Right; and I don't feel that's a knock against K8s; its just the trade-offs it decided to make. A true Google-scale solution would be far worse to use, you can be sure of that.

K8s is a "middle 80%" scale solution. Its not made to run Google (though Google uses it a ton internally). Its also not made for your average four person startup (though, if you've got that experience internally, its not a bad choice; its not Heroku, but its better than a lot of deployment options out there).

All I'd say is: I've worked in a "scale-up" B2B org under $1M in ARR. We were pretty monolithic; just a backend NodeJS app, and a frontend SSR React app, basic. Five engineers by the time I left. We used K8s (EKS+Fargate). Maybe 50 pods total, across two environments. It was fantastic. We never had to say No to any weird customer, product, or engineering decision which would be difficult in either more managed, or more legacy, systems. Customer wants a custom domain and'll pay $50k for it? Like five lines of YAML and update Route 53, done. Datadog sidecar container so we can ingest some APM traces? Ten lines of YAML copy-pasted from their docs, done. Update the cluster? Click one button in the AWS UI. Every developer wants their own staging environment? Ok, bit more work, but: create some namespaces, retool the CI a bit, we can deploy separate databases in there as well its only staging, actually pretty straightforward.

Half the stuff we did with k8s would have taken three times as long with more native AWS solutions, and some of it probably would have been impossible on something like Heroku. K8s strikes a balance. Its not the simplest thing in the world. I wouldn't grab it on day 1 of a startup's engineering journey. But I wouldn't knock a startup who does.


I'm using k8s even for my personal 2 node cluster. It's just so convenient to be able to use all the automation tools.

I can leave the cluster alone for weeks, and it'll take care of itself, my CI will build new docker containers, tooling will start rolling them out across the cluster, if deployments fail they get rolled back, and I get an email, etc.

At some point I was hands-off with the cluster for 6 months and everything kept itself up to date and running just fine.


Google's Borg clusters don't have wildly larger sizes than that despite having more years of development and a lot of motivation at that scale. They instead have a lot of clusters and transparent inter-cluster network topologies (i.e. you can choose clusters with very high bandwidths to each other).

The fundamental design just isn't infinitely scalable, and at a certain point, you might rather have some bulkheads/autonomy or regional diversity.


Curious how you ran 36k node cluster. Did it involve elastic scaling, etc? What are the alternatives for k8s?


Home grown: https://www.semanticscholar.org/paper/Robust-large-scale-ren...

off the shelf: https://hradec.com/ebooks/CGI/RMS_1.0/rfm/User_Interface/Alf...

although that was with something like 6-10k nodes because there was an upper limit to how many dispatches alfred could do because it was single threaded, from the early 90s and not really designed to scale that high

https://renderman.pixar.com/tractor is probably what they use now, or https://www.opencue.io/

but any grid engine style dispatcher/manager will do what you want. It'll give you the primitives to manage wildly larger scale than k8s.

These clusters were on real steel, as elastic clusters were horrendously expensive, and the storage was/is nowhere near fast enough.

Nowadays, I'd use AWS batch, or at a push airflow.


Yeah, I don't know if it's because ECS was my first container orchestration experience but every time I look at teams trying to do k8s on AWS I think how much easier ECS would be.


The complexity difference between bog-standard ECS+Fargate and EKS+Fargate deployments rounds down pretty small. Biggest I've seen: ALB integration, IAM integration, and maybe certificate management. Most of that stuff is out-of-box on ECS, but on EKS you need some extra containers or configuration to watch the K8s API and provision stuff for you (if you want to use it; you can also just go pure-k8s) (edit: just to be clear, they provide all this for you; its not out-of-box, but its easy-to-add-box e.g. [1]).

An argument could be made for something like CodeDeploy being better integrated on ECS, but that's more of a "k8s doesn't need CodeDeploy but ECS might" kind of thing. And even then, I wouldn't touch it.

An argument could also be made that upgrading ECS clusters is a bit easier, as the cluster itself, uh, doesn't have a "version". But on Fargate, its pretty painless on EKS, and Fargate ECS tasks do have a "platform version" that generally doesn't have to be worried about (version: LATEST), but is nonzero nonetheless.

Which is really to say that both ECS and EKS puke complexity, because its AWS, but the volume is pretty similar.

[1] https://docs.aws.amazon.com/eks/latest/userguide/aws-load-ba...


Agreed. I had the displeasure of doing CodeDeploy to ECS Fargate a few weeks ago for a side project and IMO it was overly complex.


Yup. You can get pretty dang far with just "the task definition is in cloudformation, so update the image in cloudformation and submit the template". For a bit more complexity, add ALB routing weights in, and the case for CodeDeploy is kind of weak. My statement that ECS needs it was probably overzealous; for most situations it doesn't, and if there's some crazy functionality it does I'm not aware of, you'd probably need something else in k8s as well (e.g. istio).


We’ve moved a small-scale business to Kubernetes and it made our lives much easier.

Anywhere I’ve worked business always prioritizes high availability and close to zero downtime. No one sees a random delivered feature. But if a node fails at night - everybody knows it. Clients first of all.

We’ve achieved it all almost out of the box with EKS. Setup with Fargate nodes was literally a one-liner of eksctl.

Multiple environments are separated with namespaces. Leader elections between replicas are also easy. Lens is a very simple to use k8s IDE.

If you know what you’re doing with Kubernetes (don’t use EC2 for nodes, they fail randomly), it’s a breeze.


We don't have an issue with that last point, lots of EC2 EKS nodes and they don't fail randomly. Were you using resources and limits correctly? EKS nodes can fall over randomly if you don't reserve resources on the nodes for system processes, and your workloads eat up all the resources. That's probably not well documented either.


EC2 instances are inherently unreliable and that's not a knock on them, that's exactly the contract that you get using them and you're supposed to plan your architecture around the fact that at any moment an EC2 instance could die. We lose about 2-3 EC2 nodes per day (not like our app stops, like Amazon's own instance health goes red) and we couldn't care less.


What percentage of EC2 nodes is that?


Empirically around 0.1%


Setting limits is important, but it always has been. Kubernetes nodes typically don't have a swap so without setting container limits, some critical process can OOM. With swap enabled, memory grows > pathological swapping ensures => caches get dropped making disk performance suck, and all the while your system is shuffling pages between memory and disk. So of course load hits 50+ and the machine turns into a 'black hole'. I've even seen a single VM do that, and cause so much disk IO that it took out the whole hypervisor (which had a single RAID volume)


Kubernetes can't (currently) scale to Google sizes. It's designed for small- or medium-sized businesses, which might have 50,000 VMs or fewer.

There are entire SaaS industries that could fit into a single Google/Facebook/Amazon datacenter.


> small- or medium-sized businesses, which might have 50,000 VMs or fewer

Holy shit, is this considered small to medium enterprise now?


a single cluster supports approx 5,000 nodes, 110 pods per node~

The estimated maximum single cluster is 300,000 containers.

That's pretty medium, I've ran more than a million processes before, and nomad has 1million containers as its challenge https://www.hashicorp.com/c1m

borg can handle this easily.


I don't know anyone running 5k node K8S clusters. That said, borg (as an end user) appears to just keep scaling, but it uses a different model and makes different assumptions than K8S.


Sure. Most people have more clusters before they hit 5k nodes on a single cluster.

But I’ve been in situations where it would have been worthwhile. I’ve been in situations with 30,000 machines that needed to be controlled. Splitting them out into very many clusters would be a lot of wasted overhead in configuration, administration and because you lose nodes to masters.


I'm afraid this is why people pick Kubernetes. They believe small business needs those tens of thousands of VMs distributed across thousand of nodes and so on.

With some exceptions, I believe that's a few orders of magnitude above what a small business can run on. Nowadays people just start their day by drinking some K2l-aid and spinning up a "basic" 6-node cluster for a development prototype.

Maybe I'm wrong, of course.


> There are entire SaaS industries that could fit into a single Google/Facebook/Amazon datacenter.

Forget a whole datacenter, even just one rack is an unimaginable amount of computing power, these days!


Fact: An well-equipped Raspberry Pi 4 has more memory, more compute power, more storage and vastly faster networking than the Cray supercomputer I worked with at a major oil company in the 1990's! With the exception of the craziness around click-tracking (web ads and marketing have warped compute use even more than crypto), the data required to run even enterprise-scale businesses today is not really all that large.

For almost all purposes, we don't really need thousands of containers running on unimaginably fast computers, coordinated by AI-driven automation systems. What we need is software that is not morbidly obese.


I mean a single V100 GPU (~= 100 TFLOPS) has FLOPS throughputs similar to top end mid-00's supercomputers like Blue Gene, at least superficially. And you can squeeze like 4+ in a 1U if you have enough cooling and power.

https://en.wikipedia.org/wiki/History_of_supercomputing


Your scale (50,000 VMs) is way too high for small and medium sized business :-)

Is my observation correct that unicorns start to see that scale?


That’s a bit out of date. K8s can do 5,000 nodes and 300k VMs within its performance envelope: https://kubernetes.io/docs/setup/best-practices/cluster-larg...


A VM would be a node. A pod isn't a VM, it's a process tree.


Scaling is oversold and under criticized.

Folks, listen, if StackOverflow can run on this: https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

So can your doctor's appointment website, your little ML app or Notion clone.

"But...". No. You ain't gonna need it. Do some load testing, prove it to yourself. Now, multiply the load by 100x, reserve AWS resources and you're good to go.


We just moved our single instance web app with 50 users to K8. Not kidding. I totally bailed on that one. It started with moving it to the cloud (meh) and ended up with K8 in the cloud. A few of the guys wanted to pad the resume so we just turned our heads.


You don't even end up spending time on Kubernetes, because k8s is just part of the solution, a container scheduler. You have to bring logs, monitoring, a container registry ,as well as a CI system with custom jobs and do integration of everything.


This is true. But I had to bring all those things anyways, when I didn't run on k8s. I still needed some form of all of that. (Though "container registry" might be "package store", or something, depending on specifics of the implementation. Some form of artifact store.)

And with-k8s and without-k8s to me is pretty similar: we vendor or FOSS most of it. The major cloud vendors all have container registries (of … varying quality…); similarly, at a previous company we used S3+a small shim as a Python package store. (We later moved to a vendored solution.)

ELK for logs meant having a daemon set up per VM. Easier in k8s where I can push a DaemonSet to the entire cluster. With VMs … it's a per-app nightmare, really. Even then, that's really not perfect. In practice, in both situations, I feel like you end up having to integrate the apps with the metrics/logs providers. There's just not a common format. Sometimes, there are some libraries, e.g., there's some stuff for Prom's HTTP metrics APIs. Logs … eugh. Nothing amazing; getting structured logging requires per-app changes regardless of what you do. Sure, in either VM or k8s, you can just "suck up syslog/journald / docker logs", but what format are those in? They're not, is the answer, and I find most places do a "one text log per line" assumption (and then have stuff with multiline logs that just gets destroyed/corrupted/lost by the logging daemon) and it misses out on any sort of structured logs. jsonlines through those channels is a slight step up, but usually requires app changes.


For sure, I had all this stuff before k8s as well


> Kubernetes is a Google scale solution. Lots of teams said “hey if Google does it then it must be good!”…but forgot that they didn’t have the scale.

It's also a Google engineer caliber solution. Lots of teams said “hey if Google engineers do it then it must be good!”…but forgot that they didn’t have the same in-house talent as Google.


Asking as someone who has only tipped his toes into devops lately and is looking to learn K8, what is considered a reasonable "lightweight" alternative to Kubernetes these days?


You may think you want an alternative but you don't.

The API is the main drawcard of k8s in the first place, if you are off in ECS land all you are doing is wasting a bunch of time on a dead-end.

I would instead focus on getting to understand the basics of the API by using a hosted k8s service like GKE or EKS. Stick with some basic manifests, i.e deployments, services, ingress.

Once you have some stuff running you can start learning how it really works and goes together, i.e what are pods, why are pods immutable, why is a replicaset, how does a deployment orchestrate multiple replicasets, what are endpoints, what is the difference between pod readiness/liveliness.

Don't cheat yourself this early in the game, just learn things the right way from the start and save yourself a bunch of work.


I agree. I'd say the appeal of Kubernetes is the declarative configuration of your workloads. It's a language on its own that's incredibly versatile and exhaustive. This infrastructure abstraction layer is here to stay.


If you are on AWS you should check out ECS Fargate (serverless). It is really really good. Probably one of their more polished products.

If you want to stay on the Kubernetes route check out k3s. Super easy to setup and usable for small production workloads


As a security engineer, I always cringe when anything involving containers is referred to as "serverless".

I always thought that one of the advantages of going serverless was that you didn't have to worry about keeping the underlying operating system up-to-date. No needing to do a monthly "sudo apt update && apt upgrade" or whatever. But containers throw that all away when container images enter the world.

Instead of updating your operating system, you're updating your images...and it's basically the same thing.


Is anyone's goal of 'serverless' that they no longer have to deal with updating the OS?

Most would say even a server-ful system (k8s, or whatever) should be considered 'cattle not pets' with immutable nodes replaced as needed anyway. No update, just replace. Just like building a new image and having new pods (or serverless whatevers) pull it.


The cattle not pets abstraction always struck me as wildly bizarre. Whoever came up with that phrase, did they grow up on a farm?

I’ve never cordoned off an individual head of cattle and lobotomized it, which is kinda what we do when debugging issues. We take the pod out of rotation, flip a bunch of configs, then give it some traffic to see the new debugging statements.


From a purely security standpoint, "updating your OS" and "updating your image" are equivalent. What matters to the security people are that you're running the latest OpenSSL that isn't vulnerable to the newest branded vulnerability.

If you're truly "serverless" by my interpretation of it, then you wouldn't care. Your cloud provider will have updated their infrastructure, and that's all that matters.


Yeah I see what you're saying, that's a fair enough interpretation of it I just don't think it's the only one.

In fact almost nothing is serverless (well, the truth comes out! ;)) by that definition, since even Lambda has runtime versions to choose/upgrade, Managed-Acme has Acme versions, etc.

SES, SNS, SQS, etc. sure, but I suppose no compute, since you need libraries, and libraries have versions, and you can't have them (significantly/major versions) changing under your feet. (Or if they don't have versions they're of course destined to have known security holes.)

(Or it's not even about libraries if you want to say no you don't need libraries - it's just about having to interface with anything.)


AppEngine was the original serverless platform


I second this. There are a few limitations in Fargate that are annoying but overall it's solid and easy to use.


How does k3s compare with MicroK8s, for the purpose of this topic?


> what is considered a reasonable "lightweight" alternative to Kubernetes these days

Before I used Kubernetes for my side projects and only at work I always thought it's hard to operate and very tricky. If you start with an empty "default" cluster and then just add bits when you need them it's actually not that complicated and doesn't feel too heavyweight. I'd suggest to just play around with a simple example and then see how it goes.

There's things that are used in "production" clusters that you don't need at the beginning, like rbac rules, Prometheus annotations etc.


HashiCorp Nomad is a good alternative.


I've started running Nomad in my homelab, and it is a great piece of software. Although I feel like the question is sort of flawed, if you want to learn Kubernetes, you are going to need to run Kubernetes - or one of the downsized versions of it.

If you want to learn about containers, distributed workloads, etc, then Nomad is a great option that is easy to learn/adopt piecemeal.


Definitely check out all the aws offerings under ECS

There's now even onprem ECS variants which means not having to pay aws very much and still get the benefit of them running and maintaining the control plane


Not an alternative, but if you want to get a feel for Kubernetes and managing various kinds of servers, check out KubeSail and their options for K8s/K3s

It's ridiculous overkill, but I'm looking at a NextCloud server on one of their PiBox hardware servers for the house. (You don't need a PiBox - their stuff will run fine on little instances from AWS/DigitalOcean/Hertzner, etc., or a spare PC you have lying around...)


K8s ate everyone else. The alternative is to use a heroku like sass


In my particular market sector it seems everyone is using some form of cloud functions (SAM, Serverless Framework etc), and migrating away from K8S/containers.

Regarding PaaS stuff like Heroku, the only people I know that are still using that are solo hackers.


That's interesting. In the latest Who Is Hiring discussion, there seems to be more than 10 times as many references to Kubernetes compared to serverless. https://news.ycombinator.com/item?id=31582796


Interesting, you're right, I see 3 occurrences for "lambda" and 26 for "kubernetes". It's probably just the paritcular "bubble" I live in.


if you only use Services, Deployments and ConfigMaps then k8s can be simple too


Threads like the below are why DevOps jobs exist and why Kubernetes infrastructure skills pay so much and why there's such a large demand.

Yes, it's quite complicated.

No, an API to control a managed EKS/GCK cluster + terraform + Jenkins/Azure DevOps/etc. does not mean that magically the developer can 'just deploy' and infrastructure jobs are obsoleted. That's old AWS marketing nonsense predating Kubernetes.

There's a whole maintenance of the CI/CD factory and its ever demanding new requirements around performance, around Infosec requirements, around scale, and around whatever unique business requirements throw a wrench in the operation.

Sticking to ECS I guess is a valid point. What Kubernetes gives you is a more sophisticated highly available environment built for integration (Helm charts and operators and setups that when they work give you more levers to control resources allocations, separations of app environments, etc.)

And as an aside, I've been doing this for 20 years and long before Kubernetes, before Docker, hell, before VMs were used widely in production, I observed the developer mindset: Oh but it's so easy, just do X. Here, let me do it. Fast forward a year of complexity later, you start hiring staff to manage the mess, the insane tech debt the developers made unwittingly, and you realize managing infrastructure is an art and a full time job.

A story that is visible with many startups that suddenly need to make their first DevOps hire, who in turn inherit a vast amount of tech debt and security nightmares.

Get out of here with, it's just API calls. DevOps jobs aren't going away. It's just the DevOps folks doing those API calls now.


Reading the comments here validates my experience. When K8s was pitched as a way to make this all run smoothly, I thought, "Great! I'll write my code, specify what gets deployed and how many times, and it'll Just Work(tm)." I built a service which had one driver node and three workers. Nothing big. It deployed Dask to parallelize some compute. The workload was typically ~30 seconds of burst compute with some pretty minor data transfer between pods. Really straightforward, IMO.

Holy smokes, did that thing blow up. A pod would go down, get stuck in some weird state (I don't recall what anymore), and K8s would spin a new one up. Okay, so it was running, but with ever-increasing zombie pods. Whatever. Then one pod would get in such a bad state that I had to nuke all pods. Fortunately, K8s was always able to re-create them once I deleted them. But I was literally deleting all my pods maybe six or seven times per day in order to keep the service up.

Ultimately, I rewrote the whole thing with a simplified architecture, and I vowed to keep clear of K8s for as long as possible. What a mess.


This can probably be chalked up to youre-doing-it-wrong (sorry) but not knowing your precise scenario, it's hard to know what went wrong. Maybe really old versions misbehaved (only started a few years ago and its been smooth sailing), but I've never seen your problem on any of our stuff and we have dozens of different services on a bunch of languages/frameworks, and none of them just give up for no reason ( though a lot often die for predictable and self-induced reasons).

I think there was some jank on AWS CNI drivers at one point that delayed pod init, but that's probably the most wtf that I've personally bumped into thankfully.


> This can probably be chalked up to youre-doing-it-wrong

Yes, and the unforgiving part of k8s is that there is a right way documented somewhere, you might have just spent 3 days sifting through docs and posts and community forums to find it.

It's sometimes worth it, sometimes not, my main gripe with k8s would just be that there is no "simple things" and it shouldn't be pitched as making it easier for small shops. Even if a small use case can be done elegantly, it will probably require a pretty comprehensive and up to date knowledge of the whole system to keep that elegance.


Yep very much so. Doing it wrong ™ applies to any deployment and shouldn’t be held against k8s. We have over a hundred services deployed in who knows how many pods in a dozen environments and it’s definitely not that unstable.


> Doing it wrong ™ applies to any deployment and shouldn’t be held against k8s

I think there's definitely a huge asterisk there if the tool makes it very easy to "do it wrong", hard to "do it right", etc.

Of course with k8s it's tough because it's capturing computation! Hard for it to "know" what one is trying to do inside the containers. And in the case of k8s the only thing I could think of that is ... kinda in that space is managing volumes, since it runs into the dilemna of adding persistence to ephemeral things.


I imagine it's akin to management expecting you to spend your off hours learning all this "great new tech" while they think working off hours is reading online articles on hacker news to "stay up to date".


> it’s definitely not that unstable

So how unstable is it?


Not at all unstable in my experience


> This can probably be chalked up to youre-doing-it-wrong (sorry)

I think you're absolutely right. I freely admit that I knew NOTHING about K8s before embarking on this project (and still pretty much know nothing about it now), and I was able to cobble together something that 'worked', but that doesn't mean it was right.

And as another commenter points out, "a huge asterisk there if the tool makes it very easy to 'do it wrong'". I would rather be very clearly told that I've got it wrong and be prevented from progressing further vs. making something that superficially seems right then crashes and burns in prod.

I'm sure there are folks that can wield Kubernetes with great effectiveness, and good on them, but I found it to be supremely frustrating and the wrong tool for the right job. Not that I have a better solution, so I'm admittedly just kind of complaining.


We've had great success running celery applications in k8s, so it's surprising to hear dask was a problem for you. Especially considering dask recommends k8s as a deployment option.


Love Dask. Very robust and therefore very easy to get wrong. When you need a longer term solution that uses Dask, it pays to architect things well, in advance vs on the fly in a sandbox.


First. DevOps is a culture not a job most places have so many DevOps roles because they are doing it wrong.

In the olden days of 10 years ago, most operations teams worked around the clock to service the application. Like every day there would be someone on my team doing something after hours usually multiple. Tools like Kubernettes, Cloud (AWS, GCP, Azure) have added significant complexity but moved operations to more of a 9 to 5 gig. Less and less do I see after hours deployments, weekend migrations, etc. Even alert fatigue goes way down because things are self healing. This is on top of being able to move faster and safer, scale instantly, and everything else.

Operations side used to be a lot of generalists admin types and DBA's. With today's environment, you need a lot more experts. AWS alone has 1 trillion services and 2.4 billion of those are just different ways to deploy containers. So you see a lot more back end roles because it's no longer automate spinning up a couple servers, install some software, deploy, monitor and update. It's a myriad of complex services working together in an ephemeral environment that no one person understands anymore.


The number of places that get the meaning of DevOps wrong is too high. So much that it is often easier to use it wrong in order to express an idea.


I do think there is a space for a developer team model that is easy to maintain, hard to screw up and gives say 80% of the productivity gains.


New technology sometimes creates more work even though it makes the previous work easier. When the electronic spreadsheet was introduced in the 1980s, even though it made accountants more productive, the number of accountants GREW after the electronic spreadsheet was introduced. Sure, one accountant with an electronic spreadsheet could probably do the work of 10 or 100 accountants who didn't have the electronic spreadsheet, but accounting become so efficient that so many more firms wanted accountants.

"since 1980, right around the time the electronic spreadsheet came out, 400,000 bookkeeping and accounting clerk jobs have gone away. But 600,000 accounting jobs have been added." Planet Money, May 17, 2017, Episode 606: Spreadsheets!


Kubernetes in a sense is very similar to Linux back in the 2000s - it was nascent technology in a hot market that was still absolutely evolving. The difference now is that everyone knows the battle for the next tier of the platform is where people will be able to sell their value (look at RedHat selling to IBM for the saddled legacy of maintaining an OS as a tough growth proposition). For a while people thought that Hadoop would be the platform but it never grew to serve a big enough group's needs back in 2013-ish and coupled with the headaches of configuration management containerization hit and it's now combined at the intersection of OS, virtualization, CI, and every other thing people run applications on in general. It may be the most disruptive thing to our industry overall since the advent of Linux in this respect (people thought virtualization was it for a while and it's shown to have been minor comparatively).

A lot of this stuff really is trying to address the core problem we've had for a long time that probably won't ever end - "works fine on my computer."


I have noticed a pattern that keeps popping up.. I've seen many orgs invoking docker/k8s simply as an abstraction layer to allow mapping of commit hashes in a repo to objects in deploy environments.


Depending upon the nature of the artifacts that's not necessarily the worst abstraction for modeling deployments (still think that deployments are the big elephant in the room that k8s doesn't solve either when it really needs to be better standardized as a profession IMO but that's another topic). ArgoCD arguably makes this work more intuitively and it's one of the most popular K8S ecosystem components today.


Works fine on my cluster. Marking as "can't reproduce".


In my opinion, the main benefit of Kubernetes for large companies is that it allows for a cleaner separation of roles. It's easier to have a network team that's fully separate from a storage team that's fully separate from a compute team that's fully separate from an application development team because they all work around the API boundaries that Kubernetes defines.

That's valuable because, on the scale of large companies, it's much easier to hire "a network expert" or "a storage expert" or even "a Gatekeeper policy writing expert" than to hire a jack of all trades that can do all of these things reasonably well.

The corollary from this observation is that Kubernetes makes much less sense when you're operating at a start-up scale where you need jacks of all trades anyway. If you have a team of, say, 5 people doing everything from OS level to database to web application at once, you won't gain much from the abstractions that Kubernetes introduces, and the little that you gain will probably be outweighed by the cost of the complexities that lurk behind these abstractions.


The other problem I see large enterprises dealing with when it comes to K8s is the lack of cultural change in developer and infrastructure processes.

Traditional deployment models also have ways to separate the network, from the security, from the storage, from the application stack. But it's typically done through strict change control processes.

I see many orgs choose to change to K8s because it offers improvements to operational tasks related to provisioning all of those changes, and speeds up the old change control gateways.

However, K8s is tuned to operate extremely profitably in organisations that need to make large numbers of changes to their infrastructure or software stack all of the time. And begins to break down at scale (at least from a cost point of view) compared to alternative solutions if organisations are not meeting minimum requirements for deployments a day.

IT orgs have chosen to adopt this huge piece of operations software that is itself fairly monolithic, and requires large amounts of upkeep and maintenance to keep it running smoothly and provide constant availability.

But even though they've adopted this new massive fixed cost in their IT operations. They continue to use old change control processes, often because restructuring old teams, that have traditionally made the company a lot of money, into new teams proves to be an incredibly risky exercise.

And so, the net outcome, is that their overall IT operations processes marginally improve at best. But they now have simply absorbed a new fixed cost on top of all their existing ones. What's worse, I see in some orgs, the additional cost pressure is being noticed, but they attribute (IMHO incorrectly) this cost pressure to lack of competence in the new technology (K8s), and begin a massive hiring spree to try and find specialists to better tune the technology stack. The solution, IMHO, instead should be to push hard on existing teams to simplify and downsize, and create incentives to interact with the infrastructure APIs more aggressively, letting Sysadmins/DevOps/SREs deal with faults, errors and failures the way they always have, but with the new fancy tools that let them work more efficiently.


High Availability, Scalability, Deployments, etc are NOT the goal of Kubernetes, they are features that are not exclusive to Kubernetes, nor is Kubernetes necessarily better at them then others.

The goal of Kubernetes is to improve the portability of people by introducing abstraction layers at the infrastructure layer - These abstractions can seem overly complex, but they are essential to meet the needs of all users (developers, operators, cloud providers, etc)

Before kubernetes in order for a developer to deploy an application they would need to (send email, create terraform/cloudformation, run some commands, create ticket for loadbalancer team, etc) - these steps would rarely be same between companies or even between different teams in the same company.

After kubernetes you write a Deployment spec, and knowing how to write a deployment spec is portable to the next job. Sure there are many tools that introduce opinionated workflows over the essentially verbose configuration of base Kubernetes objects, and yes your next job may not use them, but understanding the building blocks, still make it faster than if every new company / team did everything completely differently.

If you only have a single team/application with limited employee churn - then the benefits may not outweigh the increased complexity.


> these steps would rarely be same between companies or even between different teams in the same company.

And the quality can differ a lot too. I used to think that k8s is not necessary, since our team has mastered both stateless and stateful app deployment on VM, with ansible calling aws/gcp API. Everything just works.

And then I joined another company, which has a hotchpotch of unorthodox terraform and ansible code and a homebrew service discovery layer, with frequent incidents in the early mornings, especially on weekends, when autoscaling of aws/gcp VM would fail due to a myriad of reasons.

With k8s, there is a minimum quality.


The thing you're noticing is the usual thing that happens when new labor saving technology is invented:

1. What people expect: less work needs to be done to get what you had before.

2. What people don't expect: more is expected because what used to be hard is now simple

So while it may have taken a few weeks to set up a pet server before and as a stretch goal you may have made your app resilient to failures with backoff retry loops etc. Now that's a trivial feature of the infrastructure, and you get monitoring with a quick helm deploy. The problems haven't disappeared, you're just operating on a different level of problems now. Now you have to worry about cascading failures, optimizing autoscaling to save money. You are optimizing your node groups to ensure your workloads have enough slack per machine to handle bursts of activity, but not so much slack that most of your capacity is wasted idling.

Meanwhile, your developers are building applications that are more complex because the capabilities are greater. They have worker queues that are designed to run on cheap spot instances. Your CI pipelines now do automatic rollouts, whereas before you used to hold back releases for 3 months because deploying was such a pain.

Fundamentally, what happens when your tools get better is you realize how badly things were being done before and your ambition increases.


Because everything has gotten bigger and more complicated.

It's like asking "if the computer saves us all so much work, why do we have more people building computers than we ever had building typewriters"?

Something can "save labor" and still consume more labor in aggregate due to growth.


Jevons Paradox


I think this Kelsey Hightower quote has summarized my experience working with Kubernetes:

> Kubernetes is a platform for building platforms. It's a better place to start; not the endgame.

https://twitter.com/kelseyhightower/status/93525292372179353...

Everywhere I've worked, having developers use and develop Kubernetes directly has been really challenging -- there's a lot of extra concepts, config files, and infrastructure you have to manage to do something basic, so Infra teams spend a lot of resources developing frameworks to reduce developer workloads.

The benefits of Kubernetes for scalability and fault tolerance are definitely worth the cost for growing companies, but it requires a lot of effort, and it's easy to get wrong.

Shameless plug: I recently cofounded https://www.jetpack.io/ to try and build a better platform on Kubernetes. If you're interested in trying it out, you can sign up on our website or email us at `demo [at] jetpack.io`.


The short answer is: Because of Kubernetes.

The longer answer is: When you switch to Kubernetes, you are introducing _a lot_ of complexity, which, depending on your actual project, might not be inherent complexity. Yes, you get a shiny tool, but you also get a lot of more things to think about and to manage, to run that cluster, which in turn will require, that you get more devops on board.

Sure, there might be projects out there, where Kubernetes is the right solution, but before you switch to it, have a real long hard thinking about that and definitely explore simpler alternatives. It is not like Kubernetes is the only game in town. It is also not like Google invents any wheels with Kubernetes.

Not everyone is Google or Facebook or whatever. We need to stop adopting solutions just because they get hyped and used at big company. We need to look more at our real needs and avoid introducing unnecessary complexity.


I agree 100%. Keep things simpler, invest the $ that would have been spent on k8s staff on performance tuning courses & upskill your devs in squeezing every last millisecond out of a couple of dedicated servers. You can do lots with a couple of beefy machines if the software is tuned.

Everything has become so abstracted away these days that performance isn’t even a consideration because people don’t understand the full stack. And I don’t mean “full stack” as it’s slung around these days. EG: doing 20 round trips to the database looking up individual records (as each is an object code-side), when you could just do one. Things like that are opaque & many devs wouldn’t even care or know, but a little bit of education can make a huge difference.


This is all true enough. The benefit of k8s is that if you need all that complexity, or even most of it, then k8s gives it to you in a fairly standard way. Then everything you build on top of it has a well-documented platform you can hire people to help maintain rather than having to train people for six months on a home-grown solution before they're productive.


The premise of your question is invalid. Have you ever tried setting up a Kubernetes cluster and deploying apps in it? Kubernetes doesn't save work, it adds work. In return, you get a lot of benefits, but it wasn't designed to reduce human work, nor was it designed to eliminate devops jobs. It was designed for scalability and availability more than anything. Most people using Kubernetes should be using something simpler, but that's a separate problem.


I don't know my dude, all 3 major clouds offer "canned" k8s services that you can set up in a ridiculously short amount of time with Terraform and your CI platform of choice.

I agree with some other comments in this thread about a general fervor in the Enterprise space to "modernize" needlessly. This conversation usually lands on the company copying what everyone else is doing or what Gartner tells them to do. Cue "DevOps".

100 percent agree with your comments on something simpler. I can't tell you how many times I've debated with our Analytics teams to just use Docker Compose/Swarm.


> all 3 major clouds offer "canned" k8s services that you can set up in a ridiculously short amount of time with Terraform and your CI platform of choice

I don't agree. I spun up a Kubernetes cluster in Azure, which was indeed easy. But then I had to figure out how to write the correct deployment scripts to deploy my docker containers to it, and how to configure all the security stuff. After more than a week of trying to figure it out, I decided to ditch the whole solution and go for Azure Container Instances instead. It was too much for me to learn about all the concepts of Kubernetes, how you configure them, how to make it work for solutions that are not as simple as the example on the website, and how to navigate through the various different methods of deploying stuff.

Maybe I'm just too dumb. But I wasn't going to invest a month of my time into doing something that should be simple enough for an average developer to accomplish.


You're not dumb, you're just new to it, and it's fundamentally hard stuff anyways, and if you can find a higher-level abstraction that lets you get work done faster, then all the better. However, the question is comparing Kubernetes to traditional VM-based infrastructure (especially with pet nodes) whereas you're comparing Kubernetes to a higher-level abstraction.

For what it's worth, deploying in Kubernetes is pretty easy once you figure it out (and often finding the information is the hardest part). All you need to do is update the Deployment resource's "image" parameter. You can do that with `kubectl patch` like so:

    kubectl patch deployment foo -p '{"spec":{"containers":[{"name":"main","image":"new-image"}]}}'
Kubernetes will handle bringing up a new replicaset, validating health checks, draining the old replicaset, etc.


Please, do not manage deployments with imperative kubectl commands, I beg of you.


The question is: what tool to use? I'm a solo developer running a very small kube cluster for my hobby project. I very much wanted to have a declarative version controlled state of my cluster. Every time I try googling solutions I get flooded with some enterprise Saas offerings that do nothing I want.

I managed my stateful sets/services for a while with terraform, but my experience was absolutely terrible and I have stopped that eventually. I now use "kubectl patch" and "kubectl apply" with handwritten yaml, but the workflow feels very clunky.

Intuitively it seems obvious to me that there must be a tool helping with that, but for some reason I was absolutely not able to find anything that would be even a little bit helpful. I am considering writing a couple python scripts that will automate it.


kubectl apply on a directory full of yaml works fairly well for small stuff. Check it into git and it's version controlled.

If you need something more flexible than that, try making your own helm chart. Helm will give you some text templating, pre and post hooks, some basic functions, and some versioning and rollback functionality.

You can start simple by just pasting in your existing k8s yaml, and then just pull out the pieces you need into variables in your values file. If you need to change an image version, then you just update the variable and `helm upgrade mychart ./mychart`


kubectl apply on a directory doesn’t work because deleting a resource manifest won’t delete the corresponding resources.


It does work, as long as you’re not deleting anything. :) That might be good enough for a “very small kube cluster for my hobby project”

I only use ‘kubectl apply’ for small stuff where I only have a couple resources. Anything more complicated and a tool like helm is much more useful.


'kubectl delete -f dir/' will delete all resources in the directory.


Right, but the only resource you want to delete is no longer in that directory, so you’ve now deleted every resource you except the one you actually wanted to delete. :)


Ah, I slightly misunderstood. Still, you should always have your manifests under version control so this shouldn't ever be a problem :)


I'm a fan of emrichen, but simple text templating (Jinja) works almost as well. (j2cli means you can just provide a yaml file with per-stage configs.)


Fluxcd, or argo CD if you want a nice UI


I probably wouldn't do this, but what problem does this cause?


The declarative approach is a more sustainable way to run Kubernetes. If you define some desired state in manifests and apply them to a cluster, they can be applied again to new clusters or the same one and Kubernetes will attempt to maintain the desired state.

This state can be version controlled, written in stone, whatever you prefer and it can always be attained.

When administrators start issuing imperative commands to a cluster, state starts being changed and there is no record[0] of the state Kubernetes is being asked to maintain.

[0] Not entirely true, the state can always be retrieved from the cluster so long as it hasn't failed.


I find this to be a common misconception, stemming from a misunderstanding of what "declarative" means (especially common when people are discussing tools like Terraform).

Firstly as you point out, there is a record of the state Kubernetes is being asked to maintain: it's in the API server as the spec of each resource.

Secondly, using `kubectl` "patch" in the manner described is not making changes to the cluster state directly, it's making changes to the specification of what should be maintained, and the various controllers effect the state changes.

Fundamentally, the argument seems to come down to "you don't have a record of what you once asked the API server to do", and that's fair enough - you don't. But that has nothing to do with imperative or declarative models.

I'm not advocating actually doing this on a day-to-day basis, but the arguments against it are not ones of imperative vs declarative.


> Fundamentally, the argument seems to come down to "you don't have a record of what you once asked the API server to do", and that's fair enough - you don't. But that has nothing to do with imperative or declarative models.

You do actually have this record. First of all, because k8s has an audit log, and secondly because deployments maintain a revision history (so you can always rollback--kubectl even supports this via `kubectl rollback undo`).


You're correct, but, when approaching this from the human perspective, it's less about the technology and the reality and more about "do I have a record of the last state I asked the cluster to be in before I fucked it up?". Ultimately that's what matters. Auditability is great, but in practice, when the shit hits the fan, can I get the application that brings in our company's money, and our customers rely on to breath, back to the state it was after 5 years and 73 updates?

Updating the manifests, pushing them to version control and having CD deploy them encourages humans to "do the right thing".

It would be quite easy though to just tweak that one environment variable though while I patch in the new image version to update that one service; until the entire cluster dies and I can't retrieve the last definition and need an equivalent cluster up within the hour.

This is really more about the practice of writing down the state you want the cluster to be in (the spec) and showing it to the cluster, than just ordering it to do one thing without context.

In this sense, Declarative Vs Imperative is just a proxy term for "do I have a record of the state I asked the cluster to keep?"


Given that Kubernetes' docs[0] discuss using imperative commands, I think it's a fairly reasonable way to describe it.

[0] https://kubernetes.io/docs/tasks/manage-kubernetes-objects/i...


Why is applying a full manifest “declarative” but applying a patch is “imperative”? That’s clearly an error.


Applying a full manifest is "this is your state now".

Applying a patch is "make these changes to your existing state".

That dependency on existing state is a difference, and it seems to map reasonably well to what declarative/imperative seem to usually be used to mean in this context.


1. A strategic merge patch says "this is your state now" (the only difference is the scope of the state in question, with a full manifest including a bunch of extraneous stuff)

2. "make these changes to your existing state" is still declarative


> Applying a full manifest is "this is your state now".

It isn’t though. It’s “please make the state look like this eventually”. You do not patch state, just spec. Controllers effect the changes.


You're both correct, but you're talking about different states. The parent's "state" is the state of the etcd database and your state is the actual state of the resources that the controllers are managing.

That said, the parent is wrong that a full manifest is declaring the (etcd) state in a way that a (strategic merge) patch isn't--both are declaring etcd state, but a strategic merge patch is doing so in finer-grained increments. A strategic merge patch can declare zero state or many full manifests, while applying full manifests can only work in increments of complete resource manifests. But both are telling Kubernetes "this is your (etcd) state now".


Then open a PR.


This is a strange way to concede an argument, but I’ll take it.


People are incorrectly assuming that using kubectl implies invoking it from an administrator's laptop. Of course, you can and should invoke kubectl from your CD pipeline. The CD pipeline maintains its own record of runs, and Kubernetes deployments have a revision history.

Moreover, people in this thread also don't know what "declarative" means. The patch is declarative, and "declarative" doesn't provide the claimed benefits. For example, as with the patch command, I can create, apply, and delete a Kubernetes manifest file (indeed, I can apply directly from stdin!) and there is no additional record of the change beyond that which would exist if I had just applied the patch.


A patch is declarative in that it's an idempotent command which requires you to fully define the patch. It's still something that is being manually defined rather than contained in a file to be applied. Of course, if that file is just being locally held and modified on someone's laptop instead of committed to version control, it's a moot point.

I will grant you that `kubectl delete -f` breaks my argument, since it's imperatively modifying objects given a declarative manifest. Re: changes via stdin, I mean sure; you can also pipe a deployment manifest into `ex` or `sed` to make changes on the fly and then echo it back to `kubectl apply -` but I wouldn't recommend it.


Yeah, my point is that whether you're changing a manifest and doing an apply versus a patch isn't relevant, the relevant bit is whether or not you have a version history and in both cases you do via your CD pipeline and via k8s deployment revision history. You can also commit to git, but I don't think there's much value in committing every image version change to git.

Also, working in raw manifests isn't a panacea; if you delete a resource's manifest, Kubernetes won't delete its corresponding resource (maybe this is what you were referring to with your bit about `kubectl delete -f`). You need something that diffs the manifests against the current state of the world and applies the requisite patches (something like Terraform) but this isn't related to how you change image versions.


I agree with your points, but as I mentioned in another comment, if we take "imperative" and "declarative" as proxy terms for "I do or do not have a reference to the state I requested of the cluster OUTSIDE of the cluster"; my point is that updating the state, in full, by modifying the manifests and committing them to be deployed by your CD pipeline or otherwise is a better approach to ensuring you or someone else can rebuild your "empire" were it to unceremoniously burn to the ground.


I think you’re assuming that just because we’re updating the image via kubectl (which itself is invoked via CD pipeline) that the infrastructure isn’t codified and persisted, which isn’t the case. You can (and should) still have your infrastructure manifests saved in your infra repo/directory and version controlled; that’s orthogonal to how you update the current image version.


The same problems any imperative management within declarative config causes -- drift. If the tool you're using supports declarative configuration, all changes should be made exclusively via the declarative interface to prevent that drift. In this example, the new image should be added to the original manifest itself, not via a CLI update.


It depends on how you manage your changes. A lot of people don't have their infra-as-code manage the deployment's image field--rather, that's updated by the application's CD pipeline. There's no drift to worry about.


So upon deploy the CD pipeline calls kubectl with the proper deployment image and that's ok?


Yes.


Drift, as the other child comment mentions, but also loss of version control. I do not want to have to trawl through someone's shell history to figure out what they changed, nor do I want to have to redirect `kubectl get foo -o yaml` output into diff.

If everything is in code, and you have a reasonable branching strategy, it's much easier to control change, to rollback bad merges, to run pre-hooks like security checks and configuration validation tools, etc.


It's not drift, because the infra-as-code doesn't manage the image field (the application's CD pipeline does). You don't trawl through someone's shell history', you look at the CD pipeline history. Rollbacks are easy--you just deploy the prior version via your CD tool.

I think you're assuming that invoking kubectl means invoking it directly from a user's command line, but kubectl can also be called in a CD script.


If you're using ArgoCD or something then sure, but bear in mind the original statement you made was directed at someone who is new to K8s, and given a command that can be executed from their shell, they would likely assume that's what you meant.


It doesn’t have to be Argo, it can be Jenkins. Whether or not you use a CD pipeline is orthogonal to whether or not you use k8s. The best practice is to use a pipeline whether you’re targeting k8s or bare VMs or a higher level PaaS abstraction.


I'm not new to Kubernetes and I've been using containers since Solaris zones were introduced.


You know kubectl has a built in diff subcommand?


I do now! Thank you.


This is a declarative kubectl command.



They’re not being precise with their language. Applying a full manifest is no more “declarative” than applying a patch.


Spot on!


As an outsider, that command looks really easy to mess up.

* shell interactions with quotes * double quotes * interpolating into image namespace with no autocomplete * easy to forget an argument * do you get autocomplete against the deployment name?

Comparison: C# declaration of a complex type - it's less complex than the `kubectl` command above, but IDEs offer way more support to get it right.

* var x = new List<Dictionary<CustomType,int?>>()

This will light up warnings if you get anything wrong;

You get: * go to definition of `CustomType` * autocomplete on classnames * highlighting on type mismatches * warnings about unused vars. * if you initialize the var, the IDE will try hard to not let you do it wrong

So structurally,

1) in the code realm, for doing minor work we have really strong guarantees,

2) in deployment realm, even though it's likely going to hit customers harder, the guarantees are weaker.

I think this is behind the feeling that the k8s system is not ready yet.


About quoting rules: We desperately needs static analysis or IDEs (Emacs plugins, whatever) that "explain" exactly what quotes do in a particular context. When I returned to C++ recently, I was blown away by Clang-Tidy (JetBrains CLion integration). Literally: Clang-Tidy seems to "know what you really want" and give intelligent suggestions. For someone whom is a very average C++ programmer, it instantly leveled me up! I could see the same for someone writing /bin/sh, /bin/ksh, /bin/bash shell commands. If the IDE could give some hints, it would be incredibly helpful to many.


The quotes here have nothing to do with k8s, it's just how you encode JSON in shell.


apt-get install shellcheck

or

https://www.shellcheck.net/


You put it in a bash script that gets called from CI and move on with life. No one is typing this into their terminal every time they want to do a deploy.

> I think this is behind the feeling that the k8s system is not ready yet.

Whether you patch the deployment image from bash or from C# doesn't indict the k8s ecosystem.


correct! Here's why: Kubernetes is an operating system (a workload/IO manager) and its lineage is 16 years old. So it's like Unix around 1979, or Windows/DOS around ~1996. Imagine it's 1996 and you can pick between Windows NT (representing 16 years of MS development) or System 360 (~40 years out from IBMs first transistorized designs).

What I'm trying to say is that as an operating system, Kubernetes is now a young adult, and historically speaking, operating systems at this level of maturity have been adopted and ridden for decades, with much success. But, ya know, if you chose OS/360 in 1996 you would have a point.


1. You’ll use `kubectl set image deployment/gitlab gitlab=gitlab/gitlab-ce:14.9.2-ce.0` in production instead

2. kubernetes will abort the change if it doesn’t syntax check or the types don’t match.


Hah, I didn't even know about `set image`.


Don’t do that, use

    kubectl set image deployment/foo main=main:new-image
instead


It also accepts "*" if you don't know or care about the container name(s)

    kubectl set image deployment/foo "*=the/new/image"
and then, my other favorite (albeit horribly named, IMHO) "rollout status" to watch it progress:

    kubectl rollout status deployment/foo


TIL. Thanks for the advice.


It took me 4-5 hours to read and understand first few chapters of Kubernetes in action, I annotated a lot text as notes on the Oreily site, but didn't have to go back to it. I didn't have to do any security and account management stuff, it was being done by someone else. After 3 years of working with plain Kubernetes and OpenShift, I still didn't have to go back to the book. Basic concepts in Kubernetes are easy to understand, if you're working as a developer to deploy your apps in it and not in DevOps managing it.

https://www.oreilly.com/library/view/kubernetes-in-action/97...


You wrote: <<if you can find a higher-level abstraction>>

This a stretch, but to me Kubernetes is like the C programming language for infra. If you look at the entire software stack today, drill down (all the turtles), and you will eventually find C (everything goes back to libc or a C/C++ kernel). I assume any commercial (or non-!) "higher-level abstraction" for infra is already (or will be soon) built on top of Kubernetes. I am OK with it.

I write this post as someone who is uber-techincal, but I know nothing about actually using Kubernetes. I can do vanilla "hand-coded/snowflake" infra just fine in my constrained, private cloud environments, but nothing that scales like Kubernetes.


> If you look at the entire software stack today, drill down (all the turtles), and you will eventually find C (everything goes back to libc or a C/C++ kernel).

I might be nitpicking, but I'm not sure that's necessarily true. You could in theory write a compiler for a new language in C (or even assembly!), and once you have a working language, re-write the compiler in that new language. Now that there is no C code involved in the stack anymore, would that still count as a C-turtle?

Haskell for example, has some "bits" written in C, but a lot is written in Haskell or assembly[1]. So if you look at the WHOLE stack you'll find C _somewhere_ (probably most places), but I don't think _everything_ boils down to C.

Granted, a LOT of stuff is written on top of C.

[1] https://stackoverflow.com/questions/23912375/what-language-i...


Great point and not a nitpick at all! I hope the next generation of languages is written on top Python, Ruby, C++ (ultra modern), Java, or DotNet/C#. I wish more languages would figure out if they can host in JVM/CLR, which would provide some crazy interactions!

After seeing so much great work done with JavaScript in the form of "transpilers", I think a lot can be done in that area. I feel Zig is a crazy good idea: A brand new language that produces binaries that are 100% compatible with C linkers. If all goes well, in a few years, why would anyone use C over Zig? It seems like the future.

Lots of people think C++ is bat sh-t crazy complex (me too, but I still like the mental gymnastics!). What if there were different C++ dialects supported by transpilers that intentionally restricted features? I think kernel and embedded programmers would the primary audience.


Kubernetes may eventually underpin lots of higher level infrastructure, but most stuff currently isn't running atop Kubernetes. None of the higher level container-as-a-service offerings by the major cloud providers run on Kubernetes, for example. Nor does Heroku. And moreover a lot of people are still working with lower-level abstractions (VMs and auto-scaling groups) or no abstractions at all (pet servers).


I hear you about feeling dumb. I think some early decisions in the k8s ecosystem led to a lot of wasted time and effort, and this frustration.

YAML: significant whitespace is always unwelcome but YAML also introduces unexpected problems, like how it deals with booleans. For example, say you have this array:

    foo:
    - x
    - y
    - z
You might think this is an array of strings, but you'd be wrong.

It's also difficult to read through a YAML config and understand the parent of each key, especially in code reviews or just on GH.

I believe k8s life would have been easier with JSON configs, where it's impossible to confuse e.g. booleans for strings and where it's easier to understand the object's hierarchy.

Helm's use of gotpl: this choice exacerbates the problems with YAML. Now you're treating a structured language with a text template library. You have to spend energy thinking about indentation levels and how the values will be interpreted by both the templater and k8s.

I think helm would be less frustrating if they chose some templating library that made objects first class citizens. Where you can inject values at specific locations with one-liners or simple blocks of code (e.g. `ingress.spec.rules[0].append(yadda yadda)`)

I'm sure there was debate about these choices early on and I don't have any unique ideas here, so I don't want to be too critical. These are just a couple of pain points I've personally experienced.


Technically YAML is a superset of JSON - all valid JSON is valid YAML. So you could write all your configs in JSON and they'd work just fine.


This discussion from a couple of weeks ago suggests it isn't that simple: not all valid JSON is always valid YAML if your definition of superset requires that JSON would be parsed the same with a YAML parser as with a JSON parser.

If you're going to use JSON for config, it's better to use an actual JSON parser.

https://john-millikin.com/json-is-not-a-yaml-subset https://news.ycombinator.com/item?id=31406473


Thanks! That's really interesting.

You could mitigate some of the issues and get JSONs "strictness as a feature" by passing the document through e.g. `jq . $file` as a CI step, but I don't think that'd resolve the 1e2 issue. TBH I didn't know you could write numbers in JSON like that, so I imagine it'd be an issue that doesn't come up often. But it's disappointing that it wouldn't just work.


> Technically YAML is a superset of JSON

That's false. http://p3rl.org/JSON::XS#JSON-and-YAML


I'd like to amend my statement:

Technically YAML is (supposed to be) a superset of JSON - (almost) all valid JSON is valid YAML. So you could write all your configs in JSON and they'd (probably) work just fine (assuming you keep things relatively simple).


Is that yaml not an array of strings?

["x","y","z"]


I don't happen to have a YAML parser at hand, but I believe it's...

["x", True, "z"]


It's not, y is treated as boolean true.


Is that a k8s thing? Because when I load it in this page[1] it comes out as a string...

[1] https://yaml-online-parser.appspot.com/?yaml=foo%3A+%0A-+x%0...


In YAML 1.2 they fixed it (and also the Norway-problem of "no" meaning false) but many places are still using an older version.


for what its worth, and if im not mostaken, there is some support for JSON, ie. kubectl get deployment xyz -o json


JSON has a different set of problems, e.g. not having multi-line string support.


Similarly, we switched from self-managed k8s on EC2 to Fargate. Took about 2 months part-time with some consultants to get thing squared away.

Once we deployed, we ran into all sorts of SRE-issues. Turns out AWS sets all those “sane limits” that our own folks never did. Still hunting ghosts from the rollout 6 months ago.

Makes for good resume fodder, and makes me laugh at the prestigious titles and positions the folks who built this system went on to receive at big name firms.

Guess it is someone else’s management problem now. :shrug:


Azure is generally the worst (their new container offering looks better though)

If you try digital ocean k8s offering it's fairly straightforward.

Google cloud was the first to offer a decent k8s as a service, if I recall correctly. We didn't have DevOps back in 2015 and we were on GCloud.

Personally I don't pick k8s just because it's heavy to run and I don't want to waste machines (plain docker is good enough for a large part of what people actually need). Sometimes in a project when I can't figure something out with just docker, I just bite the bullet and install k0s.


And you probably didn't even get to TLS and or authenticated communication between containers, CI/CD, canary deployments, observability, monitoring etc.

What we need is a Next.js for Kubernetes. Something that delivers a full stack solution on top of base Kubernetes.

The core system is great, but we need to replace these DevOps with a framework or platform.


> What we need is a Next.js for Kubernetes. Something that delivers a full stack solution on top of base Kubernetes.

Doesn't Rancher fit this description? It's pretty resource-heavy though.


Rancher seems to now be a "multi-cloud container management platform".

- Deploys kubernetes clusters - includes Fleet for CI/CD - install apps with helm - ISTIO (awesome)

It feels like it is positioned for orgs with large needs. I'm looking for k8s for small nimble orgs.

I will look at Rancher more, thanks for reminding me of it!

Google's Anthos is also hard to describe now as they cover similar "everything" product features.

GCP now has Autopilot which lets you pay just for the cpu you use (no cluster management at all).

Anthos includes ISTIO which may at some point work on Autopilot. This would mean not having to fiddle with GKE Ingress (which I found unpleasant) and instead use the new standard Gateway.

I believe that eventually GKE Autopilot will offer running individual pods on GPU/TPU, pay as you go.

But when is this all as easy as using Next.js ?


I feel you. Took me 2 years of dabbling with k8s every once in a while on the side to finally "get" it. and if I stop looking for a few months, suddenly it already gained another set of features and deprecating some other.


Definitely a pain point I've seen in my past work as well. Even though spinning up a basic cluster has gotten easier with canned services like EKS, deploying and developing on the cluster is a major challenging for most developers without a higher level framework.

My cofounders and I are working on a solution to this at https://www.jetpack.io/. If you're interested in early access, we'd love your feedback!


Container instances are pretty good really, the app service for containers is pretty good too. I’ve been playing with k8s because that’s what everyone thinks they want and need to be able to speak to it, and use it when the time comes, but I’ve yet to run into a case where I really thought it was necessary, for the platform I work on (millions of users, big number of transactions per second).


Do you use ACI at that scale? It's interesting how Microsoft also promotes AKS rather than ACI. For Azure ML they even explicitly state that ACI should not be used for production purposes, which I find quite curious.


There are tools that let you convert a running docker/podman container to a kubernetes deployment manifest which you can then deploy with one line command to your cluster. Kubernetes is the new data center and I don’t expect anyone with <10 years of enterprise experience to master it. Certainly one can use it and deploy things to it if you work at a place where there are mature pipelines, but deploying your own cluster and all associated services plus networking plus security is the domain of experienced/senior engineers. How quickly one achieves that “status” is up to the individual. It took me 15 years before I could build an enterprise network from scratch. It took me two years to understand how to do the same with K8s.


We have a team to manage our Azure k8s. Blue/green clusters, switching traffic between clusters to be able to upgrade k8s, etc

Definitely a lot of work.


Literally a one-liner for AWS EKS:

eksctl create cluster --name mycluster --region us-west1 --with-oidc --fargate --external-dns-access --asg-access --full-ecr-access --alb-ingress-access


That is a quick and short line.

Now the fun starts:"Kubernetes Failure Stories"

https://k8s.af/


I’ve made a comment below, but long story short we’ve moved to Kubernetes running on Fargate and we don’t have downtime anymore.

Sure, one can break anything, but our anecdotal experience is we’re now focused on actually delivering code rather than fretting about node failures.

https://news.ycombinator.com/item?id=31581372


You can't run all types of workloads on Fargate. At least not yet.


That’s fair, for example I couldn’t manage to run clustered Redis. Something with EFS file system that Fargate nodes use.


You mean like AWS' EKS? That spares you kubeadm stuff in setup. Arguably complicated upgrade, because you're less familiar with what you have/need/rely on (their own upgrade docs point you to upstream changelogs/release notes etc.). You're still left with a kubernetes cluster to deploy stuff to, decide how stuff scales, etc., which is a lot of what is generally meant by 'DevOps' anyway?

The lower level infrastructure/platform/kubeadm type stuff isn't really 'Dev' related at all.


The reason why they provide it, is because everyone expects it. The reason everyone expects it, is because it was "cool" and an ecosystem grew around it. I use AWS ECS and it's really really good and easy to understand.


> 3 major clouds offer "canned" k8s services

...until you need to debug something somewhere in the enormous stack.


> Most people using Kubernetes should be using something simpler, but that's a separate problem.

This has been the biggest drain on my career. Everyone wants to be an "engineer" ready to handle every problem like its the next facebook. Like bruh, this service is going to get like 100 req/hour max and only when the sun is up - just **ing throw it on cloud run. We can tell the only thing you want to build is your resume.


It's a good point: tools made by giants for giants, such as Kubernetes, Bazel, etc, may not make sense for a smaller operation.

But what would you suggest in lieu of Kubernetes? What would save work for a shop which is not yet a giant but has already overgrown the capabilities of 2-3 manually managed boxes / instances?

I can think of several options. Management by Ansible / Saltstack / Chef can easily become a rabbit hole comparable to maintaining K8s's YAML. In your experience, does Nomad save SRE work at smaller scales? Does Terraform? CloudFormation?


If you only limit yourself to a subset of kubernetes's features, it's actually really great for small operations.

I run a small cluster for my side projects consisting of 3 nodes. 1 node is dedicated to run database containers and control plane node (it's also a worker node but strictly for database statefulsets only), and the other 2 as worker nodes, but one of them also run an NFS server that mounted as persistent volumes so every worker nodes can have access to it.

Self-healing? Internet-scale? I have on interest for them on my small cluster. I just want my apps to not go down while I'm updating them without writing complicated blue-green deployment scripts, ability to move around pods to other nodes when one got overloaded, and ability to add or remove nodes when needed without starting again from scratch. I basically treat it like docker-compose on steroid. So far it works really well.


A cloud service that can run containers directly, e.g. Amazon ACS, Google Cloud Run, etc.


> e.g. Amazon ACS

It's "Amazon ECS", and it works pretty well for standard fare CRUD web services, but more complex use cases quickly end up pulling in more and more AWS services (for example, if you need to run a cron job, you can use EventBridge to trigger an ECS task or just do it all with Lambda). This isn't dramatically worse--it's mostly just different. Kubernetes APIs are more consistent than in AWS, and Kubernetes tends to be more flexible and less limited than AWS. It's also much easier to extend Kubernetes or find a pre-existing extension than in AWS. But mostly it's not going to make or break your project one way or the other.

If you're just running CRUD web services, this is fine, but if you need to spin up a background job or do more complex orchestration then it can quickly become advantageous to


We run Nomad with about 7 to 9 client instances in prod. Quite happy with it.


Anybody who watched "Kubernetes: The Documentary" knows the answer: https://youtu.be/BE77h7dmoQU

Kubernetes only exists, because Google lost the Cloud Wars, and this was their Hail Mary pass.


And I might cynically offer it was "invented" to solve the problem of ex-googlers not having any useful immediately transferable skills as the Google internal tech stack had nothing in common with industry.


In 2005 Google search got into a sticky spot, where they were completely unable to deploy new versions of the search engine from the master branch of the code for a while, because there were performance regressions that nobody could find. Deployment and related things like "deploy a copy of the production configuration to these spare 2000 machines" were manual slogs. I was the noob on the team doing such important yet unfulfilling tasks as "backport this Python performance testing script to Python 1.x because our only working test environment doesn't have the same version of Python as production does". This was before borg aka kubernetes, and let me tell you, a whole bunch of stuff was dysfunctional and broken.

All this is not to say that Kubernetes is the right solution to your problem, or to almost anyone's problem. It was just an improvement to Google's infrastructure, for the sort of problems that Google had. For some people it makes sense... for you it might not.


Borg is not K8S, K8S is not borg.

Borg is far more advanced in its scaling abilities.


That's not really been my experience. The number of people who knew how to deploy software at Google was much smaller than the number of people writing software there. I was certainly the only person on my team who ever dealt with the nitty gritty of running stuff in production. That seems about like the industry standard; I'd say the average software engineer in industry, inside or outside Google, considers their work done when their code is merged.

At my first job outside of Google we used something called Convox, which was very similar to running things in production at Google. You triggered a package build from your workstation (!) and then adjusted production to pick that up. Very similar to mpm packages and GCL files. (The difference is that Google had some machinery to say "while the build might have been triggered from a workstation, all this code has actually been checked in and reviewed". Convox did not have that part, so yeah, you could just edit some files and push them to production without checking them in, which wasn't great. But when you are a 4 person development team, not the end of the world by any means.)


But the internal tech stack doesn't have much in common with Kubernetes, either.


That's actually my point.

If Kubernetes does indeed provide the best solution to provide scalability and availability, one can argue that this would result in a decreased demand for dev ops engineers, as they "would just have to use Kubernetes" and it would decrease their workload.

In reality this does not seem to be the case, that's why I asked.


> If Kubernetes does indeed provide the best solution to provide scalability and availability, one can argue that this would result in a decreased demand for dev ops engineers, as they "would just have to use Kubernetes".

I'd say it would result in either:

- the same scalability and availability with fewer DevOps engineers - better scalability and availability with a similar number or more DevOps engineers

In my experience, it's almost always the second case that happens. For example, a service would be moved from a few (virtualized or physical) servers that can only be scaled manually, to a k8s cluster with either autoscaling or at least scaling by changing a configuration file.


Right. Most companies aren't content to settle for doing the same thing they were doing but with fewer engineers when they realize they could be using those other engineers to automate even more things (via Kubernetes operators).


It could have been designed better to be easier to set up and use. Especially for ad-hoc or more casual use. If it is easy to use casually, people will use it more and they will learn how to use it faster. It is also easier to see why you would invest in learning to create more complex configurations.

Pretty much every piece of software I've written in the past decade that tends to have configs in production can also work with little or no config. Including clustered software that just exploits things that are widely available like Zeroconf to get a decent low effort cluster up and running. No, you probably won't be able to (or want to) use those features in production, but that's beside the point. The point is to lower the thresholds. And then keep aiming at lowering them wherever you can because asking other people to care about your software is seriously uncool. Other people will never care as much as you do.

It is normal for programmers to become defensive about software. Be it their own or software they really like. But it is far more productive to assume that when users think something is awkward, perhaps it is because it is awkward. And perhaps it could have been done better.

Nobody actually gives a crap what someone thinks Kubernetes was built for -- and what kind of rubbish experience they think is deeply justified by the goals.

It is either needlessly awkward to use or it isn't. And guess what: most people think it is awkward. And I seriously doubt it needs to be this awkward.


It still comes back to right tool for the job. There's a perception in the market that k8s is the right tool for running all compute. As someone recently told me "I've been interviewing cloud people for 2 weeks and all I can find is people that want to run k8s all day."

k8s is not the right tool for every job. Most companies are not at the scale where they need to worry about the problems that it's trying to solve. But it's a cargo cult - they see the blog posts about how moving to k8s solves a bunch of problems that come up as you scale and decide they need to be solving all of those problems also even though there are simpler solutions at their current scale.

There's a bunch of other platforms out there that are way more opinionated and less "awkward" but they don't have the buzz that k8s has.


Actually, I don't see it as being that much about scale, but more about moving complexity out of applications and about solving robustness challenges. Kubernetes offers a lot of things that are useful even at smaller scales. Which is why I'm not sure I think "but it's for scale" is a valid excuse.

Kubernetes ought at least to be the right tool for a wider range of compute task. And I think it could have been the right tool for a wider range of compute tasks.


When my system is running happily on a single VM with no real robustness issues then I don't really need to solve those robustness issues, and certainly not by bringing in a complex distributed platform to run it on. Or when my lack of robustness isn't costing me enough to make it worth it to spend the engineering cycles to adopt that platform. One way or another I have to be at a scale where it actually makes sense to make this investment vs just doing the simple/easy thing that's good enough for where I'm at.


There is a lot of ground between "runs on a single VM" to "runs on thousands of instances". In fact I would think most companies that deliver some online service fit into that category.

For instance, I don't regard one of our products that runs in three availability zones and has 3-4 instances per AZ as being "large scale". It is still a small system. And it doesn't run in multiple AZs for performance reasons but because we really need high availability.

We embedded discovery, failover and automatic cluster management in the server software itself. But it isn't really how we'd like to do it. But it is still less of a hassle than running K8S. (It also means that we can do that if you license our software to run it on-prem on pretty much most runtime environments, and that has its value, but again, this isn't functionality you want or should have to do yourself)


Agreed it's mostly undifferentiated heavy lifting still, and agreed it /should/ be easier. It previously took my team something like a year to get our infra all autoscaling - something I've found other teams aren't as willing to invest in if they're just running on a handful of instances.

At ~12 instances probably still in "pets aren't so bad" territory.


Most companies I've seen end up building some kind of framework around Kubernetes to make the developer experience tolerable. The threshold for getting started and getting a basic deployment up and running is way too high.

Of course, there's a huge cost to building your own framework as well... And it's easy to get wrong.

I started https://www.jetpack.io/ recently to try and build a better solution on top of Kubernetes. If you're interested in checking it out and giving us early feedback, you can signup on the website, or email us at `demo at jetpack.io`


Thanks, I will have a look!


My favorite part of this is that now K8S by itself isn't enough. People are laying "service mesh" and other things on top of it. And, yeah, I get the sales pitch for it, but at some point, it's too much.


The amount of benefits you receive is less than if you worked for the same benefits sans kubernetes. So it definitely saves a ton of work, but your latter point still stands that most of the time the benefits aren't always as... beneficial to some stages of development/scale so it's actually just adding work for the sake of it.


It does save work, consider what you would have to do to provide those benefits without a tool. You would have to monitor usage and spin up new instances manually for scalability. You would have to manually update instances when updates came out. You would have to manually handle failover and stand by instances for availability. Manually configure load balancers. etc.

Kubernetes is a great tool if you have scaling management problems.


That's an interesting second-order effect right? Before Kubernetes et al, it was so much work or so complex that few companies implemented those things. Kubernetes saves so much work, that now everyone is doing it and it feels like more work.

"Things" here mean different things for different people. For me, it's secrets/certs, better rolling deploys and infra as code.


Yeah, it is a premature optimization issue. Everyone is starting to implement scaling solutions before they have scaling problems.


The thread here full of people incorrecting each other about how to use Kubernetes seems pretty solid evidence for your thesis "it adds work".


> Kubernetes doesn't save work, it adds work. It saves, but for certain heavy scenarios like High Availability, autoscaling etc. I hope, one day we can see Kubernetes simplified, that will fit 95% of SOHO users.


It also by design locks you in one of the cloud providers in a quite deceptive way. There is no singular interface for networking, so you cannot really port your infrastructure from cloud to bare metal for instance. Bare metal also was heavily underdeveloped, so setting it up was fragile and fiddly and quite limited.


This is exactly what the "automation destroys jobs" argument gets wrong. When aggregate output becomes cheaper, aggregate demand increases.


What is this "something simpler"?

I have to manage like half a dozen docker images. K8s seems like a massive overkill, but managing by hand is rather error-prone.


Just use AWS ECS Fargate - probably much easier to manage/setup than your own k8s cluster or even ECS k8s route. Azure/GCP may provide something similar. I think any Google product is always needlessly complex that only solves their own use case and probably many other companies need not blindly adopt those unless absolutely necessary.


> It was designed for scalability and availability more than anything.

yet its really not great at the first, and I'm strongly suspicious about the latter claim. Its very simple to make boneheaded decisions about networking that makes thing fragile.

In terms of scale, you have a limit of 5k nodes. given how fucking chatty it is, it costs a lot to run loads of nodes.


Agreed. Kube makes the scaling out easier not the bootstrapping. An important difference.


I mean, obviously it was designed to reduce human work. A tool that adds work wouldn't be very useful, would it?


> A tool that adds work wouldn't be very useful, would it?

Sure could, if it provides benefits that outweight the additional work. Insurance companies will happily pay a couple more IT-specialists if it reduces the amount of cases they have to cover by an arbitrary percentage. Tools very much can serve other purposes than reducing the friction of human attention.

If you can containerize your infra and thereby mitigate threat vectors that may put your entire enterprise at risk if exploited, that's a good business call to make, even if the cost of your IT-department gets inflated by some margin.


A tool or technology that adds work is extremely useful.

More work means

- more employees needed

- more direct reports for managers

- more jobs for x technology

- more people learning x technology because of those jobs

- more StackOverflow questions for x

- more companies using x technology because it's "easy to hire for"

- more money to be made in teaching x

- more learning material for x


Seen before :-)

https://www-users.cs.york.ac.uk/susan/joke/cpp.htm

Warning to bashers...it's a fake interview...


I think the selling point here is getting k8s' benefits using other tools would result in more work than k8s


It reduces work, but imposes a high setup and operating cost. Once you get to the right scale then all the economics look correct. Large teams and organizations and absorb that cost more easily than many who are attempting to use k8s. Of course, being open source, anyone can try k8s, but the mileage is going to vary.


The key question is what are you comparing to k8s in terms of complexity?

Does it add work compared to setting up a VM with Docker and Watchtower? - for sure..

But does it add work compared to setting up something that gives you all the same benefits of k8s without using k8s? - imho definitely not.


> all the same benefits without using Kubernetes

That's the catch, if running your app on a manually setup VM is equivalent to running your app on Kubernetes then you don't understand what Kuberentes is or provide.


Today, we see someone who believes that just because a tool's goal is to reduce complexity, that's what it accomplished


> Have you ever tried setting up a Kubernetes cluster and deploying apps in it? Kubernetes doesn't save work

This is wrong, deploying on Kubernetes is easy and quick for most apps, you have one docker image one deployment spec and that's it.

https://kubernetes.io/docs/concepts/workloads/controllers/de...


And a service configuration, ingress, other networking, persistent volumes, a mechanism for updating deployed applications, management of the nodes (even with a managed service like EKS or other cloud solutions), logging, roles, security, etc. If you're a developer and all you have to worry about is one deployment spec, thank your devops team for making your life easier. Kubernetes is great for making the dev team's life easier, but someone did a lot of work to make it that easy for you.


Couldn't agree with this more. At my last company there was a fair bit of time put into making the deployment process for our microservices as simple as added a new YAML file to the deployment repo. That file pulled a custom chart, and as a dev you just needed to configure a few variables to get everything up an running. But if you were deploying something that couldn't use one of the pre-configured charts it was a bit more work, especially if you've never done it be hand before.

Probably 98% of the devs were blissfully unaware of that complexity that the charts abstracted, and it let them focus on the services they were writing. I wasn't one of them, and always made sure to thank the devops team for simplifying the day to day deployments whenever I had to deal with writing a custom one.


You can now do something similar with Bunnyshell.com

It Handles the devops jobs for dev teams.

Full disclosure, I work for Bunnyshell.


In the absence of any description of what “something similar” or “handles the devops jobs” actually means, this comes across as spam, not informative.


Sorry about that, I should have been more informative.

Bunnyshell makes it easy to create and manage environments. (EaaS - environments as a service)

You connect your k8s cluster(s) and git accounts/repos, it reads the docker-compose files and creates deployments on the cluster.

You don’t need to know or write Kubernetes manifests, those are created for you.

You also get auto updates and ephemeral/preview environments (when a PR is created against the branch of your env, Bunnyshell deploys a new env with the proposed changes).

You are not restricted to creating resources only on the cluster, you can use Terraform for any resource that is external to the cluster ( like S3 buckets, RDS instances, anything Terraform can handle).

Hope this helps,


THIS. In my homelab, I spent roughly a day cutting services over from Docker-Compose to Kubernetes. That day included writing the Helm templates for everything, bootstrapping three bare metal nodes with a hypervisor (Proxmox), clustering said hypervisor for HA on its own isolated network, making images and then installing K3OS onto six VMs across the three nodes (3+3 control plane/worker), installing and configuring persistent distributed storage (Longhorn) with backups to a ZFS pool as an NFS target (Debian VM configured via Packer + Ansible), configuring MetalLB, and smoke testing everything.

A day's work for one person to accomplish that isn't bad, IMO, but what that doesn't capture is the literal weeks I spent poring over documentation, trying things, running tests, learning what didn't work (Rook + Ceph is a nightmare), and so on. I went so far the day before the cutover as to recreate my homelab in Digital Ocean and run through the entire installation process.

Having services that magically work is hard. Having a golden path so you can create a new one with a few clicks is even harder.


This is still easier with Kubernetes than with other tools. Installing e.g. fluentd and Kibana is again a configuration / chart which you apply to the cluster. For monitoring and visualization again you have an operator / chart that you can apply to the cluster. Yes there is a lot of complexity and learning involved but overall I was still pretty amazed at how quickly I was able to get a cluster up to speed with the available tools.


Still need to maintain the cluster though, and boy does it require some fun maintenance. Kubernetes is more than just a distributed docker image scheduler, you also need to install and maintain basic dependencies like the pod networking implementation (a whole virtual IP space!!), DNS, loadbalancing, persistent volumes, monitoring, etc etc. Maybe your cloud provider sets it all up for you, but you're not going to escape having to debug and fix issues with that stuff, whether they be show stopping bugs or performance-impacting inconveniences like higher latency.


  > the pod networking implementation (a whole virtual IP space!!)
That part, at least, can be made simple: https://john-millikin.com/stateless-kubernetes-overlay-netwo...

  > DNS, loadbalancing, persistent volumes, monitoring, etc etc
None of that is part of Kubernetes, and you'll need it (or not) regardless of how you choose to handle process scheduling.

There's a sort of common idea that a "Kubernetes cluster" is an entire self-contained PaaS, and that (for example) monitoring in Kubernetes is somehow fundamentally different from what came before. It's easy to fall into the trap of creating an internal clone of Heroku, but Kubernetes itself doesn't require you to do so and it can be a lot faster to just run Nagios (etc).


> None of that is part of Kubernetes

Well, load balancing is part of Kubernetes (except that it does not provide a sane implementation), and tbh having to debug strange failures caused by seemingly-innocent Service configuration is my least-favorite part of Kubernetes.

I agree with you on other points.


DNS too


You'd weep if you see an actual deployment yaml. What documentation provides is something minimal, and that's very long for something minimal.

Also, self-healing can create interesting problems which are fun to trace, debug and understand.


"Before you begin, make sure your Kubernetes cluster is up and running."


"And if it's not, please roll-back your systems and install it from scratch."


I'm expecting you are getting downvoted because of your assumptive and dismissive statement. The type of deployment you are suggesting is overkill to apply k8s as there is minimal to no orchestration.


Also because they are just outright missing the point of the post they were replying to. Yes, deploying thing on kubernetes is pretty simple. Deploying kubernetes itself is definitely not very simple.


People are confused, you should most likely not opperate k8s yourself if you don't know or does not have a team for it, nowdays managed k8s are easy to use and operate, with 3 clicks you get a cluster up and running with sane defaults.


Not to sound trite but I feel like your continued pressing of this issue evokes the entire "Do you even lift, bro?" meme. There are PaaS that can get you up in 3 clicks, but now you are in bed with k8s which is critical for the infrastructure. I think it's quite necessary to definitely understand the inside-and-out of such a vital component for my infra.


Either you’re so good with k8s and are clueless about how rare that expertise is, or more likely, you’re so clueless of how complex and fickle k8s can be if configured wrong but you’re not even aware that you don’t know.

I’ve seen actually decent engineers (maybe they’re not decent?) bring down prod because they accidentally Kubectl deploy’ed from their command line.


Because many many companies herd pets using kubernets.

The number of single-server setups with kubernetes thrown in for added complexity and buzzwords I’ve found is way too dang high.


Single Server kubernetes on a managed platform like Digital Ocean is a lot like having a managed server but you are more flexible with separating your services. I no longer just run imagemagick by shelling out of my ruby process in rails I use an external service that does just that. Which I can scale very easily. I can add an image for chrome with chrome driver instead of trying to build that in my dockerfile


> "If this old way of doing things is so error-prone, and it's easier to use declarative solutions like Kubernetes, why does the solution seem to need sooo much work that the role of DevOps seems to dominate IT related job boards? Shouldn't Kubernetes reduce the workload and need less men power?"

Because we're living in the stone age of DevOps. Feedback cycles take ages, languages are not typed and error prone, pipelines cannot be tested locally, and the field is evolving rapidly like FE javascript did for many years. Also I have a suspicion that the mindset of the average DevOps person has some resistance to actually using code, instead of yaml monstrosities.

There is light at the tunnel though:

- Pulumi (Terraform but with Code)

- dagger.io (modern CI/CD pipelines)

Or maybe the future is something like ReplIt, where you don't have to care about any of that stuff (AWS Lambdas suck btw).


I agree with this 100%. We're in the infancy of DevOps.

Ironically, "DevOps" started as a philosophy that developers should be able to do operations, e.g. deploy, monitor their apps without relying on external people (previously called Sys Admins, etc). Yet, we're at a stage where the "DevOps" role has become the most prevalent one. IMO things have temporarily gotten slightly worse to get much better later.

From the productivity standpoint, it is not acceptable that a Machine Learning engineer or a Full Stack Developer are expected to know Kubernetes. Or that they need to interact with a Kubernetes person/team. It is an obstacle for them to produce value.

Kubernetes is not THE solution. It's just an intermediate step. IMO, in the long run there'll be very few people actually working with technologies like Kubernetes. They'll be building other, simpler tooling on top of it, to be used by developers.

You already named few examples. I can name few more:

  - Railway.app
  - Vercel
  - Render
  - fly.io
  - probably many more under way


> From the productivity standpoint, it is not acceptable that a Machine Learning engineer or a Full Stack Developer are expected to know Kubernetes. Or that they need to interact with a Kubernetes person/team.

I agree - these things should be abstracted from the developer - thats the goal of SRE/platform engineering - DevOps is [supposed to be] as you said, a philosophical and cultural stance around early productionization. While not mutually exclusive, they're not the same thing.

But back to your point re: orchestration-level concerns being foisted upon devs - at a shop of any size, there will be devs who feel they _need_ to touch kubernetes to get their job done (wrongly, IMHO) as well as devs who want nothing to do with it - so without engineering leadership throwing their support heavily behind a specific approach, its hard for a small team to deliver value.


dagger.io: "Developed in the open by the creators of Docker"

Hard pass.


Kubernetes can really help bringing more scalability.

All you need is to rewrite your application (think microservices), reduce cold latency (get rid of anything VM based such as Java, or rewrite in Spring or Quarkus), use asynchronous RPC, and decouple compute and storage.

Then you need an elastic platform, for instance Kubernetes, with all the glue around such as Istio, and Prometheus, and Fluentd, and Grafana, Jaeger, Harbor, Jenkins, maybe Vault and Spinnaker.

Then you can finally have your production finely elastic, which 90% of companies do not need. Microservices are less performant, costlier, and harder to develop than n-tiers applications and monoliths, and way harder to debug. They're just better at handling surges and fast scaling.

If what you want is:

- automated, predictable deployments

- stateless, declarative workloads

- something easy to scale

Then Docker Compose and Terraform is all you need.

If you also need orchestration and containers are your goal, then first try Docker Swarm. If you need to orchestrate various loads and containers are a mean and not a goal, then try Nomad.

Finally, if you will need most resources Kubernetes has to offer (kubectl api-resources), then yes, opt for it. Few companies actually have a need for the whole package, yet they have to support its full operational cost.

Most companies just pile up layers, then add yet a few more (Java VMs on top of containers on top of an orchestrator on top of x86 VMs on top of(...)), and barely notice the miserable efficiency of the whole stack. Well it's using Kubernetes, it's now "modernized".


It honestly sounds like you've confused Kubernetes with some other platform.

Running a big Java monolith with 128GiB RAM footprint in Kubernetes works well. It's at its best when deployments are infrequent and traffic patterns are stable (or at least predictable).

If someone wants stateless microservices with immediate scale-up/scale-down, then that's more like a FaaS ("functions as a service") and they'll be better off with OpenFaaS instead of Kubernetes.


The true happy path: you have a big monolith hungry for resources, several instances of the monolith, and you want to be able to relatively quickly spin up new instances (modulo provisioning new nodes) without having to jump through 100 hoops.

For all people talk about autoscaling and whatnot, just hitting some buttons/sending a couple commands manually and getting some temporary scaling for reason X (or a temporary container for reason Y) without messing with a bunch of admin consoles is very nice.


You can do this type of analysis for most software, since building on existing solutions allows us to write powerful tools with less code. Listing out the existing solutions that allow developers to write less code doesn't necessarily mean the new solution is bad.

Every time I read a post like this about Kubernetes, I scratch my head. It takes me maybe half a day to deploy a CI/CD pipeline pushing into a new Kubernetes cluster with persistent DB's, configuration management, auto-renewing SSL certs and autoscaling API/web servers per environment. I'm by no means an expert, but I've been running 10+ sites this way for various clients over the past five years, with almost zero headache and downtime.

When I compare this solution to the mishmash of previous technologies I used prior to Kubernetes, it clearly comes out on top (and I use/d Terraform religiously). Setting up automatic server provisioning, rolling updates, rollbacks, auto-scaling, continuous deployment, SSL, load balancing, configuration management, etc... requires an incredible amount of work. Kubernetes either provides most of these out of the box, or makes them trivial to implement.

The only way I understand this argument is if you're building an extremely simple application. The nice thing about simple applications is that you can build them using any technology you want, because they're simple. Despite this, I often Kubernetes anyways, because it's _so simple_ to take a Helm chart and update the image name.


What is this? An argument for actual decisions based on merit? That's not the cargo cult i signed up for!


> reduce cold latency

You don't even really need to do this, as you can tell k8s how to check if the pod is healthy, what a reasonable timeframe for becoming healthy is, etc. I've got some services which can take up to 10 seconds before they're actually ready to serve workloads and k8s can scale those services up and down without too much issue. Its definitely nice to reduce cold latency, but I wouldn't say you need to do it.


There's no problem with running a java monolith in k8s. Elasticity isn't the only benefit to using k8s. And you should probably be running Jenkins and Prometheus even if your whole infra was launched with `docker run`.


> If what you want is... automated, predictable deployments... then Docker Compose and Terraform is all you need.

Can you elaborate on the deployment story here a bit?


From my experience, Kubernetes drastically reduces the number of DevOps people required. My current place has a team of 5, compared to a similarly sized, vmware-centric place I worked at a decade ago with a team of 14.

But DevOps means many things because it's not clearly defined, which also makes it difficult to hire for. It's a "jack-of-all-trades" role that people somehow fell into and decided to do instead of more traditional software engineering.

Also, from what I've experienced from our internship program, CS programs are really bad at covering these fundamentals. Students aren't learning such basics as version control, ci/cd, cloud platforms, linux, etc.


> From my experience, Kubernetes drastically reduces the number of DevOps people required. My current place has a team of 5, compared to a similarly sized, vmware-centric place I worked at a decade ago with a team of 14.

Is that K8s, or that you've outsourced hosting, storage and stateful storage to cloud services?

I suspect that things are much easier to automate effectively, and the knowledge for automating things is much more common.


> Also, from what I've experienced from our internship program, CS programs are really bad at covering these fundamentals. Students aren't learning such basics as version control, ci/cd, cloud platforms, linux, etc.

That's because those things are not part of Computer Science, or so I'm told. I got a degree in Software Engineering and regret nothing.


> CS programs are really bad at covering these fundamentals. Students aren't learning such basics as version control, ci/cd, cloud platforms, linux, etc.

Good. Those aren't 'fundamentals' of 'CS'.


Someone put it nicely when they said Kubernetes is like an operating system for containers. If you take linux as an analogy, it's clearly a non-trivial investment to learn linux and learn enough to be effective and efficient in it. Further time perhaps needed to achieve the productivity, functionality and performance of what you were used to on Mac or Windows.

Kubernetes definitely achieves this goal well, and in a relatively portable way. But just like any other engineering decision, you should evaluate the trade offs of learning a completely new OS just to get a simple web site up, versus running a nginx instance with bunch of cgi scripts.


Kubernetes is super linux.


DevOps is a philosophy, not a job role. It's the idea that developers deploy and operate their own code. An SRE is often someone who helps make that happen, by building the tools necessary for developers to operate their own code.

In a small organization, you can get away with a sysadmin running a Kubernetes cluster to enable that. In a larger org you'll need SREs as well as Operations Engineers to build and maintain the tools you need to enable the engineers.


This an underrated comment right here. I think the entire industry is confused -- but they could have just read this comment.


Kubernetes is raw material, like concrete and lumber. It needs to be massaged/crafted/assembled into something that fits the use case. A 'devops' engineer would leverage Kube to build a system, the same way a builder/contractor would leverage raw materials, subcontractors, off the shelf components, etc to build a home or office.


By far this is the best explanation, thank you.

Just like there are a plethora of programming stacks, there exists a ton of choices in implementing a software supply chain. Kubernetes is valuable to infrastructure engineers to use it to create these systems in a maintainable and reliable way.


Few reasons :

- Kubernetes is very complex to setup

- It is not needed for many use cases

- It is (hopefully) not the defacto and standard for devops

- Load Balancing is already a solved problem way before Kubernetes. For many use cases, you don't need the complexity. Even things like Self Healing are kinda solved by AWS Auto Scaling for example.

- NOt every use case needs Kubernetes and its additional overhead/complexity

- Most importantly, devops is not "one size fits all" magic wand that Kubernetes or any other tool can solve. Various nuances to consider and hence you need DevOps as a role.


K8S is not easy.

  It helps standardize:
    - deployments of containers
    - health checks
    - cron jobs
    - load balancing
What is the "old way" of doing things?

Is it same/similar across teams within and outside your organization.

If not, what would it cost to build consensus and shared understanding?

How would you build this consensus outside your organization?

For small organizations, one should do whatever makes them productive.

However, as soon as you need to standardize across teams and projects, you can either build your own standards and tooling or use something like K8S.

  Once you have K8S, the extensibility feature kicks in to address issues such as:
   - Encrypted comms between pods
   - Rotating short lived certificates
I don't love K8S.

However, if not K8S then, what alternative should we consider to build consensus and a shared understanding?


As one person with kubernetes I can build and operate quite a big platform more secure and better than ever before.

Our current platform is much more stable, has more features, and bigger than what the prev team did.

There are plenty of things you can't see like security or backup or scalability.

Backup were done app by app basis. Now you can do snapshots in k8s.

Security still is a mess. But now you can at least isolate stuff.

Scalability meant installing your application x times manually and configuring load balancer etc. Now you set it up per cluster.

Additional features you get with k8s: Auto scaling, high availability, health checks, self healing, standardization.

A lot of things got invented which lead to k8s like container or yaml.

Now with the operator pattern you can also replace admin and embed operational knowledge I to code.

Infrastructure was not ready to be controlled by code like this ever before.


It sure does reduce the amount of work. But there's a lot that remains or just gets shifted to another area/technology. Who's setting up the image build pipeline? Who's handling the scaling and capacity planning? Who's planning how the deployments actually happen? Who's setting up the system for monitoring everything? And tonnes of other things...

Kubernetes helps with: networking setup, consistent deployments, task distribution. That's about it. It's more standardised than plain VMs, but you still have to deal with the other 90% of work.


It depends. In some scenarios, it might actually increase the amount of work in comparison with other technologies. K8s introduces way too much complexities, especially for small organizations. On top of K8s, there are helm charts to add more layers. And along each layer, there might be foot guns, even experienced developers cannot easily get it right in the first place.


Because software engineers can’t help but make things more complicated than they need to be.


The easiest answer to your post is that you are looking at evidence which doesn't necessarily mean what you think it means.

If k8s is as amazing and time saving as you would imagine, you'd expect many companies to want to adopt it, so you'd expect there to be lots of job postings!

It's like saying "if computers are such time savers, why do so many companies hire people that have knowledge in computers". It's because this is a good tool that companies want to hire people with knowledge in that tool!


Simple, because they before the company had 2-3 beefy servers running some binaries and it handled all the load without problems.

Now because of new possibilities, and new development they want to switch to Kubernetes to have that new possibilities everyone is talking about, and now you have to build many new containers, configure k8s, autoscalling etc... and developers don't know it (yet) and don't have time to learn it.

So lets hire a DevOps (me) that will do it ;-)


There's a shift away from programs that run on actual computers to software that runs on clusters, a large conceptual computer. Whether ultimately Kubernetes-the-software is the answer or not I don't know, but I don't think we're going back to installing packages on individual machines, as the benefits of the conceptual large computer are too great and the most logical way to solve challenges with scale and availability.

A lot of what is called DevOps goes into adapting software to this new mindset. A lot of that software is not written with best practices in mind, and likewise lots of tools are still in their infancy and have rough edges. I think it's fair to say some time and resources go into learning new ways of doing things, and it might not be the best choice for everybody at this stage to spend those resources unless there's an obvious need.


As far as I can tell from doing years of contracting for non-FANG mid to large size companies -- they basically do their best to copy whatever trends they see coming out of FANG. There is no thought behind it.

Kubernetes and 'DEVOPS' are the new hotness at non-FANG companies as they are always 5-10 years behind the trends. Expect to see more of it before it goes out of fashion.

Also DevOps is just a title. Nobody read the book, nobody is trying to create what the original guy at Google or whatever had in mind. It is just a all encompassing job doing the same activities that the sysadmin used to do. HR tells companies that they should rename their sysadmin departments to DevOps departments and everything else continues as normal.


Starting from "if Kubernetes is the solution," you aren't going to be able to get to the answer, because:

1. Kubernetes isn't the solution 2. Kubernetes is expensive and extremely maintenance prone 3. Most of the companies I've seen switch to Kube I've seen switch away afterwards

Every time I've seen someone bring up Kubernetes as a solution, everyone at the table with first hand experience has immediately said no, loudly

Remember, there was a time at which someone wouldn't have been laughed out of the room for suggesting Meteor stack, and right now people are taking GraphQL seriously

Kube doesn't make sense until you have hundreds of servers, and devops makes sense at server #2


Like cloud APIs, K8s replaces a ton of by-hand work that formerly made up the profession of systems administration (among others). I see my career progressions from sysadmin to devops as being a pretty natural development of "automate your (old) job away". So today with cloud and k8s, I can be as productive as ten of my old selves fifteen years ago. Back then, it would have been almost unimaginable for a company like the one I'm with to be able to thrive and grow during its first 18-24 months with only a single "IT" staff who can still maintain a great work/life balance.

TL; DR - K8s and cloud let me do the work of ten of my old selves.


Kubernetes and Angular are how Google burns resources to prevent any significant competition or innovation being created outside of their sphere of influence. Engineers who wank at sophistication usually receive the most attention and decisive power, btw that's what keeps me from participating in interviews. Another reason is that IT engineers are such a low plankton that someone up in the hierarchy with massive indirect financial bonus for using this technology decides to hire "Angular Developers", "Kubernetes DevOps consultants" and this triggers the hiring down to HR and recruiters who simply filter by keywords.


This is like asking: if writing code in a high level language is the solution, then why are there so many software engineering jobs?


DevOps is not only k8s. Think about DevOps as managing the whole infrastructure: setup ci/cd pipelines, implementing infra-as-a-code, managing ML pipelines, implementing security policies ... DevOps is very wide


A few thoughts...

1) Kubernetes is an infra platform for the ops in DevOps. If developers need to spend a lot of time doing Kubernetes it takes away from their ability/time to do their dev. So, there are a lot of platform teams who pull together tools to simplify the experience for devs or DevOps specialists who handle operating the workloads.

2) Kuberentes is, as Kelsey Hightower puts it, a platform to build platforms. You need DevOps/SREs to do that.

3) Kubernetes is hard. The API is huge and it's complex. The docs are limited.


I've said it before, and I'll say it again: Kubernetes is hard, huge, complex because it solves hard, huge, complex problems. Or tries to anyway.


The corollary of course is that if you don't currently have hard, huge, complex scalability problems, well, you do now...


Yup. If the org in every aspect is not ready for scale, stick to simpler solutions.


  > 3) Kubernetes is hard. The API is huge and it's complex.
  > The docs are limited.
Eh, I wouldn't go that far. Kubernetes has a lot of API surface for features that are useless (config maps), attractive nuisances (CRDs, packet routing), or outright dangerous (secrets). If you strip it down to its core mission of distributed process scheduling, the API is reasonable and the docs are extensive.

The biggest challenge with learning Kubernetes is that third-party documentation is largely funded by ecosystem startups flogging their weird product. It can be very difficult to figure out things like network overlays when there's so few examples that don't involve third-party tools.


I'm not sure how config maps are "useless". It seems to be a pretty important element of the platform in general.


I don't think there are "so many devops" jobs. Everywhere I've worked in the last 15 years, the number of people managing everything from hardware up to developer tools and CI/CD was tiny compared with the number of developers. Some start-ups dont bother with those people at all to begin with and regret it later. Then they hire a tiny team after years of neglecting these areas, and then expect the "wings on the plane to be swapped out during flight". Those devops people are casually expected to be experts in process (incident/problem management), cloud infra / infra as code, db config / replication, networking, security (IAM, SSO, network, OS), release / deployment, monitoring / metrics / alerting / tracing (not just deploying them but working with devs to implement observability in their code), dev tooling (code/artifact repos, every brand of CICD pipelines & runners) ... basically anything that isn't software development. They're also expected to be oncall for other teams in many companies.

Many years ago I worked at a large, old tech company where all of these areas had dedicated teams.

PS: how many people at your company "really" know Kubernetes inside out? And if it misbehaves, who do you expect to have the answer?


I think that part of the problem in general tech is that many developers don't understand they're being marketed to. I really have no opinion on whether or not K8s is a smart choice, and will save you time and effort, require more effort but provide benefits, or is a bad trade-off. But the crazy push for k8s as the one size fits all solution for everything that you get from some corners of the webs smells like hype cycle.


There's an argument to be made in general (which I don't think applies here - but playing devil's advocate) that says:

When technology makes previous difficult things easy, this technology will be used to do more things that were previously impossible.

I personally haven't seen anyone use k8s to achieve things that were impossible before. They just use it because 1) They think everyone is using it 2) They don't know how to do it any other way


Container adoption by workload is still pretty low. Most workloads are difficult to containerize because they're commercial off the shelf software (cots), Windows based, etc. You need good people who can make the best of these situations and automate what they can with configuration management, image bakeries, CI/CD pipelines, infra as code, and reverse engineering or bending that legacy app to run in a container.


For the past 250 years, new machines have replaced old ways of working to make us more productive. But each new machine is more complicated than the last, requiring more technical jobs to address the new complexity in the machine. Kubernetes is just a new machine, and because so many businesses now want to run that machine, they need more maintenance crews that are trained on said machine. In order to have fewer people, we'd need to leverage economies of scale and specialization so that companies don't need these new big complex machines. Even then, the appearance of fewer workers might just be jobs moved offshore where labor is cheaper.

It's true that moving away from mutable state is a sea change that is (very) slowly engulfing the industry. But the tide is still very far out. The cloud is a pile of mutable state, leading to the same system instability and maintenance headaches that existed before the cloud, requiring new skills and workers to deal with. Redesigning the cloud to be immutable will take decades of re-engineering by vendors, and even when it's done, we'll still need economies of scale to reduce headcount.


I find the meaning of DevOps confusing.

Originally I thought it was a methodology assisted by a set of tools to make it easier for devs and ops (sys admins) to work together, mostly by giving both sides the same environment, usually in the form of docker containers.

Admins configure servers, or server pools since physical machines tend to be abstracted these days, set up applications, including the CI/CD stuff, make sure everything is secure, up to date and in working order, etc... Devs write the code, and test it on the platform supplied by the admins.

And now I see all these DevOps jobs. I mean what does it means? You are dev or you are ops, so what does DevOps means? It doesn't mean both, DevOps usually don't write software, so DevOps is ops, maybe we could call them container administrators, like we have database administrators.

I think that the confusion between the DevOps methodology and the DevOps job title give the wrong idea. Someone needs to make all these servers work, calling it serverless just means there is another abstraction layer, and while abstraction has some benefits, it rarely lessens the workload, but it may change job titles, here sysadmin -> DevOps.


Because everyone and their dog insists on using way over complicated micro service architectures.

Wages does not count as cost of goods sold, so who cares about a couple of extra hires? Funding is easy.

Also you severely underestimate the amount of work that goes under DevOps. Everything from build servers to test and security infrastructure is usually handled by DevOps. It's a massive surface area and it would be way worse without kubernetes.


Well, if you did DevOps the way it is meant to be used, the idea is that the developers can do the minimal efford of the ops part, and you no longer need that role itself. So, depending on what a company means by DevOps, it could mean developers willing to that bit extra, and clearly we need ever more developers, or they could understand it as just a modern kind of ops, in which case they are NOT doing DevOps. Kubernetes has its complexities, but it certainly doesn't require more people than previous methods. But what it does require, is people with new skills, and lots of companies want to move away from their old fields to this new, and therefore neeed to fill those roles, while people only begin to reeducate. Add to that, that more companies are doing more IT in more kinds of business segments, and in many countries the bigger generations are leaving for pension, with less people coming after. DevOps, true or not, is hardly the only field with a lack of enough educated people. You will find the same in lots of engineering fields.


I think that K8s _could_ make the process of maintaining CI/build/deployment tasks much more hands-off and automated. But in practice we often use this new "power" as an opportunity to invent more fancy/smart things to do, taking advantage of it to the point where it requires as much if not more maintenance as whatever came before.


The perception of a "need for so much dev ops nowadays" stems, in my view, from a skills mismatch in the market.

Division of labor and specialization are two natural results of any rapidly evolving industry. The tech industry is no exception. The problem is that there are currently comparatively quite a lot of ways to learn the skills necessary to become a competent web developer, backend engineer, data scientist, etc. Compared to these titles, the ways to learn the skills involved in designing, operating, and maintaining scalable cloud infrastructure have not kept up with market demand.

Kubernetes is not "the" solution, but it is one of several solutions to the problem of standardizing a trade skill for the purposes of making it transferrable. Nobody wants to go to work at a place where the skills they need to do their job well are both difficult to acquire and completely useless once they get a new job.


> If Kubernetes is the solution...

Yeah, you lost me already. This is a bit like asking why there are other languages besides Java.


Let me answer your question with another question: if Microsoft Word is so good, why are there so many people whose job it is to produce documents?

Now, I’m not saying k8s is as transformative as the word processor, but if it was, you’d probably expect to see more ops people, not fewer. They’d just be doing different things.


Kubernetes is, amongst other things, a technical solution to the billing problem. Composable resources with unit costs permit organizations to charge profitably for such resources. There are also some advantages for organizations who need to purchase such resources, giving some clarity to the process.


It is a people problem. In a typical enterprise with multiple teams involved in every small thing, it is expected to see teams having their agenda and justifying their place/time. Kubernetes is an elegant platform to do just that, because it is so extensible.

As part of my job, I see thousands of small and medium businesses happily embracing kubernetes. They do not have separate devops or security or infra folks. A few engineers do it all. Yes, there are challenges with so many moving parts and rapid releases with breaking APIs. But expect that to be fixed in coming years.

Why so many DevOps jobs? Are you noticing the jobs being eliminated, or are you just seeing more DevOps jobs being created?


If you're referring to the days of when systems engineers would run named fleets, then yeah, those days were nightmarish but for different reasons. Instead of debugging the Linux networking stack, I had to defer everything out to a team that ran a help desk to try to find time to investigate because I was locked out of access and tooling. I don't miss those days one bit, though, it was fun to name components of my fleet.

A lot of companies moved to the cloud because their old data centers (or hosts) were driven by ticket systems instead of API's and access management. What took three weeks was now solved in seconds; the beginning was fascinating without a doubt. Then companies realized they owned none of this virtualized infrastructure and were at the behest of a very large corporation who could make sweeping changes with little to no notice. Although Kubernetes was pitched as providing extra grease to the gears of the enterprise, and they weren't wrong, that is not its total value to the enterprise.

The real value in Kubernetes running your own platform on someone elses hardware, especially to the degree where you can eventually free yourself from cloud provider lock-in that the above incurred. An example, if a company can spin up a team to create a database-as-a-service in Kubernetes clusters, then your RDS costs can shift dramatically down, and it develops a new level of capability and understanding that your company never had before.

I'm a SRE-SE, but I mostly use the title "Distributed Systems Software Engineer" because I feel that really fits what I do. DevOps is just a catch-all title for non-application-software tasks and roles at this point because it consumed so many things like "release manager", "QA", "application operations", etc... Personally, I do not trust companies or teams that use this as some sort of distinguishable title.

To answer the last part of your question, "Why are DevOps everywhere", because companies have diverse needs in terms of supporting software and software development, and DevOps is basically the catch all email of software engineering.


My glib response is that automation is a lot of work.


> one could argue that the role of sys admins just got more specialized

Read the introduction to the SRE book, available free online [1] - and you'll see that SRE is defined _in contrast to_ systems administration. Its specifically defined as software engineering with the goal of managing operational complexity.

Modern shops' failure to understand this (most SREs haven't read any of the book, let alone stopped to think what SRE actually means) is IMHO a primary factor in the failure of most "devops transformations"

[1] https://sre.google/sre-book/part-I-introduction/


Kubernetes isn't a solution. It just pushes the problem down the line. Fundamental problem is most software developed these days aren't packaged properly. The devs just ship the code to production. The hacks that they used to get the development going stays in production.

For example, you don't need a docker container to deploy Mysql, although you can deploy Mysql inside one. But most development processors are so badly managed that one product has many conflicting libraries and dependencies. Eventual requiring each component to be isolated within its own container. Finally leading to an unmanageable number of containers requiring Kubernetes to manage the mess.


From what I've seen, most developers either aren't systems thinkers or are too busy to take a step back and spot and eliminate redundancy. The best way I can explain this is that many software processes and pipelines within companies are usually complex Directed Acyclic Graphs and very often Transitive Reduction [1] is not applied to these processes.

At the end of transitive reduction, you end up with a graph with all redundant dependencies removed but functionally, it is still the same graph.

[1] https://en.wikipedia.org/wiki/Transitive_reduction


Because it solves 5 problems while creating 4 more in the most complicated way possible. Seriously. It makes some things better, no question, but it is very complicated for some (most) people and is very time-consuming.


I think it comes down to jevon's paradox--if your demand is unbounded, making production more efficient doesnt reduce inputs, it makes you produce more. in the case of k8s, more reliable, more scalable, etc


To put it simply, anything that increases efficiency most likely increases the desire to scale. There is a ton of demand because everyone wants to scale to billions of users. Kubernetes is one way to get there until the next thing comes along.

Also, there is more demand than supply. Everyone wants to do Kubernetes and DevOps pipelines but the amount of folks experienced in those fields is small compared with demand.

It requires knowledge in many domains because it abstracts the entire data center. So you can’t just take a mid level sysadmin or developer and expect them to jump right in.


I can't speak for everyone, but Kubernetes lets my company do more with the same amount of manpower, so we use that manpower to do more stuff rather than reduce our SRE/sysadmin footprint.


Kubernetes is a bad approximation to the infrastructure solution at a place that has different problems than you have. It only complicates and makes everything worse and more expensive to maintain.


Something else to consider: what % of server workloads actually run on kubernetes?

I have no data to back this up, but my hypothesis is that if you zoom out, and look across the entire industry, the % is vanishingly small. It may seem like every company is running or adopting kubernetes within our bubbles but our perspective is biased.

(Note: I'm not espousing an opinion on kubernetes itself, just about it's total adoption across the entire industry and how that effects the number of devops/sysadmin/SRE roles.


My short hot take on DevOps and infrastructure as code: "Infrastructure as code has it backwards"

-------

Take the development of programming as an analogy:

* Punched cards

* Programming in assembler

* Goto

* Callable procedures

* Proper functions

* Compiled languages (There used to be companies just selling a big C compiler)

* Interpreters/JIT compilation/...

* ...

-------

And here's a similar progression:

* Servers in your basement

* Some server rented where you login via SSH

* Docker/Kubernetes/Clusters in the cloud

* Lambdas and other serverless solutions

* ...

As a sibling comment pointed out: We're still in the stone ages. Somewhere between punch cards and proper functions.

-------

To rephrase it in reversal: "Infrastructure as code has it backwards"

Right now, we manually partition our code, then provision infrastructure and then push our code to this infrastructure.

Instead, we should take a high level language that compiles to the cloud:

Just write your business logic and the the compiler figures out what clusters/services/event-buses/databases/etc to use; It will automatically partition the code, package, build, provision, push, update. And there's even room for something like JIT: Based on the load parts of your logic get, the compiler could switch databases. Also: Automated data migrations based on your code updates. But I guess we'll end up with a big distributed virtual machine that scales infinitely and completely hides the existence of servers.

There's already some glimpses of this future: No-code, the magic pulumi does with lambdas, several language projects that get rid of the file system and just store the AST in a DB, smart contracts where you pay for single computation steps...

-------

But back to the question: Kubernetes/AWS/etc is a lot of work because it's not really THE SOLUTION.


I believe a real glimpse of the future you're describing can be seen with AWS CDK's L2 or higher constructs: https://github.com/aws/aws-cdk

We already have the tools to have a cloud-scale application dynamically redefining and re-allocating its infrastructure. Give those tools some time to mature, and I'm sure the capabilities will be awesome.

A good example is the common database backend. A lot of groups have some RDS instance provisioned someway to somehow serve some frontend. Over time, the number of users grow and the RDS needs more resources provisioned for it. My last 4 jobs, it was always someone's job to check the provisioning or respond to an alert and bump the provisioning up to whatever the next rung was. In the not too distant future, this type of task will be handled by a mildly intelligent CDK application.


That sounds a lot like the original Google App Engine. They'd provide a lot of power, but you had to restrict yourself to its design choices to use it. Then it could 100% manage your deployment.

I don't see people wanting to give up control to get a platform like that any time soon.


The problems weren’t simplified. The problems were collected together into a single large platform.

However, as with most large platforms, they require ceremonies and priests (devops engineers). Someone has to make the offerings.

Much as people would like to believe, you don’t reduce complexity, you just shuffle it around and there’s an exchange rate. Even with solutions like Fly.io, you’re not getting rid of complexity in aggregate, you’re paying them to manage it (I.e. the exchange rate).


The DevOps jobs are to configure and maintain Kubernetes. The problem is K8S is a general-purpose solution that's being used for managing an application comprised of container images. It's way overcomplicated for that task, but that's already been noted here in this thread. I know of proprietary solutions that greatly simplify DevOps compared to K8S, but they're proprietary.


Because they're also doing more things. 20 years ago you might have a single server in the broom closet handled by the sysadmin and developers running tests locally (if you were lucky), nowadays we want all those things you mentioned for production, and CI/CD for developing.

I'd wager providing all those things 20 years ago without k8s and CI tools would've required relatively more sysadmins


Kubernetes solves a subset of your usual deployment problems and replaces it with a set of its own. I'd call it a tradeoff, but it's such a leaky abstraction that unless your Kubernetes fu is really strong it's mostly going to make your life harder. It's a nice keyword to have in your CV though. Most jobs that "require" it don't actually use it.


The answer to your question is this: _complexity does not go away, we just move complexity to another layer_.

Kubernetes buys the org some things but it is complex and you have to know how to write the app in a certain _way_ in order of the app to be "scalable, etc.".

There are no free meals or as someone smarter than me said long time ago "there is no royal road to learning".


When compared to a Linode VPS box that you provision and setup with Ansible yes, it's much more work (and much more cryptic at that) but also Kubernetes covers for a lot of failure scenarios that a simple Linux box would not be able to cope with while adding many other benefits.

The question is: do _you_ need this added complexity? That humble VPS can scale _a lot_ too.


DevOps has its days numbered. What you see is the wide adoption of DevOps in every industry. This trend is likely to plateau and decline in the next couple of years, after which DevOps practices are taken for granted and become rather an expectation from customers. The DevOps problem only needs to be solved once, public cloud providers are almost there.


Companies that say they’re doing DevOps are like countries with ‘Democratic’ in their name.


If you use Kubernetes, you need custom operators and controllers in order to a have feature rich environment that can support your applications and support all CI/CD instrumentation.

Then, for designing, implementing and maintaining all these extra elements is why you need a devops guy. Also not mentioning how extremely fast things are moving in the cloud era.


Indeed a lot of things in k8s are deprecated every 6 months, and in fact your super solid future proof stack is completely useless after a year. It's like building above quicksand.


Clouds have multiple conflicts of interest in favor of:

1. Dethroning sysadmins introducing devops in the middle ("devs" capable of deploying in the cloud but unable to control the OS).

2. Increase CPU and other resource consumption (promoting heavy frameworks, unable to pass the Doherty threshold in 2022).

For clouds, increasing complexity and costs almost always expands the business.


The cloud providers haven't closed the DevOps loop yet is why. I mostly have experience with Google's stuff, so I will take Cloud Build as an example. It provides the framework of CI/CD, but there isn't automatic build+deploy for every software and framework ecosystem.

What I'm trying to do at work is simplify the build ecosystem for all languages to the familiar `configure ; make ; make test ; make install` sequence that works well for OSS. If every ecosystem fit into that metaphor then the loop could be closed pretty effectively by any cloud provider by letting users add repositories to the CI/CD framework and it would do e.g. a standard docker build (configure, make, test), docker push (make install 1/2), and kubectl rollout to k8s at the end (remainder of make install).

Blockers:

Liveness and readiness checks are not automatic; they need to be part of each language*framework so that developers don't have to implement them by hand. At Google they just sort of came with the default HTTPServer class in your language of choice, with callbacks or promises you knew you had to invoke/complete when the instance was ready to serve. It helped that only 4 languages were officially supported.

Integration tests have no standard format and many deployments are combinations of artifacts built from multiple repositories, and configuration is not standardized (configmaps or ENVs? Both? External source of feature flags?) so all integration tests are manual work.

Metrics and SLOs are manual work; only humans can decide what actual properties of the system are meaningful to measure for the overall health of the system beyond simply readiness/liveness checks. Without key metrics automatic rollouts are fragile. This also means autoscaling isn't truly automatic; you need quality load metrics to scale properly. Not all services are CPU or RAM limited, and sometimes the limit varies depending on traffic.

All that said, cloud functions (Google, AWS, or other versions) are beyond DevOps. If you don't need high-QPS services then use cloud functions. They bypass 90% of the headaches of having code running on https endpoints. Most people don't have high-QPS (10K requests per second per shard) services, and could probably get away with cloud functions (1000 RPS on GCP). Everyone else pays the DevOps or hopefully the SRE tax for now. But we're still trying to automate ourselves out of a job; we don't want to be doing DevOps day-to-day either.


Kubernetes just solves the long-standing problems at a certain level of infra.

But like everything else in tech, solution of the problems at a given level enables everyone to do much more and build more complex systems that do more complicated stuff on top of that level. We always push the frontier forward with every new solution.


In short - Kubernetes solves a lot of very complex problems. The problems are complex enough that the solutions are also complex and require specialized knowledge to implement well. Most teams using Kubernetes probably shouldn't be, but tech companies like to over-optimize for future scale.


This is the same thing when people say going to the cloud is not easier.

It’s not.

You still need Devops staff.

Cloud just provisions the hardware and OS. You still have to be responsible for the apps. You still have to be responsible for IO, memory, cpu and networking capacity.

You still need to make sure your apps are able to run on cloud - whether metal or k8s.


Kubernetes is not the solution to completely automating ops so that you don’t have to employ anyone

The solution to that is PaaS (Platform as a Service), and you can start a startup with almost no devops knowledge using things like Heroku and it’s myriad competitors from startups to AWS offerings.


Kubernetes and co do not reduce the amount of work - it’s just shifted to the next abstraction layer. Before when DevOps meant „you build it - you run it“ we removed dedicated ops teams to which the code was thrown over the fence and to reduce the animosity and friction between dev and ops. This was great but now all the hips companies have dedicated ops teams only that they are now called „platform teams“. Instead of code artefacts it is now containers that are thrown over the fence and now the ops part became so complex that separating dev and ops again seems reasonable. Luckily for me I managed to keep the good old DevOps of working, developing code and running it or bare metal servers with FreeBSD and Jails - even converting an existing Kubernetes setup back to bare metal. In my opinion the platformisation of the internet infrastructure isn’t a desirable state, monocultures are too and for the vast majority of projects kubernetes is overkill as they won’t ever reach the scale that would justify a kubernetes setup. It‘s like the milage fear for EV cars - but I guess everyone wants to hit facebook or google scale and that desire misinforms the early infrastructure architecture. That is just my 40 years old grey beard view which you can happily ignore whilst flying amongst the clouds :)


It’s the recognition on the part of companies that cloud providers don’t provide a turnkey solution.


Because K8s is the very end of a long road, and even when that is done and setup, cloud eng work, shifts to CI/CD, data eng, significant networking maintenance, and IAM/account wrangling will keep the devops'ers employed. SRE is a golden goose job IMO


> The platform of choice is mostly Kubernetes these days

Is it? These days I see SAM or Serverless Framework or other FaaS solutions all around me and it seems that everyone is migrating away from ECS/EKS/containers, it might be my own particular bubble though.


You're looking at only infrastructure costs, but not at benefits. Being able to autonomously deploy an application in production increases your team's velocity by orders of magnitude, e.g. faster time-to-market, faster feedback loops, etc.


DevOps is basically the "tool smith" from Mythical Man Month's surgical team isomorphism. Any sufficiently large (>10) team of engineers will benefit immensely from a specialist focused on improving internal developer efficiency.


Even though Kubernetes could reduce the workload and might require less manpower in some cases, it's still a beast that requires management. So DevOps has shifted from managing traditional infrastructure to managing Kubernetes configurations.


DevOps are supposed to be part of the software development team. NOT a separate department. That’s the difference between SysAdmins and DevOps. It’s in the name! Developers (on a team that run) Operations (of the teams products). DevOps.


I think it was originally pushed as a way to get more people to use cloud platforms. And who better than Google to host that which they created?

Luckily its from the functionally less evil google days and open source so it is possible to use anywhere.


Kubernetes is just heavily overegineered and overmarketed thing. Let’s face the truth.


It's like saying that you won't need human workers because you'll have robots doing the work. Aha, sure, but who is going to program those robots?...



Because it primarily wasn't built for developers. It was built to keep sys admins relevant and give vendors a common place to sell their vaporware.


While I don't know who Kubernetes was built for, it's certainly developers who are pushing for Kubernetes.

The majority of SREs and sys admins I know don't really want to run Kubernetes. They'll do it, if that's what's called for, but it just adds complexity to trivial problems.

Developers want Kubernetes because it's quick to deploy containers, networks, load balancers, anything they want. That's fair, traditional hosting doesn't have much that provides that level of flexibility or that will allow you to get off the ground at that speed.

The issue is that as a platform Kubernetes isn't that easy to manage. It certain has improve, but it can be difficult to debug when it break. Nobody wants a 3AM page that a Kubernetes cluster is down. We've even seen companies say that they'll just stand up a new cluster if something breaks, because it's faster. Add to that the complexity of components deployed inside your cluster. As an SRE it absolutely suck to get the "hey the network's broken" from a developer, because that means that you now have to be an expert on WHATEVER load balancer or CNI they've decides to pull in.

As great as much of the stuff you can deploy in Kubernetes is, it's still harder to debug than an application running on a VM or bare metal server, with a "real" network.


  > Developers want Kubernetes because it's quick to deploy containers,
  > networks, load balancers, anything they want.
  > [...]
  >
  > As an SRE it absolutely suck to get the "hey the network's broken"
  > from a developer, because that means that you now have to be an
  > expert on WHATEVER load balancer or CNI they've decides to pull in.
This is why the "DevOps" approach exists.

If a developer wants root in the cluster so they can deploy some strange low-level plugin, then they need to be the ones carrying the pager.


I keep wondering when the systemd folks will come up with an orchestration layer over systemd-nspawn/systemd-machined to replace Kubernetes.


I'd love to see a movement where more engineers write tooling in-house to solve technical problems. Not adapting existing and promoted ones.


I've seen a general blurring of the lines between these roles. But a common theme is that if you have a dedicated "role" for something, they will prefer tools which cater to their "role". This is both a good thing for companies who benefit from further optimization within that "role", and a bad thing for companies who do not.

Kubernetes is a powerful tool for "DevOps" roles. It provides an immense array of configuration, and largely replaces many OpenStack, Xen, or VMWare type environments. You can build powerful internal workflows on it to colocate many services on one compute fleet while maintaining developer velocity - which can translate to large margin improvements for some firms. This comes at a cost that you are likely to need a Kubernetes team, and potentially a dev tooling team to make it all work. On a large compute environment, the latter costs don't effect the big picture.

Now on the other hand, more teams than you would expect are just fine on Heroku/AppEngine/AppRunner/Lambda. These teams tend to pay the cost of not having a dedicated dev tooling team through more expensive compute, and sub-optimal tooling. The benefit here though is that "more expensive compute" may mean a fraction of a salary in many environments, and "sub-optimal" tooling may mean a production grade SaaS offering that has a few rough edges you talk to the vendor about.

IME it's much cheaper/lower risk to choose the latter in the long-run. The apparent savings from option 1 eventually turn into tech debt as the shiny tools get old, and migrating to newer/cheaper compute options becomes more expensive. I once built a colo facility which resulted in a 4x reduction in monthly recurring expenses (including salaries) for the same compute footprint, 1 year into the lifetime of the facility the former cloud provider reduced prices by ~30%. Around 6 months into the facility the DataScience team suffered attrition, resulting in fewer compute needs. At the 1.5 year mark the team begged for a flip to SSDs as they were having latency issues (a point of initial contention with the team that SSDs should have been used in the first place). Over the 3 year expected lifespan of the facility there were about ~2.5 months of ramp up/migration work which impacted ROI.

Overall, in hindsight, I'd say at best we achieved a 1.5x reduction in compute expenses compared to the alternative of tooling improvements, cloud cost reductions, and compute optimization. I now seek the tool which provides the lowest friction abstraction as at the worst case I can simply migrate to something cheaper - investing in compute infra has a crazy level of depreciation.


Here's my thought on the current state of the industry. DevOps at some point was not a specialty that you hired for, it was a way of thinking about your team's responsibility. Your team would make an application and your team would run that in production. If you wanted to test things before deploying, you would do that. If you wanted automated deploys, you would set that up. No middleman with competing concerns between you and your users.

Eventually, people had a hard time finding well-rounded individuals that could design, develop, test, and deploy software. It seems to be a rare skillset, and people are resigned to not being able to hire for that kind of role. So, all of these ancillary concerns got split off into separate teams. You have a design team, a software engineering team, a test engineering team, operations, and so on. DevOps changed from "developers that operate their software" to "developer operations", which is just your 1990s operations team with a new name. You the developer want something, it goes on a backlog for some other team, you wait 6-8 years, you get your thing.

All the complexity of the devops world comes from having one team writing the software and one team running the software. An example are service meshes. They are super popular right now, and everyone and their mother is writing one and selling it for tens of thousands of dollars per year. To the software engineer, having two applications communicate over TLS is pretty simple; you read the certificates and keys from disk or an environment variable, throw them into a tls.Config, and give that tls.Config to your internal servers and internal clients. But, what happens in the real world is that the organization says something like "all apps must use mTLS by January 2023". The software team says "meh, we don't care, we'll get to it when we get to it". So the poor devops team is stuck figuring out some way to make it work. The end result is a Kubernetes admission controller that injects sidecars into every deployment, which provision TLS keys from a central server at application startup time. The sidecars then adjust iptables rules so that all outgoing connections from the original application go through the proxy, and if some distributed policy says that the connection is supposed to be mTLS, it makes that happen. Basically, because nobody on the dev team was willing to spend 15 minutes learning how to make this all work, it got bolted on by $100k worth of consultants, all for a worse result than just typing in a very small number of lines of code by yourself. That's the state of devops. The people writing the software won't run it, so you have to add complexity to get what the organization wants.

I think it's terrible, but that's the fundamental disconnect. When you need to change how software works without being able to edit the code, the workarounds get increasingly complicated.

As always, what looks like a software complexity problem is actually an organizational complexity problem. The way I've managed this in the past is to organize a lot of knowledge sharing, and make a conscious effort to avoid hiring too many specialists. At my current job my team used to make a SaaS product, and our team was a combination of backend software engineers, frontend software engineers, and some engineers with experience with operations. We were one team; frontend engineers would write Go code, backend engineers would make React changes, and we all did operational drills ("game days") once a week. The result was a very well-rounded team. Everyone could deploy to production. Everyone could be on call. Everyone could fix problems outside of their official area of expertise. I wouldn't have it any other way. The industry, however, deeply disagrees with that approach. So you're going to have testing teams, devops teams, etc.


Why aren’t we working 3 day weeks when we have all this automation power today? Like all things in life the bar just gets raised


If x is the solution, why are there so many <x related> jobs?

Economics says that as something becomes cheaper, demand increases.


Because coding in Yamllang and Jsonlang is superior to old and archaic languages like Rust and Golang.


Wait a minute, but Rust and Golang are new!


"To Kubernetes! The cause of - and solution to - all of life's problems!"


Scale, we are using so much more IT infra now. Old sysadmin ways don’t scale well.


Yes, it is the solution for keeping DevOps employed and happily compensated.


So, I am old enough that when I started my career I was just a "system administrator" who happened (rather luckily) to work primarily with BSD and Linux servers. At that time, I was still learning a lot. I eventually learned enough and gained enough experience to become a "systems engineer" which meant that I could architect solutions for customers of my employer. I then became a senior systems engineer. Throughout this entire time things like Chef, Puppet, ansible, and Salt were not widely used even after they were created. Red Hat pushed ansible really really hard once it came out, and config management became a thing. The combination of config management systems with containers created two new roles: DevOps, SRE. Servers became VMs, which in turn became container platforms. Config managers took the place of version control and a bash script. CI/CD became weirder. In times past, you would have something like HAproxy on FreeBSD, which would then send traffic to Apache/Nginx servers, which in turn sent traffic to PHP servers, which called data from database servers and an NFS cluster. Now, behind the scenes, you may still have HAproxy or other load balancers, but those are combined with something similar to OpenStack with an underlying storage system like Ceph. All of that may get partnered with geo-aware DNS if you're really fancy. Systems engineers and admins are still managing that stuff behind the scenes at Azure, AWS, Google, RackSpace, Cloudflare, DigitalOcean, and other places (or at least I imagine so). There are also engineers who specialize in OpenStack. Most, however, have transitioned to the new roles of DevOps or SRE, because the need for highly skilled SEs and SAs has waned.

Essentially, these roles have narrowed the focus of system administrators and systems engineers. In one, you are concerned with CI/CD, and in the other you are making and maintaining cloudy solutions for people. This is yet another layer of abstraction for people, but it also means that most people do not know how to configure underlying software anymore. Because they lack knowledge of how to configure underlying software, they also require automation frameworks. They now do not know how to automate their workflows with Bash, Ruby, Python, or anything else. They need the cloud system to do it for them, which means that they get very vendor locked.

EDIT: the plus side of a new abstraction layer is cheaper tech departments at non-tech companies (fewer and cheaper personnel); which also means that pretty much everyone wants to be a software developer now, and very few people want to be SAs, SEs, DOEs, or SREs; you have to know everybit as much but you get paid much less.

All of this may bust. Increasingly, more and more people are becoming wary of monopolistic tech giants. The cost of their datacenters on the planet is increasingly rapidly. The governments of the world are growing wary of their increasing power. For businesses, complete reliance on a third party who has vastly more power isn't as palatable as it used to be. We may see a resurgence of smaller DCs and bare metal deployment, but any such change would only happen if another massive tech bust occurs. The reality that I see is that we may see both models live in tandem indefinitely, as there are differing use-cases that make either more suitable.


Because Kubernetes automates away 4 jobs by creating the need for 5.


btw. kubernetes is just a scheduler. you give kubernetes a definition and it will schedule the things according to your definition. everything else is basically just an addon.


Kubernetes is insanely complex and modular. Just yesterday I was looking at the source code and the code part I knew was replaced by yet another pluggable system. Instead of consolidating into a well-understoof set of features, Kubernetes is exploding with complexity, so it's almost impossible to "build it yourself" for a production environment.

However, there are plenty of companies that will sell you a system, including varying levels of support. You then, of course, have to hire your own DevOps engineers that will deal with the areas the support doesn't cover, which, given the complexity, is still an awful lot. Or you do everything in-house, which means hiring even more people.

TL;DR DevOps engineers won't be out of the job anytime soon. Same for Kubernetes developers.


Checkout jetpack.io they are trying to solve exactly that


Because kubernetes is both the problem and the solution.


because it's an overengineered hype that does not reduce complexity, only shovels it around, turning simple problems into obscure ones


It isn't the solution :)


Tl;dr: Kubernetes is not "the platform of choice. There is no universal tool. That's why you need system architects, DevOps, etc.


instead of 3-4 devops guys, you now only need 1-2 really good kubernetes guy.


DevOps isn't a job. DevOps is a system to work with people directly and find out what they need and give them things that enable them to get their job done faster, while also getting enough information to make sure the product stays online and reliable. What people call "a DevOps role" today is just sysadmin or sysop or syseng or SRE.

Back in the day we cobbled together solutions out of different parts because it gave us a strategic advantage over monolithic commercial solutions. It was cheaper, but it was also easy to customize and fit to product & user needs. Yes configuration management was a nightmare, and it came back from the dead as Terraform, because instead of an OS with mutable state we now have a Cloud with mutable state. Docker and Packer and a few other solutions have fixed a lot of the mutable state issues, but databases are still flawed and SaaS is still just a mucky mess of unversioned mutable state and nonstandard uncomposeable poorly documented APIs.

With Kubernetes, we're back in the land of commercial monolithic products. Technically you can build it yourself and customize it all, but it's expensive and time consuming and difficult because of how many components there are tied together. It "gives you everything you need" the way the International Space Station does. Do you need a space station, or a barn?

People get so wrapped up in terminology. Declarative doesn't mean anything other than "not procedural"; it's not better than procedural, it's just different. Plenty of declarative things are a tire fire. Infrastructure as Code just means "there is some code that is committed to git that can manage my servers". A shell script calling AWS CLI is IaC. Doesn't make it a good solution.

You can't just install a piece of software and be done. That's the entire point of the DevOps movement, really. It's not about what you use, it's all about how you use it. Work with humans to figure out what will work for your specific situation. Use your brain. Don't just install some software because it's trendy and hope it will fix all your problems.


>DevOps isn't a job.

And Agile isn't Scrum, but once again a buzzword became the catalyst for a change that the buzzword isn't even "supposed to" represent.

It's our fault for never learning our lesson about buzzwords.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: