Ask HN: If Kubernetes is the solution, why are there so many DevOps jobs?

jmillikin · on June 1, 2022

  >   1) internal users: mainly developers by providing CI/CD
  >   2) external users: end users
  >
  > Nowadays we call people that do 1) DevOps and people that do
  > 2) SREs (so one could argue that the role of sys admins just
  > got more specialized).

Both are called sysadmins.

SRE is a specialized software engineering role -- you'd hire SREs if you wanted to create something like Kubernetes in-house, or do extensive customization of an existing solution. If you hire an SRE to do sysadmin work, they'll be bored and you'll be drastically overpaying.

DevOps is the idea that there shouldn't be separate "dev" and "ops" organizations, but instead that operational load of running in-house software should be borne primarily by the developers of that software. DevOps can be considered in the same category as Scrum or Agile, a way of organizing the distribution and prioritization of tasks between members of an engineering org.

---

With this in mind, the question could be reframed as: if projects such as Kubernetes are changing the nature of sysadmin work, why has that caused more sysadmin jobs to exist?

I think a general answer is that it's reduced the cost associated with running distributed software, so there are more niches where hiring someone to babysit a few hundred VMs is profitable compared to a team of mainframe operators.

etruong42 · on June 1, 2022

> so there are more niches where hiring someone to babysit a few hundred VMs is profitable

This makes a lot of sense. The same thing happened in the past with new technology, such as the electronic spreadsheet:

"since 1980, right around the time the electronic spreadsheet came out, 400,000 bookkeeping and accounting clerk jobs have gone away. But 600,000 accounting jobs have been added."

Episode 606: Spreadsheets!, May 17, 2017, Planet Money

onlyrealcuzzo · on June 1, 2022

In 1980 - there were 90M employees in the US. Now there's 151M.

Given that the US has transitioned out of manufacturing and into businesses services - I don't think much of this is explained by technology creating new jobs.

I think it's just explained by the workforce growing - and the shift in the US's role in the global economy.

chii · on June 2, 2022

workforce grows as more is goods and services are produced.

So the real question is - whether the ratio of different roles changed vs total population.

deeptote · on June 1, 2022

Basically, "why technological innovation creates and transforms jobs, instead of causing unemployment".

gruez · on June 1, 2022

...except that 400k in 1980 translates to 608k in 2017, if you factor in overall labor force size. That means even though there were 600k jobs "created", it's still a net loss.

[1] https://fred.stlouisfed.org/series/CLF16OV

jvanderbot · on June 1, 2022

I think the point of the comment in discussion is:

"Though the spreadsheet was supposed to make clerks obsolete, it in fact just upgraded their job requirements to allow the operation of spreadsheets"

And probably their salaries too.

This is more in line with the discussion related to OP, as well, as devops employees exist, even though apparently it is automated.

mkoubaa · on June 1, 2022

This is a weak argument, since it ignores the productivity gains in all the industries that necessitated more accountants. Accountants don't only do each other's taxes, so more productivity in accountants is an indicator for a more productive economy

aitchnyu · on June 2, 2022

ATMs caused a decrease in teller transactions, but led to an explosion of bank branches.

jupp0r · on June 1, 2022

SREs don't normally write Kubernetes alternatives. They are the people who operate/write automation that interacts with/advise teams how to run their software on/ Kubernetes to solve business problems like ensuring availability.

throwaway787544 · on June 1, 2022

> but instead that operational load of running in-house software should be borne primarily by the developers of that software

Go back and read a few DevOps books and blogs by the founders of it. We will always need separate disciplines for dev and ops, just like we need mechanical engineers and mechanics/racecar drivers. But we need them to work together and communicate better to solve problems better and not throw dead cats over walls.

You can of course give devs more powerful tools, more access and agency to enable them to develop the software better. Sentry.io is a great example of what is needed; makes everyone's life easier, devs can diagnose issues and fix bugs quickly without anyone in their way. That doesn't require operations work because it's just simplifying and speeding up the triage and debug and fix and test phases. It's the fundamental point of DevOps.

jmillikin · on June 1, 2022

My second job in large-scale software was at Google, which used the "DevOps model" since before DevOps was named. I have no need to read a blog on it.

You want the person who designs the car to know what a car is, and to be able to diagnose basic issues like "the fuel gauge says 'empty' and engine won't start". And there's no analogy to an Indy car driver in software, every distributed system is self-driving.

The most popular alternative to "DevOps" is a team of developers who do not run the software, and may not even have the skills or capabilities needed to boot up the server process. They do their development in an IDE, run unit tests to verify functionality, and do not have permission to log in to the production environment.

Meanwhile the "ops" side consists of people who may know basic shell scripting, or some Python if they're a go-getter, but are unable to read stack traces or diagnose performance excursions.

jmillikin · on June 1, 2022

throwaway787544 deleted their reply to this post. My response was as follows:

---

  > So you're familiar with Six Sigma then? Value stream mapping? TPS?
  > W.E. Deming? Martin Fowler? There's more to DevOps than deployments
  > and CI/CD.

None of those have any relationship to DevOps.

  > I haven't worked at Google, but I expect somebody gave you some tools
  > and some access to cloud infra and said "good luck".

I was on Borg-SRE. My job was to build the tools and maintain the cloud infra.

  > If you're lucky there was a small army of ops people behind the
  > scenes keeping things running for you

The point of having an SRE team is to avoid having a "small army" of ops people. Manual operation of compute infrastructure is uneconomical at scale.

  > if you weren't lucky you were expected to know how to correctly build
  > and run complex systems at scale by yourself

Google does expect its developers to understand basic principles of distributed systems, yes.

  > > every distributed system is self-driving
  >
  > I didn't know Google was in the bridge selling business.

What I mean is that when your service runs 24/7 and downtime is reported in the New York Times, there is no room for manual action during normal operation. The system must have the capacity to maintain itself and self-recover.

epivosism · on June 1, 2022

Responding to the inlined kubectl command mentioned above:

As an outsider, that command looks really easy to mess up.

  * shell interactions with quotes
  * double quotes
  * interpolating into image namespace with no autocomplete
  * easy to forget an argument
  * do you get autocomplete against the deployment name?

Comparison: C# declaration of a complex type - it's less complex than the `kubectl` command above, but IDEs offer way more support to get it right.

  * var x = new List<Dictionary<CustomType,int?>>()

This will light up warnings if you get anything wrong;

You get:

   * go to definition of `CustomType`
   * autocomplete on classnames
   * highlighting on type mismatches
   * warnings about unused vars.
   * if you initialize the var, the IDE will try hard to not let you do it wrong

So structurally,

1) in the code realm, for doing minor work we have really strong guarantees,

2) in deployment realm, even though it's likely going to hit customers harder, the guarantees are weaker.

I think this is behind the feeling that the k8s system is not ready yet.

vajrabum · on June 1, 2022

Normally you're never going to run a kubectl command in prod that changes anything except for kubectl apply -f. The yaml file may or may not be script or template generated but it's likely. In a lot of shops they're just going to write to a wrapper around the kubernetes API in whatever language the shop uses whether it be python, go or whatever. And there are plenty of linters for the yaml and ways to test without impacting prod.

volume · on June 2, 2022

> So you're familiar with Six Sigma then? Value stream mapping? TPS? > W.E. Deming? Martin Fowler? There's more to DevOps than deployments > and CI/CD.

>> None of those have any relationship to DevOps.

Deming. I think you mean maybe that Deming has less of a relationship to SRE (Google style). The DevOps Cafe guy wears a Deming t-shirt and talks about Deming all the time. Maybe he has denounced Deming for some reason?

Value stream mapping. This has been brought up in at least one talk at DevOps Days.

Six Sigma. There's some obvious overlaps if you look it up.

throwaway787544 · on June 2, 2022

Thank you. I didn't want to get into a whole "thing" because the parent just doesn't know what they're talking about and I can't teach them everything in a comment thread, but suffice to say that Gene Kim and Jez Humble would agree with me, and that anyone who wants to know more can look up books and blogs by them.

dijit · on June 1, 2022

> My second job in large-scale software was at Google, which used the "DevOps model" since before DevOps was named. I have no need to read a blog on it.

Cool, you probably know my old colleague Hugo, he was one of the first SRE's at Google in a team of like 20 or so. (I wasn't there, could be mistaken).

Anyway, what "Production" was, is different than what devops is. DevOps is different things to different people. In the beginning it was "What happens when systems administrators do Agile".

(Genuinely), Confusion came about because of the "100 deploys a day" talk and the conference being called "dev ops days" (to include developers).

The two things got conflated and now everyone thinks devops are.. well, build engineers or sysadmins or developers who learned a bit of bash and terraform.

volume · on June 2, 2022

> build engineers or sysadmins or developers who learned a bit of bash and terraform.

From what I have seen, it's also getting familiar with the inner workings and process of the company, as well as current and past runtime tendencies for performance and uptime.

dijit · on June 2, 2022

> it's also getting familiar with the inner workings and process of the company, as well as current and past runtime tendencies for performance and uptime.

That sounds like sysadmins.

volume · on June 2, 2022

let's say there's 5+ leading definitions of DevOps. I'll add this which I think is quite unpopular but true:

DevOps is how Engineering respects the work of sysadmins.

bostonsre · on June 1, 2022

> And there's no analogy to an Indy car driver in software, every distributed system is self-driving.

You typically build systems to sell a product. The driver is the customer that uses the app. You build the app to be as simple as possible to effectively operate while providing the best possible outcome for that customer. The driver doesn't care or need to know how the engine or the rest of the car works, just that the engine and car works well.

turkey99 · on June 2, 2022

Did Google really use the “DevOps model”? Observing from the outside it looked like SRE was their alternative

jupp0r · on June 1, 2022

The major point is that "mechanical sympathy" for how to operate software should be considered early in the development cycle. Fulfilling the business goals in operation should be a major design consideration that might warrant trade offs in other areas. Traditionally, this has been seen as more of an afterthought and DevOps solution to the problem is that involving the people who design and maintain software in its operation would automatically create incentives to make operational improvements because the people making decisions are the people feeling the consequences of those decisions.

tricky777 · on June 1, 2022

Generalization:

- each role in a company tries to optimize/nudge whole organization toward this role's convenience.

- specialization improves local optimum (advances certain role) at the cost of global optimum (everybody has to dance around new roles processes)

- joining seceral roles into one, creates the oposite result, optimum is searched at more global level (not necessarily found)

- Separation of responsibilities (aka creation of new role) can generate a fractal (e.g. tester of left winglets blue stipe's thickness meter)

- complete joining off roles will create homogenous chaos after n employees (everybody should do everything)

prediction: we will see constant experimentation, first roles will be split, then some will get joined. then split again. then joined again. (people cant search for local optimum and global optimum at the same time)

jupp0r · on June 2, 2022

I agree in general. Extreme example I've witnessed ~8 years ago: dev team is building linux server. Dev team is not allowed to touch production. Server is deployed via scp by ops team. Ops team is on the other side of the planet and generally pretty unknowledgeable. Dev team is deploying by creating a ticket and literally telling ops team what to type in terminal. Ops team fails at copy/pasting instructions. Deployments fail, dev team has to fix things by telling ops team what to type over the phone.

Lesson learned: kubernetes isn't that bad

throwaway787544 · on June 2, 2022

If you look at construction, they have a system that's very similar to what DevOps espouses. Lots of trades and specializations, because building to code is complicated and important, and having the experience of doing one job well is important to be able to work around problems on the fly. But - this I think is the critical part - they must know about the other trades that are affected by their work, so as not to impede others' work or the project by accident. It's as much about compassion for the guy coming after you as it is reducing cost and speeding up construction. All of that is the aim of DevOps.

osigurdson · on June 2, 2022

This is a very insightful comment.

jghn · on June 2, 2022

> DevOps is the idea that there shouldn't be separate "dev" and "ops" organizations,

I agree with this definition of DevOps. However the vast, vast, vaaaast majority of real life uses of the term "DevOps" I've seen are just rebranded sysadmins. Sometimes it at least implies a more engineering approach to their coding. But in these institutions the Devs and Ops are very much separate groups of people, unfortunately.

thewileyone · on June 2, 2022

Agreed. The DevOps that I'm familiar with is two job descriptions in one job, one headcount, one pay.

0xBDB · on June 4, 2022

I agree. I think the grandparent is maybe describing an ideal state, but in practice most "DevOps" and "SRE" positions I see go by on job boards are for either sysadmins who can do some automation and infrastructure as code, or possibly devs who know how cloud infrastructure and networking work, depending on your perspective.

If you're lucky the hiring manager might have at least read the Phoenix Project or the Google SRE books.

shawnz · on June 1, 2022

essentially Jevons paradox

normie3000 · on June 1, 2022

https://en.m.wikipedia.org/wiki/Jevons_paradox

binarymax · on June 1, 2022

Kubernetes is a Google scale solution. Lots of teams said “hey if Google does it then it must be good!”…but forgot that they didn’t have the scale. It caught on so much that for whatever reason it’s now the horrendous default. I’ve worked on at least 3 consulting projects that incorporated K8s and it slowed everything down and took way too much time, and we got nothing in return - because those projects only needed several instances, and not dozens or hundreds.

If you need less than 8 instances to do host your product, run far away anytime anyone mentions k8s

nokya · on June 1, 2022

Exactly.

I am consulting with a startup right now that chose to go everything docker/k8s. The CTO is half-shocked/half-depressed by the complexity of our architecture meetings, although he used to be a banking software architect in his previous assignments. Every question I ask ends up in a long 15 minutes monologue by the guy who architected all of it, even the most simple questions. They are soon launching a mobile app (only a mobile app and its corresponding API, not even a website) and they already have more than 60 containers running and talking to each other across three k8s clusters and half of them interact directly with third-parties outside.

Even as I am being paid by the hour, I really feel sad for both the CTO and the developers attending the meeting.

k8s is definitely not for everyone. Google has thousands hardware systems running the same hypervisor, same OS, same container engine and highly specialized stacks of micro-services that need to run by the thousands. And even, I am not sure that k8s would satisfy Google's actual needs tbh.

Ironically, there are some companies that highly benefit from this and they are not necessarily "large" companies. In my case, k8s and devops in general made my life infinitely easier for on-site trainings: those who come with a poorly configured or decade-old laptop can actually enjoy the labs at the same pace than every other attendee.

rahkiin · on June 1, 2022

60 containers sounds like an architecture problem, not a Kubernetes problem. Kubernetes does not stop you from running 1 container in 1 pod receiving ingress and talking to a database.

deathanatos · on June 1, 2022

Presuming too (it's hard to tell) that they mean 60 different types of containers. One of my clusters currently has ~311 containers, but that's mostly due to replication.

If I count actually different containers (like, unique PodSpecs, or so), that count drops to ≈30. Even that is "high", and from an architectural standpoint, it isn't really a number I'd use. E.g., we have a simple daemon, but it also has a cronjob associated with it. So it has "2" PodSpecs by that count. But architecturally I'd call it a single thing. How it implements itself, that's up to it.

A lot of our "unique PodSpec" count, too, comes from utility type things that do one thing, and do it well. Logging (which comes from our vendor) is 3 PodSpecs. Metrics is another 3. We have a network latency measurement (literally ping shoved into a container…): PodSpec. A thing that checks certs to ensure they don't expire: PodSpec. HTTP proxy (for SSRF avoidance): PodSpec. A tool that rotates out nodes so that their OSes can be patched: PodSpec. Let's Encrypt automation (a third party too): 3 PodSpecs … but hey, it does its job, and it's a third party tool, so what do I care, so long as it works the API between me and it suffices (and honestly, its logs are pretty good. When it has had problems, I've usually been able to discern why from the logs). DB backup. But most of these don't really add much conceptual overhead; any one is maybe tied (conceptually) to our applications, but not really to all the other utilities. (E.g., there isn't really coupling between, say, the cert renewer and the logging tooling.) A confused/new dev might need to have it explained to them what any of those given tools do, ofc., but … many of them you can just Google.

… in previous jobs where we didn't use Kubernetes, we mostly just ignored a lot of the tasks that these tools handle. E.g., reboot a VM for patches? It was a custom procedure, depending on VM, and what is running on that VM. You needed to understand that, determine what the implications were … etc. And the end result was that reboots just didn't happen. K8s abstracts that (in the form of PDBs, and readiness checks) and can thus automate it. (And ensure that new loads don't need TLC that … an app dev realistically isn't going to be given the time to give.)

If we needed a common thing on every node? That would be rough. We did finally get to having a common base VM image, but even them, all of the per-app VM images would need to be rebased on the newer one, and then all rolled out, and who/how would one even track that? And … in practice, it didn't happen.

nokya · on June 13, 2022

Thank you for your comment. It's a mix of both what you described: different types of containers + what I have flagged as "utility containers" (stuff they just installed because it serves a purpose very well).

The problem I see with this approach is that it has become very difficult to evaluate system-wide topics such as accessibility (or security, or performance) as we constantly deal with a very diverse technological stack and increasingly complex attack surface.

In my opinion, this makes finding competent people who can actually evaluate or assess work almost impossible, unless you hire a Lemming who will run some third-party scanner he found on GitHub: if the scanner doesn't say something is awful or critical, then almost everyone at the table is instantly convinced the system is perfectly robust.

I try to warn my clients by asking them if they think that a judge will be satisfied if they answer "we ran the scanner the other day and the scanner said it was all good" after a customer sues them for failing to comply with a disabilities act.

raffraffraff · on June 2, 2022

Yep, that's microservices gone mad

datavirtue · on June 2, 2022

It's the gateway drug

siva7 · on June 1, 2022

this is not the fault of kubernetes but of microservice architecture gone horribly wrong

Spivak · on June 1, 2022

Good god, there's a reason people worship monoliths. And here I think my company's app is over-engineered for using Sidekiq and Lambda for similar workloads.

KaiserPro · on June 1, 2022

> Kubernetes is a Google scale solution

The problem is that its _not_ a google scale solution. Its something that _looks_ like a google scale solution, but is like a movie set compared to the real thing.

for example: https://kubernetes.io/docs/setup/best-practices/cluster-larg...

no more than 5k nodes.

Its extra ordinarily chatty at that scale, which means it;ll cost on inter-vpc traffic. I also strongly suspect that the whole thing is fragile at that size.

Having run a 36k node cluster in 2014, I know that K8s is just not designed for high scale high turnover vaguely complicated job graphs.

I get the allure, but in practice K8s is designed for a specific usecase, and most people don't have that usecase.

for most people you will want either ECS(its good enough, so long as you work around its fucking stupid service scheme) or something similar.

015a · on June 2, 2022

Right; and I don't feel that's a knock against K8s; its just the trade-offs it decided to make. A true Google-scale solution would be far worse to use, you can be sure of that.

K8s is a "middle 80%" scale solution. Its not made to run Google (though Google uses it a ton internally). Its also not made for your average four person startup (though, if you've got that experience internally, its not a bad choice; its not Heroku, but its better than a lot of deployment options out there).

All I'd say is: I've worked in a "scale-up" B2B org under $1M in ARR. We were pretty monolithic; just a backend NodeJS app, and a frontend SSR React app, basic. Five engineers by the time I left. We used K8s (EKS+Fargate). Maybe 50 pods total, across two environments. It was fantastic. We never had to say No to any weird customer, product, or engineering decision which would be difficult in either more managed, or more legacy, systems. Customer wants a custom domain and'll pay $50k for it? Like five lines of YAML and update Route 53, done. Datadog sidecar container so we can ingest some APM traces? Ten lines of YAML copy-pasted from their docs, done. Update the cluster? Click one button in the AWS UI. Every developer wants their own staging environment? Ok, bit more work, but: create some namespaces, retool the CI a bit, we can deploy separate databases in there as well its only staging, actually pretty straightforward.

Half the stuff we did with k8s would have taken three times as long with more native AWS solutions, and some of it probably would have been impossible on something like Heroku. K8s strikes a balance. Its not the simplest thing in the world. I wouldn't grab it on day 1 of a startup's engineering journey. But I wouldn't knock a startup who does.

kuschku · on June 2, 2022

I'm using k8s even for my personal 2 node cluster. It's just so convenient to be able to use all the automation tools.

I can leave the cluster alone for weeks, and it'll take care of itself, my CI will build new docker containers, tooling will start rolling them out across the cluster, if deployments fail they get rolled back, and I get an email, etc.

At some point I was hands-off with the cluster for 6 months and everything kept itself up to date and running just fine.

singron · on June 2, 2022

Google's Borg clusters don't have wildly larger sizes than that despite having more years of development and a lot of motivation at that scale. They instead have a lot of clusters and transparent inter-cluster network topologies (i.e. you can choose clusters with very high bandwidths to each other).

The fundamental design just isn't infinitely scalable, and at a certain point, you might rather have some bulkheads/autonomy or regional diversity.

arunc · on June 2, 2022

Curious how you ran 36k node cluster. Did it involve elastic scaling, etc? What are the alternatives for k8s?

KaiserPro · on June 2, 2022

Home grown: https://www.semanticscholar.org/paper/Robust-large-scale-ren...

off the shelf: https://hradec.com/ebooks/CGI/RMS_1.0/rfm/User_Interface/Alf...

although that was with something like 6-10k nodes because there was an upper limit to how many dispatches alfred could do because it was single threaded, from the early 90s and not really designed to scale that high

https://renderman.pixar.com/tractor is probably what they use now, or https://www.opencue.io/

but any grid engine style dispatcher/manager will do what you want. It'll give you the primitives to manage wildly larger scale than k8s.

These clusters were on real steel, as elastic clusters were horrendously expensive, and the storage was/is nowhere near fast enough.

Nowadays, I'd use AWS batch, or at a push airflow.

russellendicott · on June 1, 2022

Yeah, I don't know if it's because ECS was my first container orchestration experience but every time I look at teams trying to do k8s on AWS I think how much easier ECS would be.

015a · on June 2, 2022

The complexity difference between bog-standard ECS+Fargate and EKS+Fargate deployments rounds down pretty small. Biggest I've seen: ALB integration, IAM integration, and maybe certificate management. Most of that stuff is out-of-box on ECS, but on EKS you need some extra containers or configuration to watch the K8s API and provision stuff for you (if you want to use it; you can also just go pure-k8s) (edit: just to be clear, they provide all this for you; its not out-of-box, but its easy-to-add-box e.g. [1]).

An argument could be made for something like CodeDeploy being better integrated on ECS, but that's more of a "k8s doesn't need CodeDeploy but ECS might" kind of thing. And even then, I wouldn't touch it.

An argument could also be made that upgrading ECS clusters is a bit easier, as the cluster itself, uh, doesn't have a "version". But on Fargate, its pretty painless on EKS, and Fargate ECS tasks do have a "platform version" that generally doesn't have to be worried about (version: LATEST), but is nonzero nonetheless.

Which is really to say that both ECS and EKS puke complexity, because its AWS, but the volume is pretty similar.

[1] https://docs.aws.amazon.com/eks/latest/userguide/aws-load-ba...

russellendicott · on June 2, 2022

Agreed. I had the displeasure of doing CodeDeploy to ECS Fargate a few weeks ago for a side project and IMO it was overly complex.

015a · on June 2, 2022

Yup. You can get pretty dang far with just "the task definition is in cloudformation, so update the image in cloudformation and submit the template". For a bit more complexity, add ALB routing weights in, and the case for CodeDeploy is kind of weak. My statement that ECS needs it was probably overzealous; for most situations it doesn't, and if there's some crazy functionality it does I'm not aware of, you'd probably need something else in k8s as well (e.g. istio).

throwaway7865 · on June 1, 2022

We’ve moved a small-scale business to Kubernetes and it made our lives much easier.

Anywhere I’ve worked business always prioritizes high availability and close to zero downtime. No one sees a random delivered feature. But if a node fails at night - everybody knows it. Clients first of all.

We’ve achieved it all almost out of the box with EKS. Setup with Fargate nodes was literally a one-liner of eksctl.

Multiple environments are separated with namespaces. Leader elections between replicas are also easy. Lens is a very simple to use k8s IDE.

If you know what you’re doing with Kubernetes (don’t use EC2 for nodes, they fail randomly), it’s a breeze.

AaronM · on June 1, 2022

We don't have an issue with that last point, lots of EC2 EKS nodes and they don't fail randomly. Were you using resources and limits correctly? EKS nodes can fall over randomly if you don't reserve resources on the nodes for system processes, and your workloads eat up all the resources. That's probably not well documented either.

Spivak · on June 1, 2022

EC2 instances are inherently unreliable and that's not a knock on them, that's exactly the contract that you get using them and you're supposed to plan your architecture around the fact that at any moment an EC2 instance could die. We lose about 2-3 EC2 nodes per day (not like our app stops, like Amazon's own instance health goes red) and we couldn't care less.

jamesfinlayson · on June 2, 2022

What percentage of EC2 nodes is that?

theK · on June 2, 2022

Empirically around 0.1%

raffraffraff · on June 2, 2022

Setting limits is important, but it always has been. Kubernetes nodes typically don't have a swap so without setting container limits, some critical process can OOM. With swap enabled, memory grows > pathological swapping ensures => caches get dropped making disk performance suck, and all the while your system is shuffling pages between memory and disk. So of course load hits 50+ and the machine turns into a 'black hole'. I've even seen a single VM do that, and cause so much disk IO that it took out the whole hypervisor (which had a single RAID volume)

jmillikin · on June 1, 2022

Kubernetes can't (currently) scale to Google sizes. It's designed for small- or medium-sized businesses, which might have 50,000 VMs or fewer.

There are entire SaaS industries that could fit into a single Google/Facebook/Amazon datacenter.

blue_cookeh · on June 1, 2022

> small- or medium-sized businesses, which might have 50,000 VMs or fewer

Holy shit, is this considered small to medium enterprise now?

dijit · on June 1, 2022

a single cluster supports approx 5,000 nodes, 110 pods per node~

The estimated maximum single cluster is 300,000 containers.

That's pretty medium, I've ran more than a million processes before, and nomad has 1million containers as its challenge https://www.hashicorp.com/c1m

borg can handle this easily.

dlp211 · on June 1, 2022

I don't know anyone running 5k node K8S clusters. That said, borg (as an end user) appears to just keep scaling, but it uses a different model and makes different assumptions than K8S.

dijit · on June 1, 2022

Sure. Most people have more clusters before they hit 5k nodes on a single cluster.

But I’ve been in situations where it would have been worthwhile. I’ve been in situations with 30,000 machines that needed to be controlled. Splitting them out into very many clusters would be a lot of wasted overhead in configuration, administration and because you lose nodes to masters.

drdaeman · on June 1, 2022

I'm afraid this is why people pick Kubernetes. They believe small business needs those tens of thousands of VMs distributed across thousand of nodes and so on.

With some exceptions, I believe that's a few orders of magnitude above what a small business can run on. Nowadays people just start their day by drinking some K2l-aid and spinning up a "basic" 6-node cluster for a development prototype.

Maybe I'm wrong, of course.

hpkuarg · on June 1, 2022

> There are entire SaaS industries that could fit into a single Google/Facebook/Amazon datacenter.

Forget a whole datacenter, even just one rack is an unimaginable amount of computing power, these days!

dublin · on June 1, 2022

Fact: An well-equipped Raspberry Pi 4 has more memory, more compute power, more storage and vastly faster networking than the Cray supercomputer I worked with at a major oil company in the 1990's! With the exception of the craziness around click-tracking (web ads and marketing have warped compute use even more than crypto), the data required to run even enterprise-scale businesses today is not really all that large.

For almost all purposes, we don't really need thousands of containers running on unimaginably fast computers, coordinated by AI-driven automation systems. What we need is software that is not morbidly obese.

gh02t · on June 1, 2022

I mean a single V100 GPU (~= 100 TFLOPS) has FLOPS throughputs similar to top end mid-00's supercomputers like Blue Gene, at least superficially. And you can squeeze like 4+ in a 1U if you have enough cooling and power.

https://en.wikipedia.org/wiki/History_of_supercomputing

bg24 · on June 11, 2022

Your scale (50,000 VMs) is way too high for small and medium sized business :-)

Is my observation correct that unicorns start to see that scale?

ec109685 · on June 1, 2022

That’s a bit out of date. K8s can do 5,000 nodes and 300k VMs within its performance envelope: https://kubernetes.io/docs/setup/best-practices/cluster-larg...

jmillikin · on June 1, 2022

A VM would be a node. A pod isn't a VM, it's a process tree.

systemvoltage · on June 2, 2022

Scaling is oversold and under criticized.

Folks, listen, if StackOverflow can run on this: https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

So can your doctor's appointment website, your little ML app or Notion clone.

"But...". No. You ain't gonna need it. Do some load testing, prove it to yourself. Now, multiply the load by 100x, reserve AWS resources and you're good to go.

datavirtue · on June 2, 2022

We just moved our single instance web app with 50 users to K8. Not kidding. I totally bailed on that one. It started with moving it to the cloud (meh) and ended up with K8 in the cloud. A few of the guys wanted to pad the resume so we just turned our heads.

dosethree · on June 1, 2022

You don't even end up spending time on Kubernetes, because k8s is just part of the solution, a container scheduler. You have to bring logs, monitoring, a container registry ,as well as a CI system with custom jobs and do integration of everything.

deathanatos · on June 1, 2022

This is true. But I had to bring all those things anyways, when I didn't run on k8s. I still needed some form of all of that. (Though "container registry" might be "package store", or something, depending on specifics of the implementation. Some form of artifact store.)

And with-k8s and without-k8s to me is pretty similar: we vendor or FOSS most of it. The major cloud vendors all have container registries (of … varying quality…); similarly, at a previous company we used S3+a small shim as a Python package store. (We later moved to a vendored solution.)

ELK for logs meant having a daemon set up per VM. Easier in k8s where I can push a DaemonSet to the entire cluster. With VMs … it's a per-app nightmare, really. Even then, that's really not perfect. In practice, in both situations, I feel like you end up having to integrate the apps with the metrics/logs providers. There's just not a common format. Sometimes, there are some libraries, e.g., there's some stuff for Prom's HTTP metrics APIs. Logs … eugh. Nothing amazing; getting structured logging requires per-app changes regardless of what you do. Sure, in either VM or k8s, you can just "suck up syslog/journald / docker logs", but what format are those in? They're not, is the answer, and I find most places do a "one text log per line" assumption (and then have stuff with multiline logs that just gets destroyed/corrupted/lost by the logging daemon) and it misses out on any sort of structured logs. jsonlines through those channels is a slight step up, but usually requires app changes.

dosethree · on June 2, 2022

For sure, I had all this stuff before k8s as well

908B64B197 · on June 1, 2022

> Kubernetes is a Google scale solution. Lots of teams said “hey if Google does it then it must be good!”…but forgot that they didn’t have the scale.

It's also a Google engineer caliber solution. Lots of teams said “hey if Google engineers do it then it must be good!”…but forgot that they didn’t have the same in-house talent as Google.

dusanh · on June 1, 2022

Asking as someone who has only tipped his toes into devops lately and is looking to learn K8, what is considered a reasonable "lightweight" alternative to Kubernetes these days?

jpgvm · on June 2, 2022

You may think you want an alternative but you don't.

The API is the main drawcard of k8s in the first place, if you are off in ECS land all you are doing is wasting a bunch of time on a dead-end.

I would instead focus on getting to understand the basics of the API by using a hosted k8s service like GKE or EKS. Stick with some basic manifests, i.e deployments, services, ingress.

Once you have some stuff running you can start learning how it really works and goes together, i.e what are pods, why are pods immutable, why is a replicaset, how does a deployment orchestrate multiple replicasets, what are endpoints, what is the difference between pod readiness/liveliness.

Don't cheat yourself this early in the game, just learn things the right way from the start and save yourself a bunch of work.

mmcnl · on June 2, 2022

I agree. I'd say the appeal of Kubernetes is the declarative configuration of your workloads. It's a language on its own that's incredibly versatile and exhaustive. This infrastructure abstraction layer is here to stay.

jstream67 · on June 1, 2022

If you are on AWS you should check out ECS Fargate (serverless). It is really really good. Probably one of their more polished products.

If you want to stay on the Kubernetes route check out k3s. Super easy to setup and usable for small production workloads

Sohcahtoa82 · on June 1, 2022

As a security engineer, I always cringe when anything involving containers is referred to as "serverless".

I always thought that one of the advantages of going serverless was that you didn't have to worry about keeping the underlying operating system up-to-date. No needing to do a monthly "sudo apt update && apt upgrade" or whatever. But containers throw that all away when container images enter the world.

Instead of updating your operating system, you're updating your images...and it's basically the same thing.

OJFord · on June 1, 2022

Is anyone's goal of 'serverless' that they no longer have to deal with updating the OS?

Most would say even a server-ful system (k8s, or whatever) should be considered 'cattle not pets' with immutable nodes replaced as needed anyway. No update, just replace. Just like building a new image and having new pods (or serverless whatevers) pull it.

thakoppno · on June 2, 2022

The cattle not pets abstraction always struck me as wildly bizarre. Whoever came up with that phrase, did they grow up on a farm?

I’ve never cordoned off an individual head of cattle and lobotomized it, which is kinda what we do when debugging issues. We take the pod out of rotation, flip a bunch of configs, then give it some traffic to see the new debugging statements.

Sohcahtoa82 · on June 1, 2022

From a purely security standpoint, "updating your OS" and "updating your image" are equivalent. What matters to the security people are that you're running the latest OpenSSL that isn't vulnerable to the newest branded vulnerability.

If you're truly "serverless" by my interpretation of it, then you wouldn't care. Your cloud provider will have updated their infrastructure, and that's all that matters.

OJFord · on June 1, 2022

Yeah I see what you're saying, that's a fair enough interpretation of it I just don't think it's the only one.

In fact almost nothing is serverless (well, the truth comes out! ;)) by that definition, since even Lambda has runtime versions to choose/upgrade, Managed-Acme has Acme versions, etc.

SES, SNS, SQS, etc. sure, but I suppose no compute, since you need libraries, and libraries have versions, and you can't have them (significantly/major versions) changing under your feet. (Or if they don't have versions they're of course destined to have known security holes.)

(Or it's not even about libraries if you want to say no you don't need libraries - it's just about having to interface with anything.)

kuschku · on June 2, 2022

AppEngine was the original serverless platform

jeffwask · on June 1, 2022

I second this. There are a few limitations in Fargate that are annoying but overall it's solid and easy to use.

netfortius · on June 2, 2022

How does k3s compare with MicroK8s, for the purpose of this topic?

dewey · on June 1, 2022

> what is considered a reasonable "lightweight" alternative to Kubernetes these days

Before I used Kubernetes for my side projects and only at work I always thought it's hard to operate and very tricky. If you start with an empty "default" cluster and then just add bits when you need them it's actually not that complicated and doesn't feel too heavyweight. I'd suggest to just play around with a simple example and then see how it goes.

There's things that are used in "production" clusters that you don't need at the beginning, like rbac rules, Prometheus annotations etc.

digianarchist · on June 1, 2022

HashiCorp Nomad is a good alternative.

jhillyerd · on June 1, 2022

I've started running Nomad in my homelab, and it is a great piece of software. Although I feel like the question is sort of flawed, if you want to learn Kubernetes, you are going to need to run Kubernetes - or one of the downsized versions of it.

If you want to learn about containers, distributed workloads, etc, then Nomad is a great option that is easy to learn/adopt piecemeal.

everfrustrated · on June 1, 2022

Definitely check out all the aws offerings under ECS

There's now even onprem ECS variants which means not having to pay aws very much and still get the benefit of them running and maintaining the control plane

dublin · on June 1, 2022

Not an alternative, but if you want to get a feel for Kubernetes and managing various kinds of servers, check out KubeSail and their options for K8s/K3s

It's ridiculous overkill, but I'm looking at a NextCloud server on one of their PiBox hardware servers for the house. (You don't need a PiBox - their stuff will run fine on little instances from AWS/DigitalOcean/Hertzner, etc., or a spare PC you have lying around...)

dosethree · on June 1, 2022

K8s ate everyone else. The alternative is to use a heroku like sass

mastazi · on June 1, 2022

In my particular market sector it seems everyone is using some form of cloud functions (SAM, Serverless Framework etc), and migrating away from K8S/containers.

Regarding PaaS stuff like Heroku, the only people I know that are still using that are solo hackers.

subsection1h · on June 2, 2022

That's interesting. In the latest Who Is Hiring discussion, there seems to be more than 10 times as many references to Kubernetes compared to serverless. https://news.ycombinator.com/item?id=31582796

mastazi · on June 2, 2022

Interesting, you're right, I see 3 occurrences for "lambda" and 26 for "kubernetes". It's probably just the paritcular "bubble" I live in.

horlux · on June 1, 2022

if you only use Services, Deployments and ConfigMaps then k8s can be simple too

mancerayder · on June 1, 2022

Threads like the below are why DevOps jobs exist and why Kubernetes infrastructure skills pay so much and why there's such a large demand.

Yes, it's quite complicated.

No, an API to control a managed EKS/GCK cluster + terraform + Jenkins/Azure DevOps/etc. does not mean that magically the developer can 'just deploy' and infrastructure jobs are obsoleted. That's old AWS marketing nonsense predating Kubernetes.

There's a whole maintenance of the CI/CD factory and its ever demanding new requirements around performance, around Infosec requirements, around scale, and around whatever unique business requirements throw a wrench in the operation.

Sticking to ECS I guess is a valid point. What Kubernetes gives you is a more sophisticated highly available environment built for integration (Helm charts and operators and setups that when they work give you more levers to control resources allocations, separations of app environments, etc.)

And as an aside, I've been doing this for 20 years and long before Kubernetes, before Docker, hell, before VMs were used widely in production, I observed the developer mindset: Oh but it's so easy, just do X. Here, let me do it. Fast forward a year of complexity later, you start hiring staff to manage the mess, the insane tech debt the developers made unwittingly, and you realize managing infrastructure is an art and a full time job.

A story that is visible with many startups that suddenly need to make their first DevOps hire, who in turn inherit a vast amount of tech debt and security nightmares.

Get out of here with, it's just API calls. DevOps jobs aren't going away. It's just the DevOps folks doing those API calls now.

mynameisash · on June 1, 2022

Reading the comments here validates my experience. When K8s was pitched as a way to make this all run smoothly, I thought, "Great! I'll write my code, specify what gets deployed and how many times, and it'll Just Work(tm)." I built a service which had one driver node and three workers. Nothing big. It deployed Dask to parallelize some compute. The workload was typically ~30 seconds of burst compute with some pretty minor data transfer between pods. Really straightforward, IMO.

Holy smokes, did that thing blow up. A pod would go down, get stuck in some weird state (I don't recall what anymore), and K8s would spin a new one up. Okay, so it was running, but with ever-increasing zombie pods. Whatever. Then one pod would get in such a bad state that I had to nuke all pods. Fortunately, K8s was always able to re-create them once I deleted them. But I was literally deleting all my pods maybe six or seven times per day in order to keep the service up.

Ultimately, I rewrote the whole thing with a simplified architecture, and I vowed to keep clear of K8s for as long as possible. What a mess.

adra · on June 1, 2022

This can probably be chalked up to youre-doing-it-wrong (sorry) but not knowing your precise scenario, it's hard to know what went wrong. Maybe really old versions misbehaved (only started a few years ago and its been smooth sailing), but I've never seen your problem on any of our stuff and we have dozens of different services on a bunch of languages/frameworks, and none of them just give up for no reason ( though a lot often die for predictable and self-induced reasons).

I think there was some jank on AWS CNI drivers at one point that delayed pod init, but that's probably the most wtf that I've personally bumped into thankfully.

makeitdouble · on June 2, 2022

> This can probably be chalked up to youre-doing-it-wrong

Yes, and the unforgiving part of k8s is that there is a right way documented somewhere, you might have just spent 3 days sifting through docs and posts and community forums to find it.

It's sometimes worth it, sometimes not, my main gripe with k8s would just be that there is no "simple things" and it shouldn't be pitched as making it easier for small shops. Even if a small use case can be done elegantly, it will probably require a pretty comprehensive and up to date knowledge of the whole system to keep that elegance.

therealdrag0 · on June 2, 2022

Yep very much so. Doing it wrong ™ applies to any deployment and shouldn’t be held against k8s. We have over a hundred services deployed in who knows how many pods in a dozen environments and it’s definitely not that unstable.

rtpg · on June 2, 2022

> Doing it wrong ™ applies to any deployment and shouldn’t be held against k8s

I think there's definitely a huge asterisk there if the tool makes it very easy to "do it wrong", hard to "do it right", etc.

Of course with k8s it's tough because it's capturing computation! Hard for it to "know" what one is trying to do inside the containers. And in the case of k8s the only thing I could think of that is ... kinda in that space is managing volumes, since it runs into the dilemna of adding persistence to ephemeral things.

Layke1123 · on June 2, 2022

I imagine it's akin to management expecting you to spend your off hours learning all this "great new tech" while they think working off hours is reading online articles on hacker news to "stay up to date".

Shorn · on June 2, 2022

> it’s definitely not that unstable

So how unstable is it?

dmitryminkovsky · on June 2, 2022

Not at all unstable in my experience

mynameisash · on June 2, 2022

> This can probably be chalked up to youre-doing-it-wrong (sorry)

I think you're absolutely right. I freely admit that I knew NOTHING about K8s before embarking on this project (and still pretty much know nothing about it now), and I was able to cobble together something that 'worked', but that doesn't mean it was right.

And as another commenter points out, "a huge asterisk there if the tool makes it very easy to 'do it wrong'". I would rather be very clearly told that I've got it wrong and be prevented from progressing further vs. making something that superficially seems right then crashes and burns in prod.

I'm sure there are folks that can wield Kubernetes with great effectiveness, and good on them, but I found it to be supremely frustrating and the wrong tool for the right job. Not that I have a better solution, so I'm admittedly just kind of complaining.

whoopdeepoo · on June 2, 2022

We've had great success running celery applications in k8s, so it's surprising to hear dask was a problem for you. Especially considering dask recommends k8s as a deployment option.

nogbit · on June 2, 2022

Love Dask. Very robust and therefore very easy to get wrong. When you need a longer term solution that uses Dask, it pays to architect things well, in advance vs on the fly in a sandbox.

jeffwask · on June 1, 2022

First. DevOps is a culture not a job most places have so many DevOps roles because they are doing it wrong.

In the olden days of 10 years ago, most operations teams worked around the clock to service the application. Like every day there would be someone on my team doing something after hours usually multiple. Tools like Kubernettes, Cloud (AWS, GCP, Azure) have added significant complexity but moved operations to more of a 9 to 5 gig. Less and less do I see after hours deployments, weekend migrations, etc. Even alert fatigue goes way down because things are self healing. This is on top of being able to move faster and safer, scale instantly, and everything else.

Operations side used to be a lot of generalists admin types and DBA's. With today's environment, you need a lot more experts. AWS alone has 1 trillion services and 2.4 billion of those are just different ways to deploy containers. So you see a lot more back end roles because it's no longer automate spinning up a couple servers, install some software, deploy, monitor and update. It's a myriad of complex services working together in an ephemeral environment that no one person understands anymore.

fertrevino · on June 1, 2022

The number of places that get the meaning of DevOps wrong is too high. So much that it is often easier to use it wrong in order to express an idea.

smeagull · on June 2, 2022

I do think there is a space for a developer team model that is easy to maintain, hard to screw up and gives say 80% of the productivity gains.

etruong42 · on June 1, 2022

New technology sometimes creates more work even though it makes the previous work easier. When the electronic spreadsheet was introduced in the 1980s, even though it made accountants more productive, the number of accountants GREW after the electronic spreadsheet was introduced. Sure, one accountant with an electronic spreadsheet could probably do the work of 10 or 100 accountants who didn't have the electronic spreadsheet, but accounting become so efficient that so many more firms wanted accountants.

"since 1980, right around the time the electronic spreadsheet came out, 400,000 bookkeeping and accounting clerk jobs have gone away. But 600,000 accounting jobs have been added." Planet Money, May 17, 2017, Episode 606: Spreadsheets!

devonkim · on June 1, 2022

Kubernetes in a sense is very similar to Linux back in the 2000s - it was nascent technology in a hot market that was still absolutely evolving. The difference now is that everyone knows the battle for the next tier of the platform is where people will be able to sell their value (look at RedHat selling to IBM for the saddled legacy of maintaining an OS as a tough growth proposition). For a while people thought that Hadoop would be the platform but it never grew to serve a big enough group's needs back in 2013-ish and coupled with the headaches of configuration management containerization hit and it's now combined at the intersection of OS, virtualization, CI, and every other thing people run applications on in general. It may be the most disruptive thing to our industry overall since the advent of Linux in this respect (people thought virtualization was it for a while and it's shown to have been minor comparatively).

A lot of this stuff really is trying to address the core problem we've had for a long time that probably won't ever end - "works fine on my computer."

esoterae · on June 1, 2022

I have noticed a pattern that keeps popping up.. I've seen many orgs invoking docker/k8s simply as an abstraction layer to allow mapping of commit hashes in a repo to objects in deploy environments.

devonkim · on June 1, 2022

Depending upon the nature of the artifacts that's not necessarily the worst abstraction for modeling deployments (still think that deployments are the big elephant in the room that k8s doesn't solve either when it really needs to be better standardized as a profession IMO but that's another topic). ArgoCD arguably makes this work more intuitively and it's one of the most popular K8S ecosystem components today.

infogulch · on June 1, 2022

Works fine on my cluster. Marking as "can't reproduce".

majewsky · on June 1, 2022

In my opinion, the main benefit of Kubernetes for large companies is that it allows for a cleaner separation of roles. It's easier to have a network team that's fully separate from a storage team that's fully separate from a compute team that's fully separate from an application development team because they all work around the API boundaries that Kubernetes defines.

That's valuable because, on the scale of large companies, it's much easier to hire "a network expert" or "a storage expert" or even "a Gatekeeper policy writing expert" than to hire a jack of all trades that can do all of these things reasonably well.

The corollary from this observation is that Kubernetes makes much less sense when you're operating at a start-up scale where you need jacks of all trades anyway. If you have a team of, say, 5 people doing everything from OS level to database to web application at once, you won't gain much from the abstractions that Kubernetes introduces, and the little that you gain will probably be outweighed by the cost of the complexities that lurk behind these abstractions.

kcexn · on June 2, 2022

The other problem I see large enterprises dealing with when it comes to K8s is the lack of cultural change in developer and infrastructure processes.

Traditional deployment models also have ways to separate the network, from the security, from the storage, from the application stack. But it's typically done through strict change control processes.

I see many orgs choose to change to K8s because it offers improvements to operational tasks related to provisioning all of those changes, and speeds up the old change control gateways.

However, K8s is tuned to operate extremely profitably in organisations that need to make large numbers of changes to their infrastructure or software stack all of the time. And begins to break down at scale (at least from a cost point of view) compared to alternative solutions if organisations are not meeting minimum requirements for deployments a day.

IT orgs have chosen to adopt this huge piece of operations software that is itself fairly monolithic, and requires large amounts of upkeep and maintenance to keep it running smoothly and provide constant availability.

But even though they've adopted this new massive fixed cost in their IT operations. They continue to use old change control processes, often because restructuring old teams, that have traditionally made the company a lot of money, into new teams proves to be an incredibly risky exercise.

And so, the net outcome, is that their overall IT operations processes marginally improve at best. But they now have simply absorbed a new fixed cost on top of all their existing ones. What's worse, I see in some orgs, the additional cost pressure is being noticed, but they attribute (IMHO incorrectly) this cost pressure to lack of competence in the new technology (K8s), and begin a massive hiring spree to try and find specialists to better tune the technology stack. The solution, IMHO, instead should be to push hard on existing teams to simplify and downsize, and create incentives to interact with the infrastructure APIs more aggressively, letting Sysadmins/DevOps/SREs deal with faults, errors and failures the way they always have, but with the new fancy tools that let them work more efficiently.

moshloop · on June 1, 2022

High Availability, Scalability, Deployments, etc are NOT the goal of Kubernetes, they are features that are not exclusive to Kubernetes, nor is Kubernetes necessarily better at them then others.

The goal of Kubernetes is to improve the portability of people by introducing abstraction layers at the infrastructure layer - These abstractions can seem overly complex, but they are essential to meet the needs of all users (developers, operators, cloud providers, etc)

Before kubernetes in order for a developer to deploy an application they would need to (send email, create terraform/cloudformation, run some commands, create ticket for loadbalancer team, etc) - these steps would rarely be same between companies or even between different teams in the same company.

After kubernetes you write a Deployment spec, and knowing how to write a deployment spec is portable to the next job. Sure there are many tools that introduce opinionated workflows over the essentially verbose configuration of base Kubernetes objects, and yes your next job may not use them, but understanding the building blocks, still make it faster than if every new company / team did everything completely differently.

If you only have a single team/application with limited employee churn - then the benefits may not outweigh the increased complexity.

throwdbaaway · on June 2, 2022

> these steps would rarely be same between companies or even between different teams in the same company.

And the quality can differ a lot too. I used to think that k8s is not necessary, since our team has mastered both stateless and stateful app deployment on VM, with ansible calling aws/gcp API. Everything just works.

And then I joined another company, which has a hotchpotch of unorthodox terraform and ansible code and a homebrew service discovery layer, with frequent incidents in the early mornings, especially on weekends, when autoscaling of aws/gcp VM would fail due to a myriad of reasons.

With k8s, there is a minimum quality.

habitue · on June 1, 2022

The thing you're noticing is the usual thing that happens when new labor saving technology is invented:

1. What people expect: less work needs to be done to get what you had before.

2. What people don't expect: more is expected because what used to be hard is now simple

So while it may have taken a few weeks to set up a pet server before and as a stretch goal you may have made your app resilient to failures with backoff retry loops etc. Now that's a trivial feature of the infrastructure, and you get monitoring with a quick helm deploy. The problems haven't disappeared, you're just operating on a different level of problems now. Now you have to worry about cascading failures, optimizing autoscaling to save money. You are optimizing your node groups to ensure your workloads have enough slack per machine to handle bursts of activity, but not so much slack that most of your capacity is wasted idling.

Meanwhile, your developers are building applications that are more complex because the capabilities are greater. They have worker queues that are designed to run on cheap spot instances. Your CI pipelines now do automatic rollouts, whereas before you used to hold back releases for 3 months because deploying was such a pain.

Fundamentally, what happens when your tools get better is you realize how badly things were being done before and your ambition increases.

rconti · on June 1, 2022

Because everything has gotten bigger and more complicated.

It's like asking "if the computer saves us all so much work, why do we have more people building computers than we ever had building typewriters"?

Something can "save labor" and still consume more labor in aggregate due to growth.

pixl97 · on June 2, 2022

Jevons Paradox

jljljl · on June 1, 2022

I think this Kelsey Hightower quote has summarized my experience working with Kubernetes:

> Kubernetes is a platform for building platforms. It's a better place to start; not the endgame.

https://twitter.com/kelseyhightower/status/93525292372179353...

Everywhere I've worked, having developers use and develop Kubernetes directly has been really challenging -- there's a lot of extra concepts, config files, and infrastructure you have to manage to do something basic, so Infra teams spend a lot of resources developing frameworks to reduce developer workloads.

The benefits of Kubernetes for scalability and fault tolerance are definitely worth the cost for growing companies, but it requires a lot of effort, and it's easy to get wrong.

Shameless plug: I recently cofounded https://www.jetpack.io/ to try and build a better platform on Kubernetes. If you're interested in trying it out, you can sign up on our website or email us at `demo [at] jetpack.io`.

zelphirkalt · on June 1, 2022

The short answer is: Because of Kubernetes.

The longer answer is: When you switch to Kubernetes, you are introducing _a lot_ of complexity, which, depending on your actual project, might not be inherent complexity. Yes, you get a shiny tool, but you also get a lot of more things to think about and to manage, to run that cluster, which in turn will require, that you get more devops on board.

Sure, there might be projects out there, where Kubernetes is the right solution, but before you switch to it, have a real long hard thinking about that and definitely explore simpler alternatives. It is not like Kubernetes is the only game in town. It is also not like Google invents any wheels with Kubernetes.

Not everyone is Google or Facebook or whatever. We need to stop adopting solutions just because they get hyped and used at big company. We need to look more at our real needs and avoid introducing unnecessary complexity.

trollied · on June 1, 2022

I agree 100%. Keep things simpler, invest the $ that would have been spent on k8s staff on performance tuning courses & upskill your devs in squeezing every last millisecond out of a couple of dedicated servers. You can do lots with a couple of beefy machines if the software is tuned.

Everything has become so abstracted away these days that performance isn’t even a consideration because people don’t understand the full stack. And I don’t mean “full stack” as it’s slung around these days. EG: doing 20 round trips to the database looking up individual records (as each is an object code-side), when you could just do one. Things like that are opaque & many devs wouldn’t even care or know, but a little bit of education can make a huge difference.

cestith · on June 1, 2022

This is all true enough. The benefit of k8s is that if you need all that complexity, or even most of it, then k8s gives it to you in a fairly standard way. Then everything you build on top of it has a well-documented platform you can hire people to help maintain rather than having to train people for six months on a home-grown solution before they're productive.

caymanjim · on June 1, 2022

The premise of your question is invalid. Have you ever tried setting up a Kubernetes cluster and deploying apps in it? Kubernetes doesn't save work, it adds work. In return, you get a lot of benefits, but it wasn't designed to reduce human work, nor was it designed to eliminate devops jobs. It was designed for scalability and availability more than anything. Most people using Kubernetes should be using something simpler, but that's a separate problem.

planetafro · on June 1, 2022

I don't know my dude, all 3 major clouds offer "canned" k8s services that you can set up in a ridiculously short amount of time with Terraform and your CI platform of choice.

I agree with some other comments in this thread about a general fervor in the Enterprise space to "modernize" needlessly. This conversation usually lands on the company copying what everyone else is doing or what Gartner tells them to do. Cue "DevOps".

100 percent agree with your comments on something simpler. I can't tell you how many times I've debated with our Analytics teams to just use Docker Compose/Swarm.

Avalaxy · on June 1, 2022

> all 3 major clouds offer "canned" k8s services that you can set up in a ridiculously short amount of time with Terraform and your CI platform of choice

I don't agree. I spun up a Kubernetes cluster in Azure, which was indeed easy. But then I had to figure out how to write the correct deployment scripts to deploy my docker containers to it, and how to configure all the security stuff. After more than a week of trying to figure it out, I decided to ditch the whole solution and go for Azure Container Instances instead. It was too much for me to learn about all the concepts of Kubernetes, how you configure them, how to make it work for solutions that are not as simple as the example on the website, and how to navigate through the various different methods of deploying stuff.

Maybe I'm just too dumb. But I wasn't going to invest a month of my time into doing something that should be simple enough for an average developer to accomplish.

throwaway894345 · on June 1, 2022

You're not dumb, you're just new to it, and it's fundamentally hard stuff anyways, and if you can find a higher-level abstraction that lets you get work done faster, then all the better. However, the question is comparing Kubernetes to traditional VM-based infrastructure (especially with pet nodes) whereas you're comparing Kubernetes to a higher-level abstraction.

For what it's worth, deploying in Kubernetes is pretty easy once you figure it out (and often finding the information is the hardest part). All you need to do is update the Deployment resource's "image" parameter. You can do that with `kubectl patch` like so:

    kubectl patch deployment foo -p '{"spec":{"containers":[{"name":"main","image":"new-image"}]}}'

Kubernetes will handle bringing up a new replicaset, validating health checks, draining the old replicaset, etc.

sgarland · on June 1, 2022

Please, do not manage deployments with imperative kubectl commands, I beg of you.

jrx · on June 1, 2022

The question is: what tool to use? I'm a solo developer running a very small kube cluster for my hobby project. I very much wanted to have a declarative version controlled state of my cluster. Every time I try googling solutions I get flooded with some enterprise Saas offerings that do nothing I want.

I managed my stateful sets/services for a while with terraform, but my experience was absolutely terrible and I have stopped that eventually. I now use "kubectl patch" and "kubectl apply" with handwritten yaml, but the workflow feels very clunky.

Intuitively it seems obvious to me that there must be a tool helping with that, but for some reason I was absolutely not able to find anything that would be even a little bit helpful. I am considering writing a couple python scripts that will automate it.

kube-system · on June 1, 2022

kubectl apply on a directory full of yaml works fairly well for small stuff. Check it into git and it's version controlled.

If you need something more flexible than that, try making your own helm chart. Helm will give you some text templating, pre and post hooks, some basic functions, and some versioning and rollback functionality.

You can start simple by just pasting in your existing k8s yaml, and then just pull out the pieces you need into variables in your values file. If you need to change an image version, then you just update the variable and `helm upgrade mychart ./mychart`

throwaway894345 · on June 2, 2022

kubectl apply on a directory doesn’t work because deleting a resource manifest won’t delete the corresponding resources.

kube-system · on June 2, 2022

It does work, as long as you’re not deleting anything. :) That might be good enough for a “very small kube cluster for my hobby project”

I only use ‘kubectl apply’ for small stuff where I only have a couple resources. Anything more complicated and a tool like helm is much more useful.

bavell · on June 2, 2022

'kubectl delete -f dir/' will delete all resources in the directory.

throwaway894345 · on June 2, 2022

Right, but the only resource you want to delete is no longer in that directory, so you’ve now deleted every resource you except the one you actually wanted to delete. :)

bavell · on June 4, 2022

Ah, I slightly misunderstood. Still, you should always have your manifests under version control so this shouldn't ever be a problem :)

lukeschlather · on June 1, 2022

I'm a fan of emrichen, but simple text templating (Jinja) works almost as well. (j2cli means you can just provide a yaml file with per-stage configs.)

edude03 · on June 1, 2022

Fluxcd, or argo CD if you want a nice UI

I_dev_outdoors · on June 1, 2022

I probably wouldn't do this, but what problem does this cause?

alias_neo · on June 1, 2022

The declarative approach is a more sustainable way to run Kubernetes. If you define some desired state in manifests and apply them to a cluster, they can be applied again to new clusters or the same one and Kubernetes will attempt to maintain the desired state.

This state can be version controlled, written in stone, whatever you prefer and it can always be attained.

When administrators start issuing imperative commands to a cluster, state starts being changed and there is no record[0] of the state Kubernetes is being asked to maintain.

[0] Not entirely true, the state can always be retrieved from the cluster so long as it hasn't failed.

jen20 · on June 1, 2022

I find this to be a common misconception, stemming from a misunderstanding of what "declarative" means (especially common when people are discussing tools like Terraform).

Firstly as you point out, there is a record of the state Kubernetes is being asked to maintain: it's in the API server as the spec of each resource.

Secondly, using `kubectl` "patch" in the manner described is not making changes to the cluster state directly, it's making changes to the specification of what should be maintained, and the various controllers effect the state changes.

Fundamentally, the argument seems to come down to "you don't have a record of what you once asked the API server to do", and that's fair enough - you don't. But that has nothing to do with imperative or declarative models.

I'm not advocating actually doing this on a day-to-day basis, but the arguments against it are not ones of imperative vs declarative.

throwaway894345 · on June 1, 2022

> Fundamentally, the argument seems to come down to "you don't have a record of what you once asked the API server to do", and that's fair enough - you don't. But that has nothing to do with imperative or declarative models.

You do actually have this record. First of all, because k8s has an audit log, and secondly because deployments maintain a revision history (so you can always rollback--kubectl even supports this via `kubectl rollback undo`).

alias_neo · on June 1, 2022

You're correct, but, when approaching this from the human perspective, it's less about the technology and the reality and more about "do I have a record of the last state I asked the cluster to be in before I fucked it up?". Ultimately that's what matters. Auditability is great, but in practice, when the shit hits the fan, can I get the application that brings in our company's money, and our customers rely on to breath, back to the state it was after 5 years and 73 updates?

Updating the manifests, pushing them to version control and having CD deploy them encourages humans to "do the right thing".

It would be quite easy though to just tweak that one environment variable though while I patch in the new image version to update that one service; until the entire cluster dies and I can't retrieve the last definition and need an equivalent cluster up within the hour.

This is really more about the practice of writing down the state you want the cluster to be in (the spec) and showing it to the cluster, than just ordering it to do one thing without context.

In this sense, Declarative Vs Imperative is just a proxy term for "do I have a record of the state I asked the cluster to keep?"

sgarland · on June 1, 2022

Given that Kubernetes' docs[0] discuss using imperative commands, I think it's a fairly reasonable way to describe it.

[0] https://kubernetes.io/docs/tasks/manage-kubernetes-objects/i...

throwaway894345 · on June 1, 2022

Why is applying a full manifest “declarative” but applying a patch is “imperative”? That’s clearly an error.

tbrownaw · on June 1, 2022

Applying a full manifest is "this is your state now".

Applying a patch is "make these changes to your existing state".

That dependency on existing state is a difference, and it seems to map reasonably well to what declarative/imperative seem to usually be used to mean in this context.

throwaway894345 · on June 1, 2022

1. A strategic merge patch says "this is your state now" (the only difference is the scope of the state in question, with a full manifest including a bunch of extraneous stuff)

2. "make these changes to your existing state" is still declarative

jen20 · on June 1, 2022

> Applying a full manifest is "this is your state now".

It isn’t though. It’s “please make the state look like this eventually”. You do not patch state, just spec. Controllers effect the changes.

throwaway894345 · on June 2, 2022

You're both correct, but you're talking about different states. The parent's "state" is the state of the etcd database and your state is the actual state of the resources that the controllers are managing.

That said, the parent is wrong that a full manifest is declaring the (etcd) state in a way that a (strategic merge) patch isn't--both are declaring etcd state, but a strategic merge patch is doing so in finer-grained increments. A strategic merge patch can declare zero state or many full manifests, while applying full manifests can only work in increments of complete resource manifests. But both are telling Kubernetes "this is your (etcd) state now".

sgarland · on June 1, 2022

Then open a PR.

throwaway894345 · on June 2, 2022

This is a strange way to concede an argument, but I’ll take it.

throwaway894345 · on June 1, 2022

People are incorrectly assuming that using kubectl implies invoking it from an administrator's laptop. Of course, you can and should invoke kubectl from your CD pipeline. The CD pipeline maintains its own record of runs, and Kubernetes deployments have a revision history.

Moreover, people in this thread also don't know what "declarative" means. The patch is declarative, and "declarative" doesn't provide the claimed benefits. For example, as with the patch command, I can create, apply, and delete a Kubernetes manifest file (indeed, I can apply directly from stdin!) and there is no additional record of the change beyond that which would exist if I had just applied the patch.

sgarland · on June 1, 2022

A patch is declarative in that it's an idempotent command which requires you to fully define the patch. It's still something that is being manually defined rather than contained in a file to be applied. Of course, if that file is just being locally held and modified on someone's laptop instead of committed to version control, it's a moot point.

I will grant you that `kubectl delete -f` breaks my argument, since it's imperatively modifying objects given a declarative manifest. Re: changes via stdin, I mean sure; you can also pipe a deployment manifest into `ex` or `sed` to make changes on the fly and then echo it back to `kubectl apply -` but I wouldn't recommend it.

throwaway894345 · on June 1, 2022

Yeah, my point is that whether you're changing a manifest and doing an apply versus a patch isn't relevant, the relevant bit is whether or not you have a version history and in both cases you do via your CD pipeline and via k8s deployment revision history. You can also commit to git, but I don't think there's much value in committing every image version change to git.

Also, working in raw manifests isn't a panacea; if you delete a resource's manifest, Kubernetes won't delete its corresponding resource (maybe this is what you were referring to with your bit about `kubectl delete -f`). You need something that diffs the manifests against the current state of the world and applies the requisite patches (something like Terraform) but this isn't related to how you change image versions.

alias_neo · on June 1, 2022

I agree with your points, but as I mentioned in another comment, if we take "imperative" and "declarative" as proxy terms for "I do or do not have a reference to the state I requested of the cluster OUTSIDE of the cluster"; my point is that updating the state, in full, by modifying the manifests and committing them to be deployed by your CD pipeline or otherwise is a better approach to ensuring you or someone else can rebuild your "empire" were it to unceremoniously burn to the ground.

throwaway894345 · on June 2, 2022

I think you’re assuming that just because we’re updating the image via kubectl (which itself is invoked via CD pipeline) that the infrastructure isn’t codified and persisted, which isn’t the case. You can (and should) still have your infrastructure manifests saved in your infra repo/directory and version controlled; that’s orthogonal to how you update the current image version.

ryapric · on June 1, 2022

The same problems any imperative management within declarative config causes -- drift. If the tool you're using supports declarative configuration, all changes should be made exclusively via the declarative interface to prevent that drift. In this example, the new image should be added to the original manifest itself, not via a CLI update.

throwaway894345 · on June 1, 2022

It depends on how you manage your changes. A lot of people don't have their infra-as-code manage the deployment's image field--rather, that's updated by the application's CD pipeline. There's no drift to worry about.

xbar · on June 1, 2022

So upon deploy the CD pipeline calls kubectl with the proper deployment image and that's ok?

throwaway894345 · on June 1, 2022

sgarland · on June 1, 2022

Drift, as the other child comment mentions, but also loss of version control. I do not want to have to trawl through someone's shell history to figure out what they changed, nor do I want to have to redirect `kubectl get foo -o yaml` output into diff.

If everything is in code, and you have a reasonable branching strategy, it's much easier to control change, to rollback bad merges, to run pre-hooks like security checks and configuration validation tools, etc.

throwaway894345 · on June 1, 2022

It's not drift, because the infra-as-code doesn't manage the image field (the application's CD pipeline does). You don't trawl through someone's shell history', you look at the CD pipeline history. Rollbacks are easy--you just deploy the prior version via your CD tool.

I think you're assuming that invoking kubectl means invoking it directly from a user's command line, but kubectl can also be called in a CD script.

sgarland · on June 1, 2022

If you're using ArgoCD or something then sure, but bear in mind the original statement you made was directed at someone who is new to K8s, and given a command that can be executed from their shell, they would likely assume that's what you meant.

throwaway894345 · on June 1, 2022

It doesn’t have to be Argo, it can be Jenkins. Whether or not you use a CD pipeline is orthogonal to whether or not you use k8s. The best practice is to use a pipeline whether you’re targeting k8s or bare VMs or a higher level PaaS abstraction.

I_dev_outdoors · on June 3, 2022

I'm not new to Kubernetes and I've been using containers since Solaris zones were introduced.

thinkmassive · on June 1, 2022

You know kubectl has a built in diff subcommand?

sgarland · on June 1, 2022

I do now! Thank you.

throwaway894345 · on June 1, 2022

This is a declarative kubectl command.

sgarland · on June 1, 2022

Kubernetes' docs[0] disagree.

[0] https://kubernetes.io/docs/tasks/manage-kubernetes-objects/i...

throwaway894345 · on June 1, 2022

They’re not being precise with their language. Applying a full manifest is no more “declarative” than applying a patch.

mihaigalos · on June 1, 2022

Spot on!

epivosism · on June 1, 2022

As an outsider, that command looks really easy to mess up.

* shell interactions with quotes * double quotes * interpolating into image namespace with no autocomplete * easy to forget an argument * do you get autocomplete against the deployment name?

Comparison: C# declaration of a complex type - it's less complex than the `kubectl` command above, but IDEs offer way more support to get it right.

* var x = new List<Dictionary<CustomType,int?>>()

This will light up warnings if you get anything wrong;

You get: * go to definition of `CustomType` * autocomplete on classnames * highlighting on type mismatches * warnings about unused vars. * if you initialize the var, the IDE will try hard to not let you do it wrong

So structurally,

1) in the code realm, for doing minor work we have really strong guarantees,

2) in deployment realm, even though it's likely going to hit customers harder, the guarantees are weaker.

I think this is behind the feeling that the k8s system is not ready yet.

throwaway2037 · on June 1, 2022

About quoting rules: We desperately needs static analysis or IDEs (Emacs plugins, whatever) that "explain" exactly what quotes do in a particular context. When I returned to C++ recently, I was blown away by Clang-Tidy (JetBrains CLion integration). Literally: Clang-Tidy seems to "know what you really want" and give intelligent suggestions. For someone whom is a very average C++ programmer, it instantly leveled me up! I could see the same for someone writing /bin/sh, /bin/ksh, /bin/bash shell commands. If the IDE could give some hints, it would be incredibly helpful to many.

throwaway894345 · on June 1, 2022

The quotes here have nothing to do with k8s, it's just how you encode JSON in shell.

peppermint_tea · on June 1, 2022

apt-get install shellcheck

or

https://www.shellcheck.net/

throwaway894345 · on June 1, 2022

You put it in a bash script that gets called from CI and move on with life. No one is typing this into their terminal every time they want to do a deploy.

> I think this is behind the feeling that the k8s system is not ready yet.

Whether you patch the deployment image from bash or from C# doesn't indict the k8s ecosystem.

frazbin · on June 1, 2022

correct! Here's why: Kubernetes is an operating system (a workload/IO manager) and its lineage is 16 years old. So it's like Unix around 1979, or Windows/DOS around ~1996. Imagine it's 1996 and you can pick between Windows NT (representing 16 years of MS development) or System 360 (~40 years out from IBMs first transistorized designs).

What I'm trying to say is that as an operating system, Kubernetes is now a young adult, and historically speaking, operating systems at this level of maturity have been adopted and ridden for decades, with much success. But, ya know, if you chose OS/360 in 1996 you would have a point.

kuschku · on June 1, 2022

1. You’ll use `kubectl set image deployment/gitlab gitlab=gitlab/gitlab-ce:14.9.2-ce.0` in production instead

2. kubernetes will abort the change if it doesn’t syntax check or the types don’t match.

throwaway894345 · on June 1, 2022

Hah, I didn't even know about `set image`.

kuschku · on June 1, 2022

Don’t do that, use

    kubectl set image deployment/foo main=main:new-image

instead

mdaniel · on June 1, 2022

It also accepts "*" if you don't know or care about the container name(s)

    kubectl set image deployment/foo "*=the/new/image"

and then, my other favorite (albeit horribly named, IMHO) "rollout status" to watch it progress:

    kubectl rollout status deployment/foo

throwaway894345 · on June 1, 2022

TIL. Thanks for the advice.

tauwauwau · on June 1, 2022

It took me 4-5 hours to read and understand first few chapters of Kubernetes in action, I annotated a lot text as notes on the Oreily site, but didn't have to go back to it. I didn't have to do any security and account management stuff, it was being done by someone else. After 3 years of working with plain Kubernetes and OpenShift, I still didn't have to go back to the book. Basic concepts in Kubernetes are easy to understand, if you're working as a developer to deploy your apps in it and not in DevOps managing it.

https://www.oreilly.com/library/view/kubernetes-in-action/97...

throwaway2037 · on June 1, 2022

You wrote: <<if you can find a higher-level abstraction>>

This a stretch, but to me Kubernetes is like the C programming language for infra. If you look at the entire software stack today, drill down (all the turtles), and you will eventually find C (everything goes back to libc or a C/C++ kernel). I assume any commercial (or non-!) "higher-level abstraction" for infra is already (or will be soon) built on top of Kubernetes. I am OK with it.

I write this post as someone who is uber-techincal, but I know nothing about actually using Kubernetes. I can do vanilla "hand-coded/snowflake" infra just fine in my constrained, private cloud environments, but nothing that scales like Kubernetes.

rustyminnow · on June 1, 2022

> If you look at the entire software stack today, drill down (all the turtles), and you will eventually find C (everything goes back to libc or a C/C++ kernel).

I might be nitpicking, but I'm not sure that's necessarily true. You could in theory write a compiler for a new language in C (or even assembly!), and once you have a working language, re-write the compiler in that new language. Now that there is no C code involved in the stack anymore, would that still count as a C-turtle?

Haskell for example, has some "bits" written in C, but a lot is written in Haskell or assembly[1]. So if you look at the WHOLE stack you'll find C _somewhere_ (probably most places), but I don't think _everything_ boils down to C.

Granted, a LOT of stuff is written on top of C.

[1] https://stackoverflow.com/questions/23912375/what-language-i...

throwaway2037 · on June 1, 2022

Great point and not a nitpick at all! I hope the next generation of languages is written on top Python, Ruby, C++ (ultra modern), Java, or DotNet/C#. I wish more languages would figure out if they can host in JVM/CLR, which would provide some crazy interactions!

After seeing so much great work done with JavaScript in the form of "transpilers", I think a lot can be done in that area. I feel Zig is a crazy good idea: A brand new language that produces binaries that are 100% compatible with C linkers. If all goes well, in a few years, why would anyone use C over Zig? It seems like the future.

Lots of people think C++ is bat sh-t crazy complex (me too, but I still like the mental gymnastics!). What if there were different C++ dialects supported by transpilers that intentionally restricted features? I think kernel and embedded programmers would the primary audience.

throwaway894345 · on June 1, 2022

Kubernetes may eventually underpin lots of higher level infrastructure, but most stuff currently isn't running atop Kubernetes. None of the higher level container-as-a-service offerings by the major cloud providers run on Kubernetes, for example. Nor does Heroku. And moreover a lot of people are still working with lower-level abstractions (VMs and auto-scaling groups) or no abstractions at all (pet servers).

heretogetout · on June 1, 2022

I hear you about feeling dumb. I think some early decisions in the k8s ecosystem led to a lot of wasted time and effort, and this frustration.

YAML: significant whitespace is always unwelcome but YAML also introduces unexpected problems, like how it deals with booleans. For example, say you have this array:

    foo:
    - x
    - y
    - z

You might think this is an array of strings, but you'd be wrong.

It's also difficult to read through a YAML config and understand the parent of each key, especially in code reviews or just on GH.

I believe k8s life would have been easier with JSON configs, where it's impossible to confuse e.g. booleans for strings and where it's easier to understand the object's hierarchy.

Helm's use of gotpl: this choice exacerbates the problems with YAML. Now you're treating a structured language with a text template library. You have to spend energy thinking about indentation levels and how the values will be interpreted by both the templater and k8s.

I think helm would be less frustrating if they chose some templating library that made objects first class citizens. Where you can inject values at specific locations with one-liners or simple blocks of code (e.g. `ingress.spec.rules[0].append(yadda yadda)`)

I'm sure there was debate about these choices early on and I don't have any unique ideas here, so I don't want to be too critical. These are just a couple of pain points I've personally experienced.

rustyminnow · on June 1, 2022

Technically YAML is a superset of JSON - all valid JSON is valid YAML. So you could write all your configs in JSON and they'd work just fine.

krapp · on June 1, 2022

This discussion from a couple of weeks ago suggests it isn't that simple: not all valid JSON is always valid YAML if your definition of superset requires that JSON would be parsed the same with a YAML parser as with a JSON parser.

If you're going to use JSON for config, it's better to use an actual JSON parser.

https://john-millikin.com/json-is-not-a-yaml-subset https://news.ycombinator.com/item?id=31406473

rustyminnow · on June 1, 2022

Thanks! That's really interesting.

You could mitigate some of the issues and get JSONs "strictness as a feature" by passing the document through e.g. `jq . $file` as a CI step, but I don't think that'd resolve the 1e2 issue. TBH I didn't know you could write numbers in JSON like that, so I imagine it'd be an issue that doesn't come up often. But it's disappointing that it wouldn't just work.

bmn__ · on June 1, 2022

> Technically YAML is a superset of JSON

That's false. http://p3rl.org/JSON::XS#JSON-and-YAML

rustyminnow · on June 1, 2022

I'd like to amend my statement:

Technically YAML is (supposed to be) a superset of JSON - (almost) all valid JSON is valid YAML. So you could write all your configs in JSON and they'd (probably) work just fine (assuming you keep things relatively simple).

jtchang · on June 1, 2022

Is that yaml not an array of strings?

["x","y","z"]

Filligree · on June 1, 2022

I don't happen to have a YAML parser at hand, but I believe it's...

["x", True, "z"]