Hacker News new | past | comments | ask | show | jobs | submit login
A developer goes to a DevOps conference (darkcoding.net)
405 points by fanf2 on Sept 28, 2019 | hide | past | favorite | 291 comments



This is sadly the state of the current mainstream DevOps "movement". It is no longer anything that could be described as a movement, only as an industry, a set of tools, and a thing a few years back that CTOs in laggard companies announce to the board that they need to be doing.

It was supposed to be about efficiency and repeatability, and approaching ops with a SWE mindset; but most importantly taking ownership of your shit, end-to-end; which is something that the best and most effective SysAdmins and Software Engineers always did.

We don't hire "DevOps Engineers", rather "Platform Engineers", in which we have a hard requirement that you are approaching competence as a software engineer in at least one language, and in at least one paradigm (e.g you should be able to tell me about type systems, data structures, polymorphism, higher order functions, composition vs inheritance, referential transparency, TDD etc).

We also expect that all of our backend software engineers deploy and maintain their own infra (as code), using the guardrails/services/systems provided by our platform team. Deploying a new database cluster for a user-facing service is a pull request from a backend engineer, not a Jira ticket for a "DevOps Engineer"

There are companies out there "doing it right", but they are in the minority.


This was bound to happen though.

Everyone wants an engineer who can do everything competently, but few places are actually willing to compensate to recruit and keep the people who have those skills. Even worse, some of the places that can afford the right people - they have internal politics that prevent these people from being able to drive meaningful change once in the role. Good people will leave environments like that and they'll be left with people who are either under-utilized or who caused internal conflict in the first place.

Every executive loves to hear "ownership", "get rid of silos", "speed up release time", "repeatable deployments". They don't like hearing "increase salaries dramatically", "piss off some existing employees", and "hiring is going to be even harder".

So what happens instead? Middle-management types decide to train their existing teams to use tools that are associated with "DevOps", update some job titles, and tell the investors and executives they now have a DevOps team. The existing employees are now happy to learn new tools and feel appreciated, the migration to the new tools may or may not solve some old lingering pain points (while also introducing some new, un-predicted pain points that the new DevOps team will happily resolve and write RCAs about, allowing leadership to think that "dev culture" is taking hold).

The fact of the matter is, "doing it right" is very expensive and not necessarily going to save every company costs or increase their revenue in the long-run. Sometimes just using better tools and and adopting better processes is enough to see benefits, and that's ok. If running infrastructure is your company's core competency, then it makes sense to invest in extremely skilled people at all levels that touch infra. But those extremely skilled people are expensive, prone to turnover, and tend to be picky about where they work.

So I don't should shame companies and engineers for shallow adoption of DevOps tooling and imply they're subverting the DevOps "movement" or whatever. There is room for many roles under the DevOps umbrella, and just because some places aren't immediately restructuring everything, doesn't mean they aren't learning won't contribute back to the greater community at some point in the future.


I have been lurking HN for a couple of years, but this comment made me create an account. I am all too familiar with the problems you're stating here. It is quite frustrating, really. People like myself are hired to change the system, destroy silos - then, they (management, generally) see that we're talented at building infrastructure, or some other task that we do, and they throw us into a traditional sysadmin role, and are confused why they can't hold on to a DevOps/SRE/WhateverEngineer for very long. Then they tell their managers that they have/are doing DevOps because they had a guy build some CI/CD pipelines and build out servers, probably manually 1-by-1 because the tools for automating that aren't allowed and the "DevOps" guy doesn't have entitlements to automate it. Not that I'd know anything about that...


I'm 100% in favor of your view of DevOps, and its devolution from movement to dubious agglomeration of vendors and consultants is something I've seen before.

Like DevOps, the Agile movement started out as a bunch of smart, dedicated people seeking a new way to work. But once the excitement spread out of that passionate early-adopter group, it changed radically for the worse. I think that's because once you get to mainstream adopters, they're not interested in deep change. They want to keep doing what they're doing, but 10% better. Vendors and consultants retool to serve that market, inevitably watering things down and frequently missing the point entirely.

This really frustrated me when it happened in the Agile movement. [1] But I've come to accept that as long as our industry is structured the way it is, it's going to keep happening. It's honestly kind of depressing, but the good news is that anybody willing to build a culture of excellence and put in marginally more work can get much, much better results than their competitors.

[1] I wrote more about that here: http://williampietri.com/writing/2011/agiles-second-chasm-an...


This linked article was excellent and I encourage anyone here to read it.


This is something I've been wondering about. I recently interviewed for a full-stack dev position, and I felt I did quite well on all of the development questions. It wasn't a devops role - they had a whole separate devops position - but I got passed over for another candidate and when I got the news it was suggested that I should "get some devops experience" and maybe try again in the future. I thought that was weird. I know generally what docker/containers do and what purpose things like Kubernetes serve, I'd just never used them. I figured I'd be able to pick up whatever I needed for doing minimal devops tasks in the course of the job. Is it common to expect more than that from "developers"? This was neither a small nor a foolish company.


I can’t speak for how common it is, but in my opinion, it is not unreasonable to ask that developers have at least a reasonable understanding of how the systems that they build actually work, throughout at least the majority of the layers of abstraction that they run upon.

It is though quite unreasonable to expect experience with specific tools, unless you’ve asked for them on the job spec.

The title “full stack engineer” is a whole other can of worms, not a million miles away from this DevOps thread.


I agree with brundolf, these are tools and most people can get a handle on how they work in an afternoon or two. We aren't talking about expecting new hires set this infrastructure up from scratch, just use the tools that are already in place.


> full-stack dev position

How can you be a full-stack dev if you haven’t deployed (and kept running) anything in the past few years? Unless you just kept manually rsync-ing files to a server all that time.


You are making a subtle mixup between “I am” vs “I can”.

He didn’t write that he is a full stack dev, he wrote that he interviewed for a full stack dev role:

> “I recently interviewed for a full-stack dev position, and I felt I did quite well on all of the development questions.”


That is a fair point. My main idea was that the questions make sense in context. The ‘you’ in my message should be read as a generic you.


It's a poor excuse if they did not list those skills as required. People need a reason to reject and will use a variety reasons that may not apply or matter for the job you are applying for as long as it sounds good on paper.


Werner Vogels has a recent blog post [1] entitled "Modern Applications at AWS" where he says "To succeed in using application development to increase agility and innovation speed, organizations must adopt five elements, in any order: microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security."

I see the devops role at large cos evolving into dev-sec-ops for the "automated, continuous security" tooling and processes. (Besides the usual tooling for delivering and supporting a reliable network service, cloud engineering, system-level issue resolution in dev and test, load and performance and containerized test automation in the pipelines, failover fire-drills, etc.)

[1] https://www.allthingsdistributed.com/2019/08/modern-applicat...


> microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security.

Two of those things are absolutely irrelevant to succeed.


I’m guessing microservices and serverless? Curious as a developer trying to get a better handle on how systems should be built/designed.


Yes, that was my intent anyway. I’m sure different people read that differently.


I'm involved in a few different areas at my company, but when I get involved in hiring DevOps types, I try to look for the person you described as a "Platform Engineer." There are far too many times where I talk to people who know what tools are, but its just a black box that they plug things into, and they only know to use it because that's what someone at their local DevOps meetup group said was cool/useful. I want people who can help devs, not just deploy their code, or give them a system to run on. They shouldn't be afraid of learning a new language (or just looking at it), and should have competency in one already.


Funny, our company recently turned our wonderfully positioned Platform Engineers into Dev Ops Engineering concerned almost entirely with Chef.


>There are companies out there "doing it right", but they are in the minority

Probably true of absolutely everything in the end.


What makes you so sure your company is "doing it right"?


Over the last 10 years, I've become convinced the idealized notion of DevOps is not a thing. I have a couple reasons:

1. There is a huge sysadmin workforce. Many of them never learned to really code. Are these people supposed to go away? Are they supposed to learn to code? Coding is a skill that takes time to master, and if these people were sufficiently motivated to code, they probably would've become coders before now.

2. In general, software engineers aren't thinking about or care about their CI/CD pipeline beyond it being easy to work with. They don't care about their infrastructure except insofar as what they need to know to make their stuff fast. There is a nice separation of concerns there that's nearly impossible to avoid. As a result, it's very easy to build higher order tools to handle most workflows and infrastructure. It's the whole reason AWS can exist and why most devs don't care about the details of ops.

I realize I'm speaking in very broad terms. However, broad terms are what define an industry. I've seen this play out enough times that I knew what this article was going to be about before I even clicked on it. I'm pretty sure most other people did, too.


I’m convinced it’s driven by business communication and objectives more than skill set.

I’ve _never_ worked with a “product manager” or “product owner” that could effectively tell me what a SLO was, or simply tell me what the business impact of bugs were. All I usually get is what we’ll lose if we don’t implement feature X or Y.

This impacts “DevOps” because it almost always puts focus and priority on anything feature-related. In order to do anything not feature-oriented, you literally have to be fighting massive quality problems on a daily basis. Thus... very few organizations understand how to have “devs and ops spend quality time together”. Almost all time not building features is wasted in their eyes.

In the end, this is why there’s a “dev and ops split”. The devs drive features, which PMs understand, and the ops makes things run, and the PO/PM types ignore them but sometimes like to make requests.

I think a pre-cursor to any real DevOps culture are SLOs that are understood and reported to executives. If that doesn’t exist, DevOps usually ends up being a BS term used to “remain relevant”. Without some kind of business quality objective, real DevOps time is pure cost, and counter to business development.

Just like “agile” BS, any real change usually starts at the top fo the company.


I'm not convinced that it's only POs that struggle with quantifying the cost of a bug in production.

I can see the same when programming language advocates extol the benefits of more static analysis for software that does not even remotely need the level of reliability they are aiming for.

The guy who hacks it together in PHP will just get to market first and get such a critical mass of market share that 5-10% more reliability probably won't matter.

Of course a lot of this depends on market sector, self driving cars are a million miles from serving banner ads.


True, many POs/PMs drive a lot of prioritization of work, since (maybe unfortunately), they're usually the ones left to "understanding the business impact".

It kind of adds more credence to people not being so siloed in responsibility. We all just bring different skills to the party. Thus, we're all responsible for quality, for business impact and prioritization, etc.

Unfortunately, I've often had situations where, upon a reorganization, I'm left with managers who love to "guard others from distractions" and end up cutting team members off from broader discussions. Thus, leading to the siloing.

This is why I think "DevOps" ends up meaning nothing. It's a _lot_ of details that requires lots of communication and knowledge sharing that can be overwhelming to, ahem, certain PHB types. Sometimes you can implement a tool, other times you can just not build a particular feature, but the reasons are usually very complex and never easily get broken into a simple one-liner task definition with a story point cost.


The best way to have PO that understand the business value of bugs is to work in traditional industries, where software just has a support role.

Every bug or downtime that gets reported does have a visible business impact.

And unnecessary features seldom get implemented if not covered by project budget.


Whenever I go in to a new client and ask about what their SLIs are, I usually get blank stares from 90% of the audience and maybe _one_ timid reply.


> As a result, it's very easy to build higher order tools to handle most workflows and infrastructure. It's the whole reason AWS can exist and why most devs don't care about the details of ops.

I have 25 years of engineering experience. I'm now a CEO... I think the cloud infrastructure movement is really killing off a valuable skillset - actually understanding your software stack.

My company would die on AWS... our hosting costs would be 8x...


I work on cloud infra (mostly data, but also Linux infra and networking on Azure, since I’m a Solution Architect at Microsoft).

And I agree with you in that one critical aspect, and would like to contribute my viewpoint: Every single successful move to the cloud I’ve witnessed hinged on the people who did it _really_ understanding their software stack.

For them (usually clients and not contractors or integrators doing the move for them) the key aspect of the learning curve was understanding cost drivers in cloud infra (or PaaS), weighing them against their current situation, and _measuring_ alternatives.

I’m often hassled by peers and salespeople for spending “too long talking to the techs” rather than doing pretty PowerPoint pitches, but I’m proud of the engineering work that makes those migrations viable, because a) it is truly full-stack and b) I learn at least as much as the techs I work with as we drill down into their stack and figure out how best to move it to the cloud.

Alas, this kind of depth work happens too seldom (see my other comment).


Do you have some blog posts or something about how you achieve the 8x cost reduction over AWS? That seems like a lot to me.


I think some young people just know AWS because of their dominant marketing, and have no idea what's it like to just have some dedicated servers - which you need for your own security anyway, unless you're comfortable sharing RAM and CPU with others (I'm not https://media.ccc.de/v/33c3-8044-what_could_possibly_go_wron... )

$171.83/month of AWS EC2 (+ bw): 8 core CPU 16GB RAM 250 GB SSD https://calculator.s3.amazonaws.com/index.html

24.99€/month of dedicated: 8 core CPU 16GB DDR3 250 GB SSD Basic Unmetered 1 Gbit/sec https://www.online.net/fr/serveurs-dedies/start-2-m-ssd

$809,57/month 20 Core EC2 dedicated instance RAM? Disk? BW? https://aws.amazon.com/fr/ec2/dedicated-hosts/pricing/

234,99€ month dedicated: 2x Intel® Xeon® Silver 4114 20C/40T 128 Go DDR4 ECC 2x 1 To NVMe Premium 1 Gbit/s https://www.online.net/fr/serveurs-dedies/core-5-s

I suppose the rest to 8x is on bandwidth, which is not charged on this kind of servers.


I think it depends of your scale. On most cases the real hosting cost for a dedicated barebone is administrating and managing the server. So if you need 100 dedicated 20 core server it is most likely cheaper to do that yourself, but if you need just one it is much much cheaper to use some cloud service.


I don't think you ever "just need one", for me one is always two : staging (and ci, by extension ..) and production.

Provisioning an EC2 instance or a dedicated server just has the same cost in terms of manhour: running some automation in deployment pipeline, is something that you'll have to setup and maintain.

Be it updating a docker-compose file over ssh with ansible or something else: you have to do it anyway. If you want to cut that cost perhaps try with a starter project from OpenShift or Heroku, but that will also have a bigger cost even than an EC2 instance, and you will still have to take time to ensure your deployment is properly automated, with automated backups as part of the process.


> I don't think you ever "just need one", for me one is always two : staging (and ci, by extension ..) and production.

It's always at least two in production if you ever need it to be HA.


Most project never get to the point where they need more than 99.9% which is perfectly feasible with a single server. Interesting remark nonetheless.


Prod needs n + 2 if you want to survive losing one (machine, datacenter, whatever) while you have one out for maintenance. Buying three when you need one is pretty expensive; buying seven when you need five isn't as big a deal.

Staging just needs n > 1 to test that sharding works at all.


IMO bare compute isn't really the AWS value prop; if all you need is a VPS (or a non-virtual one) AWS is expensive overkill. The reason to use them is all the peripheral services, tied together with IAM[1][2].

[1] https://forrestbrazeal.com/2019/02/18/cloud-irregular-iam-is...

[2] https://news.ycombinator.com/item?id=19742038


From a purely operational perspective; this is also a reason to avoid them.

If it's throwaway then sure, but building your critical infrastructure around a single provider is really bad business hygiene for the sole reason that you can never control for your dependency increasing costs arbitrarily.

Maybe that's fine for most people, people seem to optimise (or value?) developer time over basically anything else so maybe it's something that's already taken into account.


How much would you have to pay a sysadmin to install, maintain and monitor an in house hosted server ? How much would you have to pay to have daily snapshots of your instance ? And what do you do with the old server if there is a change in business strategy and you either need a much bigger server, or no longer need this server at all.


Are you suggesting that someone running in the cloud suddenly has no need of an ops team? This is the danger in drinking the DevOps Kool-aid, assuming that because a developer can manage to not completely fuck-up their dev box they are somehow proficient enough at system administration to manage cloud infra.

Daily snapshots? For the prices listed you could have an actual hot spare running 24x7 and still save money. This also makes surge-up easier and if you ask nicely a lot of decent dedicated hosting providers will let you shrink without holding you to contract terms.


Absolutely correct, if anybody wants to pass an AWS DevOps certification: https://aws.amazon.com/fr/certification/certified-devops-eng... (I won't)


The comparison above is not AWS vs an in house hosted server. It's dedicated bare metal. If something happens with the hardware on those dedicated servers, the hosting company handles it for you.

And even most dedicated servers can optionally come preinstalled with the OS.


Absolutely, it can even install Proxmox if you want something that supports snapshots (I prefer to "just do remote secure incremental data backups")


It's sad really. Every time I hear someone saying devops this, devops that I ask them to explain to me what devops means. You see... everyone has their own idiosyncratic definition and everyone that does not agree with them is an idiot.

to understand what devops should have been we have to step back and think about ownership. When you own they thing you build from you know building it, to deploying it, maintaininging and sunsetting it that's when you can tell me you are doing devops. It's not about a super developer or a super sysadmin - it's about full ownership. In the process you'll do software development, you'll learn about the environment and the hardware the software runs on and most importantly you will learn to never ever ever again overengineer something.

to answer parent: sysadmins have to learn how to write code. if you want to stay relevant and still have a job in 5 years you have no choice. AWS & other big clouds are already eating your lunch. the best transition is to actually learn how to deploy infrastructure in the cloud - and if you're starting fresh, why not actually learn how to code? good developers know their CI/CD pipeline in depth. They care about infrastructure and details. Apart from junior developers, you really don't have the option to live in your box and not care.


Also there are many system admins who can code quick shell scripts in bash/powershell, setup build and CI pipelines, love doing performance analysis with Unix tools and know how to harden systems but have utterly no desire to learn 'industry' languages and large frameworks and spend day-after-day inside an IDE developing software products.

DevOps generally means developers doing sub-standard operations work.


> DevOps generally means developers doing sub-standard operations work.

Mainly because ops cannot be bothered to.

Telling me something is going to take a month when it should take only a few minutes if you’d done your job right the first time means I’m going to be doing substandard operations work.

But if that’s the case, is it really sub-standard?


Personal experience: no, it really is substandard.

It’s easy to pretend that, like say, HR, because you don’t see the work that’s being done, no work is actually being done, and you could do a better job if you’d had the job from the beginning.

However, how often do you get to do greenfield work as a developer?

Right, now imagine you’re not even writing code, just pushing configuration around. OPs is often a constraint satisfaction problem, not an engineering problem.

Developers can come along and whip up a quick ldap/logging/monitoring/pipeline, but when it gets to the long tail of the hundreds of hours of standard practice (ie. you have to write documentation for that thing you wrote) and additional tooling (yes, you do need to write custom integration to active directory because of [corporate policy here]...) it’s a big mess, badly done, by developers who didn’t really want to in the first place, and don’t want to maintain it (and sure as hell not out of hours when it goes down).

TLDR; don’t be ridiculous. Smart people do this stuff professionally and then you come along and think you’re some hot shot who can do it in a few minutes?

Get off your high horse. You have a domain skill set that doesn’t include competence in the domain you superficially think you could easily do.

Try listening to why people say it takes a long time, and work within the constraints of the domain, or you will produce substandard outcomes.

I’ve seen it in so many places, it makes me just sigh now. As a developer (which I am), my complaint with ”Devops” is that management has been given the same expectation you just articulated.

ie. that ops is basically people who were never good enough to be programmers, and basically, despite years of training and experience, are basically incompetent and not “agile”.

...and somehow, developers, with no training in the domain, will just do a better job of it.

They don’t. In any of the places I’ve worked and seen it tried, in like, the last ten years.

Ymmv.


The disconnect is that IT/Ops really have no power in the org. Often they are subsumed by some incompetent line manager who’s clueless about the day to day challenges developers face when shipping code that needs to run somewhere. When the non-technical dumb fucks sneak into management you end up making sacrifices, because as you point out, they want it done yesterday and in five minutes. That leads to a sloppy standard of product and engineering culture, including ops culture. Which is to say take shortcuts and not think about 10 years later when someone has to janitor your manually spun up infrastructure.


Sounds like you need OPSMAN, then. It's a new managerial philosophy where management is actually done by operations.


Could you share some article? Its hard to Google.


10 years? that’s absurd. the surface has shifted under your feet in 10 years. anyone building for 10 years from now is a luxury watchmaker and not thinking about the org.

only build for the next scale factor.


We just decommissioned applications written in 1986, last modified in 2012. They were running important production code until this year.


> TLDR; don’t be ridiculous. Smart people do this stuff professionally and then you come along and think you’re some hot shot who can do it in a few minutes?

Basically, yes. Mainly because I do this professionally too.

Just because I haven’t been assigned to do it this time does not mean I can’t.

How about you are generous and interpret my comment as if I know what I’m talking about, then I won’t be an ass and nuance my previous statement a bit.

Something can (and probably should) take a month the first time around. Then, if you’ve actually fulfilled your promise as ‘devops’ any second time you are doing this thing it should take significantly less time. Maybe you’ve not automated all parts, but certainly the most common ones.

I think we’re now at iteration 20+, and it still takes a month for every new environment (dev, sqa, prod) of the same application.

Now it’s definitely possible there is a reasonable explanation for that, but it’s starting to get a bit suspect. Enough so that I’ll make a frustrated comment about it on HN when the subject comes up.


Have you looked at their ticket queue?

It could be a matter of work in progress or internal bottlenecks.

[0] has a good summary of WIP after the heading titled "Why Do We Need To Visualize IT Work And Control WIP?".

TLDR - read the Phoenix Project.

[0] https://itrevolution.com/resource-guide-for-the-phoenix-proj...


> software engineers aren't thinking about or care about their CI/CD pipeline beyond it being easy to work with

Then they're not great engineers. That's fine at the beginning, but in a long run, they need to care or they'll cause issues. The separation you mentioned and the possibility of creating higher order tools exists only if software can support that kind of environment.

Whether the software can be set up with a single file / endpoint / environment, or does it require interaction with a custom gui for initial setup for example defines whether ci/cd setup is trivial or a long project on its own. Unless devs can work to make things easy to automate, high order tools will not help them.


>Then they're not great engineers. That's fine at the beginning, but in a long run, they need to care or they'll cause issues.

What kinds of issues do you think will happen? Isn't the whole point of having devops/sysadmins so software engineers don't have to spend valuable time setting up and maintaining infrastructure and tools? If the answer to that is no, then we've come full circle because software engineers maintaining infrastructure and tools are just devops.


> There is a huge sysadmin workforce. Many of them never learned to really code. Are these people supposed to go away? Are they supposed to learn to code?

Yes, and when they do learn how to code, you get stuff like Puppet.


>Coding is a skill that takes time to master, and if these people were sufficiently motivated to code, they probably would've become coders before now.

Personally a DevOps/SRE type role appeals to me more because

1: I really liked ops but found coding and doing NOTHING but it kind of meh. A combo is fine.

2: I REALLY like approaching ops as a software problem

3: The work itself has more variety than straight ops or dev which I think is more suited to my adhd issues.

I think there is much political inertia and a lack of code savvy workforce keeping ops going but honestly the way I see it the writing is on the wall for traditional ops. I never see any ops positions in FAANG.

It may be true most coder s don't think about infrastructure but that gives more value to this niche not less. There is still value to DIY when it comes to things like infrastructure.


Sysadmin folks can transition into security, they don’t need to go away


Sysadmin folks can transition into security

No more easily than they can become developers. There are a hell of a lot of really crappy cybersec guys around who think the job is just the access control subset of sysadmin...


We use NixOS for everything and have no dedicated ops.


Can you clarify? What makes it so easy abs painless?


No stupid state. What people think of as "inevitable" shit for ops to clean up is often (not claiming always!) entirely preventable shooting in the foot.


> The most common job title seemed to be SRE (Site Reliability Engineer), although there was a long tail, and they don’t care much about job titles.

Oh I like that one. "I kept the pile of shit running and put the fires out before they were noticed" is too long to put on a business card, anyway. "Site Reliability Engineer" sounds almost tame.


The name and job title churn in this space over the past five years is breathtaking, almost JavaScript-ian.

I've been at my current company for 6 years now. When I first started, those guys were "Operations", and the job title was "Admin". We used Google App Engine, and they mostly clicked buttons in the GCP web console.

One year in, they re-branded as "DevOps". They were all "DevOps Engineers". They wrote a lot of hacky Python scripts, using the "gcloud" utility, to manage our new Docker-based services on Google VM's.

A year or two later, they decided to become "Platform Services". They changed their titles to "Platform Engineers". They saw us developers using Jenkins for our CI/CD pipeline, and decided to re-implement all of their Python scripts as Jenkins jobs.

Earlier this year, they became "The SRE Group". They are "Site Reliability Engineers" now. We eliminated their Python scripts, by migrating our microservices to Kubernetes and using managed databases. So now they're back to clicking buttons in the Google Cloud Platform web console again.


You speak of them with such disdain. Is that a common attitude in your company?


I can understand such disdain. I am a software engineer currently working in a DevOps team (yes, team, you’ve heard it right). I’d say that 90% of DevOps engineers I’ve met don’t even know what a linked list is, and they themselves talk with disdain about developers.

So much for DevOps philosophy.


I’m a developer who runs an infrastructure team. Have for years. And most developers I know don’t know why you’d log in JSON unless I was the one to explain it to them (patiently, while remembering the times they acted frustrated at my guys for not doing all the magic and just some).

So much for DevOps philosophy, indeed.


I wouldn’t mind so much if they at least knew what terraform was.


if they at least knew what terraform was

Any real, experienced DevOps engineer will reply “a steaming pile of shit”.


Sure, it's got its problems, but please show me a better alternative.


On AWS, Boto. On Azure, DSC (Powershell) and ARM templates.

You will need to maintain separate TF codebases for each one anyway, but with the native solutions you get easy access to all features of the platform and don't have to spend most of your time jumping through stupid hoops to try and pretend that one tool can do everything.

No-one would write code worrying if it was valid in both Java and C# but that is the same level as what TF claims to do. It's complete shit, ditch it and you will be 10x happier, I guarantee it.


Sounds to me like using a single tool instead of a bunch of separate ones with different functionality is a win to me, but we are each entitled to our opinions.

Boto doesn’t seem comparable to terraform at all, unless you enjoy building state management yourself.

Terraform has been better than anything else I’ve tried so far.


They are worthy of disdain because they never took a CS class?


No one is worthy of disdain due to unfamiliarity. I have explained why a software engineer that cares about the infrastructure might feel disdain after reading about and agreeing to the DevOps philosophy and switches to a DevOps role only to find out that the vast majority of new teammates lack comprehension of the most basic software engineering principles and don’t find value in them. It is incredibly frustrating.

Now, it is also true that most developers are just content with clicking around in their Jetbrains IDE without bothering about resources or even knowing what a file descriptor or system call is. Those so called senior Java engineers that I had to explain the basics of garbage collection to them. Those also make operations harder than it should be.

But that’s another story. We’re taking about the dissonance between what gets advertised as DevOps and what truly is, which seems to be the norm in our industry and is leaving a significantly amount of people unhappy and dissatisfied.


> You speak of them with such disdain.

Nonsense. They're smart professionals, and probably the hardest workers in the company. Certainly the ones subject to the highest degree of pressure, which they always handle with grace.

I "speak" of the absurdity of our industry's superficial hype cycles, and forever-swinging pendulums. After you've been in the game for 20+ years, you will recognize a number of cycles and pendulums yourself. You'll go along with it, because that's part of being a professional. But you'll wonder why we're collectively unable to step back, and recognize more of these things, and not pointlessly churn so much.

However, I could write a similar comment about engineering managers and SDLC methodology trends. A similar comment about testers and quality assurance. And yes, probably a dozen similar comments about language fads and architecture patterns and trends related to developers.

I don't think any of those comments would come from a place of disdain for workers themselves. I just think that culture has shifted, and anything short of unqualified gushing praise registers as offense for many younger people. Maybe that pendulum will swing again too?


Have to agree with the comment below, DevOps folks/teams I've worked with seem to have both a superior attitude and an inferior knowledge of good practice and good habits when compared to developers.

They seem to have bought into their own myths.


Gee, I don’t know. When developers are constantly flinging half-tested and wholly unoperationalized things over the wall, perhaps it’s normal to develop local-maxima defensive practices.

I am a software developer and yet when I hear a developer open their mouth to complain about nearly any other aspect of a product team, be it QA or infra or UX, it has this weird tendency to resolve to Anybody But The Developer causing an issue.


Then perhaps you ought to sit with the folks I'm consulting for at the moment, who have a DevOps team who have failed to deliver a reproducible system for over two years and still have the attitude that they're the superior race.


I’m sorry to hear that, but have you considered the possibility that the team you work with is not emblematic of the entire industry?


Indeed, that's just my current experience.

That said, they are highly paid, London financial world folks, so you'd hope they'd be good.

I also hear from a developer I work with who has recently defected back from devops style roles to devs that this is not atypical. He seems to place blame at least partially on "The Phoenix Project" for the attitude!


I'm pretty sure SRE is a very old title at google.


I love the incredibly vague job title "Member, Technical Staff" I had at Sun. It could cover anything from kernel hacking to HVAC repair!

At least I had root access to my own workstation (and everybody else's in the company, thanks to the fact that NFS actually stood for No File Security).

[In the late 80's and early 90's, NFSv2 clients could change their hostname to anything they wanted before doing a mount ("hostname foobar; mount server:/foobar /mnt ; hostname original"), and that name would be sent in the mount request, and the server trusted the name the client claimed to be without checking it against the ip address, then looked it up in /etc/exports, and happily returned a file handle.

If the NFS server or any of its clients were on your local network, you could snoop file handles by putting your ethernet card into promiscuous mode.

And of course NFS servers often ran TFTP servers by default (for booting diskless clients), so you could usually read an NFS server's /etc/exports file to find out what client hostnames it allowed, then change your hostname to one of those before mounting any remote file system you wanted from the NFS server.

And yes, TFTP and NFS and this security hole you could drive the space shuttle through worked just fine over the internet, not just the local area network.]


Considering yall ran sendmail, I had root on your workstation if i wanted it :) it was a different world then.


When the Morris worm went around, one of the ways it got in was through sendmail, using the "DEBUG" command. Right after it happened, some wise-ass sent around an email telling everybody to edit their sendmail binary (with Emacs of course), search for "DEBUG", and replace the "D" will a NULL, thus disabling the "DEBUG" command.

What that actually did was change the "DEBUG" command into the "" command.

At the time I was running a mailing list from the University of Maryland, and often had to check Sun email addresses by telnetting to sun.com port 25, pressing return a couple of times to flush the telnet negotiation characters, then going "EXPN some-email-address".

So the day after the Morris worm, I go "telnet sun.com 25", hit return a couple of times, then "EXPN foobar", and it dumps out a huge torrent of debugging information, because I had accidentally switched it into debug mode by entering a blank line!

I reported it to postmaster@sun.com, and they fixed it. But it's kinda silly that they would have applied such a ham fisted patch to their sendmail demon like that, based on an email that some dude on the internet sent around!

https://www.rapid7.com/db/modules/exploit/unix/smtp/morris_s...

http://www.cs.unc.edu/~jeffay/courses/nidsS05/attacks/seely-...

https://spaf.cerias.purdue.edu/tech-reps/823.pdf


I've had to convince some folks that indeed, someone with the title of "system administrator," if they put more than one programming language on their resume, is probably "SRE," or "DevOps."

I have the job title of SRE currently and could care less what the actual title is. I've got work to do, and I'm going to accomplish it in the best way possible given whatever constraints exist at that time.


It does matter - in my experience, the future job prospects of "System Administrator" are much worse than "SRE"/"DevOps".

Switching my title from "System Administrator" to "SRE" within my last company resulted in a job family change and a 10% raise (I had to show I could code as well as a software engineer in order to make the switch).

When I left that role, having "SRE" on my resume instead of "sysadmin" was (probably, I don't have any strong evidence for this) instrumental in getting responses when I applied to "Software Engineer" roles at selective companies.

I think there is a bias against roles that don't code, and especially roles that sound outdated.

A lot of larger companies and managers in those companies don't understand what DevOps is, what SRE means or anything like that. They just know Ops and Dev, and your previous job title is probably the strongest hint they have to work off when they categorize you. Getting lumped in with Ops is (probably) a big hit to your earnings potential and limits your future options if/when you decide to move.


> Switching my title from "System Administrator" to "SRE" within my last company resulted in a job family change and a 10% raise (I had to show I could code as well as a software engineer in order to make the switch).

Perhaps we're in agreement, but this is critically important. When my company hires for DevOps, CloudOps, etc positions, we are innundated with applications from Ops/SysAdmin personas. We don't want people who will put out the fire and keep the system limping along--we want people that can't stand putting out fires _and who have the skill set_ to build systems that (1) aren't likely to catch fire and (2) are easy to troubleshoot/extinguish when they do. One way we're addressing the problem is to change titles from "DevOps" and "CloudOps" to "Cloud Engineer"--not sure yet how big of an impact that will have (if any at all), but it's worth a shot.


I am actively hiring for a devops. I get hundreds of resumes from sysasmins that either don't have AWS/cloud, Ansible/Automation, CI/CD lifecycle, DB management or Ngnix/Apache experience.

The majority are sysasdmins that did point and click setup ..


You think maybe some of those folks are looking for an opportunity to LEARN some of that tech?

The "DevOps" fad is screwing over a large segment of senior level I.T professionals who are used to specializing. (Databases, Storage, OS, Security, etc.). I've also yet to see any startup that Jez Humble would actually call a DevOps shop.

Now, startups are hiring generalists with 3-5 years of "hacking" experience, or have a popular project on GitHub.


A willingness to learn new tech is essential but not sufficient. Mostly it seems folks with a more traditional sysadmin background are looking to be Kubernetes sysadmins or AWS sysadmins, but we’re not looking for sysadmins, we’re looking for engineers. Learning the new tech isn’t sufficient—it’s not even about the tech—you need to be able to _do engineering_.


This is a critical point, and it's one the post to which you replied seems to miss.

"System administrators" in the traditional sense--and I have hired many of them and consulted on the obsolescence of others--often and generally exhibit that strong get-it-working,-damn-the-consequences tendency that is in opposition to--well, none of us in this industry are engineers, but some of us aspire to engineering. Rigorous, systemic, and repeatable are the watchwords, and to that end those system administrators aren't being "screwed over"--there's a different skillset being prioritized.


I think that is a gross mis-characterization because I see a ton of "get it done now, make it right later" bullshit among DevOps-y start ups.

Again, I think there's a large talent pool available, but startups who think they'll be the next FAANG act too big for their britches and actively discriminate against older tech workers who are likely experts in several pieces of the tech stack.

I routinely see these folks get passed over for younger, less experienced candidates (often for 1/2 to 3/4 the salary) who look good on paper because they wax eloquent about their pet project on GitHub, facial hair wax and kombucha.

Source: I make damn good money as a "fixer", and my primary customers are 5-30 person startups. I don't "code", and never will (useful scripts, and some automation/cloud API excepted).

I go in and practically beat the managers over the heads with the DevOps Handbook, and "engineers" with the NASA Systems Engineering Handbook. Most of my work is tearing out fucked k8s installs, and cutting AWS spending by 1/2 or more. (A few clients were billed based on how much I reduced their bill).

Have a standing job offer with one client, however it requires Azure certification pretty much immediately. Between not really using much MS stuff, and the exam focusing mostly on the Azure CLI, it might not be worth the trouble for a steady paycheck. They were nice enough to cover a training course though, so I'm willing to see where it goes.


Sure, there are stupid startups that think they do "devops". What of it?

It's great that you can make that money in the role you describe. Before I decided I wanted to stop doing sales work alongside dev work, I used to make very good money as a similar fixer. On the other hand, I do code. I'm very good at it. And I've learned that fixing the situations of companies whose operators don't code is to fix, or replace, those operators. Especially those expensive operators who you're holding up over folks who understand systems as code and as managed resources.

Sneering at kombucha and facial hair wax, though? Aren't you saying you're the adult in the room here? Frankly, you sound bitter. And that sucks. As somebody who has spent his entire career doing both dev and ops and getting to the point where melding them together is natural and the teaching thereof is likewise a basic part of work, I've had to recommend the replacement of people who act like you're acting in this thread. 'Cause I'm happy to teach, and I've never met a hands-first sysadmin who couldn't do what should be done. But I've met a lot who won't, and if they don't retire first it eventually catches up to them.


So you are part of the problem then. You've globbed onto a bastardized concept of "DevOps", and refuse to allow other people into your walled garden.


There is no authoritative definition of DevOps, so there’s really no point in arguing that one of our definitions is wrong. I’m telling you that it is more valuable to treat ops as an engineering problem rather than “duct-tape it and keep it chugging along”. So yes, if you want to treat it as an engineering problem, you must _employ engineers_. These engineers can be former sysadmins so long as they know how (or can be trained in a reasonable timeframe) to do engineering.

WRT your “walled garden” quip, employment is about qualifications. No one is entitled to jobs for which they are not qualified, not even sysadmins. If the employer is hiring engineers and the sysadmin candidate can’t or won’t learn how to engineer, then they are not qualified.


Hi! I wanted to say a couple small things in followup to your recent submission about coding, but that thread has now locked and is not accepting replies (<rant>my biggest dislike of HN</rant>). Really glad you're still posting comments and I can reply to this one! My email is in my profile (click my username).


What types of businesses — perhaps the industries — do the majority of your applicants come from? I'm asking because, from my limited experience, it seems that many small and medium sized businesses that are NOT in the IT field are afraid of testing newer tech.

The justifications I've been given for this stance are:

1. Newer tech introduces new problems, and increases the scope of working knowledge

2. Adopting new practices requires the business to attempt to hire for that skill in the future

3. Present managers, whose experience stemmed from working in a sysadmin role, do not have the working knowledge and capability to understand/learn new practices

With that, do a lot of applicants seem to come with a basic/old background of just Windows (or the like) experience?

It's hard to try and get any of your mentioned requirements running in these businesses. What my friends and I've encountered is a big resistance to the command-line (Powershell or Bash), learning how Linux systems are configured, or anything that doesn't come with a large support contract.

If yourself or anyone on HN has tips or anecdotes on how to introduce changes — gradually and slowly AND given that it could help the business — I would LOVE to read them! My biggest goal is to reduce operating and capital costs for systems that are not accounted for contributing directly to increasing the company's revenue (at least when you don't control that calculation, anyway).


Yeah, I’m not complaining about the state of the industry - it makes sense why you want people who can code! My advice is more for people in DevOps who can code that because of this inundation of sysadmin applicants who can’t code, you need to make sure that you are distinguishable from the herd so that your application doesn’t get rejected immediately based on your title. That’s why the title really matters, and also why it’s important to actually write code as a DevOps. Any role where you don’t get to code is career-limiting, IMO.


In that case, isn’t an excellent DevOps/sre engineer indistinguishable from an excellent software engineer? Why hold any distinction at all?


Probably because the role actually does still require someone to be able to put out fires while under stress. I've met many otherwise excellent software engineers who cannot or will not deal with high pressure oncall situations. The Operations aspect of DevOps requires a certain level of familiarity and comfort with the type of real-time communication and troubleshooting needed to deal with emergency situations. Some people are excellent with design and implementation work, but do not communicate well enough in high pressure situations to fit a DevOps/SRE role.


Maybe so, but my experience is that the "ops" people don't want to be "devs". As other posters have said they do want to be integrated with design choices that will affect them, and many times they do write non trivial code that keeps these complex systems on their feet.


Most software engineers simply hate being on call and the software being developed can be pretty mundane unless you’re working on cloud native tooling perhaps. It is a rather narrow area of software engineering honestly, but IME software engineers passionate about their software in production are great SRE candidates and I say this not because I’m a former generalist software engineer either but have had to hire for these positions.


Narrow hm? In my experience as sre-swe I had to debug and write patches for kernel issues, networking issues (l3,l4 and l7), various OS issues (related to fs, cgroups, memory management), then there’s orchestration (scheduling, upgrades), safety/reliability and various configuration tooling which I had to write in Python, C++ and Go (not to mention half a dozen or so DSLs). Then there’s incident response skills for oncall.

It is much more broad than when I was an embedded dev with only one job - to make some driver work on a different architecture.


If you have your infrastructure team on call instead of your developers, you are screwing up.

In almost every reasonably shaped organization the majority of bugs are shipped by developers, not infra/platform/SRE. Localize the pain to the agents who cause it or it will never go away.


Oh, not saying that’s how it should be. My current situation is such that infrastructure is the majority of the production issues and we’ll call developers on the rare occasion something serious happens relating to their code. Our platform goes through much more testing rigor than most SaaS companies our size tend to perform and I’m proud to be supporting these guys.


This isn’t it. DevOps aren’t the (exclusive) oncall engineers, the dev teams should be responsible for oncall as well—the people empowered to create or fix the operations problems should be responsible for operations. See my sibling comment for why DevOps is different than SE.


All true. At my current gig, the infra team is on call for pretty much everything. That's how it was when I started, and it's taken time to deal with stuff like alert fatigue and better surfacing of metrics and logs. But we're now in the process of moving to all first line pages going to the dev team (because they ship most of the bugs in the first place). If there's an infra problem, they can call us then.


From my experience, SRE is a person with developer mindset (and skillset, or the desire to have a developer skillset) who doesn't mind touching the infra.

A SWE typically won't want to touch infra. That's my experience of course, YMMV.


SWEs who don't touch infra show up because universities today pump out a lot of book-smart real-world-dumb graduates.


I don't think so. I have my areas of expertise and of course in a pinch I will pitch in and try to help any way I can, but I've always found my best work is done in a team with fairly well defined roles, a healthy respect for each others' specialisms, and an enthusiasm for short bursts of collab/pair coding and longer stretches of solo(ish) work.

I have fond memories of the team that worked across three time zones so I would get up to a set of well-described problems that I would solve in the morning (so satisfying), then a nice stretch of feature building after lunch, then a burst of pair coding with my newly arisen colleague, then maybe finishing with writing up any roadblocks or requests for the next person. I got very used to identifying blockers that were out of my area and would be better solved by the domain expert, and also a LOT better at ticket writing. It was a really, really productive and rewarding workflow and one of the key points was not getting bogged down in stuff outside my areas. We also all really appreciated each other because we all experienced each other as magical elves giving answers to hard problems in exchange for answers for easy problems! ;)


The distinction is mostly because a lot of developers lack the fundamental understanding of downstack problems in 2019. It’s not that they can’t do it, it’s that they’ve rarely had to and in so not doing have built themselves a mental Jenga tower that requires time and effort to stabilize and build a foundation beneath.

Companies hate that. Investing in people who will leave is bad, they think. Put them in their box and let them do what they already know.

Which is why they hire me and call me an SRE. (I don’t use the term. My current title is “principal engineer”. I’m not an engineer, though. Neither are most people here.) And I’m not saying downstack ignorance is great. Profitable for me, sure. But it’s a natural response to companies’ unwillingness to invest in their people. They want them pre-made. Hence the made-up titles for people with breadth.


Familiarity with different problem domains. Software engineering is essential to the DevOps skillset since we do a lot of automation, but also understanding the (constantly changing) ecosystem of tools, how to design a CI/CD pipeline, how to configure the developers’ dev environments, how to model your infrastructure as code, etc, etc. A good DevOps engineer is a good SE with an understanding of the DevOps problem space.


Because they don't want to pay software engineering wages.


SREs make 5-10% more than SWEs at Google.


That’s good to know, thanks for bringing that to my attention. Is it true that SREs are more likely to have oncall rotations at Google than SWEs? That is the impression I got from reading the Google SRE book - it talks about bringing SWEs into the oncall rotation when the SREs are swamped (i.e. more than 50% of their time is devoted to operational work) as a kind of pressure release mechanism to prevent the SREs from burning out or getting mired in endless toil. It makes it sound like outside of situations like that, SREs are doing most of the oncall work.

If this is the case, perhaps the pay discrepancy is explained by the greater oncall duties and morale issues for SREs?

It’s also worth mentioning that based on my reading of the Google SRE book, very few organizations approach operations like Google does. I personally think that Google has an enlightened approach to operations, but not all companies do.

Basically, I think SRE is a good title to have, but sysadmin is not, and I encourage anyone early in their career to rebrand themselves ASAP.


> I personally think that Google has an enlightened approach to operations, but not all companies do

Perhaps because the founders were doing operations (well) in the beginning, so they know a few things about it and its importance. They famously built a reliable system with about the cheapest possible hardware.


Are SWEs at Google expected to firefight especially outside of normal work hours? This is one thing I imagine SREs are expected to be doing that regular SWEs aren't. I think this is also why they require strong development background for the role.


I worked in Amazon which has a similar view to on-call to Google. Usually one person is on-call for the team for a cycle of every week or fortnight. There is also a "follow the sun" model wherein another team in has your nights covered. I do recall that SWEs also had the responsibility of being on-call, since teams are usually a mix of SWEs and SREs.

Naturally a team with a high ops load was shit and made your life hell. After my time there, I vowed never to do on-call unless it was after hours and specifically for emergency response. My opinion is - if there's an issue during work hours, the leads or the entire team should be on it - and collectively fix it.

Anyway, Google has written a bit about this in their SRE book: https://landing.google.com/sre/sre-book/chapters/being-on-ca...


Which kind? There are SRE that are SWEs and then there are plain SREs.


It doesn't matter now. It'll matter next time.

Change your title to DevOps, SRE, or team lead. I also don't like titles, I'm sharing my experience with you.


"Computer Janitor"


Based on my last hotel stay, that would be Computer Sanitation Engineer.


Maybe some day Amazon will make a VR interface to AWS, like Viscera Cleanup Detail:

In Viscera Cleanup Detail, players are given the role of "Space-Station Janitors", tasked with cleaning and repairing facilities that have been the scene of bloody battles during an alien invasion or other form of disaster. Tasks include gathering and disposing of debris, including, dismembered bodies of aliens and humans, spent shell casings and broken glass, restocking of wall-mounted first-aid kits, repairing bullet holes in walls, and cleaning of blood splatter and soot marks from floors, walls and ceilings, as well as secondary bonus tasks. These include stacking items like crates and barrels in a designated stacking area, and filing disaster reports on the events and deaths that took place in a corresponding level.

https://en.wikipedia.org/wiki/Viscera_Cleanup_Detail

Writing for The A.V. Club, Chaz Evans called it a commentary on first-person shooters that focuses on the consequences of violence.

https://games.avclub.com/seeing-without-crosshairs-a-survey-...


This comment made my day. I laughed out loud. Well done.


It's funny because it's true! The amount of insanity is frustrating...


How do they keep a pile of shit afloat, though? Do they build log monitors and trigger restarts when shit dies?


The trick is to eat the right stuff so shit floats by itself.


Basically, yes. We use Kubernetes and it has that shit built in. If the software is too crappy you will be forced to switch it off and do it manually, tho. And when you are not doing anything else anymore you will have succesfully tansitioned from software engineer to SRE.


I recently transferred as a developer to a development operations role at a medium-sized company. This article exactly describes my experience - the main focus of the ops team seems to be on build, deployment and monitoring technologies focused on a migration towards containerization running on AWS.

Code and tooling is built on a "what works" basis and no particular attention seems to be paid to the overall design of the software or testing (likely due to a real or perceived lack of time). On-call rotations (and how to eliminate work for them) is the hot button discussion topic.

The #1 question I get asked by the developers is "Did you really want to transfer from development to ops?" which me think that a lot of developers look down on operations roles and see it as a demotion. I find that quite odd given that 1) ops keeps the product up and the money coming in with relatively little headcount and 2) most of the people working on our ops teams have a formal education in software engineering.


I think it’s because ops folk cobble together scripts and tools only to scratch the current itch, rather than think through the whole problem and design and write software to solve it. Testability being one of the biggest sins I’ve seen in ops, tools like Ansible encourage changing systems at run time with complex logic tied to specific deployments for example, they don’t use IDEs so there’s no way to jump through the yaml files, get inline help, know which playbook is run when (it’s like a program with 100s of main()), no integrated debugger. It’s like the previous 50 years of computer science never happened and they’re starting back in the 1970s.


A little thought exercise: your comment from the POV of a ops person explaining why they look down on devs.

Dev folks often take weeks to make even a small change. It doesn't matter how urgent the need is or how badly the business needs some kind of workaround, they over-complicate every problem, and ops has to spend more time in meetings planning their next project than it would have taken for us to put a working fix to be in place. Maintainability is one of the biggest sins I've seen in devs, tools like npm encourage devs to use an overly-complex chain of 3rd-party dependencies for even small projects, and they rarely ever update their dependencies after initial deployment, so if a security update to an underlying package is required or if the underlying has to be deployed to a new underlying system or container changes, things can break easily. They depend on IDEs for everything, and can't even use commonly installed system tools like awk or sed when, or use netstat to debug simple networking issues. It's like the 1970s never even happened and they never even learned what an operating system is.


Just like developers cobble together apps that are barely operable. Hard coding IP addresses, opening connections with no timeouts, and lacking basic understanding what the difference is between DNS and HTTP.


at one point, I assumed hard coding IP addresses and paths to /home/user/whatever/ was a just a web dev thing

spoilers: it is not just a web dev thing


Great post! You touched on something there thats been bugging me last few years, testing is non existent in devops/sre world so developers who hate testing and do not understand it is core to good engineering seem to gravitate towards devops roles as they can hack their way thru their day all while creating tons of tech debt.


Testing is basic in devops/SRE. How are you going to ensure reliability if you cannot test the shit out of it?

The issue is there’s no formal tool or practices for testing in ops. DSL languages are limited forcing you to use several for different kind of scenarios. At the end you need to rely in a real programming language to parse different format files and ensuring your variables are correct. I think developing software for operations is exciting which is going to mature with time. Kubernetes(and its CRD) is a step forward


Chef had great support for automated testing. We also generally did a lot of testing on other tools.

Sure, things like ansible lack proper testing support, but that doesn't mean that all of the profession doesn't test.


> The issue is there’s no formal tool or practices for testing in ops.

There is though, it’s called writing normal software. Kubernetes can be framework to build on, with actual developers who design from high level logic down through to the implementation.


> The #1 question I get asked by the developers is "Did you really want to transfer from development to ops?"

I’m not sure what job listings they’re looking at, but as an ops person (kubernetes) I’m interviewing for jobs with close to half a million a year in total comp.


I think that people perceive being on call and having to work with messy legacy infrastructure as a restriction on their personal freedom and an anti-perk. We also don't compensate well for oncall shifts so that might be part of the reason that people see it as a lesser form of work and not as a special status that's only given to the best engineers. At other organizations it may be different... I hear FAANG companies compensate their on call shifts quite well (typically a percentage of base salary banded by SLA)


Second time I'm hearing (what's for me) a mile-high comp number for kube jobs and I'm now really tempted.

Working as a data scientist - software engineer in a midsize company, I constantly battle amateur ops folk and "backend" fullstack jockeys from introducing kubernetes into a saas product I mostly created by myself (makes money but there are maybe ten users per hour tops, why would I need kubernetes for that?).

My org has seen multi-day downtimes for the entire eng team workflow because the eks cluster went down and they couldn't figure out how. We have four people dedicated in the infra team for this! I'm not really an ops person but I see where the failings of these folks and stacks are, and feel like I might be able to learn to be half-decent if I put the time. What advice would you give ?


Kubernetes is a full time job. If you want to capture 90% of the benefits of containerization without wasting too much time on a complex solution then simply restrict yourself to only use docker with bash scripts and maybe a load balancer if you really want to have HA.

I've found nomad to have a lower complexity than kubernetes but if you cannot directly integrate service discovery into your application then you will need to use a service mesh which is an all or nothing thing but using something like traefik's support for consul means you will have to use regular service discovery alongside the service mesh. It's not a huge burden but there should be a better way.


> why would I need kubernetes for that?

They need k8s for their, otherwise bleak CV. Its all hype and shiny today.


> (makes money but there are maybe ten users per hour tops, why would I need kubernetes for that?).

The idea is that you move your low traffic app onto servers with other low traffic apps and save money. If they’re just moving your app to a kubernetes cluster by itself it’s probably not worth it.


Whoa where are you seeing these job listings? I don't know if I've ever come across a public listing for a technical role that paid so much.


They don’t really exist. I’m mid career management and hire these types of roles. You go above $200k and you’re just wasting money. There may be a few random openings above this but nothing sustainable once folks realize the talent curve.


They don’t list the salary but the recruiters will tell you — it’s Silicon Valley companies mostly.


Can you share more? I am aware that this comp exists (just see levels.fyi) but haven’t found it for Kubernetes focused roles yet.

Are you talking to FAANG’s? Second tiers like Uber, Square, etc? Other?


Both.


After having done a rotation in the dev ops world, I find their lack of design and architecture really disturbing. Services are implemented without edge cases considered. Their tools are some hacked together monster using flavor of the month and some old tech stacks. Unit and system tests aren't written, or if they are, they don't test any edge cases whatsoever. It's hard to test anything locally and most testing is done in test or even prod (Seriously). Documentation is almost non-existent and the implementations of services such as Chef don't follow chef documentation or best practices. Logging is usually non-descriptive and applications written by the ops teams will fail with cryptic errors which they know by heart but make no sense.

"Oh that nil pointer exception on line 53? Yeah that means you don't have IAM permissions".

In short, the ops world lacks what most software engineers would consider basic engineering practices. It really feels like that entire world is just a hackathon project. I know there are some very talented engineers on ops teams. It's just the impression I got from 6ish month rotation.

Rant over, this just touched a nerve.


Strange, I'm burdened with a development team that behaves the same, except they usually don't know what the error messages that their own software produces mean, especially not "by heart".

But since the CEO has a development background, always sides with them and in general wants them to build "features", not fix their shit, they never have to take responsibility for anything.


> medium-sized company

this is mind boggling! For med size company to have a separate ops department. I can't imagine that responsibility for MY code in production (and delivery of the said code to production) lies on a different team/department that are possibly not even on the same floor/building I'm at.

Separate OPS department is where dev's happiness and job satisfaction go to die.

I'd imagine this outdated setup in some government agency but not in a med-size company.


> I can't imagine that responsibility for MY code in production (and delivery of the said code to production) lies on a different team/department that are possibly not even on the same floor/building I'm at.

I recently did some interviews with our devs to find out what features we could add to our platform that would provide them with value. The result was interesting and I found that people typically fall into one of two camps: 1) I want to know everything about my deployed service and have tools that alert me and allow me to intervene, or 2) I don't want to know anything about the deployed service runtime and expect operations to handle my issues and alert me when there's a problem.

It sounds like you might be in the former. =)


The initial intention (of DevOps) was to eliminate these discrepancies by eliminating Ops department all together. DevOps doesn't necessarily mean devs doing ops, but it means devs and people curios about ops sit together, as one team, as one department, right next to each other. This setup improves practically every aspect of the product development, support, and delivery, as well as collaboration, communication, and response times.

Initial intention aside, it feels that there is a general consensus that VAST majority of the companies that have `devOps` in the job description somewhere are cargo culting and have no clue what they are doing.

If you slap `Dev` prefix to your Ops department your job postings will look "trendy" but nothing else will actually change.

DevOps is a culture, not a team or department.


Sounds like how companies or teams cargo-cult “Agile” and have no idea what it is or why.


You may deploy your application package (however that is packaged up), but what happens when a hard drive starts to die? It may not _die_, it just may have elevated write latency. What about a RAID controller firmware having issues above a certain IOPS threshold? What about a critical kernel security patch that has to go out, and your application runs on 1,000 servers?

None of those things are related to your code directly, but may interact with it at some level. At some point, you get so far removed from the work on your actual application that it makes sense to move that to another 'Infrastructure' group.


What do you consider a medium-sized company? Or a department even? I'm having trouble imagining a department being in an entirely different building. The companies I've worked for who I considered medium-sized had hundreds of employees and if they had multiple different offices, each department had a physical presence at each office.


Cargo cult doesn't care about the number of employees in a company.


Operations are always looked down, nothing new here.However it's funny when I see devops jobs with much higher salaries than the ones for dev roles. I manage an ops team in a non tech company- while sales get much more attention, we have better office environment,better equipment,salaries are better and there's no quartermaster using his whip on the deck...


> "Did you really want to transfer from development to ops?" which me think that a lot of developers look down on operations roles and see it as a demotion.

I would certainly see it as a departure from what I want to do - design, build and deliver systems. I'm not really in it for the continuity operations.


Then, and I don't know of a nice way to say this, the systems you design, build, and deliver are going to be unreliable and flawed.

Owning (as in caring about and as in business responsibility) the reliability and operation of systems you build, at least for a while after they stabilize, is critical if you want to produce quality products. After all, operational flakiness is a UX issue.


> Then, and I don't know of a nice way to say this, the systems you design, build, and deliver are going to be unreliable and flawed.

Sorry but that's utter bollocks. I'm not interested in being in ops, therefore my software is shit?

> Owning (as in caring about and as in business responsibility) the reliability and operation of systems you build ... is critical.

But that's not being in ops. That's about taking an interest in the running system.

You've just jumped on me saying I'm not interested in having a role in ops because I like to build software, and run off to some weird unsupported conclusion that I just write code and abandon it. I don't need to have a business responsibility for the running system in order to support it and be responsive to the ongoing needs of those that do.


> 2) most of the people working on our ops teams have a formal education in software engineering.

You are so lucky. This is incredibly rare. I'm also lucky that my team is the same way. But most teams are not.


Hey dude was wondering how this comment made you feel:

> Cloud monitoring is a saturated market.

It's like yeah, it's saturated but it's saturated because the infrastructure is we're hosting it on is still and forever changing. Take some serverless CloudFormation in AWS; there was no good solution for application monitoring until someone specifically started solving for it because no one in their right mind was going to use CloudWatch and none of the other existing monitoring solutions/tools could fit the bill either unless they started from scratch and solved for that specific new infrastructure.

<Insert shameless Epsagon plug here />

The Cloud monitoring market might seem saturated but that's because there is no "silver bullet" solution given how much infrastructure has been and continues to change.


CloudWatch works fine and Epsagon's sales tactics are dishonest and shitty in addition to being spammy--I'm still waiting to hear back from a "Cassie" who I don't think actually exists as to where they sourced my email from for their cold-email marketing blasts.

I'll never do business with a company that gross.


> DevOps means the veteran admins had to check in their personal scripts

Oh my, this is an epiphany.

I don't even know what DevOps means. But if the implication was that the "veteran admin" was making arbitrary state changes and is now forced by DevOps to document it in the commit history, I am firmly in favor of whatever DevOps is.


My understanding is that devops are the veteran admin, only now they check-in their code.


What it was supposed to be -

Software development, deployment and ops come under the same role. The DevOps engineer controls the horizontal and the vertical, the code, the environment, the build, the deployment, and as such is a multi-talented unicorn.

What it seems to have become - Sysadmins writing scripts around terraform, and formalising their work to the extent that it is at least usually reproducible.


> a multi-talented unicorn.

That's not necessary. Natural curiosity and willingness to treat dev/ops (whichever is not your team) as us, not them, is sufficient.


I'm not sure I've seen that be any more prevalent now than it was when we had developers and sysadmins.

To me it's become like 'agile' and interesting set of viewpoints and philosophies totally ruined by the industry that's grown around it.


Well, at least the DevOps industry brings actual value to the process (well, not when you are sold OpenStack, OpenShift, Vault and whatever else to run a single container image).


> Everyone hates YAML. Everyone writes a lot of YAML.

I don’t know a single engineer (ops or not) who enjoys writing YAML, yet it is utterly unavoidable.

I’ve lost count of the amount of bugs and broken deploys that have happened because of YAML, or because of a type error caused by it.


What people really hate is having to write complicated configuration. No matter what format you put it in, it is still complicated configuration. It's hard if not impossible to test, there's only one right way to do it, and it is likely interconnected with other complicated configuration.

Whatever format it happens to be written in (now YAML is the apparently trendy way) it is guilty by association.


Pure YAML will never DRY.

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

If you write procedural configurations with a real Turing Complete, but not Turing-Tarpit language like Python or JavaScript, then you don't need to repeat yourself ridiculous numbers of times, and manage thousands of lines of hand written eye-sore almost-but-not-quite-entirely-unlike-tea YAML. Plus you can implement validation, warnings and error checking, and support multiple input and output formats.

But so many DevOps dogmatically go out of their way to avoid writing any code, even when the alternative means writing a hell of a lot more brittle unmaintainable YAML where you have to carefully check every line for correctness, and make sure that you're actually repeating the same thing in every location required without any typos, and don't let any of those repetitions slip through the cracks when you make changes.

With real Turing complete code, macros, and templates, you can accomplish the same task as with pure un-DRY YAML data, much more easily and efficiently, with orders of magnitudes fewer lines of code, that you can actually understand, maintain, validate, and test, so you can be sure it actually works without meticulously checking every line of output by hand.

It's better to combine DRY combinations of different non-Turing-complete data formats like CSV, YAML, JSON, INI, XML, using the most appropriate formats depending on the type of data. Spreadsheets are much better than JSON or YAML or XML for many different kinds of data with repeating structure (so you don't repeat key names), and JSON and YAML are much better for irregular tree-shaped data (since you're not restricted to rigid structures), and XML for tree structured data and documents including text.

You end up needing different formats as input as well as output. So you need a real Turing Complete language to read them all in, combine and validate them, and shit out the various other formats required by your tools and environment.


Almost every piece of software running on Linux or Unix requires some kind of basic configuration file. Whether it's INI or YAML. You can't just 'code all the things'.

If you want an example of a dynamic configuration, look at RPM's .spec. That's a monstrosity. What you're asking for is more of that, and that's insane.

You could also use something like Python to do all your typing, build a dictionary, and just dump it to a yaml file if you think writing yaml by hand is too error prone (which I personally disagree with).


I think RPM .spec files are sunk way deep into the "Turing Tarpit" area I was referring to.

https://en.wikipedia.org/wiki/Turing_tarpit

Any time you take a language like bash that's ALREADY deeply muddled in the Turing Tarpit, and then try to make up for its weaknesses by wrapping it up in yet another arbitrary syntax that you just pulled out of your ass like .spec, you're even worse off than you were when you started. Why pick a terrible language like bash that's no good for reading and writing and manipulating structured data formats like CSV, JSON, YAML, INI, or XML, and then try to "fix" it, when there are so many much better well supported off-the-shelf alternatives that don't require re-inventing the square wheel, like Python, JavaScript, or Lua?


I'm sure when RPM first came about, it was probably less of a monstrosity. But like most things in the software world, people start bolting on new features, and you have a big mess.

For most programs though, I would prefer a YAML config file. It's easy to serialize/deserialize for many languages, and you can adjust your init scripts / systemd units to spit out a new config on startup if you so choose. Or you can use something like ansible and some templates to generate that config once when you deploy your application (we're all using immutable infrastructure now, right?), although trying to template YAML files in jinja2 is a real PITA; I'd probably just write an application specific ansible module to dump out my config and skip the yaml jinja template part.

That's the really nice thing about ansible, you can make it do all sorts of interesting stuff.


https://dhall-lang.org/ No tarpit here (no Turing Completeness or recursion), and as DRY as you want.


> But so many DevOps go out of their way to avoid writing code

Well, this is the real problem, is that DevOps people should be writing code in my opinion, especially code to automate deployments and handle configuration. But many times it's just a new job label for people with the same ol' ops and sysadmin skillset who don't want to write code.

It doesn't help that when they make unmaintainable piles of configuration that nobody understands, it typically adds to their job security.


I totally agree!

There are good DevOps and bad DevOps. Personally, I'm a Dev who necessarily knows how to do Ops, because nobody else is there to put out the fires and wipe my ass for me. Good DevOps should not have such disdain for writing code, and should not be so territorial and focused on job security, and should work more closely with developers and code.

And good developers should understand operations, and shouldn't be so helpless and ignorant when it comes to deploying and maintaining systems themselves, so they can design their code to be easily configurable and dovetail into operational systems.

For the same reasons, it's also important for programmers developing tools and content pipelines for use by artists to understand how the art tools and artists work, and how to use them (enough to create placeholder programmer art to test functionality), even if they don't have any artistic ability.

And for artists and designers to have some understanding of how computers and programming works, and how to use spreadsheets and outliners and databases to understand and create specifications and configurations, so they don't design things that are impossible or extremely inefficient to implement, and make intractable demands of computers and programmers.

https://en.wikipedia.org/wiki/Programmer_art


I'm with you on that. I come from a background of Dev and have been called DevOps (by others), although I just call myself a problem solver.

The realization that I really needed to understand what happens in operations for me came around '09, when the Xbox Operations Center called me and told me my code wasn't working, and we had such a wall between us that I couldn't see what was going on, and they couldn't describe it either.

I ended up writing automated publishing pipelines for them to take the most risky parts of their dozens of pages word doc and writing tools to do this for them automatically. Most people didn't even think this was a thing that could be done, let alone should be done. Problem solved!

I think people who are territorial are inherently insecure in their skills and therefore fear getting out of their comfort zone. Generalists are far better than specialists in my opinion. You want someone to go where the problems are, rather than people who invent new problems for others in their own little empire. I think a lot of big companies are so big they can have people silo'd all day, so people don't even think about the people and systems they are affecting.


I've used a lot of JSON. It's OK. No comments, not many data types. But the spec is only a couple pages, and I've never been in doubt about how something should be escaped, or parsed. I could probably write a bare-bones parser in an afternoon, if I needed to.

I've tried to work with YAML a few times. The tree structure and extra data types are great. Everything else is a huge pain. There's at least 3 versions of the spec, and the latest one is nearly 100 pages. The parts I need are always in some "extension", so there's even more that I need to support. It has a system for serializing native Objects, so you have to be careful with untrusted data because there are some interesting security issues. It's so complex, I have trouble knowing what to quote, or how. It's not feasible to write your own parser in any reasonable amount of time. Worst of all, every parsing library is slightly different, so (not unlike SOAP) you kind of have to know that it's going to be parsed with (say) PyYAML.

Complicated configuration is indeed a problem in any format, but YAML makes even simple things complex. From the beginning, I really wanted to like YAML. Unfortunately, I think their goals (human-readable text, language-agnostic, rich data types, efficient, extensible, easy to implement, easy to use) are impossible. You simply can't achieve all of them at once.


I'm launching a CI service [0] which instead of using YAML configs to run builds on a third party platform, will let you run the builds yourself on your own machines so you can just use a script or whatever you want to do your builds/deploys.

I share your frustration and was motivated by it to build this. Why should I spend ages writing up everything as config files when I have a script that already works, is easy to change and debug, and can handle any custom thing I need?

I think config files to describe devops processes are a good approach for huge companies with huge teams, lots of churn etc. The approach perhaps has simplicity & stability benefits - works for everyone everywhere without understanding any detail, changes are a bit easier to track, etc. But for small teams wanting control, speed and the flexibility of just writing code to do what you want it can often be an inefficient approach. At least in my experience.

You should check Box CI out. Launching very soon!

[0] https://boxci.dev


True, though YAML is the most "human readable/writable" of the usual suspects (YAML/JSON/XML)


I'd say that spreadsheets are vastly more readable/writable/editable/maintainable than YAML or JSON or XML (i.e. no punctuation and quoting nightmares), and they're easy to learn and use, so orders of magnitude more people know how to use them proficiently, plus tools to edit spreadsheets are free and widely available (i.e. Google Sheets), and they support real time multi user collaboration, version control, commenting, formatting, formulas, scripting, import/export, etc. They're much more compact for repetitive data, but they can also handle unstructured and tree structured data, too.

To illustrate that, here's something I developed and wrote about a while ago, and have used regularly with great success to collaborate with non-technical people who are comfortable with spreadsheets (but whose heads would explode if I asked them to read or write JSON, YAML or XML):

Representing and Editing JSON with Spreadsheets

I’ve been developing a convenient way of representing and editing JSON in spreadsheets, that I’m very happy with, and would love to share!

https://medium.com/@donhopkins/representing-and-editing-json...

Here is the question I’m trying to answer:

How can you conveniently and compactly represent, view and edit JSON in spreadsheets, using the grid instead of so much punctuation?

My goal is to be able to easily edit JSON data in any spreadsheet, conveniently copy and paste grids of JSON around as TSV files (the format that Google Sheets puts on your clipboard), and efficiently export and import those spreadsheets as JSON.

So I’ve come up with a simple format and convenient conventions for representing and editing JSON in spreadsheets, without any sigils, tabs, quoting, escaping or trailing comma problems, but with comments, rich formatting, formulas, and leveraging the full power of the spreadsheet.

It’s especially powerful with Google Sheets, since it can run JavaScript code to export, import and validate JSON, provide colorized syntax highlighting, error feedback, interactive wizard dialogs, and integrations with other services. Then other apps and services can easily retrieve those live spreadsheets as TSV files, which are super-easy to parse into 2D arrays of strings to convert to JSON.


That is super cool, please don't over complicate it with utility features. I have been considering a project to manage a kubernetes cluster via Google spreadsheet. Google docs have great features relating to user authentication and permissions. The project would needs to visualize the JSON state representation for the k8s cluster... your project is ideal.

e.g. calling another google service with the JSON using a token minted BY THE USER CURRENTLY USING THE SHEET


Thanks for the encouragement! I agree, I'd like to keep it from becoming complicated. My hope is to keep it simple and refine it into a clean well defined core syntax that's easy to implement in any language, with an optional extension mechanism (for defining new types and layouts), without falling into the trap of markdown's or yaml's almost-the-same-but-slightly-different dialects. (I wrote more about that at the end of the article, if you made it that far.)

The spreadsheet itself brings a lot of power to the table. (Pun not intended!)

There are some cool things you can do using spreadsheet expressions, like make random values that change every time you download the CSV sheet, which is great for testing. But expressions have their limitations: they can't add new rows and columns and structures, for example. However, named ranges are useful for pointing to data elsewhere in other sheets, and you can easily change their number of rows and columns.

For convenience and expressivity, I've defined ways of including other named sheets and named ranges by reference, and 2d arrays of uniformly typed values, and also define compact tables of identical nested JSON object/array structures by using declarative headers (one object per row, which I described in the article, but it's not so simple, and needs more examples and documentation).


yeah my eye brows are fairly raised at the thought of embedding a templating language in it. For production use of a spreadsheet, I imagine pulling the source code out of the spreadsheet using https://github.com/google/clasp and synchronising with a repository using Terraform.

At which point Terraform has a weak templating engine already, but its generally enough for building reusable infra. Additional features can be provided within the spreadsheet using reusable libraries. One pain point with embedding functional dataprocessing in a spreadsheet for JSON data, is a decent way of writing tree expressions, to which I would turn to the de facto JSON tooling jq for inspiration.

if you want to take this further, I am up for building some infra for continuous deployment spreadhseets through terraform. tom <dot> larkworthy <at> futurice.com

But I would not embed stuff inline with the JSON. I would have a pure sheet dedicated to stuff going in, and a compute sheet for stuff join out. And the definition for stuff going out should basically be a JQ expression, that can "shell out" to sheets expressions https://github.com/sloanlance/jq/issues/1


TOML is a worthy contender. It is my favorite simple but powerful-enough data language.

Here is a side-by-side comparison of data in TOML versus YAML: https://gist.github.com/oconnor663/9aeb4ed56394cb013a20

And some comments that resonate with me:

  The yaml spec is overly complex and parsing it properly
  is a nightmare. I rather prefer TOML because of it's
  simplicity. Unless one really need the gazillion extra
  features which yaml provides (which one probably doesn't),
  I'd say sticking with TOML seems to be the saner choice.

  I've recently kind of changed my mind on unquoted strings.
  They're nice when you're editing config files by hand, but
  they run into parsing issues in simple cases like when the
  string looks like an int, or of course when the string
  contains quotation marks itself.


I disagree. Yes it is human readable if you just want to read the words (like your would with Markdown), but with a configuration file you want to understand the structure. YAML makes that quite confusing IMO. It seems like a random array of dashes and indentation.

JSON is much more human readable in that respect because the structure is explicit so there's no ambiguity. I'd say TOML is somewhere in-between. But both are vastly preferable to YAML because they don't have Javascript-style type insanity.


I’ve yet to see an editor that works well with YAML, yet JSON and XML are simple. Add comments and trailing commas to JSON and it would be perfect to write. And XML isn’t that evil, really.


To this day I am baffled why choose YAML? Personally, I think its more error prone, less flexible, and harder to read than JSON. Not to say JSON is a perfect format, but it sure feels better than YAML.


Reason number 1: aliases and anchors. Reason number 2: allowing comments.

JSON is a terrible config format, and an ok data interchange format.


Agreed. It amazes me that we got out of the XML-for-everything era with a lot of people thinking the problem was XML, and not the for-everything part. JSON-for-everything is just as maddening to me.


Comments was the reason for me to move from JSON to YAML in config files. I remember perfectly fine what every option does, but adding comments to each option is a must when it comes to sharing my code with anyone else.


If its a config file I'm not sure why you'll need comments. Comments / documentation should be in the system you'll do the config for. If your set options are that quirky / need documentation; why not explain them in a README?

But maybe I misunderstood. Can you give an example of a comment that makes sense / is required in a configuration JSON?


> If your set options are that quirky / need documentation; why not explain them in a README?

Mostly because I don't know whether the person customizing my code will read the README, but they'll definitely see the comments I wrote right before the configuration option itself.

With comments and some extra whitespace, they can go through the file line by line, read what a specific option does, configure it properly, and move on to the next one. No back and forth to the README required.


JSON's finicky about commas. Quotation marks everywhere are a visual noise. Comments are accepted by many parsers, but not by all of them.

Something like http://www.relaxedjson.org/ is a JSON we need. An implicit root object would make it almost perfect for writing configuration.


Yeah, what I want is JSON with comments and nice multi-line string support. I don't like how much syntactic magic YAML does (I don't need or want the country code for Norway to be parsed as a boolean False value). I still don't know what the exclamation point does (e.g., !Ref).

And clearly YAML is the wrong tool for infra-as-code since CloudFormation has to build a macro system, conditionality, referencing, and a couple different systems for defining and calling functions (templates being one and their implicit functions being another). We also see tools like Troposphere and CDK which are effectively different ways to generate CloudFormation YAML via programming languages (or more precisely programming languages that were designed for humans).

And it's not just limitations inherent to CloudFormation--Helm has long had templates for generating YAML, but those also weren't sufficiently powerful/expressive/ergonomic so Helm3 is supporting Lua as well. And as I understand it, Terraform is constantly adding more and more powerful features into HCL.

So what's the solution? It's pretty simple--we should keep the YAML around, but it should be the intermediate representation (IR), not the human interface. The human interface should be something like a functional language[^1] (or an imperative language that is written in a functional style) that evaluates to that YAML IR layer. The IR is then passed to something like Kubernetes or Terraform or CloudFormation which understand it, but it's not the human interface.

As for the high-level language, something like [Starlark][0] would work well. It's purpose-built for being an evaluated configuration language. However, I would argue that a static type system (at least an optional static type system) is important--it's easy enough to imagine someone extending Starlark with type annotations and building a static type checker (which is much easier for Starlark since it's a subset of Python which is intended to be amenable to static analysis).

This, I think, is the proper direction for infrastructure-as-code tooling.

[^1]: Functional in that it is declarative instead of imperative--not necessarily that the syntax should be as hard to read as OCaml or Haskell. Also, while YAML is also declarative, it doesn't have a notion of evaluation or variables.

[0]: https://docs.bazel.build/versions/master/skylark/language.ht...


I do not understand the need for all of these different new language implementations and data formats. GuixSD vs NixOS already showed that Scheme is a superior solution as a configuration language, scripting language, template language, and intermediate representation. A single language that has 30+ years of successful production use, tons of books and documentation. Why re-invent the wheel in four different, incompatible ways?


Is NixOS a scheme? Anyway, we moved away from Nix because of all its problems (the language being only a medium-sized one). Also, I detest CMake, but by your own standard (longevity, popularity), it is better than Nix or Guix. Frankly those tools haven’t shown themselves to be “superior” in any meaningful way. Yes, they have been around for a while, but having been around for a long time and not enjoying any significant adoption is not very compelling.


> Is NixOS a scheme?

That question does not make any sense. I am talking about Scheme the programming language: https://schemers.org/

NixOS is a GNU/Linux distribution built on top of the Nix package manager. The Nix package manager has its own custom configuration language. GuixSD is a GNU/Linux distribution built on top of the Guix package manager. GuixSD uses Guile Scheme as the package configuration language, system configuration language, scripting language, and implementation language for many of the system services (such as init). GuixSD does more things better with an existing standard programming language than NixOS does with its own custom programming language.

> Also, I detest CMake, but by your own standard (longevity, popularity), it is better than Nix or Guix.

What does CMake have to do with anything?

> Yes, they have been around for a while

What are you talking about? The first stable version of NixOS, 13.10, was released in 2013. GuixSD 1.0 was only released this May.

Your post is hard to make sense of.


Yeah, I misunderstood your post—sorry about that. I thought you said “Guix and Nix show us that scheme is the answer”. In any case, I don’t know how you conclude that Guix won over Nix nor how you conclude that the winner is superior to all other configuration languages. It’s especially counter-intuitive that it should be an intermediate representation—the IR should not be executable, it should only be descriptive. The IaC technology shouldn’t need to interpret anything, and the language you pass into it therefore shouldn’t support evaluation/execution.


> I don’t know how you conclude that Guix won over Nix

There is no contest that I am aware of. I used GuixSD vs NixOS as an example of how adapting existing standards can provide a lot more benefits with a lot less effort than coming up with incompatible new languages.

> how you conclude that the winner is superior to all other configuration languages

Here is a condensed list:

1. 60+ years of successful use of S-expressions in all of the needed roles (programming language, intermediate representation, configuration language, template language, network protocols).

2. Many proven Scheme implementations available for use in any scenario, from clusters to microcontrollers.

3. Easily amenable to formal analysis and verification. Excellent tools such as ACL2 available and in use for many decades.

> It’s especially counter-intuitive that it should be an intermediate representation—the IR should not be executable, it should only be descriptive.

S-expressions do both.

> The IaC technology shouldn’t need to interpret anything, and the language you pass into it therefore shouldn’t support evaluation/execution.

That is an unexpected thing to say about an acronym that stands for "Infrastructure as Code." You cannot get automation out of static data.


> Here is a condensed list:

I don’t find that list very compelling. Longevity in particular isn’t very interesting given that Scheme has gained very little ground in 60 years. That seems like an indicator that there is something wrong. I would sooner use Python or JavaScript, which are familiar to programmers in general and which have gained lots of traction in their respective lifetimes.

> S-expressions do both.

Right, that’s the problem. :)

> That is an unexpected thing to say about an acronym that stands for "Infrastructure as Code." You cannot get automation out of static data.

It’s a matter of architecture and separation of responsibilities. The IaC technology takes static data and automated the creation, mutation, and deletion of the resources specified therein. The client generates that flat data by evaluating a program over some inputs.


I replied with this elsewhere in the thread, but maybe Dhall? https://dhall-lang.org/

Functional? Check. Type annotations? Check. Not Turing complete? Check. Compiles to multiple config formats? Check.


Yeah, I think Dhall is the right idea, but I think it’s going to have the same syntax/ergonomics issues that Haskell and OCaml suffer from. Our company invested heavily in Nix, but the developers really struggled with the expression language which seems quite similar to Dhall both syntactically and ergonomically. While Dhall might be great for Haskell/OCaml/F# shops, a configuration language isn’t the right place to push for new idioms or ways to think about programming.


Have you looked at jsonnet[0]?

It has multiline strings and comments among other features...

[0]: https://jsonnet.org/ref/spec.html


We’ve been using starlark for kubernetes configs at my prev gig and I quite liked it. Open-sourced some of that stuff as https://github.com/cruise-automation/isopod (which is based on https://github.com/stripe/skycfg). I hear stripe are also using their thing with terraform although not sure to which extent.


TeamCity actually lets you set job configurations with kotlin in source control. The config is actually a kotlin dsl, so not only can you import settings, you can also have TeamCity generate projects, jobs, and settings from your code.


Gradle also uses Kotlin these days.


I’ve done “devops“ at both large and small organizations. I don’t write YAML. I use Pulumi and TypeScript for system provisioning; it’s all syntactically and type-checked as I write it, and reusable besides. I refuse to deal with the maintainability disaster that is Ansible (I used to use Chef but now pretty much everything can be handled via Fargate) and I don’t have a need for Kubernetes (except in my home lab where I use it with Pulumi’s providers).

The one place I could write YAML is AWS CodeBuild buildspecs. I write them in JSON, when they’re not being assembled on the fly from Pulumi.

You can do it too—just pick better tools for it.


Seeing it mentioned here twice, I went and checked out their website/github. What I see is lots of 20-line examples setting up a docker-container/very basic vm/aws-lambda. While typescript might have better "testability" I'm not sure, where this goes if you just copy/execute bash-scripts or inject javascript code into aws-lambda. To me it seems like it just reinvents the "classic" sys-admin but instead of writing/copying/executing bash-scripts, dev(op)s are now churning out bash-scripts wrapped in tested wrapper code - well...

In contrast ansible has a lot of declarative building blocks (iirc all are unit-tested!), which either fail or do the specified job. And yeah, the specified job might be copying and templating a config-file for some oldschool service (and you might be shocked: people still use these!), which is inherently not really testable (except by integration tests. or maybe you write a test to get the file and to check whether you got to type the name right twice) - I'm not sure how pulumi can help you with that?

And yes, I would love a concise, descriptive DSL (compare for example spack, which does this nicely) over/as an alternative to the loops-crafted-on-YAML-mess of ansible but I take the latter any day over some "we-can-call-cloud-providers-apis-and-kubernetes-but-why-would-you-copy-files?"-stuff like pulumi.


I wouldn't be "surprised" that somebody wants better instance CM. I also don't care and I can't particularly fault the Pulumi folks for putting it on the back burner too; as each of the major instance CM providers are in the process of demonstrating, it doesn't make money and is being eaten by a container-centric and pervasively disposable approach.t I've stopped caring about machines enough to write even a systemd unit file and it's a better world to live in.

You can write instance CM yourself, I'd you want. Pulumi isn't a "cloud thing", it's a lifecycle management tool. But it's typed, and that absolves it of many, many sins given the incredibly positive surface it presents to a developer.


Interesting never heard of Pulumi, thanks

You might be interested in https://www.jetbrains.com/teamcity/

It has a Kotlin strongly typed DSL for pipelines, i absolutely despite Jenkins and Groovy, so many bugs causing huge issues due to weak typing and lack of testing :(


I’ve used Teamcity. It’s totally fine. I don’t want to host anything, though, and I want them inside my VPC. We’ll use AWS until we can’t stand it anymore and them we’ll fire up a Jenkins (just because there’s more internal expertise).


Enjoying it might be an overstatement, but I prefer YAML over JSON, XML, INI files, bash scripts and most other config mechanisms.

Maybe the only comparable thing that comes to mind that I prefer is jsonnet.


I recommend taking a look at TOML [1] as an alternative to YAML for many situations.

  Objectives: 

  TOML aims to be a minimal configuration file format 
  that's easy to read due to obvious semantics. TOML is
  designed to map unambiguously to a hash table. TOML
  should be easy to parse into data structures in a wide
  variety of languages.
A reddit thread titled "YAML vs TOML" has some merit [2].

  YAML downsides:
  - Implicit typing causes surprise type changes. (e.g. 
    put 3 where you previously had a string and it will
    magically turn into an int).
  - A bunch of nasty "hidden features" like node anchors
    and references that make it look unclear (although to
    be fair a lot of people don't use this).

  TOML downsides:
  - Noisier syntax (especially with multiline strings).
  - The way arrays/tables are done is confusing, especially 
    arrays of tables.
Rust uses TOML to configure its build system, Cargo [3].

[1]: https://github.com/toml-lang/toml

[2]: https://www.reddit.com/r/devops/comments/6f82nu/yaml_vs_toml...

[3]: https://doc.rust-lang.org/cargo/getting-started/first-steps....


I find it interesting that it's always called "infrastructure as code" but then everything is basically "infrastructure as config files" where any control flow is application-specific constructs for a language not built for the task.

Why not just read python, like how webpack uses javascript as its "config".


We probably haven't met, but I think YAML is great. If you're doing a lot of work in python and ansible, then you just start using YAML everywhere.

I actually struggled with YAML at first when starting with Ansible because I didn't know it was YAML, and I didn't understand lists vs dictionaries in YAML (seems obvious in hindsight, but just didn't click for whatever reason).

> I’ve lost count of the amount of bugs and broken deploys that have happened because of YAML, or because of a type error caused by it.

YAML can be linted. Also, golang might be helpful to read i and deserialize your YAML files against a known-type so you can catch these type errors. This should probably be done during CI when you create a pull request or merge request. This obviously requires doing things on your end to make it all fit together.


I was wondering if anyone has started using cue[1] and has an opinion about it, seems like a super interesting project aiming to solve this

[1] https://cuelang.org/



Part of my job is introducing people to Ansible. For the people not intimately familiar with YAML, it's the biggest obstacle.


It's a hell of a lot better than JSON and I like that everything uses it.


I have hardly written a line of YAML in my career. It's hardly unavoidable, no one is forcing you to use bad tools. Especially considering that Kubernetes is massively over-engineered for most systems that aren't themselves over-engineered.


I am a solutions architect at Red Hat but I came from being an application architect. I recently gave a talk on cqrs and event sourcing at a midwest user group. I am routinely surprised how few "DevOps" engineers care to grasp application architecture yet want to start dabbling in things like Kubernetes. Automation using Ansible, yes, no problems there. Creating microservice architectures using containers and kubernetes tooling is yet a ways out for many. New tooling is absolutely necessary to implement the things we have all been talking about for at least 10 years. Developers can no longer just care about writing code. They have to think about how to manage breaking up their monolith, not just for performance; that's not typically the driving force behind adopting microservice architectures. It's about improving lead time and reducing the batch size of releases. You can't have microservice architectures without both sides(dev and ops) understanding what each actually do. I have first hand knowledge how difficult it still is to break those silos down. Developers, if you still think writing code is the hard part; well, I've got news for you. Ok, writing good code is still hard but there's a lot more to think about to get the most out of that code when it goes to production. As a starting point to answer "why do any of this?" I recommend the book "Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations".


been in an ops role for 2.5 years, 2 years as software dev and currently i'm in a devops role for 2.5 years now. this article hits the nail on the head. this was nothing more than a rebranding of the word ops. it has very little to do with actual software development. not to say that i don't enjoy it, for the better part of the time i was a software dev, i really missed setting up machines and day to day network/system administrator work. i know my way around 4-5 programming languages so i'd say if the need arises i can use the right tool for the right job. however, kubernetes, aws, terraform, packer, ansible, bash, etc are everyday things. i'd say in the last years i've developed a love hate relationship with pretty much every tool out there (looking at you terraform and k8s). the pitfall with this position is that there's a lot of buzzwords flying around and 9/10 and if you have some idiotic management you end up implementing some overkill solution to fix your problems. the other pitfall, from an infrastructure perspective, is that the ecosystem around kubernetes is so big that there are now an insane number of moving parts that you need to maintain and monitor.


> Being a gathering of system administrators I’d expected more IRC, there was no mention of it.

We just don't tell you about it -- Slack is more visible to management, and thus infosec has made slackbots a fucking integration nightmare.


> As a result they also talk about mental health more than programmers do.

It really is. All of the lifelong DevOps/sre people I know have spent time grappling with the consequences of this. On all experiences also seem so much more dramatically in larger companies where teams can properly load balance.


> DevOps is Ops with new tools... I did not meet or hear from any developers.

This is awful! It's not DevOps, it's an old infra gathering! Very disappointing.

I expected devops conference to have devs ONLY. Devs that that are curious and passionate about whole product life cycle, hardware, scalability, deployments and networks, devs who also pulled and transformed/transitioned their ops friends into devs.

Is it a general trend with DevOps or just devopsdays Portland conference thing? Is it cargo-culting?


I think there's still an attitude among developers that anything that looks like system administration is beneath them.

My theory, which might annoy some people, is that it probably has to do with operating systems collecting so much legacy cruft that a lot of coding to interact with the underlying system is just extremely gross. Like, shell scripts and command line apps have not gotten that much more elegant since the 1980s, but programming languages definitely have. The funny thing is that things like Kubernetes and containers and a lot of these devops technologies are actually really interesting and exciting, but at the end of the day the lingua franca is still bash. And if you're writing code in modern python or typescript or whatnot and all the sudden you have to regularly use a language that lacks robust support for things like arrays you're not going to be a happy camper.


> I expected devops conference to have devs ONLY.

In the past five years I've been doing this, I can emphatically say that the number of software engineers/developer types that I've met at DevOps events are so few that I can more or less count them with both of my hands.

In many ways, I consider DevOps an evolution of what systems administrators do. These days, there's so much demand on infrastructure teams to do more with less. Automation systems have massively improved our productivity that the business types are happy to do so much more that it leads to a vicious cycle of compression and stress.

Don't get me wrong, I love automation and such. But it's terrifying that we're asked to put together systems that we can't comprehend.

I'd consider myself an oddball in the "DevOps scene". I was a bona-fide polyglot software engineer before I pivoted. I did embedded systems in C and C++, full stack Java work, Nodejs stuff, etc. I later did QA automation work in Python, and pivoted from that into DevOps. It wasn't that far of a jump.

But most people I know in the "DevOps scene" are nothing like that. They don't have years of experience in OSS allowing them to see the real gold in the pile of fool's gold. They don't have professional experience doing software development. Many of them only barely understand software architectures. Life cycles? Beyond systems maintenance, there's not much for them to learn about.

I would argue that proper DevOps teams wouldn't have things like on-call rotations that give periodic or constant on-job demands, as that's usually an SRE-type responsibility. But DevOps is just a fancy way to reduce your Ops teams for most places.


I also moved from development to ops and am now at a company with a large engineering organization.

Most developers don't care about ops. They expect everything to "just work" and they don't expect to have to help that process along with the way their code is written. It doesn't matter how junior or senior they are either.

I'd say more than 50% do things that are extremely irresponsible wrt resource utilization. Especially if it's I/O.


>> But most people I know in the "DevOps scene" are nothing like that.

This feels almost universally like the case (unless you "started out" in devops, I guess). It's a catch-all for everything infra. A "good devops" (IMO) will know the ins/outs of all components of the product/business and (more importantly) how it runs and reacts in the real world (and not in the ivory tower/developers' laptop).


There are really 2 types of companies when it comes to DevOps: those companies that and gender a culture of DevOps with their Dev and Ops teams, and those companies which hire a dedicated DevOps team. In my experience people who go to these conferences are almost always from companies of the latter type. It is often argued that the only true DevOps way is that of the former type, but in my experience the company's that talk about DevOps are more often of the 2nd type as that setup in a company is much more common.

It is like REST API's. Most folks that write REST API's are really writing HTTP RPC API's, only a few are doing true REST, but nobody cares; it does the job.


On point, I will just add one thing, the split is not really binary. There is a continuum depending on leadership and teams.

Most companies start as latter either because of historical reasons. i.e they already had a dedicated systems/ops team or because a significant majority of engineers are not interested in doing cross functional/vertical product engineering.

I have experienced two such efforts. One where the transition had varying degrees of buy in from whole eng org and happened slowly but successfully, another org where developers kept (indirectly) refusing to own up their systems past build phase and engaged in unproductive turf wars.

As a development engineer nothing is sadder than seeing a fellow dev engineer not giving a damn about either larger picture (why our product exists/why our users's lives are going to be better because of the product) or about how's the system behaving outside of their little bubble of local machine and jenkins.

PS: I do acknowledge that different people have different motivations depending on what's going on in their personal live.


Devs just don’t care about infrastructure in my experience.


I’ve started my career very early on on the age of 16 as a system admin. Heart and soul was on pure engineering and less on what people label these things on CVs.

A few years later I moved up the stack into a full time programmer and been doing that ever since 20 years down the line.

I have built systems from the ground up literally, loading the van with the equipment, installing the racks, the servers, the EMC’s ...

Having seen it from all perspectives, sys admin, DBA, Ops and all it’s variants and a programmer. I’d like to think of my self as an engineer who solve problems or build things to solve problems.

The tools that you use does not define you. I have seen and still seeing loads of people Devs who don’t understand or willing to understand how ops work Or how the crap they produce runs and get maintained.

And ops who write hacky code in the name of infra as code.

I wish we can just be Engineers and stop labelling and define people by the tools they use. Things will be better IMO


"everyone hates yaml" hah. Shocking how we're stuck on these problems.

The old line about "the problem that it solves is not hard and it does not solve them well" about XML is kind of hilarious when we're still facing the same thing about its current popular replacements - JSON and YAML.


I'll take limited YAML over JSON any day of the week for things that might be edited by hand. The cleaner diffs when appending to a list alone are worth it.


True, but when the YAML file gets long and complicated, such as when a kubernetes YAML spec, it becomes difficult to figure out what indentation different items need to be at and you end up needing to use tools other than just your laptop: https://twitter.com/Caged/status/1039937162769096704?s=19


Hah, I can see my own old response to that tweet.

Anyways, as horrifying as it is, that's actually an argument in favor of XML. After all, would being lost in {} be that much better than indents? You'd still be scrambling to figure out how deep you are in a deeply nested object.

The obnoxious repetition of xml's closing elements actually helps here since you can see what object you're closing.

...

But I hate XML.

At this point I'm fantasizing about using heavily-sandboxed Lua and going fully Turing-complete so when your files are getting horribly complicated you can start doing some minimal reuse.


I guess the temptation would be to say that it is in fact a hard problem. But then I look at something like HOCON which basically solves all the problems with YAML, you can understand it after about half an hour of reading, but somehow it's a fraction as popular as yaml or json. Maybe it just lacks marketing muscle or a really visible project.


YAML is all nice and dandy when the file you're writing is relatively simple. When what you're doing expands beyond a single key-value data structure, things rapidly take a turn for the tragic. JSON is fine, and I can live with XML, but save me from YAML.


For me the meaning of DevOps has shifted from "developers and operations working together" to "operations picking up developer tooling". As more and more administrators become familiar with the new workflow I expect them to transition from mostly relying on manual testing and scheduled drills to writing automated tests for infrastructure fail-over, backups, firewalls, etc.

I'm not a big fan of YAML the format but I'll gladly put up with it for the valuable content that's encoded in those files - the entire system described in a transparent accessible manner at an abstraction level which enables the developer, the network guy, the server guy, the storage guy, the architect and the security guy to have a meaningful conversation.


From the article:

DevOps is Ops with new tools. I went to DevOpsDays thinking DevOps means developers and operations merged into one team. I was wrong.

I did not meet or hear from any developers. It was a gathering of system administrators who use version control, and write a lot of YAML. Programming languages were not a topic...

DevOps means the veteran admins had to check in their personal scripts, and everyone is expected to automate more things. They relate to the software their business runs (“app”, “asset”) the way a merchant-navy captain relates to what’s in the containers on his ship.

Ah. This must be DevOps for big shops. Not "why are we paying operators when we could have the developers do it?"


I'm an organizer for one of the (many) DevOpsDays Conferences held around the world. Speaking only for myself, I struggle with making sure that DevOps is about Dev and Ops working together, not just Ops / SysAdmins using a different, more marketable job title. When Operations Engineers become DevOps Engineers, it subconsciously says to Developers that they don't belong.

I'd encourage all of you that feel excluded, or feel like these are just SysAdmin conferences in sheeps clothing, to go to your local DevOps Meetup and start there. Demand that DevOps continues to stand for a community and cultural way of making software development suck less and not just "the next Agile". /rant over


> When Operations Engineers become DevOps Engineers, it subconsciously says to Developers that they don't belong..

I think anyone who actually feels that way about their peers getting a title-change is being incredibly over-sensitive.

The fact is that for many years, some "System Administrator" types did already think like "developers" and use similar tooling. Those folks just didn't get recognized or paid for that, because the role was defined by what you did (keep things operational), instead of how of you did it. So when "DevOps" became a thing, it was actually not a huge leap for many Operations types. So it's reasonable that many of them found positions with the "DevOps" title appealing due to the increased pay and prestige, and the fact that they already had the skills to do it.

Conversely - many developers don't actually have much experience running production systems, and it can feel demeaning to hear those people say that Operations-types do DevOps "wrong" because operations people don't want to use the latest bleeding-edge tool or the most efficient programming language. A preference for stability and maturity is something learned from experience, and stability happens to be very valuable when it comes to production situations.

All that said, I still love hearing from dev-types who share their experiences around DevOps. Both sides should continue learning, hearing each other out, and valuing each other's contributions without assuming ill-intent or inferiority due to their educations background or previous titles.


> The fact is that for many years, some "System Administrator" types did already think like "developers" and use similar tooling.

The funny thing is that it was the opposite for me. I became a DevOps Engineer because a lot of my work was being self-supported with my own infrastructure. I found that I had a knack for finding and putting together good systems to support my work. In many ways, I guess I was considered the quintessential DevOps person. In my QA role, I wound up being a developer who needed to put together infrastructure to support the needs of the QA group and to a lesser extent software engineers.

When this was recognized, I was moved into the sysadmin group as a founding member of a new team. That team grew and split out into its own group that specifically interfaced with both software engineers and infrastructure engineers. My team is not perfect, but it's been the best I could ask for. We've done amazing work transforming our engineering processes over the years, and software engineers are now more often part of the infrastructure bringup process than not. We've still got a ways to go, but we're getting there.


I find this article to be a little frustrating. It is kind of like a layperson going to a mechanic convention and saying "all I heard about was engines, brakes and tires. Everyone is interested in tires".

If you are not in the domain, then you will not understand why the 'devops' engineers are all looking at these things. Kudos to the developer for going to the conference and learning something new.

Devops are exactly the same as Software Engineers. There are those people that just slap code together and do not care about design, tests or the longevity of the product. Devops can be the same where they just push the button in AWS, create some bash scripts and get things running.

Then you have those Devops that approach building product and infrastructure in a repeatable, scalable, traceable fashion. They care about software engineering practices and building something to last.

I work as something in between a platform engineer and ops. I run 4 kubernetes clusters in GCP, had to build up all the underlying resources (dbs, service accounts, storage, clusters) and get the product running on it. I use terraform for this, including kubernetes deployments, and it works quite well.

Just like software engineers, devops/ops is under time constraints and have to hack things together. Whatever it takes to keep the business running...


> The most common job title seemed to be SRE (Site Reliability Engineer)

This article (plus the comments) kind of scares me because I am a CS student who just received an opportunity to work as an SRE Intern at a pretty big company over the next summer break. I took it because I wanted to find out whether SRE is a role for me but I am a bit worried that if I accept a more permanent SRE role after the internship, will it hurt my chances of moving to dev role later in my career.


My job is 50% dev 50% devops/sre i would say.

One thing I noticed about whole SRE/devops scene lately is that there seems to be no grasp of the KISS principle, there are people building elaborate glass houses on top of k8s (tho k8s+pipelines is very useful its easy to get carried away)

another observation is that the people who are not good programmers but are good at appearing to get stuff done (but who create tons of tech debt) seem to fall naturally into SRE roles, as there is no testing culture in pipeline driven development

SRE currently is like the wild west, it be many years before it starts to resemble engineering


This. I run opsZero and the amount of CI/CD implementations I see in different infrastructures is crazy. Most code is not tested or overtly complex and every DevOps person seems to reinvent the wheel a hundred different ways.

Living in the world of Terraform, Kubernetes, and helm has been a godsend because it codifies deployments into a few buckets that removes a lot of this custom code. To be honest I’m thinking the days of DevOps may be numbered.


Depends on what kind of career you want to have, but I think for the most part, it could be good experience, depending on the position. Everyone's riding the big SV company bandwagon, so every company is going to be totally different. If you're being brought in to convert applications to run on AWS instead of on premise, or refactoring apps to run in containers instead of VM, you're not an SRE, you're operations, and that's what we call 'lift and shift.' The AWS and container experience will be valuable, but you're probably not going to learn much about software development. You're going to be writing Puppet or Chef modules (or if the company has their act together, ansible), and it's going to mostly suck.

As with anything, you're going to put out what you put in. If you want to be a low-level C programmer, then spend time doing whatever is related to that.


You will have no problem in moving back if you are good at finding and solving business “problems”.

As long as there is people working (not monitoring) there is job to do. Always read up on the company agreements with contractors and look for the money flow. This will fast get you into the most lucrative Dev role within every company.


Can you elaborate on this? Does it mean to position yourself as a person who can solve the problems that your business is throwing lots of money at? Or to avoid the problems that are being tossed to contractors since the business might not see them as long-enough term issues to hire employees to solve them?


I did the opposite and went into a dev role right after university then later moved onto devops jobs. Getting into DevOps is relatively easy but it does require a change of mind set (some of my current colleagues came from a dev background and still refuse to fully embrace devops ways).

I think if it's a startup, it'll be easy to get into either types of roles. From my experience start ups give you massive jumps in career progression in little time even if the company barely cares about career progression. For me this has been due to the tiny size of the teams vs giant teams (my current team is 50 man, in start ups my teams have been no more than 5 people).

Having said all of that, in bigger companies there's more opportunity to pick and choose where you specifically want to focus.

Mileage will likely vary.


Don’t be, doing SRE internships will give you the most comprehensive introduction to all aspects of developing and operating software. For me when reviewing candidates that would be a huge plus.


This is my own experience, but at my current company we’ve had quite a few people move from SRE to development. That being said, we are a pretty collaborative company, so the SREs are already working pretty closely with our dev team and aren’t totally siloed off in terms of communication and projects.


I remember Google trying to swindle me into an SRE internship when I applied for SWE. I politely declined because I wasn't spending $60000 on a degree to do ops work.


Its hilarious to read Developers and Operations guys bash each other with DevOps.

DevOps is about getting those two groups in harmony where they compliment each other.

Nothing else.


You have a monolith. The monolith is deployed by a grumpy old admin using a shell script. He monitors it with another shell script. This is the old, bad way of doing things. Monoliths are bad because they are the kind of thing only neolithic cavemen would care about. So you break up the monolith into micro-services. Of course, you use (other) services to provision and monitor your new micro-services. You now have dozens of tools and scripts and configuration files. This is, naturally, the modern way of doings things. If tomorrow morning, your 500 customers suddenly become 3 million customers, you can just flip a switch and scale on demand in the cloud! Soon you'll be so successful that you can even hire an entire team to look after your new non-monolithic system.


It's soon time to invent nano- and pico-services, so everything gets to be scalable by default. That way we can finally stop bothering about first examining the problem domain at hand, and pull out all the stops at once -- looking busy like bees in the process.

Excuse me while I construct this house from micro-bricks and molecular mortar first.


I’ve always “automated stuff away” and worried about making sure whatever I developed had a sane, reproducible way of a) being tested and b) being deployable at scale, so I’ve watched DevOps/SRE unfold as I hopped between both sides of the fence (I’m a Solution Architect at Microsoft now, and spent a few decades in telcos doing the above).

I’m going to be (intentionally) critical here, but please understand that these are examples taken from a biased sample (I work in Portugal with very traditional enterprise customers and early stage startups, either of which usually lack-or don’t care about-senior talent - juniors of all stripes tend to be over-enthusiastic about jargon and bleeding edge, and the blast radius is huge on both kinds of companies).

The way I see it (in both enterprise and the local startups I deal with) is that DevOps has created as much confusion and undue ceremony as Agile in many places, all of which share a common trait—developers there do not understand (or want to deal) with software architecture and just want to hit their sprint targets without reasoning about the architectural impact of implementation details.

Enterprise devs will usually not have any real control about architecture, internal endpoints or even dev environments (all the Ops is taken away from them) and startup devs are usually rushing things, setting up faster infra to fix code problems and building up technical debt _in architecture terms_ (the Ops stuff is usually haphazard and too bleeding edge).

There are some positive exceptions, though. I’m currently helping a customer do Kubernetes from a _governance_ perspective (i.e., figuring out all the steps from dev to prod including environments, namespaces, network policy groups, etc.), and even though this is being done alongside already existing deployments, _taking the time to plan and work things out inside your org_ makes a lot of difference.

We’re not calling it DevOps, SRE, or Agile. It’s not a buzzword-laden, Valley-anointed process. It’s just (senior) engineers (Devs and Ops) talking and systematically going through what will work and won’t...


It occurs to me that DevOps is Plato's Cave. So many people think they know what DevOps is, yet all they really know is what they've always been familiar with, and they can only interpret it based on that. It's kind of depressing.


Reminds me more of the blind men and the elephant: https://en.wikipedia.org/wiki/Blind_men_and_an_elephant


A question to ponder: is devops a niche created by developers who can't sysadmin, or sysadmins who can't code?

“A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.”


> DevOps is Ops with new tools. I went to DevOpsDays thinking DevOps means developers and operations merged into one team. I was wrong.

Nope, he was right. DevOps is all and only about removing the wall between Dev and Ops. That might go as far as merging the team, but not necessarily.

However, the conference is _dead wrong_ is they convey the feeling that "DevOps is Ops with new tools". That's akin to putting lipstick on a pig, and does DevOps as an organization tool a lot of wrong (just like Agile, etc.).


DevOps is about collaboration between devs and ops towards building software and operating it efficiently to meet business needs. Traditionally building and operating were looked very differently but experience has taught us that features and stability cannot be achieved without understanding of how features were built and what needs to be done to run those features in production efficiently. In an ideal world, the same person should be able to build and operate systems in production however we don't live in an ideal world - traditionally there was a big divide between devs and ops, and in complex environments you can't learn to do everything yourself. Hence... DevOps.

There are multiple ways this can be achieved. This article is a good read to understand different models of implementation of DevOps culture - https://web.devopstopologies.com/

"DevOps is Ops with new tools. I went to DevOpsDays thinking DevOps means developers and operations merged into one team. I was wrong." - This is kind of right but not completely. DevOps is about bringing ops closer to dev teams by either merging ops or retraining devs to be able to be better at ops (there are more ways as indicated in the above article about DevOps topologies). The idea of doing ops with new tools is that these new tools help us with doing ops like we are building software. Many software development best practices, abstractions, architectures can be now applied to ops as well. This shift in doing ops with these tools which are a lot like other software development tools enable ops to come closer to devs and vice-verse. Here is a great talk by Mitchell Hashimoto (the creator of Terraform, Packer, Vagrant, Vault, etc.) that got me into DevOps - https://www.youtube.com/watch?v=UTQQggVx4sI. This talk explains the use of tools with the backdrop of DevOps.

Other than these two things, your observation was spot on. I was not at the event but that's the story at most DevOps events.

I think software architecture will soon become an area of focus in DevOps circles as the right architecture also essential for achieving CI/CD, agility and DevOps.

Never the less, welcome to the world of DevOps. :)


The DevOps movement petered out because it solved most of it's technical problems with tools that have become industry standards.

As a sysadmin, you used to have to write scripts to solve EVERYTHING. Especially since tooling for ops people in the 2000s was a load of hot garbage. A lot of tools back then had GUIs, which weren't automation friendly.

Nowadays most ops problems have some easily automatable tool, which I think gives less incentive for sysadmins to reach for a programming language.


It seems more likely to me that those "devops tools" only solve a small slice of the big sysadmin cheese, and not in a flexible way (one useful skill of sysadmins is being able to react on the spot to what's coming to them at that moment, using basic tools and logical reasoning, much like a firefighter).

Being able to differentiate between what's worth to automate and what is not is another one of those useful skills. Knowing a scripting language to glue those basic tools together is too. Being proactive (assisted by monitoring software and knowledge of the underlying systems so you can tell what consequences some event have) is another...

None of that is new nor surprising to sysadmins. It might sound fresh to others, being trapped to reinvent a small slice of it (badly). Also, "devops" is just another funky management fad by now.


As a lazy developer I enjoy DevOps principles. First time I don't have infrastructure issues. My code builds and deploys automagically and sys-admins can focus on important stuff.

I'd also like to describe it as just another tool in my toolbox. Created by admins for developers. I don't mind YAML at all. It's better than JSON or XML and it's a first class citizen in Go, Python, JS and other languages.


Can anybody give me a reason why containers are popular for production deployments (NOT referring to K8 use-cases)? As a development workflow they're amazing - but cheap cloud based VPSs + Ansible/Puppet work really nice. Why would you add another layer of abstraction in there when all it does is hinder performance?


Because you are deploying a single artifact with minimal dependencies that behaves in a known fashion.

Whereas if you are deploying code blobs they start depending on the operating system to provide stuff. You have to have the right version of Ruby on each machine. This becomes a problem when the right version varies between products and you would also like to update the OS to something new.

With containers the application code is decoupled from the operating system.


Yea I get that. To me this scenario gets mitigated with due diligence checking when spinning up a virtual machine - but I recognise the benefit in having a single artifact.


For the same reason that pallets and containers are popular in shipping even though they make less efficient use of space than breakbulk packing would. The ease of management outweighs the potential performance penalties.


Very good analogy.


I've been on both sides of that fence for 13 years now (currently in a really nice dev job.).

To me it seems like things are getting better. Often devs and ops are the same kind of people only on different teams, or even embedded in the teams. Nobody is hiding anything and people are working together if stuff breaks.


>> Many were curious about service-meshes (basically a smart network proxy?) such as Istio or Linkerd, but almost none were using one.

Going to blame Ubuntu's psychological priming via ads in their MOTD for this one


What ads are in Ubuntu's MOTD and what do they have to do with Istio/Linkerd service meshes?


Ubuntu's MOTD displays dynamic ads and at least one of these ads was about Istio. E.g. as best I can tell, every Ubuntu installation that hadn't disabled dynamic MOTDs displayed this Istio ad for most of August: https://bazaar.launchpad.net/~ubuntu-motd/ubuntu-motd/trunk/... .

(Of course as a Linkerd person I think it's ironic to advertise Istio on MicroK8s because Istio is anything but micro. But Microk8s has great Linkerd support these days, so maybe we'll get a Linkerd ad one day, and harmony in the universe will be restored.)


Spot on. And totally depressing. Apps as black boxes. Ugh.


This has been my lived experience with DevOps since I was first subjected to it ~2014. In a word, depressing, yes.

I've been fighting to maintain full control over a few of my teams applications, because I refuse to let the juniors and intermediates, who weren't lucky enough to gain full-stack experience, live in a world where they don't understand how these applications and services, etc..., _actually_ operate in production environments. But I'm losing ground all of the time. Management, emphatically, _does not care_.


I can imagine it the other way round: "hey have you tried that new javascript framework? It only came out last week, it's pre-alpha but I already have it running in production. What? No, not a problem, I just chuck it over to ops and they keep it running".


You are exaggerating to make a point, but... what else would you prefer? That your sysadmins approve your frameworks? How detailed would you get? Do you want a formal approval cycle before installing a new npm package?

I'd prefer an ops team that trusts the dev team is competent, and trusts that if the tests pass, the devs probably made acceptable decisions.


> I'd prefer an ops team that trusts the dev team is competent, and trusts that if the tests pass, the devs probably made acceptable decisions.

You don't need anyone to trust you as long as you're responsible for deploying, monitoring and responding to failures regarding the app, meaning: being on-call for the app(s).


Have you worked in a place that has a dedicated DevOps team, but makes the devs be on-call for the apps?


I work (and have worked for nine years) at a company with a dedicated DevOps team. As a developer, I have never been on call, but I do have to be there for deployment of my code to production (which, due to the nature of the code, happens around 2:00 am my time) to ensure everything is working as expected (and only once was a roll back required).


Not OP, but all of the companies I've worked for have done this.

The most basic was two different schedules, Dev & Ops, where if something went wrong then both responded. The most successful, so far was one where there are multiple on-call rotations that are hooked up to different slices of the alerting tree.


Yes.


That's the attitude that gave Docker its popularity: it doesn't matter how much crap it's in the app, as soon as they've wrapped in Docker's greasy paper it magically stinks less and it becomes Ops' problem.


Really, I always looked at it the other way around: Docker makes devs take responsibility for their dependencies, they can't say "well, it works on my machine."

Add one of the many decent Docker analysis tools and you even have the devs worrying about outdated packages and security vulns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: