A developer goes to a DevOps conference

darkr · on Sept 28, 2019

This is sadly the state of the current mainstream DevOps "movement". It is no longer anything that could be described as a movement, only as an industry, a set of tools, and a thing a few years back that CTOs in laggard companies announce to the board that they need to be doing.

It was supposed to be about efficiency and repeatability, and approaching ops with a SWE mindset; but most importantly taking ownership of your shit, end-to-end; which is something that the best and most effective SysAdmins and Software Engineers always did.

We don't hire "DevOps Engineers", rather "Platform Engineers", in which we have a hard requirement that you are approaching competence as a software engineer in at least one language, and in at least one paradigm (e.g you should be able to tell me about type systems, data structures, polymorphism, higher order functions, composition vs inheritance, referential transparency, TDD etc).

We also expect that all of our backend software engineers deploy and maintain their own infra (as code), using the guardrails/services/systems provided by our platform team. Deploying a new database cluster for a user-facing service is a pull request from a backend engineer, not a Jira ticket for a "DevOps Engineer"

There are companies out there "doing it right", but they are in the minority.

sudosteph · on Sept 28, 2019

This was bound to happen though.

Everyone wants an engineer who can do everything competently, but few places are actually willing to compensate to recruit and keep the people who have those skills. Even worse, some of the places that can afford the right people - they have internal politics that prevent these people from being able to drive meaningful change once in the role. Good people will leave environments like that and they'll be left with people who are either under-utilized or who caused internal conflict in the first place.

Every executive loves to hear "ownership", "get rid of silos", "speed up release time", "repeatable deployments". They don't like hearing "increase salaries dramatically", "piss off some existing employees", and "hiring is going to be even harder".

So what happens instead? Middle-management types decide to train their existing teams to use tools that are associated with "DevOps", update some job titles, and tell the investors and executives they now have a DevOps team. The existing employees are now happy to learn new tools and feel appreciated, the migration to the new tools may or may not solve some old lingering pain points (while also introducing some new, un-predicted pain points that the new DevOps team will happily resolve and write RCAs about, allowing leadership to think that "dev culture" is taking hold).

The fact of the matter is, "doing it right" is very expensive and not necessarily going to save every company costs or increase their revenue in the long-run. Sometimes just using better tools and and adopting better processes is enough to see benefits, and that's ok. If running infrastructure is your company's core competency, then it makes sense to invest in extremely skilled people at all levels that touch infra. But those extremely skilled people are expensive, prone to turnover, and tend to be picky about where they work.

So I don't should shame companies and engineers for shallow adoption of DevOps tooling and imply they're subverting the DevOps "movement" or whatever. There is room for many roles under the DevOps umbrella, and just because some places aren't immediately restructuring everything, doesn't mean they aren't learning won't contribute back to the greater community at some point in the future.

yee_hawps · on Sept 29, 2019

I have been lurking HN for a couple of years, but this comment made me create an account. I am all too familiar with the problems you're stating here. It is quite frustrating, really. People like myself are hired to change the system, destroy silos - then, they (management, generally) see that we're talented at building infrastructure, or some other task that we do, and they throw us into a traditional sysadmin role, and are confused why they can't hold on to a DevOps/SRE/WhateverEngineer for very long. Then they tell their managers that they have/are doing DevOps because they had a guy build some CI/CD pipelines and build out servers, probably manually 1-by-1 because the tools for automating that aren't allowed and the "DevOps" guy doesn't have entitlements to automate it. Not that I'd know anything about that...

wpietri · on Sept 28, 2019

I'm 100% in favor of your view of DevOps, and its devolution from movement to dubious agglomeration of vendors and consultants is something I've seen before.

Like DevOps, the Agile movement started out as a bunch of smart, dedicated people seeking a new way to work. But once the excitement spread out of that passionate early-adopter group, it changed radically for the worse. I think that's because once you get to mainstream adopters, they're not interested in deep change. They want to keep doing what they're doing, but 10% better. Vendors and consultants retool to serve that market, inevitably watering things down and frequently missing the point entirely.

This really frustrated me when it happened in the Agile movement. [1] But I've come to accept that as long as our industry is structured the way it is, it's going to keep happening. It's honestly kind of depressing, but the good news is that anybody willing to build a culture of excellence and put in marginally more work can get much, much better results than their competitors.

[1] I wrote more about that here: http://williampietri.com/writing/2011/agiles-second-chasm-an...

arwhatever · on Sept 29, 2019

This linked article was excellent and I encourage anyone here to read it.

_bxg1 · on Sept 28, 2019

This is something I've been wondering about. I recently interviewed for a full-stack dev position, and I felt I did quite well on all of the development questions. It wasn't a devops role - they had a whole separate devops position - but I got passed over for another candidate and when I got the news it was suggested that I should "get some devops experience" and maybe try again in the future. I thought that was weird. I know generally what docker/containers do and what purpose things like Kubernetes serve, I'd just never used them. I figured I'd be able to pick up whatever I needed for doing minimal devops tasks in the course of the job. Is it common to expect more than that from "developers"? This was neither a small nor a foolish company.

darkr · on Sept 28, 2019

I can’t speak for how common it is, but in my opinion, it is not unreasonable to ask that developers have at least a reasonable understanding of how the systems that they build actually work, throughout at least the majority of the layers of abstraction that they run upon.

It is though quite unreasonable to expect experience with specific tools, unless you’ve asked for them on the job spec.

The title “full stack engineer” is a whole other can of worms, not a million miles away from this DevOps thread.

cmiles74 · on Sept 28, 2019

I agree with brundolf, these are tools and most people can get a handle on how they work in an afternoon or two. We aren't talking about expecting new hires set this infrastructure up from scratch, just use the tools that are already in place.

Aeolun · on Sept 29, 2019

> full-stack dev position

How can you be a full-stack dev if you haven’t deployed (and kept running) anything in the past few years? Unless you just kept manually rsync-ing files to a server all that time.

sah2ed · on Sept 29, 2019

You are making a subtle mixup between “I am” vs “I can”.

He didn’t write that he is a full stack dev, he wrote that he interviewed for a full stack dev role:

> “I recently interviewed for a full-stack dev position, and I felt I did quite well on all of the development questions.”

Aeolun · on Sept 29, 2019

That is a fair point. My main idea was that the questions make sense in context. The ‘you’ in my message should be read as a generic you.

wolco · on Sept 29, 2019

It's a poor excuse if they did not list those skills as required. People need a reason to reject and will use a variety reasons that may not apply or matter for the job you are applying for as long as it sounds good on paper.

carlsborg · on Sept 28, 2019

Werner Vogels has a recent blog post [1] entitled "Modern Applications at AWS" where he says "To succeed in using application development to increase agility and innovation speed, organizations must adopt five elements, in any order: microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security."

I see the devops role at large cos evolving into dev-sec-ops for the "automated, continuous security" tooling and processes. (Besides the usual tooling for delivering and supporting a reliable network service, cloud engineering, system-level issue resolution in dev and test, load and performance and containerized test automation in the pipelines, failover fire-drills, etc.)

[1] https://www.allthingsdistributed.com/2019/08/modern-applicat...

Aeolun · on Sept 29, 2019

> microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security.

Two of those things are absolutely irrelevant to succeed.

caramel_ · on Sept 29, 2019

I’m guessing microservices and serverless? Curious as a developer trying to get a better handle on how systems should be built/designed.

Aeolun · on Sept 29, 2019

Yes, that was my intent anyway. I’m sure different people read that differently.

mukti · on Sept 29, 2019

I'm involved in a few different areas at my company, but when I get involved in hiring DevOps types, I try to look for the person you described as a "Platform Engineer." There are far too many times where I talk to people who know what tools are, but its just a black box that they plug things into, and they only know to use it because that's what someone at their local DevOps meetup group said was cool/useful. I want people who can help devs, not just deploy their code, or give them a system to run on. They shouldn't be afraid of learning a new language (or just looking at it), and should have competency in one already.

swtrs · on Sept 29, 2019

Funny, our company recently turned our wonderfully positioned Platform Engineers into Dev Ops Engineering concerned almost entirely with Chef.

james_s_tayler · on Sept 29, 2019

>There are companies out there "doing it right", but they are in the minority

Probably true of absolutely everything in the end.

Kiro · on Sept 29, 2019

What makes you so sure your company is "doing it right"?

madrox · on Sept 29, 2019

Over the last 10 years, I've become convinced the idealized notion of DevOps is not a thing. I have a couple reasons:

1. There is a huge sysadmin workforce. Many of them never learned to really code. Are these people supposed to go away? Are they supposed to learn to code? Coding is a skill that takes time to master, and if these people were sufficiently motivated to code, they probably would've become coders before now.

2. In general, software engineers aren't thinking about or care about their CI/CD pipeline beyond it being easy to work with. They don't care about their infrastructure except insofar as what they need to know to make their stuff fast. There is a nice separation of concerns there that's nearly impossible to avoid. As a result, it's very easy to build higher order tools to handle most workflows and infrastructure. It's the whole reason AWS can exist and why most devs don't care about the details of ops.

I realize I'm speaking in very broad terms. However, broad terms are what define an industry. I've seen this play out enough times that I knew what this article was going to be about before I even clicked on it. I'm pretty sure most other people did, too.

mr_tristan · on Sept 29, 2019

I’m convinced it’s driven by business communication and objectives more than skill set.

I’ve _never_ worked with a “product manager” or “product owner” that could effectively tell me what a SLO was, or simply tell me what the business impact of bugs were. All I usually get is what we’ll lose if we don’t implement feature X or Y.

This impacts “DevOps” because it almost always puts focus and priority on anything feature-related. In order to do anything not feature-oriented, you literally have to be fighting massive quality problems on a daily basis. Thus... very few organizations understand how to have “devs and ops spend quality time together”. Almost all time not building features is wasted in their eyes.

In the end, this is why there’s a “dev and ops split”. The devs drive features, which PMs understand, and the ops makes things run, and the PO/PM types ignore them but sometimes like to make requests.

I think a pre-cursor to any real DevOps culture are SLOs that are understood and reported to executives. If that doesn’t exist, DevOps usually ends up being a BS term used to “remain relevant”. Without some kind of business quality objective, real DevOps time is pure cost, and counter to business development.

Just like “agile” BS, any real change usually starts at the top fo the company.

Guthur · on Sept 29, 2019

I'm not convinced that it's only POs that struggle with quantifying the cost of a bug in production.

I can see the same when programming language advocates extol the benefits of more static analysis for software that does not even remotely need the level of reliability they are aiming for.

The guy who hacks it together in PHP will just get to market first and get such a critical mass of market share that 5-10% more reliability probably won't matter.

Of course a lot of this depends on market sector, self driving cars are a million miles from serving banner ads.

mr_tristan · on Sept 29, 2019

True, many POs/PMs drive a lot of prioritization of work, since (maybe unfortunately), they're usually the ones left to "understanding the business impact".

It kind of adds more credence to people not being so siloed in responsibility. We all just bring different skills to the party. Thus, we're all responsible for quality, for business impact and prioritization, etc.

Unfortunately, I've often had situations where, upon a reorganization, I'm left with managers who love to "guard others from distractions" and end up cutting team members off from broader discussions. Thus, leading to the siloing.

This is why I think "DevOps" ends up meaning nothing. It's a _lot_ of details that requires lots of communication and knowledge sharing that can be overwhelming to, ahem, certain PHB types. Sometimes you can implement a tool, other times you can just not build a particular feature, but the reasons are usually very complex and never easily get broken into a simple one-liner task definition with a story point cost.

pjmlp · on Sept 29, 2019

The best way to have PO that understand the business value of bugs is to work in traditional industries, where software just has a support role.

Every bug or downtime that gets reported does have a visible business impact.

And unnecessary features seldom get implemented if not covered by project budget.

rcarmo · on Sept 29, 2019

Whenever I go in to a new client and ask about what their SLIs are, I usually get blank stares from 90% of the audience and maybe _one_ timid reply.

burtonator · on Sept 29, 2019

> As a result, it's very easy to build higher order tools to handle most workflows and infrastructure. It's the whole reason AWS can exist and why most devs don't care about the details of ops.

I have 25 years of engineering experience. I'm now a CEO... I think the cloud infrastructure movement is really killing off a valuable skillset - actually understanding your software stack.

My company would die on AWS... our hosting costs would be 8x...

rcarmo · on Sept 29, 2019

I work on cloud infra (mostly data, but also Linux infra and networking on Azure, since I’m a Solution Architect at Microsoft).

And I agree with you in that one critical aspect, and would like to contribute my viewpoint: Every single successful move to the cloud I’ve witnessed hinged on the people who did it _really_ understanding their software stack.

For them (usually clients and not contractors or integrators doing the move for them) the key aspect of the learning curve was understanding cost drivers in cloud infra (or PaaS), weighing them against their current situation, and _measuring_ alternatives.

I’m often hassled by peers and salespeople for spending “too long talking to the techs” rather than doing pretty PowerPoint pitches, but I’m proud of the engineering work that makes those migrations viable, because a) it is truly full-stack and b) I learn at least as much as the techs I work with as we drill down into their stack and figure out how best to move it to the cloud.

Alas, this kind of depth work happens too seldom (see my other comment).

adrianN · on Sept 29, 2019

Do you have some blog posts or something about how you achieve the 8x cost reduction over AWS? That seems like a lot to me.

1337shadow · on Sept 29, 2019

I think some young people just know AWS because of their dominant marketing, and have no idea what's it like to just have some dedicated servers - which you need for your own security anyway, unless you're comfortable sharing RAM and CPU with others (I'm not https://media.ccc.de/v/33c3-8044-what_could_possibly_go_wron... )

$171.83/month of AWS EC2 (+ bw): 8 core CPU 16GB RAM 250 GB SSD https://calculator.s3.amazonaws.com/index.html

24.99€/month of dedicated: 8 core CPU 16GB DDR3 250 GB SSD Basic Unmetered 1 Gbit/sec https://www.online.net/fr/serveurs-dedies/start-2-m-ssd

$809,57/month 20 Core EC2 dedicated instance RAM? Disk? BW? https://aws.amazon.com/fr/ec2/dedicated-hosts/pricing/

234,99€ month dedicated: 2x Intel® Xeon® Silver 4114 20C/40T 128 Go DDR4 ECC 2x 1 To NVMe Premium 1 Gbit/s https://www.online.net/fr/serveurs-dedies/core-5-s

I suppose the rest to 8x is on bandwidth, which is not charged on this kind of servers.

mijamo · on Sept 29, 2019

I think it depends of your scale. On most cases the real hosting cost for a dedicated barebone is administrating and managing the server. So if you need 100 dedicated 20 core server it is most likely cheaper to do that yourself, but if you need just one it is much much cheaper to use some cloud service.

1337shadow · on Sept 29, 2019

I don't think you ever "just need one", for me one is always two : staging (and ci, by extension ..) and production.

Provisioning an EC2 instance or a dedicated server just has the same cost in terms of manhour: running some automation in deployment pipeline, is something that you'll have to setup and maintain.

Be it updating a docker-compose file over ssh with ansible or something else: you have to do it anyway. If you want to cut that cost perhaps try with a starter project from OpenShift or Heroku, but that will also have a bigger cost even than an EC2 instance, and you will still have to take time to ensure your deployment is properly automated, with automated backups as part of the process.

chupy · on Sept 29, 2019

> I don't think you ever "just need one", for me one is always two : staging (and ci, by extension ..) and production.

It's always at least two in production if you ever need it to be HA.

1337shadow · on Sept 29, 2019

Most project never get to the point where they need more than 99.9% which is perfectly feasible with a single server. Interesting remark nonetheless.

erik_seaberg · on Sept 29, 2019

Prod needs n + 2 if you want to survive losing one (machine, datacenter, whatever) while you have one out for maintenance. Buying three when you need one is pretty expensive; buying seven when you need five isn't as big a deal.

Staging just needs n > 1 to test that sharding works at all.

andreareina · on Sept 29, 2019

IMO bare compute isn't really the AWS value prop; if all you need is a VPS (or a non-virtual one) AWS is expensive overkill. The reason to use them is all the peripheral services, tied together with IAM[1][2].

[1] https://forrestbrazeal.com/2019/02/18/cloud-irregular-iam-is...

[2] https://news.ycombinator.com/item?id=19742038

dijit · on Sept 29, 2019

From a purely operational perspective; this is also a reason to avoid them.

If it's throwaway then sure, but building your critical infrastructure around a single provider is really bad business hygiene for the sole reason that you can never control for your dependency increasing costs arbitrarily.

Maybe that's fine for most people, people seem to optimise (or value?) developer time over basically anything else so maybe it's something that's already taken into account.

oxfordmale · on Sept 29, 2019

How much would you have to pay a sysadmin to install, maintain and monitor an in house hosted server ? How much would you have to pay to have daily snapshots of your instance ? And what do you do with the old server if there is a change in business strategy and you either need a much bigger server, or no longer need this server at all.

evgen · on Sept 29, 2019

Are you suggesting that someone running in the cloud suddenly has no need of an ops team? This is the danger in drinking the DevOps Kool-aid, assuming that because a developer can manage to not completely fuck-up their dev box they are somehow proficient enough at system administration to manage cloud infra.

Daily snapshots? For the prices listed you could have an actual hot spare running 24x7 and still save money. This also makes surge-up easier and if you ask nicely a lot of decent dedicated hosting providers will let you shrink without holding you to contract terms.

1337shadow · on Sept 29, 2019

Absolutely correct, if anybody wants to pass an AWS DevOps certification: https://aws.amazon.com/fr/certification/certified-devops-eng... (I won't)

jjeaff · on Sept 29, 2019

The comparison above is not AWS vs an in house hosted server. It's dedicated bare metal. If something happens with the hardware on those dedicated servers, the hosting company handles it for you.

And even most dedicated servers can optionally come preinstalled with the OS.

1337shadow · on Sept 29, 2019

Absolutely, it can even install Proxmox if you want something that supports snapshots (I prefer to "just do remote secure incremental data backups")

rantwasp · on Sept 29, 2019

It's sad really. Every time I hear someone saying devops this, devops that I ask them to explain to me what devops means. You see... everyone has their own idiosyncratic definition and everyone that does not agree with them is an idiot.

to understand what devops should have been we have to step back and think about ownership. When you own they thing you build from you know building it, to deploying it, maintaininging and sunsetting it that's when you can tell me you are doing devops. It's not about a super developer or a super sysadmin - it's about full ownership. In the process you'll do software development, you'll learn about the environment and the hardware the software runs on and most importantly you will learn to never ever ever again overengineer something.

to answer parent: sysadmins have to learn how to write code. if you want to stay relevant and still have a job in 5 years you have no choice. AWS & other big clouds are already eating your lunch. the best transition is to actually learn how to deploy infrastructure in the cloud - and if you're starting fresh, why not actually learn how to code? good developers know their CI/CD pipeline in depth. They care about infrastructure and details. Apart from junior developers, you really don't have the option to live in your box and not care.

lenkite · on Sept 29, 2019

Also there are many system admins who can code quick shell scripts in bash/powershell, setup build and CI pipelines, love doing performance analysis with Unix tools and know how to harden systems but have utterly no desire to learn 'industry' languages and large frameworks and spend day-after-day inside an IDE developing software products.

DevOps generally means developers doing sub-standard operations work.

Aeolun · on Sept 29, 2019

> DevOps generally means developers doing sub-standard operations work.

Mainly because ops cannot be bothered to.

Telling me something is going to take a month when it should take only a few minutes if you’d done your job right the first time means I’m going to be doing substandard operations work.

But if that’s the case, is it really sub-standard?

wokwokwok · on Sept 29, 2019

Personal experience: no, it really is substandard.

It’s easy to pretend that, like say, HR, because you don’t see the work that’s being done, no work is actually being done, and you could do a better job if you’d had the job from the beginning.

However, how often do you get to do greenfield work as a developer?

Right, now imagine you’re not even writing code, just pushing configuration around. OPs is often a constraint satisfaction problem, not an engineering problem.

Developers can come along and whip up a quick ldap/logging/monitoring/pipeline, but when it gets to the long tail of the hundreds of hours of standard practice (ie. you have to write documentation for that thing you wrote) and additional tooling (yes, you do need to write custom integration to active directory because of [corporate policy here]...) it’s a big mess, badly done, by developers who didn’t really want to in the first place, and don’t want to maintain it (and sure as hell not out of hours when it goes down).

TLDR; don’t be ridiculous. Smart people do this stuff professionally and then you come along and think you’re some hot shot who can do it in a few minutes?

Get off your high horse. You have a domain skill set that doesn’t include competence in the domain you superficially think you could easily do.

Try listening to why people say it takes a long time, and work within the constraints of the domain, or you will produce substandard outcomes.

I’ve seen it in so many places, it makes me just sigh now. As a developer (which I am), my complaint with ”Devops” is that management has been given the same expectation you just articulated.

ie. that ops is basically people who were never good enough to be programmers, and basically, despite years of training and experience, are basically incompetent and not “agile”.

...and somehow, developers, with no training in the domain, will just do a better job of it.

They don’t. In any of the places I’ve worked and seen it tried, in like, the last ten years.

Ymmv.

0x8BADF00D · on Sept 29, 2019

The disconnect is that IT/Ops really have no power in the org. Often they are subsumed by some incompetent line manager who’s clueless about the day to day challenges developers face when shipping code that needs to run somewhere. When the non-technical dumb fucks sneak into management you end up making sacrifices, because as you point out, they want it done yesterday and in five minutes. That leads to a sloppy standard of product and engineering culture, including ops culture. Which is to say take shortcuts and not think about 10 years later when someone has to janitor your manually spun up infrastructure.

js8 · on Sept 29, 2019

Sounds like you need OPSMAN, then. It's a new managerial philosophy where management is actually done by operations.

aitchnyu · on Sept 30, 2019

Could you share some article? Its hard to Google.

techslave · on Sept 29, 2019

10 years? that’s absurd. the surface has shifted under your feet in 10 years. anyone building for 10 years from now is a luxury watchmaker and not thinking about the org.

only build for the next scale factor.

mtcoope · on Oct 2, 2019

We just decommissioned applications written in 1986, last modified in 2012. They were running important production code until this year.

Aeolun · on Sept 29, 2019

> TLDR; don’t be ridiculous. Smart people do this stuff professionally and then you come along and think you’re some hot shot who can do it in a few minutes?

Basically, yes. Mainly because I do this professionally too.

Just because I haven’t been assigned to do it this time does not mean I can’t.

How about you are generous and interpret my comment as if I know what I’m talking about, then I won’t be an ass and nuance my previous statement a bit.

Something can (and probably should) take a month the first time around. Then, if you’ve actually fulfilled your promise as ‘devops’ any second time you are doing this thing it should take significantly less time. Maybe you’ve not automated all parts, but certainly the most common ones.

I think we’re now at iteration 20+, and it still takes a month for every new environment (dev, sqa, prod) of the same application.

Now it’s definitely possible there is a reasonable explanation for that, but it’s starting to get a bit suspect. Enough so that I’ll make a frustrated comment about it on HN when the subject comes up.

tjalfi · on Sept 29, 2019

Have you looked at their ticket queue?

It could be a matter of work in progress or internal bottlenecks.

[0] has a good summary of WIP after the heading titled "Why Do We Need To Visualize IT Work And Control WIP?".

TLDR - read the Phoenix Project.

[0] https://itrevolution.com/resource-guide-for-the-phoenix-proj...

viraptor · on Sept 29, 2019

> software engineers aren't thinking about or care about their CI/CD pipeline beyond it being easy to work with

Then they're not great engineers. That's fine at the beginning, but in a long run, they need to care or they'll cause issues. The separation you mentioned and the possibility of creating higher order tools exists only if software can support that kind of environment.

Whether the software can be set up with a single file / endpoint / environment, or does it require interaction with a custom gui for initial setup for example defines whether ci/cd setup is trivial or a long project on its own. Unless devs can work to make things easy to automate, high order tools will not help them.

tmh88j · on Sept 29, 2019

>Then they're not great engineers. That's fine at the beginning, but in a long run, they need to care or they'll cause issues.

What kinds of issues do you think will happen? Isn't the whole point of having devops/sysadmins so software engineers don't have to spend valuable time setting up and maintaining infrastructure and tools? If the answer to that is no, then we've come full circle because software engineers maintaining infrastructure and tools are just devops.

_dw7s · on Sept 29, 2019

> There is a huge sysadmin workforce. Many of them never learned to really code. Are these people supposed to go away? Are they supposed to learn to code?

Yes, and when they do learn how to code, you get stuff like Puppet.

TheOperator · on Sept 29, 2019

>Coding is a skill that takes time to master, and if these people were sufficiently motivated to code, they probably would've become coders before now.

Personally a DevOps/SRE type role appeals to me more because

1: I really liked ops but found coding and doing NOTHING but it kind of meh. A combo is fine.

2: I REALLY like approaching ops as a software problem

3: The work itself has more variety than straight ops or dev which I think is more suited to my adhd issues.

I think there is much political inertia and a lack of code savvy workforce keeping ops going but honestly the way I see it the writing is on the wall for traditional ops. I never see any ops positions in FAANG.

It may be true most coder s don't think about infrastructure but that gives more value to this niche not less. There is still value to DIY when it comes to things like infrastructure.

0xDEFC0DE · on Sept 29, 2019

Sysadmin folks can transition into security, they don’t need to go away

goatinaboat · on Sept 29, 2019

Sysadmin folks can transition into security

No more easily than they can become developers. There are a hell of a lot of really crappy cybersec guys around who think the job is just the access control subset of sysadmin...

Ericson2314 · on Sept 29, 2019

We use NixOS for everything and have no dedicated ops.

pdimitar · on Sept 29, 2019

Can you clarify? What makes it so easy abs painless?

Ericson2314 · on Oct 2, 2019

No stupid state. What people think of as "inevitable" shit for ops to clean up is often (not claiming always!) entirely preventable shooting in the foot.

h2odragon · on Sept 28, 2019

> The most common job title seemed to be SRE (Site Reliability Engineer), although there was a long tail, and they don’t care much about job titles.

Oh I like that one. "I kept the pile of shit running and put the fires out before they were noticed" is too long to put on a business card, anyway. "Site Reliability Engineer" sounds almost tame.

StevePerkins · on Sept 28, 2019

The name and job title churn in this space over the past five years is breathtaking, almost JavaScript-ian.

I've been at my current company for 6 years now. When I first started, those guys were "Operations", and the job title was "Admin". We used Google App Engine, and they mostly clicked buttons in the GCP web console.

One year in, they re-branded as "DevOps". They were all "DevOps Engineers". They wrote a lot of hacky Python scripts, using the "gcloud" utility, to manage our new Docker-based services on Google VM's.

A year or two later, they decided to become "Platform Services". They changed their titles to "Platform Engineers". They saw us developers using Jenkins for our CI/CD pipeline, and decided to re-implement all of their Python scripts as Jenkins jobs.

Earlier this year, they became "The SRE Group". They are "Site Reliability Engineers" now. We eliminated their Python scripts, by migrating our microservices to Kubernetes and using managed databases. So now they're back to clicking buttons in the Google Cloud Platform web console again.

sevagh · on Sept 28, 2019

You speak of them with such disdain. Is that a common attitude in your company?

TurboHaskal · on Sept 28, 2019

I can understand such disdain. I am a software engineer currently working in a DevOps team (yes, team, you’ve heard it right). I’d say that 90% of DevOps engineers I’ve met don’t even know what a linked list is, and they themselves talk with disdain about developers.

So much for DevOps philosophy.

eropple · on Sept 29, 2019

I’m a developer who runs an infrastructure team. Have for years. And most developers I know don’t know why you’d log in JSON unless I was the one to explain it to them (patiently, while remembering the times they acted frustrated at my guys for not doing all the magic and just some).

So much for DevOps philosophy, indeed.

Aeolun · on Sept 29, 2019

I wouldn’t mind so much if they at least knew what terraform was.

goatinaboat · on Sept 29, 2019

if they at least knew what terraform was

Any real, experienced DevOps engineer will reply “a steaming pile of shit”.

watermelon0 · on Sept 29, 2019

Sure, it's got its problems, but please show me a better alternative.

goatinaboat · on Sept 29, 2019

On AWS, Boto. On Azure, DSC (Powershell) and ARM templates.

You will need to maintain separate TF codebases for each one anyway, but with the native solutions you get easy access to all features of the platform and don't have to spend most of your time jumping through stupid hoops to try and pretend that one tool can do everything.

No-one would write code worrying if it was valid in both Java and C# but that is the same level as what TF claims to do. It's complete shit, ditch it and you will be 10x happier, I guarantee it.

Aeolun · on Sept 30, 2019

Sounds to me like using a single tool instead of a bunch of separate ones with different functionality is a win to me, but we are each entitled to our opinions.

Boto doesn’t seem comparable to terraform at all, unless you enjoy building state management yourself.

Terraform has been better than anything else I’ve tried so far.

root_axis · on Sept 29, 2019

They are worthy of disdain because they never took a CS class?

TurboHaskal · on Sept 29, 2019

No one is worthy of disdain due to unfamiliarity. I have explained why a software engineer that cares about the infrastructure might feel disdain after reading about and agreeing to the DevOps philosophy and switches to a DevOps role only to find out that the vast majority of new teammates lack comprehension of the most basic software engineering principles and don’t find value in them. It is incredibly frustrating.

Now, it is also true that most developers are just content with clicking around in their Jetbrains IDE without bothering about resources or even knowing what a file descriptor or system call is. Those so called senior Java engineers that I had to explain the basics of garbage collection to them. Those also make operations harder than it should be.

But that’s another story. We’re taking about the dissonance between what gets advertised as DevOps and what truly is, which seems to be the norm in our industry and is leaving a significantly amount of people unhappy and dissatisfied.

StevePerkins · on Sept 29, 2019

> You speak of them with such disdain.

Nonsense. They're smart professionals, and probably the hardest workers in the company. Certainly the ones subject to the highest degree of pressure, which they always handle with grace.

I "speak" of the absurdity of our industry's superficial hype cycles, and forever-swinging pendulums. After you've been in the game for 20+ years, you will recognize a number of cycles and pendulums yourself. You'll go along with it, because that's part of being a professional. But you'll wonder why we're collectively unable to step back, and recognize more of these things, and not pointlessly churn so much.

However, I could write a similar comment about engineering managers and SDLC methodology trends. A similar comment about testers and quality assurance. And yes, probably a dozen similar comments about language fads and architecture patterns and trends related to developers.

I don't think any of those comments would come from a place of disdain for workers themselves. I just think that culture has shifted, and anything short of unqualified gushing praise registers as offense for many younger people. Maybe that pendulum will swing again too?

Nursie · on Sept 28, 2019

Have to agree with the comment below, DevOps folks/teams I've worked with seem to have both a superior attitude and an inferior knowledge of good practice and good habits when compared to developers.

They seem to have bought into their own myths.

eropple · on Sept 29, 2019

Gee, I don’t know. When developers are constantly flinging half-tested and wholly unoperationalized things over the wall, perhaps it’s normal to develop local-maxima defensive practices.

I am a software developer and yet when I hear a developer open their mouth to complain about nearly any other aspect of a product team, be it QA or infra or UX, it has this weird tendency to resolve to Anybody But The Developer causing an issue.

Nursie · on Sept 29, 2019

Then perhaps you ought to sit with the folks I'm consulting for at the moment, who have a DevOps team who have failed to deliver a reproducible system for over two years and still have the attitude that they're the superior race.

weberc2 · on Sept 29, 2019

I’m sorry to hear that, but have you considered the possibility that the team you work with is not emblematic of the entire industry?

Nursie · on Sept 29, 2019

Indeed, that's just my current experience.

That said, they are highly paid, London financial world folks, so you'd hope they'd be good.

I also hear from a developer I work with who has recently defected back from devops style roles to devs that this is not atypical. He seems to place blame at least partially on "The Phoenix Project" for the attitude!

rat9988 · on Sept 28, 2019

I'm pretty sure SRE is a very old title at google.

DonHopkins · on Sept 28, 2019

I love the incredibly vague job title "Member, Technical Staff" I had at Sun. It could cover anything from kernel hacking to HVAC repair!

At least I had root access to my own workstation (and everybody else's in the company, thanks to the fact that NFS actually stood for No File Security).

[In the late 80's and early 90's, NFSv2 clients could change their hostname to anything they wanted before doing a mount ("hostname foobar; mount server:/foobar /mnt ; hostname original"), and that name would be sent in the mount request, and the server trusted the name the client claimed to be without checking it against the ip address, then looked it up in /etc/exports, and happily returned a file handle.

If the NFS server or any of its clients were on your local network, you could snoop file handles by putting your ethernet card into promiscuous mode.

And of course NFS servers often ran TFTP servers by default (for booting diskless clients), so you could usually read an NFS server's /etc/exports file to find out what client hostnames it allowed, then change your hostname to one of those before mounting any remote file system you wanted from the NFS server.

And yes, TFTP and NFS and this security hole you could drive the space shuttle through worked just fine over the internet, not just the local area network.]

h2odragon · on Sept 29, 2019

Considering yall ran sendmail, I had root on your workstation if i wanted it :) it was a different world then.

DonHopkins · on Sept 29, 2019

When the Morris worm went around, one of the ways it got in was through sendmail, using the "DEBUG" command. Right after it happened, some wise-ass sent around an email telling everybody to edit their sendmail binary (with Emacs of course), search for "DEBUG", and replace the "D" will a NULL, thus disabling the "DEBUG" command.

What that actually did was change the "DEBUG" command into the "" command.

At the time I was running a mailing list from the University of Maryland, and often had to check Sun email addresses by telnetting to sun.com port 25, pressing return a couple of times to flush the telnet negotiation characters, then going "EXPN some-email-address".

So the day after the Morris worm, I go "telnet sun.com 25", hit return a couple of times, then "EXPN foobar", and it dumps out a huge torrent of debugging information, because I had accidentally switched it into debug mode by entering a blank line!

I reported it to postmaster@sun.com, and they fixed it. But it's kinda silly that they would have applied such a ham fisted patch to their sendmail demon like that, based on an email that some dude on the internet sent around!

https://www.rapid7.com/db/modules/exploit/unix/smtp/morris_s...

http://www.cs.unc.edu/~jeffay/courses/nidsS05/attacks/seely-...

https://spaf.cerias.purdue.edu/tech-reps/823.pdf

chucky_z · on Sept 28, 2019

I've had to convince some folks that indeed, someone with the title of "system administrator," if they put more than one programming language on their resume, is probably "SRE," or "DevOps."

I have the job title of SRE currently and could care less what the actual title is. I've got work to do, and I'm going to accomplish it in the best way possible given whatever constraints exist at that time.

_vertigo · on Sept 28, 2019

It does matter - in my experience, the future job prospects of "System Administrator" are much worse than "SRE"/"DevOps".

Switching my title from "System Administrator" to "SRE" within my last company resulted in a job family change and a 10% raise (I had to show I could code as well as a software engineer in order to make the switch).

When I left that role, having "SRE" on my resume instead of "sysadmin" was (probably, I don't have any strong evidence for this) instrumental in getting responses when I applied to "Software Engineer" roles at selective companies.

I think there is a bias against roles that don't code, and especially roles that sound outdated.

A lot of larger companies and managers in those companies don't understand what DevOps is, what SRE means or anything like that. They just know Ops and Dev, and your previous job title is probably the strongest hint they have to work off when they categorize you. Getting lumped in with Ops is (probably) a big hit to your earnings potential and limits your future options if/when you decide to move.

weberc2 · on Sept 28, 2019

> Switching my title from "System Administrator" to "SRE" within my last company resulted in a job family change and a 10% raise (I had to show I could code as well as a software engineer in order to make the switch).

Perhaps we're in agreement, but this is critically important. When my company hires for DevOps, CloudOps, etc positions, we are innundated with applications from Ops/SysAdmin personas. We don't want people who will put out the fire and keep the system limping along--we want people that can't stand putting out fires _and who have the skill set_ to build systems that (1) aren't likely to catch fire and (2) are easy to troubleshoot/extinguish when they do. One way we're addressing the problem is to change titles from "DevOps" and "CloudOps" to "Cloud Engineer"--not sure yet how big of an impact that will have (if any at all), but it's worth a shot.

xtracto · on Sept 28, 2019

I am actively hiring for a devops. I get hundreds of resumes from sysasmins that either don't have AWS/cloud, Ansible/Automation, CI/CD lifecycle, DB management or Ngnix/Apache experience.

The majority are sysasdmins that did point and click setup ..

HNUser34159 · on Sept 28, 2019

You think maybe some of those folks are looking for an opportunity to LEARN some of that tech?

The "DevOps" fad is screwing over a large segment of senior level I.T professionals who are used to specializing. (Databases, Storage, OS, Security, etc.). I've also yet to see any startup that Jez Humble would actually call a DevOps shop.

Now, startups are hiring generalists with 3-5 years of "hacking" experience, or have a popular project on GitHub.

weberc2 · on Sept 28, 2019

A willingness to learn new tech is essential but not sufficient. Mostly it seems folks with a more traditional sysadmin background are looking to be Kubernetes sysadmins or AWS sysadmins, but we’re not looking for sysadmins, we’re looking for engineers. Learning the new tech isn’t sufficient—it’s not even about the tech—you need to be able to _do engineering_.

eropple · on Sept 28, 2019

This is a critical point, and it's one the post to which you replied seems to miss.

"System administrators" in the traditional sense--and I have hired many of them and consulted on the obsolescence of others--often and generally exhibit that strong get-it-working,-damn-the-consequences tendency that is in opposition to--well, none of us in this industry are engineers, but some of us aspire to engineering. Rigorous, systemic, and repeatable are the watchwords, and to that end those system administrators aren't being "screwed over"--there's a different skillset being prioritized.

HNUser34159 · on Sept 29, 2019

I think that is a gross mis-characterization because I see a ton of "get it done now, make it right later" bullshit among DevOps-y start ups.

Again, I think there's a large talent pool available, but startups who think they'll be the next FAANG act too big for their britches and actively discriminate against older tech workers who are likely experts in several pieces of the tech stack.

I routinely see these folks get passed over for younger, less experienced candidates (often for 1/2 to 3/4 the salary) who look good on paper because they wax eloquent about their pet project on GitHub, facial hair wax and kombucha.

Source: I make damn good money as a "fixer", and my primary customers are 5-30 person startups. I don't "code", and never will (useful scripts, and some automation/cloud API excepted).

I go in and practically beat the managers over the heads with the DevOps Handbook, and "engineers" with the NASA Systems Engineering Handbook. Most of my work is tearing out fucked k8s installs, and cutting AWS spending by 1/2 or more. (A few clients were billed based on how much I reduced their bill).

Have a standing job offer with one client, however it requires Azure certification pretty much immediately. Between not really using much MS stuff, and the exam focusing mostly on the Azure CLI, it might not be worth the trouble for a steady paycheck. They were nice enough to cover a training course though, so I'm willing to see where it goes.

eropple · on Sept 29, 2019

Sure, there are stupid startups that think they do "devops". What of it?

It's great that you can make that money in the role you describe. Before I decided I wanted to stop doing sales work alongside dev work, I used to make very good money as a similar fixer. On the other hand, I do code. I'm very good at it. And I've learned that fixing the situations of companies whose operators don't code is to fix, or replace, those operators. Especially those expensive operators who you're holding up over folks who understand systems as code and as managed resources.

Sneering at kombucha and facial hair wax, though? Aren't you saying you're the adult in the room here? Frankly, you sound bitter. And that sucks. As somebody who has spent his entire career doing both dev and ops and getting to the point where melding them together is natural and the teaching thereof is likewise a basic part of work, I've had to recommend the replacement of people who act like you're acting in this thread. 'Cause I'm happy to teach, and I've never met a hands-first sysadmin who couldn't do what should be done. But I've met a lot who won't, and if they don't retire first it eventually catches up to them.

HNUser34159 · on Sept 29, 2019

So you are part of the problem then. You've globbed onto a bastardized concept of "DevOps", and refuse to allow other people into your walled garden.

weberc2 · on Sept 29, 2019

There is no authoritative definition of DevOps, so there’s really no point in arguing that one of our definitions is wrong. I’m telling you that it is more valuable to treat ops as an engineering problem rather than “duct-tape it and keep it chugging along”. So yes, if you want to treat it as an engineering problem, you must _employ engineers_. These engineers can be former sysadmins so long as they know how (or can be trained in a reasonable timeframe) to do engineering.

WRT your “walled garden” quip, employment is about qualifications. No one is entitled to jobs for which they are not qualified, not even sysadmins. If the employer is hiring engineers and the sysadmin candidate can’t or won’t learn how to engineer, then they are not qualified.

exikyut · on Oct 4, 2019

Hi! I wanted to say a couple small things in followup to your recent submission about coding, but that thread has now locked and is not accepting replies (<rant>my biggest dislike of HN</rant>). Really glad you're still posting comments and I can reply to this one! My email is in my profile (click my username).

dvaun · on Sept 28, 2019

What types of businesses — perhaps the industries — do the majority of your applicants come from? I'm asking because, from my limited experience, it seems that many small and medium sized businesses that are NOT in the IT field are afraid of testing newer tech.

The justifications I've been given for this stance are:

1. Newer tech introduces new problems, and increases the scope of working knowledge

2. Adopting new practices requires the business to attempt to hire for that skill in the future

3. Present managers, whose experience stemmed from working in a sysadmin role, do not have the working knowledge and capability to understand/learn new practices

With that, do a lot of applicants seem to come with a basic/old background of just Windows (or the like) experience?

It's hard to try and get any of your mentioned requirements running in these businesses. What my friends and I've encountered is a big resistance to the command-line (Powershell or Bash), learning how Linux systems are configured, or anything that doesn't come with a large support contract.

If yourself or anyone on HN has tips or anecdotes on how to introduce changes — gradually and slowly AND given that it could help the business — I would LOVE to read them! My biggest goal is to reduce operating and capital costs for systems that are not accounted for contributing directly to increasing the company's revenue (at least when you don't control that calculation, anyway).

_vertigo · on Sept 28, 2019

Yeah, I’m not complaining about the state of the industry - it makes sense why you want people who can code! My advice is more for people in DevOps who can code that because of this inundation of sysadmin applicants who can’t code, you need to make sure that you are distinguishable from the herd so that your application doesn’t get rejected immediately based on your title. That’s why the title really matters, and also why it’s important to actually write code as a DevOps. Any role where you don’t get to code is career-limiting, IMO.

dev_dull · on Sept 28, 2019

In that case, isn’t an excellent DevOps/sre engineer indistinguishable from an excellent software engineer? Why hold any distinction at all?

sudosteph · on Sept 28, 2019

Probably because the role actually does still require someone to be able to put out fires while under stress. I've met many otherwise excellent software engineers who cannot or will not deal with high pressure oncall situations. The Operations aspect of DevOps requires a certain level of familiarity and comfort with the type of real-time communication and troubleshooting needed to deal with emergency situations. Some people are excellent with design and implementation work, but do not communicate well enough in high pressure situations to fit a DevOps/SRE role.

bahmboo · on Sept 28, 2019

Maybe so, but my experience is that the "ops" people don't want to be "devs". As other posters have said they do want to be integrated with design choices that will affect them, and many times they do write non trivial code that keeps these complex systems on their feet.

devonkim · on Sept 28, 2019

Most software engineers simply hate being on call and the software being developed can be pretty mundane unless you’re working on cloud native tooling perhaps. It is a rather narrow area of software engineering honestly, but IME software engineers passionate about their software in production are great SRE candidates and I say this not because I’m a former generalist software engineer either but have had to hire for these positions.

dilyevsky · on Sept 28, 2019

Narrow hm? In my experience as sre-swe I had to debug and write patches for kernel issues, networking issues (l3,l4 and l7), various OS issues (related to fs, cgroups, memory management), then there’s orchestration (scheduling, upgrades), safety/reliability and various configuration tooling which I had to write in Python, C++ and Go (not to mention half a dozen or so DSLs). Then there’s incident response skills for oncall.

It is much more broad than when I was an embedded dev with only one job - to make some driver work on a different architecture.

eropple · on Sept 29, 2019

If you have your infrastructure team on call instead of your developers, you are screwing up.

In almost every reasonably shaped organization the majority of bugs are shipped by developers, not infra/platform/SRE. Localize the pain to the agents who cause it or it will never go away.

devonkim · on Sept 29, 2019

Oh, not saying that’s how it should be. My current situation is such that infrastructure is the majority of the production issues and we’ll call developers on the rare occasion something serious happens relating to their code. Our platform goes through much more testing rigor than most SaaS companies our size tend to perform and I’m proud to be supporting these guys.

weberc2 · on Sept 28, 2019

This isn’t it. DevOps aren’t the (exclusive) oncall engineers, the dev teams should be responsible for oncall as well—the people empowered to create or fix the operations problems should be responsible for operations. See my sibling comment for why DevOps is different than SE.

eropple · on Sept 29, 2019

All true. At my current gig, the infra team is on call for pretty much everything. That's how it was when I started, and it's taken time to deal with stuff like alert fatigue and better surfacing of metrics and logs. But we're now in the process of moving to all first line pages going to the dev team (because they ship most of the bugs in the first place). If there's an infra problem, they can call us then.

BossingAround · on Sept 28, 2019

From my experience, SRE is a person with developer mindset (and skillset, or the desire to have a developer skillset) who doesn't mind touching the infra.

A SWE typically won't want to touch infra. That's my experience of course, YMMV.

williamDafoe · on Sept 28, 2019

SWEs who don't touch infra show up because universities today pump out a lot of book-smart real-world-dumb graduates.

limograf · on Sept 29, 2019

I don't think so. I have my areas of expertise and of course in a pinch I will pitch in and try to help any way I can, but I've always found my best work is done in a team with fairly well defined roles, a healthy respect for each others' specialisms, and an enthusiasm for short bursts of collab/pair coding and longer stretches of solo(ish) work.

I have fond memories of the team that worked across three time zones so I would get up to a set of well-described problems that I would solve in the morning (so satisfying), then a nice stretch of feature building after lunch, then a burst of pair coding with my newly arisen colleague, then maybe finishing with writing up any roadblocks or requests for the next person. I got very used to identifying blockers that were out of my area and would be better solved by the domain expert, and also a LOT better at ticket writing. It was a really, really productive and rewarding workflow and one of the key points was not getting bogged down in stuff outside my areas. We also all really appreciated each other because we all experienced each other as magical elves giving answers to hard problems in exchange for answers for easy problems! ;)

eropple · on Sept 29, 2019

The distinction is mostly because a lot of developers lack the fundamental understanding of downstack problems in 2019. It’s not that they can’t do it, it’s that they’ve rarely had to and in so not doing have built themselves a mental Jenga tower that requires time and effort to stabilize and build a foundation beneath.

Companies hate that. Investing in people who will leave is bad, they think. Put them in their box and let them do what they already know.

Which is why they hire me and call me an SRE. (I don’t use the term. My current title is “principal engineer”. I’m not an engineer, though. Neither are most people here.) And I’m not saying downstack ignorance is great. Profitable for me, sure. But it’s a natural response to companies’ unwillingness to invest in their people. They want them pre-made. Hence the made-up titles for people with breadth.

weberc2 · on Sept 28, 2019

Familiarity with different problem domains. Software engineering is essential to the DevOps skillset since we do a lot of automation, but also understanding the (constantly changing) ecosystem of tools, how to design a CI/CD pipeline, how to configure the developers’ dev environments, how to model your infrastructure as code, etc, etc. A good DevOps engineer is a good SE with an understanding of the DevOps problem space.

megaremote · on Sept 28, 2019

Because they don't want to pay software engineering wages.

williamDafoe · on Sept 28, 2019

SREs make 5-10% more than SWEs at Google.

_vertigo · on Sept 28, 2019

That’s good to know, thanks for bringing that to my attention. Is it true that SREs are more likely to have oncall rotations at Google than SWEs? That is the impression I got from reading the Google SRE book - it talks about bringing SWEs into the oncall rotation when the SREs are swamped (i.e. more than 50% of their time is devoted to operational work) as a kind of pressure release mechanism to prevent the SREs from burning out or getting mired in endless toil. It makes it sound like outside of situations like that, SREs are doing most of the oncall work.

If this is the case, perhaps the pay discrepancy is explained by the greater oncall duties and morale issues for SREs?

It’s also worth mentioning that based on my reading of the Google SRE book, very few organizations approach operations like Google does. I personally think that Google has an enlightened approach to operations, but not all companies do.

Basically, I think SRE is a good title to have, but sysadmin is not, and I encourage anyone early in their career to rebrand themselves ASAP.

ahartmetz · on Sept 29, 2019

> I personally think that Google has an enlightened approach to operations, but not all companies do

Perhaps because the founders were doing operations (well) in the beginning, so they know a few things about it and its importance. They famously built a reliable system with about the cheapest possible hardware.

extra_rice · on Sept 28, 2019

Are SWEs at Google expected to firefight especially outside of normal work hours? This is one thing I imagine SREs are expected to be doing that regular SWEs aren't. I think this is also why they require strong development background for the role.

commandersaki · on Sept 28, 2019

I worked in Amazon which has a similar view to on-call to Google. Usually one person is on-call for the team for a cycle of every week or fortnight. There is also a "follow the sun" model wherein another team in has your nights covered. I do recall that SWEs also had the responsibility of being on-call, since teams are usually a mix of SWEs and SREs.

Naturally a team with a high ops load was shit and made your life hell. After my time there, I vowed never to do on-call unless it was after hours and specifically for emergency response. My opinion is - if there's an issue during work hours, the leads or the entire team should be on it - and collectively fix it.

Anyway, Google has written a bit about this in their SRE book: https://landing.google.com/sre/sre-book/chapters/being-on-ca...

commandersaki · on Sept 28, 2019

Which kind? There are SRE that are SWEs and then there are plain SREs.

avip · on Sept 28, 2019

It doesn't matter now. It'll matter next time.

Change your title to DevOps, SRE, or team lead. I also don't like titles, I'm sharing my experience with you.

DonHopkins · on Sept 28, 2019

"Computer Janitor"

mieseratte · on Sept 28, 2019

Based on my last hotel stay, that would be Computer Sanitation Engineer.

DonHopkins · on Sept 28, 2019

Maybe some day Amazon will make a VR interface to AWS, like Viscera Cleanup Detail:

In Viscera Cleanup Detail, players are given the role of "Space-Station Janitors", tasked with cleaning and repairing facilities that have been the scene of bloody battles during an alien invasion or other form of disaster. Tasks include gathering and disposing of debris, including, dismembered bodies of aliens and humans, spent shell casings and broken glass, restocking of wall-mounted first-aid kits, repairing bullet holes in walls, and cleaning of blood splatter and soot marks from floors, walls and ceilings, as well as secondary bonus tasks. These include stacking items like crates and barrels in a designated stacking area, and filing disaster reports on the events and deaths that took place in a corresponding level.

https://en.wikipedia.org/wiki/Viscera_Cleanup_Detail

Writing for The A.V. Club, Chaz Evans called it a commentary on first-person shooters that focuses on the consequences of violence.

https://games.avclub.com/seeing-without-crosshairs-a-survey-...

binwiederhier · on Sept 28, 2019

This comment made my day. I laughed out loud. Well done.

Conan_Kudo · on Sept 28, 2019

It's funny because it's true! The amount of insanity is frustrating...

nurettin · on Sept 28, 2019

How do they keep a pile of shit afloat, though? Do they build log monitors and trigger restarts when shit dies?

prepend · on Sept 28, 2019

The trick is to eat the right stuff so shit floats by itself.

tuco86 · on Sept 28, 2019

Basically, yes. We use Kubernetes and it has that shit built in. If the software is too crappy you will be forced to switch it off and do it manually, tho. And when you are not doing anything else anymore you will have succesfully tansitioned from software engineer to SRE.

porpoisemonkey · on Sept 28, 2019

I recently transferred as a developer to a development operations role at a medium-sized company. This article exactly describes my experience - the main focus of the ops team seems to be on build, deployment and monitoring technologies focused on a migration towards containerization running on AWS.

Code and tooling is built on a "what works" basis and no particular attention seems to be paid to the overall design of the software or testing (likely due to a real or perceived lack of time). On-call rotations (and how to eliminate work for them) is the hot button discussion topic.

The #1 question I get asked by the developers is "Did you really want to transfer from development to ops?" which me think that a lot of developers look down on operations roles and see it as a demotion. I find that quite odd given that 1) ops keeps the product up and the money coming in with relatively little headcount and 2) most of the people working on our ops teams have a formal education in software engineering.

sargram01 · on Sept 28, 2019

I think it’s because ops folk cobble together scripts and tools only to scratch the current itch, rather than think through the whole problem and design and write software to solve it. Testability being one of the biggest sins I’ve seen in ops, tools like Ansible encourage changing systems at run time with complex logic tied to specific deployments for example, they don’t use IDEs so there’s no way to jump through the yaml files, get inline help, know which playbook is run when (it’s like a program with 100s of main()), no integrated debugger. It’s like the previous 50 years of computer science never happened and they’re starting back in the 1970s.

sudosteph · on Sept 28, 2019

A little thought exercise: your comment from the POV of a ops person explaining why they look down on devs.

Dev folks often take weeks to make even a small change. It doesn't matter how urgent the need is or how badly the business needs some kind of workaround, they over-complicate every problem, and ops has to spend more time in meetings planning their next project than it would have taken for us to put a working fix to be in place. Maintainability is one of the biggest sins I've seen in devs, tools like npm encourage devs to use an overly-complex chain of 3rd-party dependencies for even small projects, and they rarely ever update their dependencies after initial deployment, so if a security update to an underlying package is required or if the underlying has to be deployed to a new underlying system or container changes, things can break easily. They depend on IDEs for everything, and can't even use commonly installed system tools like awk or sed when, or use netstat to debug simple networking issues. It's like the 1970s never even happened and they never even learned what an operating system is.

hedwall · on Sept 28, 2019

Just like developers cobble together apps that are barely operable. Hard coding IP addresses, opening connections with no timeouts, and lacking basic understanding what the difference is between DNS and HTTP.

snlnspc · on Sept 30, 2019

at one point, I assumed hard coding IP addresses and paths to /home/user/whatever/ was a just a web dev thing

spoilers: it is not just a web dev thing

throwaway122379 · on Sept 28, 2019

Great post! You touched on something there thats been bugging me last few years, testing is non existent in devops/sre world so developers who hate testing and do not understand it is core to good engineering seem to gravitate towards devops roles as they can hack their way thru their day all while creating tons of tech debt.

clvx · on Sept 28, 2019

Testing is basic in devops/SRE. How are you going to ensure reliability if you cannot test the shit out of it?

The issue is there’s no formal tool or practices for testing in ops. DSL languages are limited forcing you to use several for different kind of scenarios. At the end you need to rely in a real programming language to parse different format files and ensuring your variables are correct. I think developing software for operations is exciting which is going to mature with time. Kubernetes(and its CRD) is a step forward

p_l · on Sept 29, 2019

Chef had great support for automated testing. We also generally did a lot of testing on other tools.

Sure, things like ansible lack proper testing support, but that doesn't mean that all of the profession doesn't test.

sargram01 · on Sept 28, 2019

> The issue is there’s no formal tool or practices for testing in ops.

There is though, it’s called writing normal software. Kubernetes can be framework to build on, with actual developers who design from high level logic down through to the implementation.

empath75 · on Sept 28, 2019

> The #1 question I get asked by the developers is "Did you really want to transfer from development to ops?"

I’m not sure what job listings they’re looking at, but as an ops person (kubernetes) I’m interviewing for jobs with close to half a million a year in total comp.

porpoisemonkey · on Sept 28, 2019

I think that people perceive being on call and having to work with messy legacy infrastructure as a restriction on their personal freedom and an anti-perk. We also don't compensate well for oncall shifts so that might be part of the reason that people see it as a lesser form of work and not as a special status that's only given to the best engineers. At other organizations it may be different... I hear FAANG companies compensate their on call shifts quite well (typically a percentage of base salary banded by SLA)

kubelust · on Sept 28, 2019

Second time I'm hearing (what's for me) a mile-high comp number for kube jobs and I'm now really tempted.

Working as a data scientist - software engineer in a midsize company, I constantly battle amateur ops folk and "backend" fullstack jockeys from introducing kubernetes into a saas product I mostly created by myself (makes money but there are maybe ten users per hour tops, why would I need kubernetes for that?).

My org has seen multi-day downtimes for the entire eng team workflow because the eks cluster went down and they couldn't figure out how. We have four people dedicated in the infra team for this! I'm not really an ops person but I see where the failings of these folks and stacks are, and feel like I might be able to learn to be half-decent if I put the time. What advice would you give ?

imtringued · on Sept 29, 2019

Kubernetes is a full time job. If you want to capture 90% of the benefits of containerization without wasting too much time on a complex solution then simply restrict yourself to only use docker with bash scripts and maybe a load balancer if you really want to have HA.

I've found nomad to have a lower complexity than kubernetes but if you cannot directly integrate service discovery into your application then you will need to use a service mesh which is an all or nothing thing but using something like traefik's support for consul means you will have to use regular service discovery alongside the service mesh. It's not a huge burden but there should be a better way.

auslander · on Sept 29, 2019

> why would I need kubernetes for that?

They need k8s for their, otherwise bleak CV. Its all hype and shiny today.

empath75 · on Sept 29, 2019

> (makes money but there are maybe ten users per hour tops, why would I need kubernetes for that?).

The idea is that you move your low traffic app onto servers with other low traffic apps and save money. If they’re just moving your app to a kubernetes cluster by itself it’s probably not worth it.

jamestimmins · on Sept 28, 2019

Whoa where are you seeing these job listings? I don't know if I've ever come across a public listing for a technical role that paid so much.

chevman · on Sept 28, 2019

They don’t really exist. I’m mid career management and hire these types of roles. You go above $200k and you’re just wasting money. There may be a few random openings above this but nothing sustainable once folks realize the talent curve.

empath75 · on Sept 28, 2019

They don’t list the salary but the recruiters will tell you — it’s Silicon Valley companies mostly.

wikibob · on Sept 28, 2019

Can you share more? I am aware that this comp exists (just see levels.fyi) but haven’t found it for Kubernetes focused roles yet.

Are you talking to FAANG’s? Second tiers like Uber, Square, etc? Other?

empath75 · on Sept 29, 2019

Both.

jorblumesea · on Sept 28, 2019

After having done a rotation in the dev ops world, I find their lack of design and architecture really disturbing. Services are implemented without edge cases considered. Their tools are some hacked together monster using flavor of the month and some old tech stacks. Unit and system tests aren't written, or if they are, they don't test any edge cases whatsoever. It's hard to test anything locally and most testing is done in test or even prod (Seriously). Documentation is almost non-existent and the implementations of services such as Chef don't follow chef documentation or best practices. Logging is usually non-descriptive and applications written by the ops teams will fail with cryptic errors which they know by heart but make no sense.

"Oh that nil pointer exception on line 53? Yeah that means you don't have IAM permissions".

In short, the ops world lacks what most software engineers would consider basic engineering practices. It really feels like that entire world is just a hackathon project. I know there are some very talented engineers on ops teams. It's just the impression I got from 6ish month rotation.

Rant over, this just touched a nerve.

stormking · on Sept 29, 2019

Strange, I'm burdened with a development team that behaves the same, except they usually don't know what the error messages that their own software produces mean, especially not "by heart".

But since the CEO has a development background, always sides with them and in general wants them to build "features", not fix their shit, they never have to take responsibility for anything.

rinchik · on Sept 28, 2019

> medium-sized company

this is mind boggling! For med size company to have a separate ops department. I can't imagine that responsibility for MY code in production (and delivery of the said code to production) lies on a different team/department that are possibly not even on the same floor/building I'm at.

Separate OPS department is where dev's happiness and job satisfaction go to die.

I'd imagine this outdated setup in some government agency but not in a med-size company.

porpoisemonkey · on Sept 28, 2019

> I can't imagine that responsibility for MY code in production (and delivery of the said code to production) lies on a different team/department that are possibly not even on the same floor/building I'm at.

I recently did some interviews with our devs to find out what features we could add to our platform that would provide them with value. The result was interesting and I found that people typically fall into one of two camps: 1) I want to know everything about my deployed service and have tools that alert me and allow me to intervene, or 2) I don't want to know anything about the deployed service runtime and expect operations to handle my issues and alert me when there's a problem.

It sounds like you might be in the former. =)

rinchik · on Sept 28, 2019

The initial intention (of DevOps) was to eliminate these discrepancies by eliminating Ops department all together. DevOps doesn't necessarily mean devs doing ops, but it means devs and people curios about ops sit together, as one team, as one department, right next to each other. This setup improves practically every aspect of the product development, support, and delivery, as well as collaboration, communication, and response times.

Initial intention aside, it feels that there is a general consensus that VAST majority of the companies that have `devOps` in the job description somewhere are cargo culting and have no clue what they are doing.

If you slap `Dev` prefix to your Ops department your job postings will look "trendy" but nothing else will actually change.

DevOps is a culture, not a team or department.

ek750 · on Sept 29, 2019

Sounds like how companies or teams cargo-cult “Agile” and have no idea what it is or why.

diek · on Sept 28, 2019

You may deploy your application package (however that is packaged up), but what happens when a hard drive starts to die? It may not _die_, it just may have elevated write latency. What about a RAID controller firmware having issues above a certain IOPS threshold? What about a critical kernel security patch that has to go out, and your application runs on 1,000 servers?

None of those things are related to your code directly, but may interact with it at some level. At some point, you get so far removed from the work on your actual application that it makes sense to move that to another 'Infrastructure' group.

c3534l · on Sept 28, 2019

What do you consider a medium-sized company? Or a department even? I'm having trouble imagining a department being in an entirely different building. The companies I've worked for who I considered medium-sized had hundreds of employees and if they had multiple different offices, each department had a physical presence at each office.

scruple · on Sept 28, 2019

Cargo cult doesn't care about the number of employees in a company.

cosmodisk · on Sept 28, 2019

Operations are always looked down, nothing new here.However it's funny when I see devops jobs with much higher salaries than the ones for dev roles. I manage an ops team in a non tech company- while sales get much more attention, we have better office environment,better equipment,salaries are better and there's no quartermaster using his whip on the deck...

Nursie · on Sept 28, 2019

> "Did you really want to transfer from development to ops?" which me think that a lot of developers look down on operations roles and see it as a demotion.

I would certainly see it as a departure from what I want to do - design, build and deliver systems. I'm not really in it for the continuity operations.

zbentley · on Sept 29, 2019

Then, and I don't know of a nice way to say this, the systems you design, build, and deliver are going to be unreliable and flawed.

Owning (as in caring about and as in business responsibility) the reliability and operation of systems you build, at least for a while after they stabilize, is critical if you want to produce quality products. After all, operational flakiness is a UX issue.

Nursie · on Sept 29, 2019

> Then, and I don't know of a nice way to say this, the systems you design, build, and deliver are going to be unreliable and flawed.

Sorry but that's utter bollocks. I'm not interested in being in ops, therefore my software is shit?

> Owning (as in caring about and as in business responsibility) the reliability and operation of systems you build ... is critical.

But that's not being in ops. That's about taking an interest in the running system.

You've just jumped on me saying I'm not interested in having a role in ops because I like to build software, and run off to some weird unsupported conclusion that I just write code and abandon it. I don't need to have a business responsibility for the running system in order to support it and be responsive to the ongoing needs of those that do.

Conan_Kudo · on Sept 28, 2019

> 2) most of the people working on our ops teams have a formal education in software engineering.

You are so lucky. This is incredibly rare. I'm also lucky that my team is the same way. But most teams are not.

mlLK · on Sept 29, 2019

Hey dude was wondering how this comment made you feel:

> Cloud monitoring is a saturated market.

It's like yeah, it's saturated but it's saturated because the infrastructure is we're hosting it on is still and forever changing. Take some serverless CloudFormation in AWS; there was no good solution for application monitoring until someone specifically started solving for it because no one in their right mind was going to use CloudWatch and none of the other existing monitoring solutions/tools could fit the bill either unless they started from scratch and solved for that specific new infrastructure.

The Cloud monitoring market might seem saturated but that's because there is no "silver bullet" solution given how much infrastructure has been and continues to change.

eropple · on Sept 29, 2019

CloudWatch works fine and Epsagon's sales tactics are dishonest and shitty in addition to being spammy--I'm still waiting to hear back from a "Cassie" who I don't think actually exists as to where they sourced my email from for their cold-email marketing blasts.

I'll never do business with a company that gross.

jancsika · on Sept 28, 2019

> DevOps means the veteran admins had to check in their personal scripts

Oh my, this is an epiphany.

I don't even know what DevOps means. But if the implication was that the "veteran admin" was making arbitrary state changes and is now forced by DevOps to document it in the commit history, I am firmly in favor of whatever DevOps is.

wodenokoto · on Sept 28, 2019

My understanding is that devops are the veteran admin, only now they check-in their code.

Nursie · on Sept 28, 2019

What it was supposed to be -

Software development, deployment and ops come under the same role. The DevOps engineer controls the horizontal and the vertical, the code, the environment, the build, the deployment, and as such is a multi-talented unicorn.

What it seems to have become - Sysadmins writing scripts around terraform, and formalising their work to the extent that it is at least usually reproducible.

orthoxerox · on Sept 28, 2019

> a multi-talented unicorn.

That's not necessary. Natural curiosity and willingness to treat dev/ops (whichever is not your team) as us, not them, is sufficient.

Nursie · on Sept 28, 2019

I'm not sure I've seen that be any more prevalent now than it was when we had developers and sysadmins.

To me it's become like 'agile' and interesting set of viewpoints and philosophies totally ruined by the industry that's grown around it.

orthoxerox · on Sept 29, 2019

Well, at least the DevOps industry brings actual value to the process (well, not when you are sold OpenStack, OpenShift, Vault and whatever else to run a single container image).

ljm · on Sept 28, 2019

> Everyone hates YAML. Everyone writes a lot of YAML.

I don’t know a single engineer (ops or not) who enjoys writing YAML, yet it is utterly unavoidable.

I’ve lost count of the amount of bugs and broken deploys that have happened because of YAML, or because of a type error caused by it.

cbanek · on Sept 28, 2019

What people really hate is having to write complicated configuration. No matter what format you put it in, it is still complicated configuration. It's hard if not impossible to test, there's only one right way to do it, and it is likely interconnected with other complicated configuration.

Whatever format it happens to be written in (now YAML is the apparently trendy way) it is guilty by association.

DonHopkins · on Sept 28, 2019

Pure YAML will never DRY.

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

If you write procedural configurations with a real Turing Complete, but not Turing-Tarpit language like Python or JavaScript, then you don't need to repeat yourself ridiculous numbers of times, and manage thousands of lines of hand written eye-sore almost-but-not-quite-entirely-unlike-tea YAML. Plus you can implement validation, warnings and error checking, and support multiple input and output formats.

But so many DevOps dogmatically go out of their way to avoid writing any code, even when the alternative means writing a hell of a lot more brittle unmaintainable YAML where you have to carefully check every line for correctness, and make sure that you're actually repeating the same thing in every location required without any typos, and don't let any of those repetitions slip through the cracks when you make changes.

With real Turing complete code, macros, and templates, you can accomplish the same task as with pure un-DRY YAML data, much more easily and efficiently, with orders of magnitudes fewer lines of code, that you can actually understand, maintain, validate, and test, so you can be sure it actually works without meticulously checking every line of output by hand.

It's better to combine DRY combinations of different non-Turing-complete data formats like CSV, YAML, JSON, INI, XML, using the most appropriate formats depending on the type of data. Spreadsheets are much better than JSON or YAML or XML for many different kinds of data with repeating structure (so you don't repeat key names), and JSON and YAML are much better for irregular tree-shaped data (since you're not restricted to rigid structures), and XML for tree structured data and documents including text.

You end up needing different formats as input as well as output. So you need a real Turing Complete language to read them all in, combine and validate them, and shit out the various other formats required by your tools and environment.

linuxftw · on Sept 28, 2019

Almost every piece of software running on Linux or Unix requires some kind of basic configuration file. Whether it's INI or YAML. You can't just 'code all the things'.

If you want an example of a dynamic configuration, look at RPM's .spec. That's a monstrosity. What you're asking for is more of that, and that's insane.

You could also use something like Python to do all your typing, build a dictionary, and just dump it to a yaml file if you think writing yaml by hand is too error prone (which I personally disagree with).

DonHopkins · on Sept 28, 2019

I think RPM .spec files are sunk way deep into the "Turing Tarpit" area I was referring to.

https://en.wikipedia.org/wiki/Turing_tarpit

Any time you take a language like bash that's ALREADY deeply muddled in the Turing Tarpit, and then try to make up for its weaknesses by wrapping it up in yet another arbitrary syntax that you just pulled out of your ass like .spec, you're even worse off than you were when you started. Why pick a terrible language like bash that's no good for reading and writing and manipulating structured data formats like CSV, JSON, YAML, INI, or XML, and then try to "fix" it, when there are so many much better well supported off-the-shelf alternatives that don't require re-inventing the square wheel, like Python, JavaScript, or Lua?

linuxftw · on Sept 29, 2019

I'm sure when RPM first came about, it was probably less of a monstrosity. But like most things in the software world, people start bolting on new features, and you have a big mess.

For most programs though, I would prefer a YAML config file. It's easy to serialize/deserialize for many languages, and you can adjust your init scripts / systemd units to spit out a new config on startup if you so choose. Or you can use something like ansible and some templates to generate that config once when you deploy your application (we're all using immutable infrastructure now, right?), although trying to template YAML files in jinja2 is a real PITA; I'd probably just write an application specific ansible module to dump out my config and skip the yaml jinja template part.

That's the really nice thing about ansible, you can make it do all sorts of interesting stuff.

platz · on Sept 29, 2019

https://dhall-lang.org/ No tarpit here (no Turing Completeness or recursion), and as DRY as you want.

cbanek · on Sept 28, 2019

> But so many DevOps go out of their way to avoid writing code

Well, this is the real problem, is that DevOps people should be writing code in my opinion, especially code to automate deployments and handle configuration. But many times it's just a new job label for people with the same ol' ops and sysadmin skillset who don't want to write code.

It doesn't help that when they make unmaintainable piles of configuration that nobody understands, it typically adds to their job security.

DonHopkins · on Sept 28, 2019

I totally agree!

There are good DevOps and bad DevOps. Personally, I'm a Dev who necessarily knows how to do Ops, because nobody else is there to put out the fires and wipe my ass for me. Good DevOps should not have such disdain for writing code, and should not be so territorial and focused on job security, and should work more closely with developers and code.

And good developers should understand operations, and shouldn't be so helpless and ignorant when it comes to deploying and maintaining systems themselves, so they can design their code to be easily configurable and dovetail into operational systems.

For the same reasons, it's also important for programmers developing tools and content pipelines for use by artists to understand how the art tools and artists work, and how to use them (enough to create placeholder programmer art to test functionality), even if they don't have any artistic ability.

And for artists and designers to have some understanding of how computers and programming works, and how to use spreadsheets and outliners and databases to understand and create specifications and configurations, so they don't design things that are impossible or extremely inefficient to implement, and make intractable demands of computers and programmers.

https://en.wikipedia.org/wiki/Programmer_art

cbanek · on Sept 28, 2019

I'm with you on that. I come from a background of Dev and have been called DevOps (by others), although I just call myself a problem solver.

The realization that I really needed to understand what happens in operations for me came around '09, when the Xbox Operations Center called me and told me my code wasn't working, and we had such a wall between us that I couldn't see what was going on, and they couldn't describe it either.

I ended up writing automated publishing pipelines for them to take the most risky parts of their dozens of pages word doc and writing tools to do this for them automatically. Most people didn't even think this was a thing that could be done, let alone should be done. Problem solved!

I think people who are territorial are inherently insecure in their skills and therefore fear getting out of their comfort zone. Generalists are far better than specialists in my opinion. You want someone to go where the problems are, rather than people who invent new problems for others in their own little empire. I think a lot of big companies are so big they can have people silo'd all day, so people don't even think about the people and systems they are affecting.

ken · on Sept 28, 2019

I've used a lot of JSON. It's OK. No comments, not many data types. But the spec is only a couple pages, and I've never been in doubt about how something should be escaped, or parsed. I could probably write a bare-bones parser in an afternoon, if I needed to.

I've tried to work with YAML a few times. The tree structure and extra data types are great. Everything else is a huge pain. There's at least 3 versions of the spec, and the latest one is nearly 100 pages. The parts I need are always in some "extension", so there's even more that I need to support. It has a system for serializing native Objects, so you have to be careful with untrusted data because there are some interesting security issues. It's so complex, I have trouble knowing what to quote, or how. It's not feasible to write your own parser in any reasonable amount of time. Worst of all, every parsing library is slightly different, so (not unlike SOAP) you kind of have to know that it's going to be parsed with (say) PyYAML.

Complicated configuration is indeed a problem in any format, but YAML makes even simple things complex. From the beginning, I really wanted to like YAML. Unfortunately, I think their goals (human-readable text, language-agnostic, rich data types, efficient, extensible, easy to implement, easy to use) are impossible. You simply can't achieve all of them at once.

davnicwil · on Sept 28, 2019

I'm launching a CI service [0] which instead of using YAML configs to run builds on a third party platform, will let you run the builds yourself on your own machines so you can just use a script or whatever you want to do your builds/deploys.

I share your frustration and was motivated by it to build this. Why should I spend ages writing up everything as config files when I have a script that already works, is easy to change and debug, and can handle any custom thing I need?

I think config files to describe devops processes are a good approach for huge companies with huge teams, lots of churn etc. The approach perhaps has simplicity & stability benefits - works for everyone everywhere without understanding any detail, changes are a bit easier to track, etc. But for small teams wanting control, speed and the flexibility of just writing code to do what you want it can often be an inefficient approach. At least in my experience.

You should check Box CI out. Launching very soon!

[0] https://boxci.dev

raverbashing · on Sept 28, 2019

True, though YAML is the most "human readable/writable" of the usual suspects (YAML/JSON/XML)

DonHopkins · on Sept 28, 2019

I'd say that spreadsheets are vastly more readable/writable/editable/maintainable than YAML or JSON or XML (i.e. no punctuation and quoting nightmares), and they're easy to learn and use, so orders of magnitude more people know how to use them proficiently, plus tools to edit spreadsheets are free and widely available (i.e. Google Sheets), and they support real time multi user collaboration, version control, commenting, formatting, formulas, scripting, import/export, etc. They're much more compact for repetitive data, but they can also handle unstructured and tree structured data, too.

To illustrate that, here's something I developed and wrote about a while ago, and have used regularly with great success to collaborate with non-technical people who are comfortable with spreadsheets (but whose heads would explode if I asked them to read or write JSON, YAML or XML):

Representing and Editing JSON with Spreadsheets

I’ve been developing a convenient way of representing and editing JSON in spreadsheets, that I’m very happy with, and would love to share!

https://medium.com/@donhopkins/representing-and-editing-json...

Here is the question I’m trying to answer:

How can you conveniently and compactly represent, view and edit JSON in spreadsheets, using the grid instead of so much punctuation?

My goal is to be able to easily edit JSON data in any spreadsheet, conveniently copy and paste grids of JSON around as TSV files (the format that Google Sheets puts on your clipboard), and efficiently export and import those spreadsheets as JSON.

So I’ve come up with a simple format and convenient conventions for representing and editing JSON in spreadsheets, without any sigils, tabs, quoting, escaping or trailing comma problems, but with comments, rich formatting, formulas, and leveraging the full power of the spreadsheet.

It’s especially powerful with Google Sheets, since it can run JavaScript code to export, import and validate JSON, provide colorized syntax highlighting, error feedback, interactive wizard dialogs, and integrations with other services. Then other apps and services can easily retrieve those live spreadsheets as TSV files, which are super-easy to parse into 2D arrays of strings to convert to JSON.

tlarkworthy · on Sept 28, 2019

That is super cool, please don't over complicate it with utility features. I have been considering a project to manage a kubernetes cluster via Google spreadsheet. Google docs have great features relating to user authentication and permissions. The project would needs to visualize the JSON state representation for the k8s cluster... your project is ideal.

e.g. calling another google service with the JSON using a token minted BY THE USER CURRENTLY USING THE SHEET

DonHopkins · on Sept 29, 2019

Thanks for the encouragement! I agree, I'd like to keep it from becoming complicated. My hope is to keep it simple and refine it into a clean well defined core syntax that's easy to implement in any language, with an optional extension mechanism (for defining new types and layouts), without falling into the trap of markdown's or yaml's almost-the-same-but-slightly-different dialects. (I wrote more about that at the end of the article, if you made it that far.)

The spreadsheet itself brings a lot of power to the table. (Pun not intended!)

There are some cool things you can do using spreadsheet expressions, like make random values that change every time you download the CSV sheet, which is great for testing. But expressions have their limitations: they can't add new rows and columns and structures, for example. However, named ranges are useful for pointing to data elsewhere in other sheets, and you can easily change their number of rows and columns.

For convenience and expressivity, I've defined ways of including other named sheets and named ranges by reference, and 2d arrays of uniformly typed values, and also define compact tables of identical nested JSON object/array structures by using declarative headers (one object per row, which I described in the article, but it's not so simple, and needs more examples and documentation).

tlarkworthy · on Sept 29, 2019

yeah my eye brows are fairly raised at the thought of embedding a templating language in it. For production use of a spreadsheet, I imagine pulling the source code out of the spreadsheet using https://github.com/google/clasp and synchronising with a repository using Terraform.

At which point Terraform has a weak templating engine already, but its generally enough for building reusable infra. Additional features can be provided within the spreadsheet using reusable libraries. One pain point with embedding functional dataprocessing in a spreadsheet for JSON data, is a decent way of writing tree expressions, to which I would turn to the de facto JSON tooling jq for inspiration.

if you want to take this further, I am up for building some infra for continuous deployment spreadhseets through terraform. tom <dot> larkworthy <at> futurice.com

But I would not embed stuff inline with the JSON. I would have a pure sheet dedicated to stuff going in, and a compute sheet for stuff join out. And the definition for stuff going out should basically be a JQ expression, that can "shell out" to sheets expressions https://github.com/sloanlance/jq/issues/1

xpe · on Sept 29, 2019

TOML is a worthy contender. It is my favorite simple but powerful-enough data language.

Here is a side-by-side comparison of data in TOML versus YAML: https://gist.github.com/oconnor663/9aeb4ed56394cb013a20

And some comments that resonate with me:

  The yaml spec is overly complex and parsing it properly
  is a nightmare. I rather prefer TOML because of it's
  simplicity. Unless one really need the gazillion extra
  features which yaml provides (which one probably doesn't),
  I'd say sticking with TOML seems to be the saner choice.

  I've recently kind of changed my mind on unquoted strings.
  They're nice when you're editing config files by hand, but
  they run into parsing issues in simple cases like when the
  string looks like an int, or of course when the string
  contains quotation marks itself.

IshKebab · on Sept 29, 2019

I disagree. Yes it is human readable if you just want to read the words (like your would with Markdown), but with a configuration file you want to understand the structure. YAML makes that quite confusing IMO. It seems like a random array of dashes and indentation.

JSON is much more human readable in that respect because the structure is explicit so there's no ambiguity. I'd say TOML is somewhere in-between. But both are vastly preferable to YAML because they don't have Javascript-style type insanity.

Kwpolska · on Sept 28, 2019

I’ve yet to see an editor that works well with YAML, yet JSON and XML are simple. Add comments and trailing commas to JSON and it would be perfect to write. And XML isn’t that evil, really.

FlorianRappl · on Sept 28, 2019

To this day I am baffled why choose YAML? Personally, I think its more error prone, less flexible, and harder to read than JSON. Not to say JSON is a perfect format, but it sure feels better than YAML.

charrondev · on Sept 28, 2019

Reason number 1: aliases and anchors. Reason number 2: allowing comments.

JSON is a terrible config format, and an ok data interchange format.

wpietri · on Sept 28, 2019

Agreed. It amazes me that we got out of the XML-for-everything era with a lot of people thinking the problem was XML, and not the for-everything part. JSON-for-everything is just as maddening to me.

probably_wrong · on Sept 28, 2019

Comments was the reason for me to move from JSON to YAML in config files. I remember perfectly fine what every option does, but adding comments to each option is a must when it comes to sharing my code with anyone else.

FlorianRappl · on Sept 30, 2019

If its a config file I'm not sure why you'll need comments. Comments / documentation should be in the system you'll do the config for. If your set options are that quirky / need documentation; why not explain them in a README?

But maybe I misunderstood. Can you give an example of a comment that makes sense / is required in a configuration JSON?

probably_wrong · on Oct 1, 2019

> If your set options are that quirky / need documentation; why not explain them in a README?

Mostly because I don't know whether the person customizing my code will read the README, but they'll definitely see the comments I wrote right before the configuration option itself.

With comments and some extra whitespace, they can go through the file line by line, read what a specific option does, configure it properly, and move on to the next one. No back and forth to the README required.

orthoxerox · on Sept 28, 2019

JSON's finicky about commas. Quotation marks everywhere are a visual noise. Comments are accepted by many parsers, but not by all of them.

Something like http://www.relaxedjson.org/ is a JSON we need. An implicit root object would make it almost perfect for writing configuration.

weberc2 · on Sept 28, 2019

Yeah, what I want is JSON with comments and nice multi-line string support. I don't like how much syntactic magic YAML does (I don't need or want the country code for Norway to be parsed as a boolean False value). I still don't know what the exclamation point does (e.g., !Ref).

And clearly YAML is the wrong tool for infra-as-code since CloudFormation has to build a macro system, conditionality, referencing, and a couple different systems for defining and calling functions (templates being one and their implicit functions being another). We also see tools like Troposphere and CDK which are effectively different ways to generate CloudFormation YAML via programming languages (or more precisely programming languages that were designed for humans).

And it's not just limitations inherent to CloudFormation--Helm has long had templates for generating YAML, but those also weren't sufficiently powerful/expressive/ergonomic so Helm3 is supporting Lua as well. And as I understand it, Terraform is constantly adding more and more powerful features into HCL.

So what's the solution? It's pretty simple--we should keep the YAML around, but it should be the intermediate representation (IR), not the human interface. The human interface should be something like a functional language[^1] (or an imperative language that is written in a functional style) that evaluates to that YAML IR layer. The IR is then passed to something like Kubernetes or Terraform or CloudFormation which understand it, but it's not the human interface.

As for the high-level language, something like [Starlark][0] would work well. It's purpose-built for being an evaluated configuration language. However, I would argue that a static type system (at least an optional static type system) is important--it's easy enough to imagine someone extending Starlark with type annotations and building a static type checker (which is much easier for Starlark since it's a subset of Python which is intended to be amenable to static analysis).

This, I think, is the proper direction for infrastructure-as-code tooling.

[^1]: Functional in that it is declarative instead of imperative--not necessarily that the syntax should be as hard to read as OCaml or Haskell. Also, while YAML is also declarative, it doesn't have a notion of evaluation or variables.

[0]: https://docs.bazel.build/versions/master/skylark/language.ht...

sedachv · on Sept 28, 2019

I do not understand the need for all of these different new language implementations and data formats. GuixSD vs NixOS already showed that Scheme is a superior solution as a configuration language, scripting language, template language, and intermediate representation. A single language that has 30+ years of successful production use, tons of books and documentation. Why re-invent the wheel in four different, incompatible ways?

weberc2 · on Sept 28, 2019

Is NixOS a scheme? Anyway, we moved away from Nix because of all its problems (the language being only a medium-sized one). Also, I detest CMake, but by your own standard (longevity, popularity), it is better than Nix or Guix. Frankly those tools haven’t shown themselves to be “superior” in any meaningful way. Yes, they have been around for a while, but having been around for a long time and not enjoying any significant adoption is not very compelling.