Automation is the serialization of understanding

blowski · on April 20, 2022

To paraphrase the maxim, working automated systems evolve from working manual systems. But only some manual systems work.

I start CI/CD by doing the whole process manually. For example, type the commands to build a Docker image, or spawn a VM, or obtain a secret. I encode all this in pseudo code, then put it in Bash (or Python). When a conditional branch appears (a different environment with different credentials), I treat it like any other code. Separate the bits that stay the same, and inject the bits that change.

The problem with most CI/CD systems is that people tightly couple themselves to the tool without really understanding it - the point the article is making. They over-complicate the solution because the documentation encourages you to do that. When they want to customise, debug, or even migrate away from it, it’s very difficult.

kqr · on April 20, 2022

People vastly underestimate manual processes.

A properly optimised, trimmed, practiced manual process can be executed much more cheaply than most people think.

What's the point? Flexibility. Humans are amazing at adapting to circumstances that invalidate portions of the process. In a fast-changing world, this is critical.

Really solid automation is notoriously expensive to set up, and about as expensive to change when the circumstances fundamentally change. When, not if. (You can usually still scrape by on an old automated process that's no longer relevant, but you're leaving oodles of value on the floor.)

Why do people underestimate manual processes? They usually don't know how to optimise them (hint: it involves restructuring things you think of as outside the process in question) so they associate manual with inefficient and expensive.

When you have a time-tested optimised manual process, it might not even make economic sense to automate it. But if it does, you don't have to do it all at once. You can start by automating the dumbest, simplest, most tedious, and most stable step of the process first.

Then you can continue to introduce automated steps only for as long as it makes economic sense to do so.

----

Another common complaint about manual processes is that they invite mistakes. I have three things to say on that:

- There are ways to mistake-proof manual processes. "Poka-yoke" is a good starting word for googling.

- Automated processes make mistakes too. The difference is that an automated process makes mistakes more consistently. (Which may or may not be a good thing, depending on context.)

- Manual processes can avoid mistakes that are about to happen. In fact, I'd argue this happens at about the same rate as mistakes are made.

jiggawatts · on April 20, 2022

I’ve made arguments both ways in the past but after decades I’ve started leaning towards “80% automated”.

The idea is to get to the point that you can tear down everything and rebuild it in an hour or so. This doesn’t need 100% automation, which can be cost-prohibitive as you’ve mentioned.

The cost benefit of partial automation is that it enables rapid iteration of things like naming conventions, container hierarchies, build order, and demarcations of when things are done.

I’m working right now on an Azure template for two(!) VMs. I could deploy these by hand in an hour. Instead I’m reusing snippets from a previous project that in turn reused snippets from another.

This lets me get “for free” all of the lessons-learned from those projects: The fiddly monitoring setup. The safety locks. Purge protection. Log archival.

The customer isn’t even sure what they want! I’ll probably have to iterate and redeploy this several times before the solution congeals. With a template this takes mere minutes per iteration.

So the lesson here is that it’s not about then final scale, but about the scale of the workflow required to finalise the entire solution end-to-end.

Teletio · on April 20, 2022

Funny that you advocate for manual things while I have the feeling we do too much manual and advocate for the opposite.

Especially the error scenarios are bad. They cost you time no one measures.

Manual processes are also difficult to sync across multiple team members and you need tooling around it to make sure manual things happen.

My mantra / priority looks more like this:

1. Try not to do it at all

2. Make it automated

3. Do it manually with a heartbeat system

I don't want to do things manually I prefer to be able to go to a beer garden in the summer and being flexible.

And as an endpoint: automation for me is the necessary base of adding additional value with high return. Only with an automation base you can extend it by fixing more and more issues automatically. While you fix the full disk issue a 100 times, I fix it once.

411111111111111 · on April 20, 2022

I think you alluded to it wonderfully: a small team with < 10 ppl will be fine with sharing knowledge and doing parts of the pipeline manually. The overhead of creating and maintaining a stable automation for edge cases quickly exceeds the time saved.

It's a different story altogether if there are multiple teams etc that are supposed to utilize the same pipeline

Teletio · on April 20, 2022

My practical experience is with small teams below 10 people.

As soon as you have a well understood base system for automation (running code with Cron, monitoring and alerting) all further automationsteps are easy to add to that system.

The initial effort was always worth it.

And the big issue is, quality is very flexible.

If you need to do something every few days and you forget about it once and you get informed, did you heart someone?

Probably not but your quality suffered.

We even had a process which was broke for 3 weeks and a customer realized the issue, not us.

Automation was missing, monitoring and alerting as well.

One solution for a manual process was a Jira plugin which would create a ticket every Monday. It would describe what to do. Half automated. Purely manually would lead again to quality issues.

aidenn0 · on April 20, 2022

I have gotten good at automating simply because I cannot do any manual process 3 times in a row identically, even with a checklist. A buddy-system checklist (someone else has the checklist and reads them off to me as I complete each step) will allow me to do it error free most of the time (to the point where repeating 3 times in a row is probable), but that doesn't make sense for most things I do on a regular basis; it's not like I'm flying a passenger airliner or operating in a surgical theater.

Here's an actual example of me trying to fill up the water glass on my office desk from the water filter in the kitchen; I have not developed a check-list for this; checklists reduce my mistakes, but do not come even close to eliminating them. This happened on Monday, but similar things happen all the time.

Attempt 1: Left the water glass on my desk

Attempt 2: Filled up the water glass, took a sip, placed it down on the counter, returned to my desk

Attempt 3: Picked up the water glass, took another sip, placed it back on the counter, returned to my desk

Attempt 4: Notice that the water glass is now 1/3 empty (2/3 full?) so I top it off and now finally remember to take it back to my desk.

harperlee · on April 20, 2022

Very related to the idea of “Do-nothing scripting”: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

nonrandomstring · on April 20, 2022

A practice I've found valuable in the past is the script command.

SCRIPT(1) script - make typescript of terminal session

Manually run through a process over and over, checking the intermediate outcomes until its right and tight. Then take the last iteration from script and make that into a shell script. After testing an tweaking few more times now make that portable in a system language of your choice. A kludge I frequently used was just copy it to a Perl file as back-quoted script lines, then replace each with a proper statement, same thing works with python calling process/system/exec things that return proper exit values.

Dirty, mechanical, industrial programming. No elegance in sight. But it works reliably in most cases and leaves an understandable trail for debug.

codeflo · on April 20, 2022

> Separate the bits that stay the same, and inject the bits that change.

Basically, treat your build/deployment scripts like you would any other code. Strive to have a single source of truth for any fact that can change etc.

That seems obvious, but in my experience was at first counterintuitive for many people who otherwise had no trouble applying these principles in application code. A lot of modern CI/CD principles boil down to “it’s just code”.

fennecfoxen · on April 20, 2022

That includes the part where “wouldn’t it be great if we could run just this one piece of code by itself sometimes, instead of having it embedded in a big ball of mud with tons of shared state and invisible dependencies?”

arinlen · on April 20, 2022

> The problem with most CI/CD systems is that people tightly couple themselves to the tool without really understanding it - the point the article is making.

It's worth noting that tight coupling to CICD services doesn't happen by accident. Not only do CICD services encourage that as a way to hold on to customers but also adding abstraction layers increases the cost of maintaining stuff. I mean, CICD services have their happy path, which implies tight coupling, and doing anything that steps out of this path won't lead to a smooth experience.

pjc50 · on April 20, 2022

> The problem with most CI/CD systems is that people tightly couple themselves to the tool without really understanding it

Related to this, it can be a genuine benefit to adapt your process to the tools rather than the other way round, especially if they're really widely used tools, because then you know you're adopting a process that works for someone.

It does however require humility. The absolute worst excesses of "Enterprise" are "custom COTS": buy some software which is built around a standard process, then try to crowbar that to match a completely different one which is inflexible not for business reasons but for organisational-politics reasons.

agentgumshoe · on April 20, 2022

I find on most occasions "custom COTS" tends to mean "we don't actually know what we do, but off the shelf - top Gartner Quadrant quartile in particular - is best practice and we'll figure it out from there," to the completely expected drawn out suffering that follows. Until the next person does it again with further abstraction from the actual needs.

moring · on April 20, 2022

While I generally agree with the article, IMHO there is more to automation than understanding, and I regularly encounter problems with automation that cannot be solved by just understanding things. Two examples just because I tripped over them recently, so I still remember:

1. In general, things that are easy for a human (especially a developer) but hard to formalize. The particular example is "fixing typescript imports" where WebStorm thinks the import should not have a filename extension but tsc thinks they should end in ".js" (yes, .js for a typescript import, that's a well-known TS quirk). As a human, I can easily spot these places, but writing a script that is guaranteed to recognize exactly the import lines that must be fixed is hard -- it's putting into formal rules, not understanding, that is the hard part here.

2. Tools that make an explicit distinction between being invoked by a human vs. being invoked by a script. Particular example: Google Cloud, user account vs. service account authentication.

koffiezet · on April 20, 2022

I think one key aspect is lost here. A thing automation does is compartmentalise, abstract and remove the need for in-depth understanding by/for everyone involved.

It's the reason why container images are the latest form of packaging for certain types of applications: it streamlines the interfaces towards applications even more. Configuration is usually done with environment variables, or even abstracted further with something like helm charts to deploy on an even more abstracted platform like Kubernetes.

Things like configuration, temporary and persistent/temporary storage, health checks, services/ports, ... are all presented with a unified interface.

epgui · on April 20, 2022

I read the title, and I thought I knew what the main ideas of the article would be. And then I read the article, and I have no idea what this is supposed to be about.

Anyone smarter than me mind fleshing out just a little bit of context?

Jtsummers · on April 20, 2022

If you build out a layered system with reusable components you're better off. But if you forget about that and create a precious gem of a script for your Docker deployment that can't be used anywhere else, you may be losing out (if you need to do anything besides use Docker). If Docker is all you need, you're fine. If you need to deploy anywhere else, you end up with multiple precious gem scripts that duplicate effort and maybe miss elements (creating subtle differences and incompatibilities between the multiple deployments).

Unix philosophy, DRY.

aesyondu · on April 20, 2022

Thanks for the summary. I read the article and to me this is more about creating appropriate abstractions than automation.

jasfi · on April 20, 2022

There are some things which seems obvious candidates to me, such as trading. However I have been surprised at how frequently and intensely people fight this idea. However not everything automated is fought, why?

My auto crypto trading system: https://tradecast.one

trabant00 · on April 20, 2022

I think I know what the article wanted to say but it did a really poor job of saying it.

The discussion is interesting though. As we pile ever more abstractions on top of existing ones we forget what they are based on. Take young "devops" for example, there's plenty of them who don't understand what AWS has under the hood. A lot of them miss fundamental Linux knowledge and never touched virtualization tools directly. And it can bite them hard sometimes.

On the other hand I think it's always been like this. I grew up doing a lot of Linux networking, virtualization, packaging, etc. But I don't know any underlying tech for that. I can't even read C code, don't know the Linux system calls.

And of course there's an xkcd about this https://xkcd.com/435/

carlhjerpe · on April 20, 2022

I started my DevOps career last November, but having done Ops before.

Since DevOps these days should just mean run everything in Kubernetes I don't see why you need to know all the things, just bolt controllers onto your K8s cluster to integrate the features you need with your cloud provider, get some simple CI/CD running and you're set to make your salary for awhile.

Make sure to assist your developers by making it easy to do things right.

Honestly the easiest career transition ever.

thinkmassive · on April 20, 2022

It's an excerpt from a podcast, not really an "article" so maybe that's why you're disappointed?