The 300% Production Problem

riansanderson · on Oct 3, 2023

I've been doing DevOps for a while now and have been struggling to put my finger on why at scale it turns out to be so...complex/underestimated/hard. This blog distills it down to an essence that rings true to me:

>To successfully get an application into production, you need to be an expert in the application itself, the deployment target and the deployment methodology.

I disagree that all that expertise only needs to reside in one person, but you've got to have a well functioning team otherwise. And the more facility each team member has with each of those domains the better your odds of success.

It's a matter of when not if some issue is going to show up in production that is going to require more than a proverbial reboot. If the team lacks, or its dysfunction negates, expertise in any one of these areas then the service is living on borrowed time.

RetroTechie · on Oct 3, 2023

More generally:

Use of any abstraction (or software / management tool) = you exchange one set of problems for another.

This applies in daily life too:

You can mow the lawn, which takes your time / effort. Or you can hire someone to mow the lawn for you.

If you do, you exchange [spending time mowing the lawn] with [spend time to find a gardener + obtain money to pay him/her].

Which to choose? The set of problems that's easiest to handle. If you're short on cash but have the time, mow the lawn yourself. If you're a busy person with a 5-figure salary, pay someone to do it for you.

In programming (or more generally, IT projects) it works the same. But where it often goes wrong, is (usually under-) estimating the nature & size of a set of problems that another set of problems is exchanged with.

"Ooh we should move to the cloud! Flexible compute & storage, easy!". Result: "cloud" is yet another dependency, and turns out to bring complexity in itself. And problems (+costs) attached to it, may just be nastier ones than if things had been kept on-site.

So the art is not always to determine what's an optimal solution (for whatever measure of "optimal"), but to get an accurate picture of the sets of problems you're choosing between.

Sometimes it's wise to pick a sub-optimal solution, if it means picking a set of problems you understand. Versus a 'better' solution whose problems you don't understand (as well).

Or go with something that doesn't work great, but works now, and good enough. Vs one that works great, but only after you needed it.

hiAndrewQuinn · on Oct 3, 2023

I often joke with my wife that capitalism is just a series of solving out problems for the reward of more interesting problems in the future.

irrational · on Oct 3, 2023

> To successfully use an abstraction, you need to understand the problem the abstraction is trying to solve and also understand how the abstraction has solved the problem.

Not just abstractions, but just about everything we use. For example, why were arrow functions added to JavaScript? After all, everything you can do with arrow functions you could do previously without arrow functions. What was the problem that arrow functions were trying to solve and how do arrow functions solve that problem? I wish MDN was updated so that at the top of every page it first gave an overview of what problem this particular feature solves and how it solves it. Including how the problem used to be solved with "old" syntax and why this new syntax is preferable. And when it is not preferable.

takinola · on Oct 3, 2023

I thought arrow functions were just designed to reduce verbosity and make code easier to read. Is there another reason?

seangrogg · on Oct 3, 2023

Yes, arrow functions have a different (and more intuitive) lexical `this` than their more verbose counterpart. It is subtly more than just a syntax sugar.

takinola · on Oct 3, 2023

Ha, I finally figured out how "this" works and then they changed it with arrow functions.

AA-BA-94-2A-56 · on Oct 3, 2023

I could be wrong, but don’t JavaScript’s closures rely on arrow syntax?

irrational · on Oct 3, 2023

No, anything you can do with arrow functions can be done without them. Arrow functions are just a simpler way of writing code. For example, you can control "this" in non-arrow functions using tools like call, apply, and bind. It is just more verbose.

takinola · on Oct 3, 2023

I'm pretty sure closures are available to any function regardless of how they are defined.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Clos...

vsareto · on Oct 3, 2023

>If you’re unlucky, you might read this and think, “holy shit, no wonder I’m burned out”.

This is why full stack jobs are terribly compensated for workers. Even worse if you are doing infrastructure.

Companies are not paying a premium for the flexibility and likely burning you out at the same time.

irrational · on Oct 3, 2023

My previous job was a full stack developer, but I had built the entire product from front to back on my own and knew how it all worked. I found it quite easy and even enjoyable. Now I'm working a job that is not full stack and I'm finding it extremely stressful because I didn't build the thing and still haven't figured out how the different parts work, much less work together.

karmakurtisaani · on Oct 3, 2023

If it makes you feel any better, there's now a person in your old job being stressed the same way!

Terr_ · on Oct 3, 2023

My own litmus-test for "full stack" (which may be a lower bar than for some others) isn't that the person can whip up a full stack of anything anywhere, but that they know enough to trace problems up and down through the stack.

Ex: The visual bug caused by the CSS style applied by the JS code that operated off the data from the server from this call which came from that database query which was bad data because of that process which was triggered by this call that comes from that UI button, etc.

diarrhea · on Oct 3, 2023

Interesting. By that definition I’m full stack, despite never having written TypeScript from scratch (just some htmx stuff). But that’s only fair, seeing how there’s plenty people calling themselves full stack without ever having written a single Dockerfile line. They couldn’t deploy themselves out of a wet paper bag.

Maybe I shouldn’t what other people call themselves get to me so deeply, huh.

llamaimperative · on Oct 3, 2023

> To successfully use an abstraction, you need to understand the problem the abstraction is trying to solve and also understand how the abstraction has solved the problem.

What? This seems like the exact wrong way to view abstractions and in fact, you view it the right way even if you think you view it the (silly) way above.

The whole point of abstraction -- not just in software but in everything in the universe -- is that you do not have to know or care about the underlying facts.

The whole stack is abstractions all the way down to the sub-atomic level, so it's obviously untenable for "correct usage" to require knowing the lower levels. Just like you (correctly) don't need to understand structural engineering or wood's growth patterns to sit in a chair made of wood.

It's very useful to know a few layers below the layer at which you primarily operate, but that is a very different from the claim up top.

vhiremath4 · on Oct 3, 2023

This depends on responsibility. If you are working within a service you own, you might be working with a bunch of abstractions you need to know because it's your responsibility to. Layers that bifurcate responsibility (and usually team boundaries) should be readable without needing to intimately understand what problem the abstraction is trying to solve.

To say you never need to understand what problem the abstraction is trying to solve (or you always need to) doesn't quite strike the practicality of how teams work together in my experience.

That said, in my experience, there are a ton of leaky abstractions that really shouldn't be too.

michaelfeathers · on Oct 3, 2023

Obligatory reference to the Law of Leaky Abstractions:

https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

Layers work to the degree that they are trivial, but we really only need them when they are non-trivial.

llamaimperative · on Oct 3, 2023

Layers work to the degree that you don't need to be a subatomic particle physicist to write this comment.

There are good abstractions and bad ones, all with varying degrees of leakiness and sharp corners. The good ones definitionally prevent you from needing to understand much about what they're abstracting.

gonational · on Oct 3, 2023

Abstraction is real. So is bad code that pretends to be an abstraction.

Small leaks in an abstraction are survivable. Small leaks in many layers of abstraction are ruinous.

solatic · on Oct 3, 2023

I agree with the author, but especially because it's so rare to find people with expertise (a) also in the application, (b) also in the deployment target, and (c) also in the deployment methodology, let alone in the specific instances your team uses (maybe you used GCP at a previous company, but that's of little help if your current company uses AWS), let alone getting multiple of such people on your team (because bus factor), really, the problem is a communication problem. Get separate developers, separate infrastructure engineers, separate release engineers - but focus first on the quality and quantity of their communication. Encourage pro-active communication, the earlier the better, and encourage open feedback.

jmmv · on Oct 3, 2023

“Classic” related reading: “The law of leaky abstractions” by Joel Spolsky: https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

And also Hyrum’s Law: https://www.hyrumslaw.com/

avgcorrection · on Oct 3, 2023

It's a nice enough reminder about the limitations of abstractions but the thesis isn't as strong as it is classically regarded to be.

Something that is inherent to any abstraction is giving up control; this is simply necessary in order for the abstraction to have implementation space. Take the example of SQL queries that are slow. The abstraction itself isn't leaky unless the queries give the wrong results sometime. The abstraction isn't some guarantee about the run-time of a query.

An abstraction abstracts away N of M parameters (N < M). Joel would presumably say that N has “leaked”. But more straightforwardly you can say that these concerns have been abstracted away.

Therefore the Law—always a good sign when the original author calls it a “law”[1]—can be restated as: all abstractions abstract.

[1] cf. “Hyrums Law”—not the case

TerrifiedMouse · on Oct 3, 2023

If you see abstraction as allowing you to ignore the implementation details then having to know the implementation details to get stuff done can be seen as the abstraction failing and the implementation details that are supposed to be abstracted away are not and are leaking out of the abstraction.

Either way, whether you consider it a leak or not, the results are the same: the abstraction becomes less useful or even useless.

TremendousJudge · on Oct 3, 2023

The law states that it's unavoidable for the implementation details to show in some cases, since you can't abstract away everything without being as complicated as the thing you're abstracting -- a "trivial" abstraction.

For non-software examples look at the history of the laws of physics. Newtonian gravity worked great for many things, but at some point scientists noticed for example that the orbit of Mercury didn't drift as predicted. This was fixed with the theory of General Relativity, which could be said was "leaking through" with the orbit of Mercury.

However, this doesn't mean that the abstraction is useless. Newtonian gravity is much easier to work with and works just fine for many applications, which is why it's still widely taught and used. But at some point, you have to deal with the leaks, and for that you have to know relativity (or become a research physicist, since we now suspect relativity to have leaks of its own)

avgcorrection · on Oct 3, 2023

Those scientific theories are moreso models. Models that get progressively less wrong.

Garbage collection is wrong to the degree that it leads to memory unsafety or memory leaks that are not caused by humans. But using more memory is not a “leak” because it has nothing to do with the guarantees.

avgcorrection · on Oct 3, 2023

Again, Joel has pointed out a truism: abstractions abstract.

Nothing is leaking. Did someone guarantee that you would get as predictable performance with garbage collection compared to manual memory management? No, because that guarantee was never part of the abstraction.

TerrifiedMouse · on Oct 3, 2023

Either way it’s safe to the say the abstraction has failed and isn’t doing its job anymore if you have to know the implementation details.

To use your garbage collection example, regardless you consider it a leak or not, the result is the same, if you have to deep drive into the workings of the garbage collector and how it manages memory, the abstraction has failed to do its job: to free you from the burden of managing memory.

His point is that all nontrivial abstraction breakdown at some point / under certain circumstances.

ambicapter · on Oct 3, 2023

This neatly describes why I hate most things that are built by stacking layers upon layers of framework, and prefer to stick to building things in technologies that I know, even if it means re-writing something that someone else has already written a super-cool, super-fresh library for.

BWStearns · on Oct 3, 2023

This relates to something I've seen with product discussions, though if we think about IaC as a product for developers I guess it's the same discussion. Very often the developer/pm of a product wants to cover _every possible_ use case even if there's no user demand there yet.

This can lead the user experience to be painful since you now basically need to be a power user to get started, and the developer experience starts to get painful as well since the code to support the everything machines is not simple.

I worked on a project where the feature requirements became so convoluted I wanted to say let's just ship them a machine with a python interpreter on it and at least the docs will already be written. I was joking but there's something there in that we're either shipping an incomplete subsection of the capabilities of the tools we're using, or we're writing a 1:n mask from our tools to some new interface.

New tools need to be compared to our BATNA (in this context maybe Best Alternative to Nothing at All?) to determine their value. I feel like this is why Heroku was so great when it came out. Yeah, you lose all this fine grained control (subset of the capabilities), but goddamn was it easy to just ship something.

I feel like Excel is really the OG king of navigating this dilemma because the interface makes it easy to gradually go into the deep end. I honestly wish more dev tools were like that because we only have so much time in the day and most tools probably _do_ have an 80% happy path subset of features that could be foregrounded and made simpler, while still allowing access to the details when needed.

idosh · on Oct 8, 2023

Brilliant quotes made me laugh so hard:

> If you’re unlucky, you might read this and think, “holy shit, no wonder I’m burned out”.

> building an operator for everything which is the proposed solution of some people who really just want to watch the world burn.

But actually, the takeaway is this:

> You want simplicity where it benefits the user, which often requires increased complexity for the developer.

Professionals should make it easier for downstream users. In the context of this article, it means the platform engineers should abstract the complexities for application developers to be able to deploy safely their applications

jasonlotito · on Oct 3, 2023

> To successfully get an application into production, you need to be an expert in the application itself, the deployment target and the deployment methodology.

I've solved this. I've built this.

> My analogy for this is how complex modern cars are, but you push a button to “start” them.

Yep. Press a button, and it does the job. It takes effort and time to get there, but it works, and it works well. And it's well worth the effort to set things up. Absolutely loved when this was up and running the way it was.

> In addition, the desire to create Terraform modules that meet every single user’s possible use case means that often, the module will expose the entire surface area of the APIs the module is managing to the user.

"Often?" No. Not in my experience. There are very few times you need to expose everything, and most of the features you want to expose are fairly standard for your company. Yes, if you are building modules for the community at large, yes. But you don't need to do that.

csours · on Oct 3, 2023

> You want simplicity where it benefits the user, which often requires increased complexity for the developer. My analogy for this is how complex modern cars are, but you push a button to “start” them.

Developers are users too. Everyone needs good user experience.

Modern cars are complex, assembly lines are complex, engineering is complex, but most assembly jobs are super simple.

midasuni · on Oct 3, 2023

I had a hire car with a “push button” thing last week.

Push the button nothing happens

Read the manual

Press the clutch and push, stoop nothing

Then you have to push break, clutch and push the button. But you have to push the button for several seconds.

Same with the handbrake, it’s a button, sometimes on, sometimes off. Maybe it will stay on when you’re parked, who knows.

Give me a real key and real handbrake any day, that’s far simpler and gives me far more control.

bartread · on Oct 3, 2023

> Developers are users too.

But not generally the users of the systems that are being built. Those are the people you need to think about. Sometimes they will be other developers, but more often they'll be end users of some sort or another. And, even as a developer, I don't want your leaky abstractions and complexities bleeding through your service API endpoints all the time.

csours · on Oct 3, 2023

Developers are users of the API specification, etc

reidjs · on Oct 3, 2023

I had this same thought after reading that line. I started thinking about where the complexity should live, if not with the developers. My conclusion is that it should live with whoever has the most 'skin in the game,' or the largest stakeholder of the enterprise. So, the CTO, CEO, Founder, Director, etc should "own" the complexity?

bartread · on Oct 3, 2023

I don't think that scales: a lot of the complexity is in the detail. It's pretty hard for it to meaningfully live with someone whose job is all about higher level concerns and bigger, more forward-looking moves.

csours · on Oct 3, 2023

It's not complexity per se. Some things will always be complex. It's about continuous improvement of things like documentation and observability - "non functional" stuff.

RetroTechie · on Oct 4, 2023

Some complexity is artificially introduced - on purpose, by the abstraction itself.

Imho abstractions should be simple enough that whatever complexity its implementation introduces, is manageable by its users & developers.

Quite a few APIs exist which are complex enough that hardly anyone (developer or user) has a handle on existing implementation. Imho such an API overshoots its goal: it abstracts too much, and/or too high level, to the point where understanding what's between API & underlying hardware, becomes impossible for 1 human. Never mind understand underlying hardware (or write software that uses the API efficiently), because API creates a target that's alien compared to hw it runs on.

That doesn't make sense! Cut the abstraction down to a manageable degree of (added!) complexity.

destitude · on Oct 3, 2023

Complexity for the developer and simplicity for the user had been an underlying mantra in macOS development for decades. Only in the past 10 years or so has there been a lot of effort to try and also simplify the developer side.

hospitalJail · on Oct 3, 2023

I wonder what years your first sentence was true.

My grandparents got an iphone like 5 years ago, and its been a nightmare to deal with. I don't have one because of the security issues(Pegasus), so walking through my grandparents to download an app is brutal. 'Do you see anywhere it says download? Get? Install? Are there any colored buttons centered on the page'

I can't remember what the actual word to install was, but grandparents couldnt figure it out. Ended up having my teenage cousins install it.

jbandela1 · on Oct 3, 2023

I wonder how much we can learn from the mainframe era of computing?

Instead of each application managing distributed computing and networking, what if the operating system could handle that transparently?

betaby · on Oct 3, 2023

What exactly you mean by "operating system could handle that transparently"? What exactly mainframes handle transparently? Moving SNA connection from one box to another? Is even possible today?