It occurred to me that the configuration hell is a consequence of Microservice-heavy approach: we have reduced complexity of cross-component interactions by compartmentalizing each component behind a hard boundary, and now we’re paying the price for this free lunch by trying to put those things back together and keeping them that way.
Turns out the complexity didn’t go anywhere, it was just biding it’s time, waiting for the right moment to strike back.
If one component needs to know about the use-case it’s serving for another service in order to do its job, you didn’t factor your components correctly (i.e. either those components really are one component, or the component boundary should just be on a lower or higher abstraction layer than you put it.)
S3 is an example of a correctly-favoured service. Object storage is a non-leaky abstraction. Object storage doesn’t need to know what an object “is” or why something is storing one. There’s no use-case-specific policy that can be applied to only specific “types” of objects. There’s just objects and buckets, and policies you can apply arbitrarily to any object or bucket because they’re policies about objects and buckets, rather than policies about some higher-level thing.
If your component’s API doesn’t create a clear, obvious, “self-contained” abstraction like object storage’s “objects and buckets” abstraction, then you don’t really have an extractable component; you just have a monolith that talks to itself using an extra layer of indirection.
Question then. If S3’s abstraction is so good, why do they keep deliberately poking holes in the abstraction, including but not limited to SQL access to the contents of the individual files? Not to mention each object is a file that can be accessed from the internet, requiring a lot of thought into how to properly protect, limit, and bill access to it.
I’m not trying to be snarky, I’m just pointing out that even a on-the-surface-ideal abstraction is leaky as hell. If S3, storing objects, can’t keep it’s abstraction clean, how can we be reasonably expected to keep our own abstractions clean?
Those features may look like they're part of the S3 "service", but they're actually separate services built on top of S3, grouped into the S3 API namespace but hitting separate higher-abstraction-layer microservices that themselves make S3 API requests to accomplish their task.
The fact that all of those services that seem to be "a part of" S3 can be implemented on top of S3, without touching the code of S3 at all, and without the higher-abstraction-layer code having to reimplement any of the S3-layer logic to do its job, is precisely what makes S3 a well-factored service.
I completely agree. This "looking for the non-leaky boundaries" approach favors simplicity and reliability. Microservices present huge wins; choreographing them poorly is really on you.
Nobody said making the leap to distributed systems architecture would be easy. In fact, quite the opposite has been alerted, over and over.
Do not complain about complexity and simultaneously reach for hard problems.
If you're thinking about implementing this or something like this, please stop.
Unless you're writing in a compiled language, use the same language your application is written in for your configuration. If you have a python application, have a python file for the configuration. Same for ruby or whatnot.
You don't need to use the entire language, but at least use the language's lexer/parser (cf. json/javascript). That way, all existing tooling for the language will work for the config files (ask me about how saltstack happily breaks their API because you're not "supposed" to use it, despite the fact that they have public docs for it). Additionally, people won't need to figure out all the stupid corner cases in your weird language that has no uses outside of a few projects.
Additionally, by making your configuration language an actual language, you also simplify a lot of the system design, because the configuration can act directly against your API. This means using your tool from other tools becomes much more straightforward, because the only interface you actually need is the API.
The existence of "configuration language" is, itself, a mistake.
Every time an article on configuration languages gets posted here there's always a chorus of pooh-poohing. This is a solved problem, just use YAML/JSON/XML/whatever. Just use the same language as your app. Just use bash.
I don't understand why people hate on new languages so much. Learning a language is pretty easy, at most a week or two of concentrated effort. On the other hand, we put years of effort into software projects. If learning a new language would make us 10% more productive with a sizeable chunk of the work, that's a HUGE WIN.
Configuration at scale is something new and we don't have good ways of doing it yet. Inventing new languages and tools to make it easier will be hugely rewarding; WAY more than a 10% improvement. Some of the stuff we invent won't be perfect. So what? Let a thousand projects bloom, we'll see what works and what doesn't. That's how our profession advances.
"at most a week or two of concentrated effort" - that's actually becoming increasingly costly due to the ever-increasing opportunity cost attached to basically everything. And it consumes mental resources too. Things may end up becoming more unproductive. And then there is the risk of a new language/toolkit/library/framework becoming under-maintained over the long run.
I believe these are the reasons why everyone ends up using ad hoc solutions and doing it their own way that works in a case-by-case basis despite not enjoying as much generalisation as says the new language/toolkit/library/framework which is basically an implementation following specifications designed to solve a more general/abstract problem in a particular way such as described in this article.
Nonetheless I'm glad this article has received a good amount of upvotes to appear on the front page of HN. I'm just wondering what the conversion rate is like i.e. how many % of people who clicked in would go through the entire article and learn about CUE lang in its details and how many % of these people would end up using CUE lang. And then there is the question of how many % of these people would stick to CUE lang over the long run (says over the course of 1 year).
I just switched my pet project to Cue. It did take me cca 5 days to get familiar with the language, but the result is much more elegant than the original Kustomize configuration. I'm looking forward to using Cue in future k8s projects and discovering new use cases for the language.
95% of you reading this and thinking oh, that's neat let me use this for the company are going to waste time and resources. Why? Because the kind of complexity the article tries to address is not reached by the vast majority of the deployments out there. Typically the 'superstructure' is larger than the thing it supports. As soon as that's the case and there is not credible path to a (near!) future where you will need that superstructure you are better off with the simplest configuration that you can get away with. It will be more robust, easier to modify and easier to troubleshoot than any of these abstraction layers.
Question one for anything that you aim at production should be: "Do I really need this?" and only if the answer is a very clear yes and you're not just trying to implement $COOLTECH because you are distracted by its shininess or because 'Google does it too' then you should go ahead and implement.
My #1 technique for improving installations is to rip out unnecessary superstructure which is obscuring why things are going wrong, and more often than not is actually part of the problem. Works every time.
The same goes by the way for modeling your development process on whatever Spotify does with your 6 people development team, and in fact for any other piece of tech that you bring on board. Each of those pieces has a cost of implementation, a cost of maintenance and a cost of cognitive overhead associated with it. The best shops out there use the least number of technologies they can get away with.
We don’t have a huge system, but already we are seeing bash scripts that sed values in Kubernetes manifests or configuration stored as a dictionary inside a python script that can generate large yaml files as part of a deploy pipeline.
I think it is an inherent error in basically all orchestration tools (Kubernetes, cloud build/formation, etc), that they don’t support scripting.
For a long time I was frustrated by the lack of scripting tools for orchestration. My ideal situation would be to write the orchestration / deployment config using the same programming language the app is written in. Eventually I found Pulumi, which supports a limited number of programming languages but is basically what I was looking for, except I would like C# to be officially supported.
How has your experience been with Pulumi? I tried it back in its alpha days to compare to terraform but I didn’t make much progress due to lack of documentation and limited ecosystem.
I really like the concept. Here are the highlights:
- Helm v2 support didn't use Tiller, instead it rendered the YAML client side and pushed that to the server. Sort of how Helm v3 will be, but it didn't work very well in practice with the charts I wanted to deploy. It felt like Helm suppor t was an afterthought and the best way to deploy was to port all your Helm chart dependencies to Pulumi.
- It was really intuitive to create functions that deployed the same thing repeatedly with slightly different settings. I used it for Office365 email server settings. I managed a couple domain names that use Office365 email so I had a single function that deployed the correct settings to the specified domain name, and I just called it twice with slightly different parameters. Way more intuitive than Terraform.
We're using Pulumi for a production system and couldn't be happier.
The ability to template Yaml in Typescript and create infrastructure is mind blowing, all the time checked by the compiler. Well it's not so much templating as using Typescripts built in JSON syntax.
Using VSCode we can refactor our infrastructure code, i.e. create functions for sub-levels of our Yaml.
So we effectively combine our Kuberentes Yaml and infrastructure. It's great. Try it.
From the readme examples it seems that the code is not provider-agnostic, i.e. you write specifically for e.g. AWS. How is it then better than using AWS' native SDK for your language?
Orchestration tools (which also include chef/puppet/etc.) inherently need to be able to notice when something is just a little out of whack, and respond by just calculating the smallest action that will restore things to the way they “should” be.
Unless your scripting language is both deterministic, essentially syntactic sugar for a dependency digraph that the orchestration system knows how to calculate reduced patches of, it’ll break the “convergence” abstraction of the orchestration system: in est, it won’t know for sure what state the system is in after the script runs, so it won’t know what needs to be done to put the system back into any other state later on.
Not that such scripting languages don’t exist. Apache BEAM, for example, is a set of SDKs for various languages that let you write code that compiles to dependency+data-flow digraphs. But it’s probably not the type of scripting you’re thinking of.
I should also mention, though, that systems like k8s do allow arbitrary programming (not necessarily “scripting”) through Operators (https://coreos.com/blog/introducing-operators.html). Rather than scripting the orchestration system itself, you write a new type of component for it to orchestrate, where that component is both created/managed by the orchestration system, and, through delegation, is responsible for implementing the method by which the orchestration creates/manages other specified types of resources.
Of course, once you support scripting, why not just do everything in a deployment script?
I think a lot of these tools are based on the assumption that declarative configuration should be sufficient. The tools eschew scripting because introducing it would call their basic assumptions into question.
Kubernets doesn't support simple scripting like passing variables and such? I know that Borg supports that via borgcfg, we basically have Borg templates for everything so you rarely have to write a new one from scratch. Not sure why it wasn't ported to the outside world.
Author here. The closest thing to borgcfg in Kubernetes would be kubecfg [0]. It is based on jsonnet, the open source equivalent to GCL. The official blessed solution right now would be kustomize [1] since it is directly integrated into kubectl. I categorize both in the article with my evaluation as solutions.
How about running scripts that compile into K configs as the common task? This has been done for a long time in the DNS world where details are held in a database, some of which are then pulled out to create a new current zone file(s).
I work at a small company - maybe 40 engineers. Our app is still a monolith, but we have secondary systems for processing data and communicating with external systems. It's not the most elegant thing ever engineered, but it's not bad given our security and compliance requirements.
I mention all this to illustrate that we are nowhere near NetAppAmaGooSoft scale. Nevertheless our deployment is complex. We can't just have a few hundred identical machines. There are many heterogenous parts, and they have to be hooked up to each other. We currently do this with ~20K lines of HCL applied with Terraform. It's mostly written by some hella-smart infra engineers, and is very well factored.
Still, it's a BEAST. The Terraform Enterprise workflow is tedious, and writing configuration is a lot of work. We would love to replace it, with ...something better. There are alternatives out there, but nothing that's obviously much better. It wouldn't be worth the migration effort. As far as I can tell, this is the state of the art, and it sucks.
My coworkers are sick of hearing me say "We are not Google", so I'm sympathetic to the YAGNI argument, but there really is a problem here. Flat YAML files would be a nightmare for us. I bet there are a lot of companies out there that have worse solutions than we do. The default of "each team rolls their own ad hoc deployment tools" masks it somewhat because it's not obvious that the company has 17 solutions to the same problem, with varying levels of effectiveness and reliability, all of which are expensive to write and maintain, and none of which will be reused when a new team gets organized.
Terraform is a valiant attempt to solve the problem with middling results. We can do better, and we need more attempts! I'm excited that CUE exists. It's not ready for my use case yet, but it's very promising. The best thing about it is that it scales well, up and down. If you have simple needs, you can just write flat CUE to start. It's just JSON with some syntax sugar. The fancy stuff can come later, but it's there when you need it.
What if you are using already existing tooling that takes very verbose yaml files? I've seen Concourse CI pipelines that push past 5k lines of YAML, where 50% internally is repeated 10-line blocks, and there is tons of repetition across different YAML files.
> Duplication is far cheaper than the wrong abstraction.
I'm going to chew on this. For me, Don't Repeat Yourself has been one of my highest values. My former colleagues would copy and paste code everywhere. It was a pain to make changes to, and I relished refactoring it.
But I also regret some of the libraries I wrote in my early years. They aren't designed how I would today, but now several applications depend on them.
One thing I will say is that it's okay for your first draft to be ugly. It helps to see all the duplication before you design the abstraction.
I would second the parent. DRY was a false God. I would say RCL: reduce cognitive load.
Obviously if there is a manageable way to reduce repetition, take it, but I would not add a lot of complexity for the sake of brevity. That's turning your dev team into a data compression algorithm made of meat.
Cognitive load is also present from vastly duplicated code and config: having to remember to update or take into consideration 20 other places in your codebase any time you make a change.
This is so true. I say this all the time to my team. Given two choices, go with the one that produces less cognitive load.
I used to say "less complexity" but I've found that even for me, sometimes when I do something that seems simple and less complex, it ends up requiring more cognitive load (it's not descriptive enough, or it tries to do too much "magic", etc).
Perhaps DRY was a good shortcut to RCL and SST (single source of truth). Those two are really important. DRY is an approximation to them (a good one in most cases).
Are you speaking about DRY on a syntactic basis, or the version from The Pragmatic Programmer ("Every piece of knowledge must have a single, unambiguous, authoritative representation within a system")?
I would say "was intended to mean.". The acronym is catchy and seems sufficiently self explanatory that I think your initial interpretation may be more common.
I've been jokingly pushing for over-application of syntax-focused DRY to be called "Huffman coding"
One thing that's missed in the common conception of DRY is that the original formulation in The Pragmatic Programmer spoke in terms of pieces of knowledge.
I won't speak to which is "really" DRY, but I think the original formulation is more useful. The more typical focus on surface syntax misses out in two respects.
First, a single piece of knowledge might be represented in multiple places even when they look different. For instance, if I'm saying "there's a button here" in HTML and in JavaScript and in CSS, there won't be any syntactic repetition but that's not DRY per TPP.
Second, just because there is syntactic repetition doesn't mean you're encoding the same piece of knowledge. Sometimes two pieces of code happen to be the same, but each looks that way for different reasons and are more likely to change independently than together. In that case, unifying them isn't DRY (per TPP). I joke that you're not improving that code, simply compressing it.
I think that original formulation is pretty spot on, although even then would note an exception for deliberately saying important things in different ways for error detection and clarity, but only if they are actually checked against each other.
The thing is, for me, configuration and code feels like two different things. It's like worrying more about the system components and their relationships than the "code" itself.
When you treat configuration as flat files your configuration becomes a 1:1 mapping of the thing you have running.
If you're throwing it all into a docker container as a single artifact you don't need a separate language to patch into a complicated config parser. Just use hardcoded objects/maps in whatever language you use.
This post is really timely. We're going through a wave of innovation in how we ship software but in that we're having to reason far more about the infrastructure. I think we might be reaching the end of that phase and the realisation that none of us really want to touch containers or kubernetes, we just want it to fade into the background. Because at the end of the day software development is still software development and not much has changed there despite the underlying platforms being completely rewritten.
I'd argue that we might once again be on the cusp of true serverless but in a way that might become ubiquitous. If we could unlock a shared platform like GitHub but for running software we'd be in a much better place.
Urges won't translate into practice so easily. The abstractions of servers won't be so easily reduced to stateless method calls. The design of a cloud-based service is still so relevant that there is no abstraction which can possibly reduce it at the current moment.
I’ll have to take a look at CUE; it might be worth using.
But I have to ask: isn’t there something simpler which can handle taking a declarative specification and adding imperative behaviour to it? I’m writing — as anyone who knows me might suspect — of S-expressions & Lisp.
They have an advantage over CUE in that one might well choose to write one’s entire program in Lisp & S-expressions. It doesn’t look like CUE is intended to be the whole-program language.
I remember that Tcl used to be commonly used for config-files-which-need-a-bit-of-scripting, but while it is an awesome language (really!) one probably doesn’t want to write an entire program with it, but rather use it to stitch functions written in C together.
I wrote PCCL for a use case like this: mostly config but with dynamic behavior. I've also used Lua in the same way. My favorite memory of this approach was when some vendor switched from a static configuration to a daily XML config on their FTP site. I modified our config file so instead of hard coding the vendor's old parameters, it would download their XML file and populate our config params such that the C++ host program did not know anything had changed.
Some people say a Turing complete config language is a step too far. With great power....
Jsonnet (and so CUE) has a lot of overlap with LISP.
I find it's syntax more suited to constructing data structures than I do LISP if only because "{}" and "[]" are more concise ways to indicate "object" and "array" than using an S-expression.
Don’t use yaml or json for (infrastructure) configuration. Use code. Configuration files checked in with source doesn’t make it “configuration as code”. If you don’t want to roll your own tools, there are tools like pulumi for this.
Yaml, and every other configuration language, has gotten complex enough in its interpretation to be its own programming language. Helm charts, for example. Turing complete templating over the top of Turing complete (or damned close if not) k8s configuration objects.
We’re already programming in code, just by shoehorning one domain-specific language into another domain.
I don't know if it's the best way, but the way I've been managing kubernetes at work is with a custom typescript library to interface with kubernetes. That library has two frontends, a website and a CLI. Whenever I add some new feature to the cloud I prototype it in YAML first and then when I get it working I integrate it into the program.
It works mostly pretty well. It's a lot more organized and powerful than having a bunch of YAML. It does feel a bit "heavy" though. There's also parts of the kubernetes api that are hard to deal with (log streaming for instance..)
I don't put everything in the tool though. Anything that's a one-off I just check in YAML for. But anything that you might do repeatedly I add to the CLI, and the website is for more general non-ops use
Totally agree. K8s using yaml as a config format is a shit show. They should have used good ol’ JSON which has a lot smaller surface area. It has proper schema definitions and a bazillion languages interop with it.
Then we can use whatever tools we want to generate and customize as needed.
YAML is ugly and doesn’t really solve a problem.
Jsonnet on the other hand is far more elegant templating language that solves a real need of generating json/yaml files.
Please please don’t use a text templating language. You’re in a world of hurt.
Years ago, someone showed me a python tool for build automation that was written in an imperative style although it was declarative underneath. Like a builder pattern. That looked pretty promising, but I have misremembered then name and so can’t find it again.
I’d rather have something like that for infrastructure management.
I have lived the first 1/2 of the article for the last 1.5+ years and have gone through that mental cycle. We've been moving our product from AWS to GCP. We have a custom deployment tool based of off Capistrano, Chef and Cloud Formation to setup our stack in AWS. There are a couple things that have helped me and may help others who have to do a similar thing (or any setup for that matter).
Tooling: I looked at them all. yaml, kustomize, helm, deployment manager, etc and in the end I went with Terraform. The reasoning was simple. We deploy Infrastructure, not kubernetes services and terraform lets you do the entire thing with one tool. It has the ability to do loops and clunky if-like statements but if you want real programming power then you can create your own terraform provider which is what we did. The pattern of how to create a provider already exists so you are not re-inventing the wheel. We even add our own services to our provider so that we have one tool for setup of GCP resources -> Kubernetes resources -> our own service(s) resources. 1 tool, 1 flow, 1 statefile and the same pattern when working with it all.
Establish patterns: "My service is a multi-tenant service and should do X". Don't allow people to get away with special cases when it comes to deploying infrastructure. Create patterns and stick to them. As an example, one pattern I have is that I have 4 variables that all of my terraform modules use: project, region, cluster and tenant. A combination of any of those variables is enough to create unique resources for everything you deploy... dns names, storage buckets, databases, service accounts, namespaces, clusters, etc. Those variables allow me to know where your git repos are, what GCP project your deployment lives in, what kubernetes cluster you are on and what namespace you are in. Patterns.
Keep it simple: There will be pressure to have custom settings and provide as much flexibility as possible. Try and avoid this. Set defaults for values and try and reduce the number of configuration options available to others. In our case, there were around hundreds of environment variables being set on a deploy and most of them were the same. I took the list, standardized the environment variable names, DATABASE_USER, MYSQL_USER, MYSQL_USERNAME, DB_USERNAME... come on guys!, and deployed them as a secret in kubernetes so that all of our running services can access them. Reduce complexity.
>"kustomize would be the most well known tool now it that is it integrated into kubectl. This seems to work well enough and could be feasible for simple use cases. In comparison to data configuration languages though, this strategy breaks down when configurations grow in complexity and scale. Template writers lack abstraction and type validation that matters when things get complicated, as the semantics are locked into an opaque tool and not exposed as language features."
How exactly does this strategy break down? This sounds a bit hand-wavey to me. Isn't Kustomize essentially patching? And isn't the type validation done by the underlying API objects in the yaml that Kustomize is patching? Am I missing something obvious or a more subtle point?
Why do people look upon the hideous face of complexity and brittleness and configuration hell and decide that the solution is to introduce yet ANOTHER piece of technology? Don't they understand that this is the exact instinct that led us to hell in the first place?
I think it's a combination of two things. One is looking for minimum change, which is honestly a pretty good urge. People like to keep doing mostly what they're doing. Having too much of the opposite instinct leads to eternal thrash.
The other, less good piece is people not being up to the task of going down to the fundamentals and building back up from there. And not just the technical fundamentals, but also those of system purpose, user needs, and how value gets delivered.
Which honestly, I get as well. Having tried using Kubernetes, I'm unimpressed. But it has such enormous momentum that even if I were sure I had a better solution, I doubt I'd bother going my own way. Instead I'd just try to mitigate the pain. Ideally I'd find a way that might lead people out of the technological cul de sac they ended up in.
There is a quite mature configuration language nix that is behind nixpkgs, nixos, and many other projects. Worth checking out for others here that are interested in abstractable configuration langauges. It is a functional lazily evaluated language.
I've been wondering if CUE's graph unification idea could be used as a type system for Nix. It might be hard to square with the way nix does overlays and overrides. It's a very common thing to take some existing derivation and produce a tweaked version. That might not work with graph unification.
I'm working on devops-pipeline to make complicated infrastructure easier to understand and deploy.
https://devops-pipeline.com
It is configured by a dot graph file.
After having to configure Druid clusters and their many many service configuration files, I've started just immediately reaching for templating & code-generators to build all of the disparate configuration artifacts the moment that I feel annoyed by redundant information in multiple places.
I love the idea of Terraform, but find that the language only goes part way, and is itself pretty idiosyncratic. It's pretty nice, though, if you move up by another step of abstraction and write code to generate your HCL. If you get lispy about it, then you can have infrastructure defined as data, generated by code, that is also data...
I wish CUE really all the best, but every senior dev can tell you that the problem of too many config languages isn't solved by yet another config language to rule them all (insert obligatory xkcd 924 reference here).
This is a really good idea. My take on this was to use Prolog but it turned out people did not like Prolog: https://github.com/davidk01/cwacop. The project is dead at the moment but being able to work with infrastructure using a hybrid model of logic + imperative drivers I think is fundamentally a good idea.
Turns out the complexity didn’t go anywhere, it was just biding it’s time, waiting for the right moment to strike back.
Damn you, entropy!