I'm completely done with configs written in YAML. Easily the worst part of Github Actions, even worse than the reliability. When I see some cool tool require a YAML file for config, I immediately get hit with a wave of apprehension. These same feelings extend to other proprietary config languages like HCL for Terraform, ASL for AWS Step Functions, etc. It's fine that you want a declarative API, but let me generate my declaration programatically.
Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right. My config and declarative definition of my cloud infra is all written in a typesafe language with great IDE support without the need for random plugins that some rando wrote and never updated since 2 years ago.
At this point, I even prefer plain JSON to YAML. What pushed me over the edge is that "deno fmt" comes with a JSON formatter, but not a YAML formatter. It's a single binary that runs in milliseconds. For YAML auto-formatting you basically have to use Prettier, and Prettier depends on half of NPM and takes a good 2 seconds to startup and run. So, I literally moved every YAML file in our repository at work that could be JSON to JSON and I think everyone has been much happier. Or, at least I have been, and nobody has complained to me about it.
Various editors also support a $schema tag in the JSON. I added this feature to our product (which has a flow that invokes your editor on a JSON file), and it works great. You can just press tab and make a config file without reading the docs. Truly wonderful.
YAML has this too with the YAML language server, but you need your tab key to indent stuff, so the ergonomics are pretty un-fun. JSON isn't perfect, but at least the text "no" is true.
At work we're currently expanding to another country. Which means that many services now need a country label etc., which is fun when you're adding "no" to all our existing services. Luckily it's quick to catch, but man... why?
Yeah, I'm pretty sure there are exactly two substantive problems with JSON for (static) configuration file use cases, which are comments and multiline strings (especially with sane handling of indentation). YAML fixes these, but it adds so much complexity in the process including such a predictable footgun of unquoted strings (the no/false problem is particularly glaring/absurd, but it's also easy to forget to quote other boolean values or numbers in a long list of other strings).
Can do as Crockford says, and write in the JSON subset of Javascript, but with comments, and convert it to JSON by running it through a JS minifier. I think you need parens around the object, though, or else it looks like a code block (boooo...):
({
// This is very much like JSON
"foo": [1, 2, "bar"]
})
Python also has JSON-like syntax, so you could use config.py:
{
'foo': [1, 2, 'bar']
}
That would require a wrapper script. Or, you can have the self-contained convention:
Yeah, I'm mostly just not sure I want to put a full programming language interpreter in my application, especially Python which is not designed to be embeddable. Moreover, I would really want something that is typed, like TypeScript, but libraries for embedded TypeScript interpreters are even more rare :/.
I figure we'd leave it to the build/deploy/CI/development whatever system. I also don't want to extend or embed my application with a full-blown runtime if I don't have to.
"source" config --> convert to JSON config "on the fly" --> app that expects JSON
edit: I worked on a C++ team that used `std::system` to invoke the system python interpreter when loading a config file. My teammates weren't morons, either, it's just the simplest thing that worked and they knew that the config script and the surrounding file system were secure.
AFAIK Prettier has 0 dependencies and runs fast enough that triggering formatting on save wasn't ever noticeable (granted I've never tried it with YAML specifically). Curious what kind of setup you had to push it to 2 seconds - maybe bulk formatting in CI for whole repository?
I prefer JSON to YAML as well. The lack of comments is a problem though. But I feel like this is a false dichotomy. Both kind of suck for this need, but I can accept that JSON is at least reasonable to work with if you need language agnostic config.
An often-heard benefit for using YAML is that JSON does not have comment. What I don't understand is why we would switch to a whole new language. Just add a filter before loading the configuration, which can't be harder than switching to YAML, right?
Another reason for YAML is that it is easier to read. That I don't understand either. The endless pain of dealing with configuration does seem come from saving a few seconds of parsing off braces and brackets, but from not being about easily figure out what goes wrong, especially when what's wrong is a missing space or tab embedded in hundreds of lines of configurations.
The best interpretation of weebull's comment is not that describing a program in a data structure is "bad" per se, but that doing that in a configuration language (or requiring configuration constructs to be programming constructs) might not be a hot idea.
Even Lisp software that uses Lisp for configuration does not necessarily allow programming in that configuration notation.
Yeah, I think describing a program in a data structure is fine. I honestly prefer it to any syntax that a "real" programming language has brought me. It's so consistent and you can really focus on what you care about. What is unhappy about Github Actions and similar is that your programming language has like 2 keywords; "download a container" and "run a shell script". I would have preferred starting with "func", "handle this error", and "retry this operation if the error is type Foo" ;)
Since this article is about helm, I'll point out that Go templates are very lispy. I often have things in them that look like {{ and (foo bar) (bar baz) }} and it only gets crazier as you add more parentheses ;)
The problem I have with GitHub Actions is that I usually want to metaprogram them. I have a monorepo and I want a particular action to run for each "project" subdirectory. I've written a program that generates GitHub Actions YAML files, but all of the ways to make sure the generator was run before each commit are fairly unsatisfying.
The problem I have with infra-as-code tools is that what I really want is a pretty simple representation for "the state of the world" that some reconciliation can use, and then I want to generate that stuff in a typesafe, expression-based language like TypeScript or Python (Dhall exists, but its Haskell-like syntax and conventions are too steep a learning curve to get mainstream adoption). Instead we get CloudFormation and Terraform which shoehorn programming language constructs into a configuration language (which isn't strictly an objection to code-as-data generally) or things like Helm which uses text templates to generate a "state of the world" description or these CDKs which all seem to depend on a full JavaScript engine for reasons that don't make sense to me (why do I need JavaScript to generate configuration?).
I often wonder if the only reason we haven't used lisp more as a society, and certainly in the devops world, is because our brains find it easier to parse nested indentation than nested parentheses.
But in doing so, we've thrown out the other important part of lisp, which is that you can use the same syntax for data that you do for control flow. And so we're stuck in this world where a "modern-looking" program is seen as a thing that must be evaluated to make sense, not a data structure in and of itself.
https://www.reddit.com/r/lisp/comments/1pyg07/why_not_use_in... is a fascinating 10 year old discussion. And of course, there's Smalltalk, which guided others to a treasure it could not possess. But most younger programmers have never even had these conversations.
The vast majority of Lisp code is assiduously written with nested indentation! So that can't be it.
Non-lisp languages have parentheses, brackets and braces, using indentation to clarify the structure. Nobody can reasonably work with minified Javascript, without reformatting it first to span multiple lines, with indentation.
Lisp has great support for indentation; reformatting Lisp nicely, though not entirely trivial, is easier than other languages.
Oh, have you seen parinfer? It's an editing mode that infers indentation from nesting, and nesting from indentation (both directions) in real-time. It also infers closing parentheses. You can just delete lines and it reshuffles the closers.
To me it seems a lot of the benefit of declarative programming is just that you can use less powerful tools that don't allow constructs you don't want to have to deal with .
LISP seems great for tinkerers and researchers, but not so much corporate devs who want extreme amounts of consistency and predictability, but don't need the absolute most elegant solution.
> you are trying to describe a program in a data structure
This describes 100% of software development, though! Programming is just designing data structures that represent some computation. Each language lends itself better to some computations than to others (and some, like YAML, are terrible for describing any kind of computation at all), but they're all just data structures describing programs.
The problem isn't that GitHub Actions tries to describe a program in a data structure, the problem is that the language that they chose to represent those programs (YAML and the meta language on top) is ill-suited to the task.
> Ansible makes the same mistake, as do countless other tools.
My favorite example of this is chown/chmod taking 4-5 lines, in yaml. Sure you can do it a bunch of different ways, sure it allows for repeatable commands. But, it just sucks.
The same reason I don't like AWS' Step Functions. The spec in JSON is horrible. On the other hand, Step Functions is pretty scalable and reliable and can take practically unlimited throughput. It's a good story for how a product can succeed by getting the primitives right and by removing just the key obstacle for users. Now that Step Functions has gained momentum, they can construct higher-level APIs and SDKs to translate user spec to the low-level JSON/YAML payload.
> These same feelings extend to other proprietary config languages like HCL for Terraform, ASL for AWS Step Functions, etc. It's fine that you want a declarative API, but let me generate my declaration programatically.
Yeah, I've had the same sort of opinion since the bad old AWS CloudFormation days. I wrote an experimental CloudFormation generator 4 years ago where all of the resources and Python type hints were generated from a JSON file that AWS published and it worked really well (https://github.com/weberc2/nimbus/blob/master/examples/src/n...).
> Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right.
Is that how CDK works? I've only dabbled with it, but it was pretty far from the "generate cloudformation" experience that I had built; I guess I never "saw the light" for CDK. It felt like trading YAML/templating problems for inheritance/magic problems. I'd really like to hear from more people who have used AWS CDK, Terraform's CDK, and/or Pulumi.
It's an annoyingly OOP model with mutations and side-effects, but if you look past that, it's pretty nice. The core idea is you create an instance of a CDK "App" object. You create new instances of "Stack" objects that take an "App" instance as a context parameter. From there, resources are grouped into logical chunks called "Constructs" which take either a stack or another construct as their parent context param. The only things you should ever inherit from are the base Constructs for Stack, Stage, and Construct. Don't use inheritance anywhere else and you'll be okay.
The code then looks something like this (writing this straight in the comment box, probably has errors):
// Entrypoint of CDK project like bin/app.ts or whatever
import * as cdk from 'aws-cdk-lib'
import { MyStack } from '../lib/my-stack.ts'
const app = new cdk.App()
const stack = new MyStack(app, 'StackNameHere', someProps)
// lib/my-stack.ts
// Imports go here
export class MyStack extends cdk.Stack {
constructor(scope: Construct, id: string, props: MyStackProps) {
super(scope, id, props)
const bucket = new s3.Bucket(this, 'MyBucket', {
bucketName: 'example-bucket',
})
const lambda = new NodejsFunction(this, 'MyLambdaFn', {
functionName: 'My-Lambda-Fn',
entryFile: 'my-handler.ts',
memorySize: 1024,
runtime: Runtime.NodeJS_20X
})
bucket.grantRead(lambda),
tracing: Tracing.Active
})
}
The best part is the way CI/CD is managed. CDK supports self-mutating pipelines where the pipeline itself is a stack in your CDK app. After the pipeline is created, it will update itself as part of the pipeline before promoting other changes to the rest of your environments.
The equivalent CloudFormation for the above example would be ridiculously long. And that's putting aside all the complexity it would take for you to add on asset bundling for code deployed to things like Lambda.
> It's an annoyingly OOP model with mutations and side-effects, but if you look past that, it's pretty nice
I think I was getting hung up on the mutations and side-effects of it all. Thanks for putting words to that. I'll have to give it another try sometime. Have you used Terraform's CDK by chance? I assume it's heavily inspired from AWS's CDK, but my company has since moved to GCP/Terraform.
The mutations and side-effects only last until synthesis. You can imagine a CDK app as a pure function that runs a bunch of mutations on an App object and then serializes the state of that object in the end to static assets that can be deployed. The internals of it all are messy, but at a conceptual level, it's easy to think about.
CDKTF is really promising, IMO. When I last looked, it was still pretty new, but it's maturing, I think. One downside compared to regular AWS CDK is that the higher level constructs from the official AWS CDK can't be used in CDKTF. There is an adapter that exists, but it's one more layer between you and knowing what's going on: https://github.com/hashicorp/cdktf-aws-cdk
Even then, it gets messy. From a tooling standpoint, how will I load your schema? How will my editor respect it? How do I run a validator against it? I know XML kind of solves some of these problems, but it has its own thorns and despite what anyone says, it is not easy to work with. XSD, XSLT, etc. So much complexity that needs to be managed in a different way in every runtime. And then type safety goes out at the boundary where it connects to your code.
We're living in a dream state now where the creators of IDEs like Visual Studio (Code) or IntelliJ actively implement common languages and frameworks. It used to be 'find a half-baked community plugin so JSON works.'
If someone made a standard schema and people used it, I can assure you the magic you are expecting from your tooling would suddenly pop in just like how JSON support appeared one day. But they can't do nothin' if there is no community support for it.
XSD and XSLT are complicated because XML is complicated.
this sort of thing looks and sounds like the right thing to do. Till you do it, on a largeish project with multiple teams that have experienced attrition.
I quietly added a layer of yaml generating code to make it bearable.
I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.
If you need complex logic, use a programming language and generate the YAML/JSON/whatever with it. There you go. Fixed it for you.
Ruby, Python, or any other language really (I only favor scripting ones because they're generally easier to run), will give you all of that without some weird pseudo-language like Jsonnet or Go templates.
Write the freaking code already and you'll get bitten way less by obscure weird issues that these template engines have.
Seriously, use any real programing language and it'll be WAY better.
I once took a job that involved managing Ansible playbooks for an absolutely massive number of servers that would run them semi-regularly for things like bootstrapping and patching. I had used Chef before for a similar task, and I loved it because it's just ruby and I could easily define any logic I wanted while using loops and proper variables.
I understand that Ansible was designed for non-programmers, but there is no worse hell for someone who is actually familiar with basic programming than being confined to the hyper-verbose nonsense that is Jinja templating of Ansible playbooks when you need to have a lot of conditional tasks and loops.
Chef vs Ansible was the first example that popped into my mind. I had a very love/hate relationship with Chef when I used it, but writing cookbooks was definitely one of the good parts.
So why is there this massive ecosystem around not writing modules then? RedHat invented automation controller just so they didn't have to implement proper error handling with Ansible.
The 'not writing modules' approach is for people that aren't comfortable writing code. I think most capable users for non-trivial things should write custom modules a lot of the time.
I think language embedding is kind of a lost architecture in modern stacks. It used to be if you had a sufficiently complex application you'd code the guts in C/C++/Java/Whatever and then if you needed to script it, you'd embed something like a LISP/Lua/whatever on top.
But today, you have plenty of off-the-shelf JSON/TOML/YAML parsers you can just import into your app and a function called readConfig in place of where an embedded interpreter might be more appropriate.
It's just easier for developers to add complexity to a config format rather than provide a full language embedding and provide bindings into the application. So people have forgotten how to do it (or even that they can do it - I don't think it occurs to people anymore)
Pulumi is enticing because it allows you to write in your preferred language and abandon HCL, but it is strictly worse in my opinion. IaC should be declarative in my opinion. That allows for greater predictability, reproducibility and maintainability. In general, I think wanting to use Python or Ruby or whatever language you're going to use with Pulumi is not a good basis for choosing the tool.
There are many graveyards filled with places that tried to start writing logic into their IaC back in the Chef/Puppet era and made a huge mess that was impossible to upgrade or maintain (recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state). The Chef/Pulumi approach can work, but it requires one person who is draconian about style and maintenance. Otherwise, it turns into a pile of garbage very quick.
Terraform/Puppet's model is a lot more maintainable for longer terms with bigger teams. It's just a better default for discouraging patterns that necessitate an outsized investment to maintain. Yes HCL can be annoying and it feels freeing to use Python/TS/whatever, but pure declarative code prevents a lot of spaghetti.
Pulumi is declarative. The procedural code (Python, Go, etc) generates the declaration of the desired state, which Pulumi then effects on the providers.
HCL is also not pure declarative code either. It can invoke non-declarative functions and can do loops based on environment variables, so in that sense there is really no difference between Pulumi and Terraform. The only real difference is that HCL is a terrible language compared to say Python.
I'm actually fairly sure HCL is Turing complete, it has loops and variables. But even if it is not all the way turing complete it's pretty close.
Pulumi may be declarative, but you use imperative languages to define your end state. The language you're actually writing your Pulumi in is what's most relevant to the point I'm making about maintainability. HCL isn't turing comlete, but even if it was, the point is that doing the types of things you can do in Python or other "real" languages is a major pain in HCL which effectively discourages you from doing that. I'm arguing that is actually a good thing for maintainability.
> recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state
Chef's resources and resource collection and notifications scheme is entirely declarative. And after watching users beat their heads against Chef for a decade the thing that users really like is using declarative resources that other people wrote. The thing that they hate doing is trying to think declaratively themselves and write their own declarative resources or use the resource collection properly. People really want the glue code that they need to write to be imperative and simple.
The biggest issue that Chef had was the "two-pass parsing" design (build the entire resource collection, then execute the entire resource collection) along with the way that the resource collection and attributes were two enormous global variables which were mutable across the entire collection of recipe code which was being run, and then the design encouraged you to do that. And recipes were kind of a shit design since they weren't really like procedures or methods in a real programming language, but more like this gigantic concatenated 'main context' script. Local variables didn't bleed through so you got some isolation but attributes and the resource collection flowing through all of them as god-object global variables was horrible. Along with some people getting a bit too clever with Ruby and Chef internals.
I had dreams of freezing the entire node attribute tree after attribute file processing before executing resources to force the whole model into something more like a functional programming style of "here's all your immutable description of your data fed into your functional code of how to configure your system" but that would have been so much worse than Python 2.7-vs-3.0 and blown up the world.
Just looking at imperative-vs-declarative is way too simplistic of an analysis of what went wrong with Chef.
The fact that HCL has poor/nonexistent multi-language parsing support makes building tooling around terraform really annoying. I shouldn't have to install Python or a Go library to read my HCL.
I have never seen Pulumi or CDKTF stuff work well. At some point are you simply writing a script and abandoning the advantages of a declarative approach
> I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.
The problem is language nerds write languages for other language nerds.
They all want it to be whatever the current sexiness is in language design and want it to be self-hosting and be able to write fast multithreaded webservers in it and then it becomes conceptually complicated.
What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.
The problem is that language nerds can't control themselves and would add stuff that would grow the language to be more complex, and then they'd use that in core libraries and style guides so that newbies would have to learn it all. I myself would tend towards adding "each/map" kinds of functions on arrays/hashmaps instead of just using for loops and having first class functions and closures, which might be mistakes. There's that immutable FP language for configuration which already exists (i can't google this morning yet) which is exactly the kind of language which will never gain any traction because >95% of the people using templated YAML don't want to learn to program that way.
> What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book.
I would argue that Tcl is exactly that. It's hard to make things any simpler than "everything is a string, and then you get a bunch of commands to treat strings as code or data". The entire language definition boils down to 12 simple rules ("dodekalogue"); everything else is just commands from the standard library. Simple Tcl code looks pretty much exactly like a typical (pre-XML, pre-JSON, pre-YAML) config file, and then you have conditionals, loops, variables etc added seamlessly on top of that, all described in very simple terms.
What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.
I think Scheme would work, as long as you ban all uses of call/cc and user-defined macros. It's simple and dynamically typed, and doesn't have built-in classes or hash maps. Only problem is that it seems like most programmers dislike Lisp syntax, or at least aren't used to it.
There's also Awk, although it's oriented towards text, and doesn't have modules (the whole program has to be in one file).
It probably wouldn't be that hard to make this language yourself. Read the book Crafting Interpreters, which guides you through making a toy language called Lox. It's close to the toy language you describe.
There’s plenty to choose from that support embedding: Python, Perl, Lua. Heck, even EMCAScript (JavaScript, VBA, etc).
As another commenter rightfully stated, this used to be the norm.
I wouldn’t say LOGO is the right example though. It’s basically a LISP and is tailored for geometry (of course you can do a heck of a lot more with it but its strength is in geometry).
You're really missing the point. Logo was super simple and we learned it in elementary school as children, that's all that I'm talking about. And those other languages have accreted way too many features to be simple enough.
I got your point. I think it is you who is missing mine:
> You're really missing the point. Logo was super simple and we learned it in elementary school as children
You wouldn't have learned conditionals and other such things though. That stuff wasn't as easy to learn in LOGO because LOGO is basically a LISP. eg
IFELSE :num = 1 [print [Number is 1]] [print [Number is 0]]
vs
if { $num == 1 } then { print "number is 1" } else { print "number is 0" }
or
if num == 1:
print "number is 1"
else:
print "number is 0"
I'm not saying these modern languages don't have their baggage. But LOGO wasn't exactly a walk in the park for anything outside of it's main domain either. Your memory of LOGO here is rose tinted.
> And those other languages have accreted way too many features to be simple enough.
I agree (though less so with Lua) but you don't need to use those features. Sure, my preference would be "less is more" and thus my personal opinion of modern Python isn't particularly high. And Perl is rather old fashioned these days (though I think modern Perl gets more criticism than it deserves). But the fact is we don't need to reinvent the wheel here. Visual Basic could make raw DLL calls meaning you had unfettered access to Win32 APIs (et al) but that doesn't mean every VBScript out there was making DLL calls left right and centre. Heck, if you really want to distil things down then there's nothing even stopping someone implementing a "PythonScript" type language which is a subset of Python.
I just don't buy "simplicity of the language" as the reason languages aren't often embedded these days. I think it's the opposite problem: "simplicity of the implementation". It's far easier to load a JSON or YAML document into a C(++|#|Objective|whatever) struct than it is it to add API hooks for an embedded scripting language. And that's precisely why software written in dynamic languages do often expose their language runtime for configuration. Eg Ruby in Puppet and Chef, half of PHP applications having config written in PHP, XMPP servers written in Haskell, etc. In those kinds of languages, it is easy to read config from source files (sometimes even importing via `eval`) so there often isn't any need to stick config in JSON documents.
I mean... Nix satisfies every single one of what you mentioned and people say its too complicated. It's literally just the JSON data structure with lambdas, which really is basic knowledge for any computer scientist, and yet people complain about it.
It's fairly straightforward to 'embed' and as a bonus it generates json anyway (you can use the Nix command line to generate JSON). Me personally, I use it as my templating system (independent of nixpkgs) and it works great. It's a real language, but also restrictive enough that you don't do anything stupid (no IO really, and the IO it does have is declarative, functional and pure -- via hashing).
3. Turing complete and based on the lambda calculus so has access to the full suite of functional control structures. Also has basic if/then/else statements for the most common cases and for intuition.
4. no threading, no concurrency, no real IO
5. definitely not object-oriented and no inheritance
6. It is functional in design and has an extremely thin set of builtins
7. FFI model is either embed libnix directly (this does not require embedding the nix store stuff, which is a completely separate modular system), or use the command line to generate json (nix-instantiate --eval --json).
Note: do not confuse nixpkgs and NixOS with the nix language. The former is a system to build linux packages and entire linux distributions that use the latter as a configuration language. The nix language is completely independent and can be used for whatever.
Tried to use Nix as a homebrew replacement and failed to get it installed correctly with it blowing up with crazy error messages that I couldn't google. I didn't even get to the point of assessing the language. It really seems like the right kind of idea, but it doesn't seem particularly stable or easy enough to get to that initial payoff. If there's a nice language under there it is crippled by the fact that the average user is going to have a hard time getting to it.
You can use nix without using nixpkgs (you seemed to be trying to use nixpkgs). The nix language is accessible via several command line tools, nix repl, nix eval, nix-instantiate, etc, and can emit json via everal flags, as well as a builtin function.
I agree with the point's in Nix's favor except for 2. dynamically typed. Defining structs as part of the language would be nice. In fact, type checking is done ad-hoc now by passing data through type checking functions.
I think I'd rather just have logicless templates than use anything dynamically typed...
Jinja2 makes a lot of sense when you're trying to make it hard to add bugs, and you also don't want everyone to have to learn Rust or Elixir or something.
It would be interesting to extend a template language with a minimal FP language that could process data before the templated get it.
I agree, and I just want to highlight what you said about generating a config file. It's extremely useful to constrain the config itself to something that can go in a json file or whatever. It makes the config simpler, easier to consume, and easier to document. But when it comes to _writing_ the config file, we should all use a programming language, and preferably a statically typed language that can check for errors and give nice auto complete and inline documentation.
I think aws cdk is a good example of this. Writing plain cloudformation is a pain. CDK solves this not by extending cloudformation with programming capabilities, but by generating the cloudformation for you. And the cloudformation is still a fairly simple, stable input for aws to consume.
You shouldn't need the full complexity and power of a Turing complete programming language to do config. The point of config is to describe a state, it's just data. You don't need an application within an application to describe state.
Inevitably, the path of just using a programming language for config leads to your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.
The complexity is already there. If you only need static state like you say, then YAML/JSON/whatever is fine. But that's not what happens as software grows.
You need data that is different depending on environments, clouds, teams, etc. This complexity will still exist if you use YAML, it'll just be a ridiculous mess where you can break your scripts because you have an extra space in the YAML or added an incorrect `True` somewhere.
Complexity growth is inevitable. What is definitely avoidable is shoving concepts that in fact describe a "business" rule (maybe operational rule is a better name?) in unreadable templates.
Rules like: a deployment needs add these things when in production, or change those when in staging, etc exist whether they are hidden behind shitty Go templates or they are structured inside of a class/struct, a method with a descriptive name, etc.
The only downside is that you need to understand some basics of programming. But for me that's not a downside at all, since it's a much more useful skill than only knowing how to stitch Go templates together.
Why are we writing software that needs so much configuration? Not all of it is needed. We could do things more like consumer software, which assumes nobody will even consider your app if they have to edit a config file.
> your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.
We're already there with Helm.
People write YAML because it's "just data". Then they want to package it up so they put it in a helm chart. Then they add variable substitution so that the name of resources can be configured by the chart user. Then they want to do some control flow or repetitiveness, so they use ifs and loops in templates. Then it needs configuring, so they add a values.yaml configuration file to configure the YAML templating engine's behaviour. Then it gets complicated so they define helper functions in the templating language, which are saved in another template file.
So we have a YAML program being configured by a YAML configuration file, with functions written in a limited templating language.
But that's sometimes not enough, so sometimes variables are also defined in the values.yaml and referenced elsewhere in the values.yaml with templating.
This then gets passed to the templating system, which then evaluates that template-within-a-template, to produce YAML.
At the end of the day, Helm's issues stem from two competing interests:
(1) I want to write something where I can visualize exactly what will be sent to Kubernetes, and visually compare it to the wealth of YAML-based documentation and tutorials out there
(2) I have a set of resources/runners/cronjobs that each require similar, but not identical, setups and environments, so I need looping control flow and/or best-in-class template inclusion utilities
--
People who have been working in k8s for years can dispense with (1), and thus can use various abstractions for generating YAML/JSON that don't require the user to think about {toYaml | indent 8}.
But for a team that's still skilling up on k8s, Helm is a very reasonable choice of technology in that it lets you preserve (1) even if (2) is very far from a best-in-class level.
I have a recent example of rolling out IPv6 in AWS:
1. Create a new VPC, get an auto-assigned /56 prefix from AWS.
2. Create subnets within the VPC. Each subnet needs an explicitly-specified /64 prefix. (Maybe it can be auto-assigned by AWS, but you may still want to follow a specific pattern for your subnets).
3. Add those subnet prefixis to security / Firewall rules.
You can do this with a sufficiently-advanced config language - perhaps it has a built-in function to generate subnets from a given prefix. But in my experience, using a general-purpose programming language makes it really easy to do this kind of automation. For reference, I did this using Pulumi with TypeScript, which works really well for this.
That kind of ignores the entire pipeline involved in computing the correct config. Nobody wants to be manually writing config for dozens of services in multiple environments.
The number of configurations you need to create is multiplicative, take the number of applications, multiply by number of environments, multiply by number of complete deploys (i.e. multiple customers running multiple envs) and very quickly end up with an unmanageable number of unique configurations.
At that point you need a something at least approaching Turing completeness to correctly compute all the unique configs. Whether you decide to achieve that by embedding that computation into your application, or into a separate system that produces pure static config, is kind of academic. The complexity exists either way, and tools are needed to make it manageable.
That's not my experience after using AWS CDK since 2020 in the same company.
Most of our code is plain boring declarative stuff.
However, tooling is lightyears ahead of YAML (we have types, methods, etc...), we can encapsulate best practices and distribute as libs and, finally, escape hatches are possible when declarative code won't cut.
We need turing completeness in the strangest of places. We can often limit these places to a smaller part of the code. But it's really hard to know beforehand where those places will occur. Whenever we think we have found a clear separation we invent a config language.
And then we realize that we need scripting so we invent a templating language. Then everybody looses their minds and invents 5 more config languages that surely will make us not need the templating language.
Let's just call it code and use clever types to separate turing and non-turing completeness?
A really good solution here is to use a full programming language but run the config generator on every CI run and show the diff in review. This way you have a real language to make conditions as necessary but also can see the concrete results easily.
Unfortunately few review tools handle this well. Checked-in snapshot tests are the closest approximation that I have seen.
It happens because config is dual purpose: its state, but it's also the text-UI for your program. It spirals out of control because people want the best of it being "just text" and being a nice clean UI.
For JSON I'd stick with Typescript to be honest. You end up executing Javascript and producing Javascript-native objects, but the typing in Typescript to ensure the objects you produce are actually valid will save a lot of debugging.
I'm very happy using Typescript to templatize JSON. You can define a template as a class, compose them if needed, and when you are done, just write an object to a file.
Completely agree, my wish is that anything that risks getting complex uses a Ruby-based DSL.
For example, I like using Capistrano, which is wrapper around rake, which is a Ruby based DSL. That means that if things get tricky I can just drop down to using a programming language. Split stuff into logical parts that I load where needed and, for example, I can do something like YAML.load(..file..).dig('attribute name') or JSON.load from somewhere else.
Yes, you risk someone building spaghetti that way, but the flip side is that a good devops can build something much easier to maintain than dozens of YAML and JSON files, and you get all the power from your IDE and linters that are already available for the programming language, so silly syntax errors are caught without needing to run anything.
> I heard you liked configuration languages, so I made this configuration language for your configuration language generation scripts. It supports templates, of course.
Because the security surface of "any language" is tricky and most (all?) popular languages do not have nice data literal syntax better than JSON and YAML.
1. a full-blown language that can generate complex output
2. a declarative static data file
I hope I'm not just pulling my punches with #2
on the other hand, some complexity spirals out of control, especially when people use it without any need. Some great things come out of creating boundaries.
I argued that point in my article some time ago https://beepb00p.xyz/configs-suck.html
also HN discussion at the time news.ycombinator.com/item?id=22787332
I just knew this would be about Kubernetes when I saw the title.
The Kubernetes API is fairly straightforward, and has a well-defined (JSON) schema, people should be spending a bulk of their time learning k8s understanding how to use the API, but instead they spend it working out how to use a Helm chart.
I don't think Jsonnet, Ksonnet, Nu, or CUE ever gained that much traction. I'm convinced most people just use Kustomize, because it's fairly straightforward and built in to kubectl.
I'd like a tool that:
- Gives definition writers type checking against the k8s schemas - validation, version deprecations, etc.
- Gives users a single artefact that can be inspected easily and will fail (ACID) if deployed against a cluster that doesn't support any objects/versions.
- Is built into the default toolchain
---
I feel like writing a Bun or Deno TypeScript script that exports a function with arguments and returns a list of definitions would work well, esp. with `deno compile`, etc. but that violates the third point.
> The Kubernetes API is fairly straightforward, and has a well-defined (JSON) schema, people should be spending a bulk of their time learning k8s understanding how to use the API, but instead they spend it working out how to use a Helm chart.
This is a general pattern in software. Instead of learning the primitives and fundamentals that your system is built on, which would be too hard, instead learn a bunch of abstractions over top of it. Sure, now you are insulated from the lower-level details of the system, but now you have to deal with a massive stack of abstractions that makes diagnosis and debugging difficult once something goes wrong. Now it's much harder to ascertain what exactly is happening in your system, since the details of what is actually going on have been abstracted away from you by design. Further, you are now dependent on that abstraction layer and must support and accommodate whatever updates may be released by the vendor, in addition to whatever else is lurking in your dependency graph.
We're using jsonnet for our systems and they have absolutely nothing to do with k8s. I'm not sure it's true to say it has ever gained much traction. It's just a niche case for complex configuration, and isn't the most publicised tool.
It does precisely what we need with zero fuss, cross platform and cross _language_ (we've embedded it in C++, .NET, and JVM executables).
We can use the resulting json config with a vast array of tools that simply don't exist for the alternatives such toml/yaml/hocon/ini whatever. In fact we tried to get HOCON working for non-JVM languages but there was always some edge case.
The second requirement is actually probably the most important - if someone that just set up ArgoCD, Flux, or has their own GitOps pipeline, how much of a headache does using a new compile step present?
Lots of things are simple in isolation: want to use Cue? Just get your definitions and install the compiler and call it and boom, there are your k8s defs! Ok, but how do I integrate all of that into my existing toolchain? How do I pass config? Etc, etc.
The best, fastest tool won't win. The tool that has the most frictionless user story will.
I've begun thinking that if you start thinking about templating you might be better off building an operator. Operators aren't as well understood and documented. But in my mind an operator is just a pod or deployment that creates on demand resources using the k8s api.
oh yeah; operators are great and sometimes they are necessary.
On the other hand, most operators I've seen are just k8s manifest templates implemented in Go.
I often end up preferring using Jsonnet to deal with that instead of doing the same stuff in Go.
Jsonnet is much more close to the underlying datamodel (the k8s manifest Json/Yaml document) and comes with some useful functionality out of the box, such "overlays".
It has downsides too! It's untyped, debugging tools are lacking, people are unfamiliar with it and don't care to learn it. So I totally get why one would entertain the possibility of writing your "templates" using a better language.
However, an operator is often too much freedom. It's not just using Go or Rust or Typescript to "generate" some Json manifests, but it also contains the code to interact with the API server, setup watches, and reactions etc.
I often wish there was a better way to separate those two concerns
I'm a fan of metacontroller [1], which is a tool that allows you to write operators without actually writing a lot of imperative code that interacts with the k8s API, but instead just provide a general JSON->JSON transformer, which you could write in any langue (Go, Python, Rust, Javascript, .... and also Jsonnet if you want).
I recently implemented something similar but much tailored to just "installing" stuff, called Kubit. An OCI artifact contains some abitrary tarball (generally containing some template sources) and a reference to a docker image containing an "engine" and runs the engine with your provided tarball + some parameters passed in a CRD. The OCI artifact could contain a helm chart and the template engine could contain the helm binary, or the template engine could be kubecfg and the OCI artifact could contain a bunch of jsonnet files. Or you could write your own stuff in python or typescript. The kubit operator then just runs your code, gathers the output and applies with with kubectl apply-set.
> On the other hand, most operators I've seen are just k8s manifest templates implemented in Go.
> I'm a fan of metacontroller [1], which is a tool that allows you to write operators without actually writing a lot of imperative code that interacts with the k8s API, but instead just provide a general JSON->JSON transformer,
That seems... surprising, to me. It's not clear to me how a JSON->JSON transformer (which is essentially a pure function on UTF-8 strings to UTF-8 strings, i.e. an operation without side effect) can actually modify the state of the world to bring your requested resources to life. If the only thing the Operator is being used for is pure computation, then I agree it's overkill.
An example use case for an Operator would be a Pod running on the cluster that is able to receive YAML documents/resource objects describing what kind of x509 certificate is desired, fulfill an ACME certificate order, and populate a Secret resource on the cluster containing the x509 certificate requested. It's not strictly JSON to JSON, from "certificate" custom resource to Secret resource - there's a bunch of side-effecting that needs to take place to, for instance, respond to DNS01 or HTTP01 challenges by actually creating a publicly accessible artifact somewhere. That's what Operators are for.
Metacontroller is actually quite easy to learn. It comes with good examples too. Including a re-implementation of the Stateful Set controller, all done with iterations of an otherwise pure computation. The trick is obviously that the state lives in the k8s api server, from which the inputs of the subsequent invocation of your pure function come.
While that is true I'm a bit afraid that we might be overselling the concept of limiting freedom past a certain point. Limiting freedom has the upside of giving us some guarantees that makes a solution easier to reason about. But once we step out of dumb-yaml I don't see that making additional intermediate trade-offs is worth it. And there are apparently some downsides to introducing additional layers as well.
The main downside of limiting freedom seems to be the chaos of having so many different ways to do things. Imagine what could happen if we agreed that there are two ways of doing things; write yaml without templates or write an operator. Then maybe we could focus efforts on the problem of writing maintainable operators.
Things should be either dumb data or the kitchen sink I think.
The purpose of an Operator is to realize the resources desired/requested in a (custom) resource manifest, often as YAML or JSON.
You give the apiserver a document describing what resources you need. The Operator actually does the work of provisioning those resources in the "real world" and (should) update the status field on the API object to indicate if those resources are ready.
Helm is another can of hot garbage. Impossible to vendor without hitting name collisions, can configure only what’s templated.
Jsonnet is the way to go with generated helm manifests transformed later. Kustomize with its post-renderer hooks is another can of even hotter garbage.
> Impossible to vendor without hitting name collisions
What problem exactly are you facing? I can change the name of the chart itself in chart.yaml and if the name of the resources collide I change them with nameOverride/fullnameOverride in the values. All charts have these because they are autogenerated by `helm create`.
You just made a copy of a chart. You modified your chart. What I’m missing is helm having some notion of an org in the chart name, like docker does: repo/name:tag, helm only has name and version. Hence you modify your chart.yaml and it should be preferable without having to modify anything.
This is really problematic when a chart pulls dependencies in.
It's funny how little developers think about how to do configuration right.
It's just a bunch of keys and values, stored in some file, or generated by some code.
But its actually the whole ball game. It's what programming is.
Everything is configuration. Every function parameter is a kind of configuration. And all the configuration in external files inevitably ends up as a function parameter in some way.
The problem is the plain-text representation of code.
Declarative configuration files seem nice because you can see everything in one place.
If you do your configuration programmatically, it is hard to find the correct place to change something.
If our code ran in real-time to show us a representation of the final configuration, and we could trace how each final configuration value was generated, then it wouldn't be a problem.
But no systems are designed with this capability, even though it is quite trivial to do. Configuration is always an after-thought.
Now extend this concept to all of programming. Imagine being able to see every piece of code that depends upon a single configuration value, and any transformations of it.
Also, most configuration is probably better placed into a central database because it is relational/graph-like. Different configuration values relate to one another. So we should be looking at configuration in a database/graph editor.
Once you unchain yourself from plain-text, things start to become a lot simpler...of course the language capabilities I mentioned above still need to become a thing.
This is something I'm trying really hard to do with a client. They have a bunch of 1500+ line "config" files for products, which are then used to make technical drawings and production files. The configs attempt to use naming scheme to group related variables together.
I want to migrate to an actual nested data-structure using (maybe) JSON - and these engineers absolutely will not write code, so config-as-code is a no-go, in addition to the disadvantage you mentioned.
My next thought was that there should be a better way to show the configuration, and allow that configuration to be modified. I was thinking maybe some sort of visual UI which where the user can navigate a representation of the final product, select a part and modify a parameter that way.
Is that along the lines of your suggestion? If not will you please expand a little? Configuration is the absolute core of this application.
Sounds like you need an SQL database. You could use SQLite.
Then provide a GUI to modify that database. You could add a bunch of constraints in the database too to ensure the config is correct.
Usually when there is plain-text files though, it's because they want it that way. It's easier to edit a text file sometimes than rows in a database. Cut/copy/paste/duplicate files and text. Simple textual version control.
Sure, I agree - I'm proposing JSON as an intermediate step toward a well-defined data-model since the thousands of copied config files have evolved over time, so the data-model is a smear of backward-compatibility hacks.
What I was trying to do is get you to explain what you mean by this:
> If our code ran in real-time to show us a representation of the final configuration, and we could trace how each final configuration value was generated, then it wouldn't be a problem. [...] But no systems are designed with this capability, even though it is quite trivial to do. Configuration is always an after-thought.
This is only relevant if you allow code to define config.
If you use conditionals and loops to create config, and then view the final json, it quickly becomes annoying when you know the thing you want to change in the final json, but have to trace backwards through the code to figure out where to change it.
So programmatic configs only work if you have this "value tracing" capability. Which nothing really does.
Worse yet, in some places (CI/CD) YAML becomes nearly a programming language. A very verbose, unintuitive, badly specified and vendor-specific one as well.
It's pretty much repeating the mistake of early 2010s Java, where the entire application frequently was glued together by enormous ball of XML that configured all the dependency injection.
It had the familiar properties of (despite DTDs and XML validation) often blowing up late, and providing error messages that were difficult to interpret.
At the time a lot of the frustration was aimed at XML, but the mid 2020s YAML hell shows us that the problem was never the markup language.
You have a loosely coupled bundle of modules that you need to glue together with some configuration language. So you decide to use X. Now you have two problems.
Spot on. We use ytt[0], "a slightly modified version of the Starlark programming language which is a dialect of Python". Burying logic somewhere in a yaml template is one thing I dislike with passion.
TBH, ytt is the only yaml templating approach that I actually like.
The downside is that it is easy to do dumb things and put a lot of loops in your yaml.
The positive is that it is pretty easy to use it like an actual templating language with business logic in starlark files that look almost just like Python. In practice this works pretty well.
The syntax is still fairly clumsy, but I like it more than helm.
I've been there. Not YAML specifically, but basically just configuration (XML, JSON, properties, ...) for some proprietary systems without any good documentation or support available. "It's easy, just do/insert X", half a year and dozens of meetings and experts later, it was indeed not just X. Meanwhile I could've build everything myself from scratch or with common open-source solutions.
YAML is the Bradford Pear of serialization formats. It looks good at first, but as your project ages, and the YAML grows it collapses under the weight of it's own branches.
You should see what they look like after a 25kph breeze. Which isn't too far off from what templated YAML generates after someone commits a bad template.
My favorite pattern in HCL is the if-loop. Since there is no »only do this resource if P« in Terraform, the solution is »run this loop not at all or once«.
Yeah … for CI files (like Github workflows & such), one of the best things I think I've done is just to immediately exec out to a script or program. That is, most of our CI steps look like this:
run: 'exec ci/some-program'
… and that's it. It really aids being able to run the (failing) CI step offline, too, since it's a single script.
Stuff like Ansible is another matter altogether. That really is programming in YAML, and it hurts.
In such places one frequently has to remind oneself and others to not start programming in that configuration language, if avoidable, to not create tons of headache and pain.
This criticism doesn't pass the sniff test though: your average Haskeller loves to extoll the virtues of using Haskell to implement a DSL for some system which is ultimately just doing the same thing in practice (because they're still not going to write documentation for it, but hey, how hard can it be to figure out it's just...)
YAML becomes a programming language because vendors need a DSL for their system, and they need to present it in a form which every other language can mostly handle the AST for, which means it's easiest if it just lives atop a data transfer format.
I don't know what this has to do with Haskell. I understand that they need a DSL for their system. I just don't agree that it is a good idea to use some general purpose serialization format. In the end they always evolve to a nearly full programming language with conditions and loops. Using a full programming language makes much more sense IMHO, for example like Zig build files or how we use Python to build neural networks. That way I can actually use existing tools to do what I need.
Yeah, I'm very sad that helm won. We do OSS k8s stuff at work, and 100% of users have asked for us to make a helm chart. So we had to. It is miserable to work on; your editor can't help you because the files are named like "foo.yaml" but they aren't YAML. You have to make sure you pipe all your data through "indent 4" so that things are lined up correctly in the YAML. What depresses me the most is that you have to re-expose every Kubernetes feature in your own way. Someone wants to add deployment.spec.template.spec.fooBars? Now you have to add deploymentFooBars to your values.yaml file and plumb it in. For every. single. feature.
It's truly "worse is better" gone wrong. I have definitely done some terrible things like "sed -e s/$FOO/foo/g" to implement templating... and that's probably how Helm started. The result is a mess.
I personally grew up on Kustomize before it was in kubectl, and was always exceedingly happy with it. (OK, it has a lot of quirks. But at least it saves you time because it actually understands the semantics of the objects you are creating.)
I like Jsonnet a lot better. As part of our k8s app, we ship an Envoy deployment to do all of our crazy traffic routing (basically... maintaining backwards compatibility with old releases). Envoy configs are... verbose..., but Jsonnet makes it really easy to work on. (The code in question: https://github.com/pachyderm/pachyderm/blob/master/etc/gener...)
I'm seriously considering transpiling jsonnet to the Go template language and just implementing everything with Jsonnet. At least that is slightly maintainable, and nobody will ever know because "helm install" will Just Work ;)
But yeah, I think Helm will be the death of Kubernetes. Some competing computer allocator container runner thingie will have some decent language for configuration, and it will just take over overnight. Mark my words!
> But yeah, I think Helm will be the death of Kubernetes. Some competing computer allocator container runner thingie will have some decent language for configuration, and it will just take over overnight. Mark my words!
I want to believe this.
Everywhere I've worked we're still rawdogging tf/hcl and helm though, because change is scary.
At least I get some relief in my personal projects. :')
I see a problem here. I'm not certain if the sort of person who would choose YAML as their configuration language sees a problem here.
There is a direct conflict between human-centred data representations and computer-centred. Computers love things that look like a bit like a Lisp. Humans like things that look a bit like Python. If you're the sort of person who wants to use a computer to manipulate their Kubernetes config then you'd be secretly annoyed that Kubernetes uses YAML. However, it appears the Kubernetes community are mainly YAML people, so why would they mind that their config files will be horrible to work with once programming logic gets involved? The downside of YAML is exactly this scenario, and I believe the people involved in K8s are generally cluey enough to see that coming.
> YAML is a superset of JSON
The spec writers can put whatever they want in their document, but I don't think this is true. If you go in and convert all the YAML config to JSON, the DevOps team is going to get upset. The two data formats have the same semantic representation, but so do all languages compiled to the same CPU arch. JSON and YAML are disjoint in practice. Mixing the two isn't a good idea.
The ironic thing is that, IIRC, k8s manifests were supposed to be machine-generated from the k8s's inception, you weren't supposed to write them by hand... of course, people wrote them by hand anyway, until it became unbearable ― at which point they've started templating them because that's how the things always seem to progress: manually-written text is almost never replaced by machine-generated config-serialized-to-text, it's replaced by templated-but-originally-still-manually-written text.
My personal philosophy is that string interpolation should not be used to generate machine-readable code, and template languages are just fancy string interpolation. We've all seen the consequences of SQL injection and cross-site scripting. That's the kind of thing that will keep happening as long as we keep putting arbitrary text into interpreters.
Yes, this means I don't think we should use template files to make HTML at all.
Alternatives to using template languages for HTML include Haml (for Ruby) and Pug (for JavaScript). These languages have defined ways to specify entire trees of tags, attributes, and text nodes.
If you don't like Python-style significant indentation, JavaScript has JSX. The HTML-looking parts of JSX compile down to a bunch of `createElement` expressions that create a web document tree. That tree can then be output as HTML if necessary.
Haml, Pug, and JSX are not template languages even though they can output HTML. Likewise, `JSON.stringify(myObj)` is not a template language for JSON. Generating machine-readable code should be done with a tool that understands and leverages the known structure of the target language when possible.
> Haml, Pug, and JSX are not template languages even though they can output HTML.
That's nonsense, unless we go by your idiosyncratic definition of what a template language is ("fancy string interpolation").
> Haml (HTML Abstraction Markup Language) is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner.
> JSX is an XML-like syntax extension to ECMAScript without any defined semantics.
OK, I'd agree that JSX is not strictly a template language.
But in the end, all of these compile down to HTML. Not by string interpolation, but as a language that is parsed into a syntax tree, then rendered into HTML properly with an internal understanding of valid structure.
YAML with templating is fancy string interpolation, it's not a template language (or at least a poorly implemented one).
I am aware that Haml and Pug call themselves template languages, but they are not. In a template language, the source is a "template" that has some special syntax to fill in some bits. I don't think that's a very idiosyncratic definition. Pretty much any programming language can output a bunch of text, but most of them are not template languages. Java has XMLBuilder, but that doesn't make it a template language for outputting XML. But PHP is a template language, even though it's not recommended to use it that way anymore.
Sorry, reading over my comment, I sounded more antagonistic than I meant to be. After all, we're here to enjoy discussion and not to battle against each other.
As an aside, on another post yesterday, I had a pleasant surprise about "templating" in life itself.
> The familiar distinction between software and hardware loses its meaning in living cells. We propose new ways to study the phylogeny of metabolisms, new astronomical ways to search for life on exoplanets, new experiments to seek the emergence of the most rudimentary life, and the hint of a coherent testable pathway to prokaryotes with template replication and coding.
Well, it's true that Haml calls itself a "templating system", and Pug uses the term "template engine". That's 3 out of 3, you win. ;)
PHP is a scripting language that is also a template processor, but I wouldn't call it a template language. So we disagree on several points, but no big deal. A big disadvantage of PHP, in relation to your original point about "fancy string interpolation", is that it does not natively understand the target output HTML syntactically and structurally.
Not all template languages are string template languages, though. If you consider PHP a templating language for text, for example, then by the same logic XQuery is a templating language for XML.
This is the essence of the problem! Yaml and templates are just distractions. It just boils down to the fact that "string" is a very general type and we use it lazily.
My personal rule: Every time a value is inserted into a string it must be properly encoded.
I wrote a full blog post around this a while back https://kevincox.ca/2022/02/08/escape-everything/. But the TL;DR is that every string has a format which needs to be respected wether that be HTML, SQL or human-readable terminal output. Every time you put some value into a string you should be properly encoding it into that format. But we rarely do.
> My personal rule: Every time a value is inserted into a string it must be properly encoded.
This is how Django templates have done it for over a decade. You have to go out of your way to tell it not to escape the values if for some reason you need that.
We are switching to cuelang [1]. IMHO it is better designed than Jsonette. Since Kubeenetes already has state reconciliation, the only thing missing in this setup is deletion. But that can now be accomplished with the prune feature. [2]
I can second cuelang. We started using it at work and it's so nice. Some of the error messages are a little hard to decipher, but that's acceptable because it catches so many errors up front. The few times I have to write yaml directly, it now feels so tedious in comparison.
we have a pipeline that ingest very concise cuelang files.
then it generates json files for each application for a tool that will create xml definitions which then are applied to a xls which the architects own, to spit out a yaml that we use to apply our helm charts. the charts deploy a k8s client which then interact with the main cluster via json using the api.
took a while, but we are using the best tool for each job.
Dhall's lack of any form of type inference makes it very verbose and difficult to refactor in my opinion. (I'm the author of dhall-kubernetes and never ended up using it in production; funnily enough). Dhall is also extremely slow. We had kubernetes manifests that took _minutes_ to type-check. Cue is basically instant. This matters a lot to me.
I find cue very ergonomic. Also it treating both types and values as values is very neat. You write your types and your values in the same syntax and everything unifies neatly. but I sometimes miss its lack of functions.
Cue also being to ingest protobuf definitions and openapi schemas makes it very quick and easy to integrate with your project. Have a new Kubernetes CRD you want to have type-checked in cue? No problem just run `cue get go k8s.io/api/myapi/v1alpha1` and off you go you have all your type definitions imported from Go to Cue!
Especially for k8s this makes for very fast development and iteration cycle.
I've wanted to take a look at https://nickel-lang.org/ which is a "what if cue had functions" language. but to be honest Cue kind of serves my needs.
Speaking of Nickel, they've got a great document detailing the reasons for their design (for example why they chose not embed in a general-purpose language like Pulumi) and how Nickel compares to other config languages like Dhall and CUE: https://github.com/tweag/nickel/blob/master/RATIONALE.md
> Dhall is also extremely slow. We had kubernetes manifests that took _minutes_ to type-check. Cue is basically instant.
Everyone wants type-safety, but no one wants to wait for the type-checker :)
Maybe in this case dhall with type checks equivalent to dhall would be slower, but I notice in many places people say "strong type-checking is valuable" while still expecting similar compile times as languages with weaker type systems.
People always undervalue the beauty of a short feedback loop until it's taken away from them.
And even then, they won't exactly pin point the problem, rather express their general frustration, without realizing that the dynamic system they used had indeed some great properties and were not popular for no reason.
I'm conflicted honestly. I find with dynamic languages it's easier to just spin your wheels and move quickly in the hole you are in.
With typed languages its easy to feel you are making less progress because the feedback loop can be longer, but generally the pieces you build are more likely to work correctly.
For me Haskell and ghci repl gives good properties from both areas, especially with something like Rapid for keeping state over repl reloads.
- You lose locality of behavior, which is very useful in configuration.
Also, nickel doesn't support injecting data into the nickel file, so external program can't set variables, query a database and pass the result to the conf file, etc.
Cue was designed very much with k8s in mind and developed tutorials and integrations for it early on. Dhall was designed pre-k8s. Dhall had to introduce a defaults feature: before that it was completely unusable for k8s. Dhall has functions, which are natural to programmers- particularly from an FP background, Dhall would be trivial to start using. Whereas it takes some getting used to cue's unifications- but there is enough documentation and integration for getting going with k8s to make up for it. Dhall has unique features for stably importing configurations from remote locations.
I love YAML and I curse it every single day that I'm working with Helm charts.
People ask me what I'd use to deploy apps on Kubernetes and I say I hate Helm and would still use it for a single reason: everybody is using it, I don't want to create a snowflake infrastructure that only I understand.
Still, back in the day I thought jsonnet would win this battle but here we are, cursing Helm and templates. That's the power of upstream decisions.
In my view, the presence of YAML templating is a red flag in any codebase or system.
YAML got its popularity with the advent of Ruby on Rails, largely due to the simplicity of the database.yml file as an aid in database connection string abstraction that felt extremely clean to Java programmers who were used to complicated XML files full of DSN names and connection string peculiarities.
The evolution of the database.yml file into something arguably as complex as the thing it was intended to replace is described in the article below:
The title of TFA was actually my reaction when I learned what Helm was actually doing. Initially I thought Helm would take an input file of YAML-with-template-bits, parse that YAML as an object, then use the provided template bits to fill in the parts of that object, then serialize the object back to YAML and write it out. Sounds reasonable, right? Nope, it's literal text substitution, so if you want to have a valid YAML as the output you better count your indentation on your fingers, and track where the newlines go or don't go.
I will tell you exactly why we template yaml.
Its the exact same reason every code base has ugly parts. And that's the evolution of complexity.
At first, you have a yaml file. No templates, no variables. Just a good old standard yaml.
Then, suddenly you need to introduce a single variable. Templating out the one variable is pretty easy, so you do it, and it's still mostly for humans to edit.
Well, now you have a yaml file and template engine already in place. So when one more thing pops up, you template it out.
8 features later, you wonder what you've done. Only, if we go back in time, each step was actually the most efficient. Introducing anything else at step 1 would be over-engineering. Introducing it anywhere else would lead to a large refactor and possible regressions.
To top it off, this is not business logic. Your devs are not touching this yaml all that much. So is it worth "fixing", probably not.
Ansible convinced me that doing programming tasks in YAML is insanity, so I started an experiment: What would Ansible be like if it's syntax were more like Python than YAML. https://github.com/linsomniac/uplaybook
I spent around 3 months over the holidays exploring that by implementing a "micro Ansible", I have a pretty solid tool that implements it, but haven't had much "seat time" with it: working on it rather than in it. But what I've done has convinced me that there are some benefits.
Except you then have to sensor that programming language severely. Maybe you can accept some endless loop, but you probably don't want the CI orchestrator to start mining Monero, instead of bootstrapping and configging servers and services.
A solution to that sensorship might be a very limited WASM runtime: one that offers a very few API's, has severely limited resources and timeouts and such. So people can write their orchestration in Python, Javascript or Rust or even Brainfuck if they want, but what that orchestration can do, and for how long it can do that, and how much memory, space and so on it gets, all is very limiting.
While that may work, it's far harder to think of than "lets make another {{templating|language}}" inside this YAML that we already have and everyone else uses.
I don't see any practical difference w.r.t. cybersecurity between "I blindly applied this pile of YAML to my production kubernetes clusters without looking at it" and "I blindly downloaded and ran this computer program on my CI runner without looking at it".
A supply chain attack on the former means that your environment is compromised. So does the latter.
GitHub actions isn't going to run your Python code on its orchestration infra. Nor is DigitalOcean or Fly.io or CircleCI. They all convened around "YAML" because it's a very limited set of instructions.
I'm quite sure you cannot write a bitcoin miner (or something that opens a backdoor) in Liquid inside YAML in the DSL that Github Actions has. I am 100% sure you can write a bitcoin miner in Python, Javascript, Lua, or any programming language that Github would use to replace their YAML config.
What? GitHub Actions, at the very least, isn't strictly yaml. I run arbitrary code in whatever language I want all the time. I'm pretty sure third party workflows can, too.
We wrote a backend service at Lyft in Python and at some point needed to do some string interpolation for experimentation. In a rush someone implemented this in YAML (no new deps needed). This ended up being the bane of the teams existence. Almost impossible to test if something was going to break in runtime, could only verify it was valid yaml but many other things were infeasible, super hard to debug - it soured me on YAML for years.
Can someone help me understand what is the advantage of using jsonnet, cue, or something else vs a simple python script (or dialect, like starlark), when you have the need of dynamically creating some sort of config?
I've used jsonnet in the past to create k8s files, but I don't work in that space anymore. I don't remember it being better or easier than writing a python script that outputs JSON. Not even taking into account maintainability and such. Maybe I'm missing something?
To add to the sibling comments, after going from a jsonnet-based setup to a Typescript-based one (via pulumi), the biggest thing I missed from jsonnet was the native object merge operations which are very useful for this kind of work as it lets you say "I want one of these, but with these changes" even when the objects are highly nested, and you can specify whether to merge or override for each individual key.
But ultimately this was a minor issue and I think it's far more important that you use something like this (whether a DSL or a mainstream PL) and that you're not trying to do string templating of YAML.
They're various points along the Turing complete config generator vs declarative config spectrum. Declarative config is ideal in lots of ways for mission critical things, but hard to create lots of because of boiler plate.
A turing-complete general purpose language is entirely unconstrained in its ability to generate config, so it's difficult to understand all the possible configs it can generate. And it's difficult to write policy that forbids certain kinds of config to be generated by something like Python. And when you need to do an emergency-rollback, it can be hard to debug a Python script that generates your config.
Starlark is a little better because it's deliberately constrained not to be as powerful as Python.
Jsonnet is, IIUC, basically an open source version of the borgcfg tool they've had at Google forever. My recollection is that Borgcfg had the reputation of being an unreadable nightmare that nobody understood. In practice, of course, people did understand it but I don't think anyone loved working with it.
I definitely wouldn't use Python because it isn't sandboxed, and users will end up doing crazy things like network calls in your config.
Starlark is a good option though.
People will talk about Jsonnet not being Turing complete, but IMO that is completely irrelvant. Turing completeness has zero practical significance for configs.
I am really sad that jsonnet / ksonnet never really took off. It’s a great way to template, but has a bit of a learning curve in my experience. I suspect that is why it’s niche.
If you like what is presented in this article, take a look at Grafana Tanka (https://tanka.dev).
Yeah similarly I'm using Nix to template K8s templates and I've never looked back. Helm is great for deploying 3rd party applications easily but I've never seen the appeal for using it for in house services, templating YAML is gross indeed.
I was reading the description of Jsonnet and wondering why we don't just use JavaScript. Read a file, evaluate it, take the value of the last expression as the output, and blat it out as JSON.
The environment could be enriched with some handy functions for working with structures. They could just be normal JavaScript functions. For example, a version of Object.assign which understands that "key+" syntax in objects. Or a function which removes entries from arrays and objects if they have undefined values, making it easy to make entries conditional.
Those things are simple enough to write on demand that this might not even have to be a packaged tool. Just a thing you do with npm.
The fact that it's a purely functional programming language with lazy evaluation is really powerful but steepens the learning curve for devs who haven't worked with functional languages.
The stdlib is also pretty sparse, missing some commonly required functions.
> The fact that it's a purely functional programming language with lazy evaluation is really powerful but steepens the learning curve for devs who haven't worked with functional languages.
does it really though? what part do they struggle with?
> The stdlib is also pretty sparse, missing some commonly required functions.
This seems to be the general curse of template languages. For some reason, their authors have this near-religious belief in removing every "unneeded" feature, which in practice results in having to write 10 incomprehensible lines of code to do something that could be easily done in one line of readable code in a proper PL.
Jsonnet looks like a case of XKCD-927[0]. I fully agree with you that real programing languages are the way to go for generating anything more complex.
Indeed why? However the conclusion I have is not to use JSON but to use a type safe configuration language that can express my intent much better making illegal states impossible. One example of such lang is Dhall.
If I’m going to use a whole language to generate my config already, why would I use anything but the language my application is written in? Everything can export JSON after all.
You have complex enough logic to warrant a language, you should use a real language. You'll have more support, less obscure issues, a solid standard library and whatever else you want, because it's a REAL language.
If the argument is "someone in my team uses recursion to write the YAML files, so I'll disallow it", then the issue is not with the language, it's with the team.
What I have found on my career is that many Ops people sell themselves short and hesitate to dive into learning and fully using an actual language. I've yet to understand why, but I've seen it multiple times.
They then end up using pseudo-languages in configuration files to avoid this small step towards using an actual language, and then complain about how awful those pseudo-languages are.
> You have complex enough logic to warrant a language, you should use a real language.
Not sure what you mean. Dhall is a real language:
Dhall is not a Turing-complete programming language,
which is why Dhall’s type system can provide safety
guarantees on par with non-programmable configuration
file formats. Specifically, Dhall is a “total”
functional programming language, which means that:
You can always type-check an expression in a finite
amount of time
If an expression type-checks then evaluating that
expression always succeeds in a finite amount of time
We're talking about templating and generating files, but it seems like everyone has just collectively forgotten about M4?
Yes, it can be unsafe if you're not careful, but if you need to bang out a quick prototype it's the best tool there is. It's part of POSIX, and so it will always be available, the language is dead simple, and you can generate any text you want with it.
I wouldn't use it with YAML, but I would probably never template YAML in the first case: just generate JSON and feed it through `yq -y` if you need a quick YAML generator.
There's 2 things on the horizon here for Kubernetes that give me hope. KCL, its own configuration language, and Timoni, which builds off CUE and corrects some of the shortcomings of Helm.
Though these days, OLM and the Quarkus operator SDK give you a completely viable alternative approach to Helm that enables you to express much more complex functionality and dependency relationships over the lifecycle of resources. An example would be doing a DB backup before upgrading to a new release etc. Obviously this power comes at a cost.
Yes, templating YAML is crazy. But is the answer jsonnet? That's even more batshit.
Why hasn't anyone opted for a "patch-based" approach? I.e. start with a base YAML/JSON file, apply a second file over it, apply this third one, and use the result as the config. How you generate these files is entirely up to you.
Yes. The answer is a "config.d" directory, this has been known to linux package managers for a long time. It is the only way for multiple packages to contribute to configuration without fighting over ownership of the one true config file.
Meanwhile in JavaScript land, config is simply another js file, with all the Object and Array literal goodness that that gets us, and the full language environment backing it up.
If you're using Helm to deploy your own apps, I feel that's a code smell. I'll add jsonnet for your own apps to the that list.
Just use dumb YAML, maybe kustomize if you really need, but if that's not sufficient, consider that a sign that you're not carving the wood the way it's telling you to.
Any form of templating for creating your own application manifest is another moving part that allows for new and fun errors, and the further away your source manifest is from the deployed result, the harder it is to debug.
If you really want to append a certain set of annotations to each and every pod in a cluster, instead of using shared templates (and enforcing their usage), there's other approaches in K8s for these kinds of use cases, that you have a lot more control over.
(A) why not use the yaml syntax that is not whitespace sensitive. In the authors example, that could be: {name: Al, address: something}
(B) do env variables not go a long way to avoiding the need for a template? Instead of generating a complete YAML, put env variable placeholders in and set those values in the target environment. At this rate, the same YAML can generally be deployed anywhere. I've seen that style implementated several times, works pretty well.
I do agree that generating config itself, and not just interpolating values - is potentially really gnarly. I do wonder, instead of interpolating variables at deploy time, why not use env variables and do the interpolation at runtime?
This article made me think it'd be nice to generate k8s JSON using TypeScript. Just a node script that runs console.log(JSON.stringify(config)), and you pipe that to a yaml file in your deploy script. The syntax seems more sane and has more broad appeal than jsonnet, and I'd wager that the dev tooling would be better given good enough typings.
By the way the answer to the question "why are we templating yaml?" is: people are just more familiar with it and don't want to have to translate examples to jsonnet that they copy and paste from the web. Do not underestimate this downside :) Same downside would probably apply to TypeScript-generated configs I bet.
Others have mentioned CDK, but I want to say that this is almost the exact approach I took on a project recently and it worked out fine. Node script that validates a few arguments and generates k8s manifests as JSON to be fed into `kubectl apply`.
IME, here's no need to involve anything more complicated if your deployment can be described solely as k8s manifests.
I would recommend implementing a similar API to Grafana Tanka: https://tanka.dev
When you "synthesise", the returned value should be an array or an object.
1. If it's an object, check if it has an `apiVersion` and `kind` key. If it does, yield that as a kubernetes object and do not recurse.
2. If it's an array or any other object, repeat this algorithm for all array elements and object values.
This gives a lot of flexibility to users and other engineers because they can use any data structures they want inside their own libraries. TypeScript's type system improves the ergonomics, too.
You can convert YAML to JSON programmatically, and JSON is valid jsonnet, so you can pretty much copy paste examples from the web into your jsonnet if you find yourself wanting to do that
Hot take, this is a terrible idea, and is why so much cloud infra is monstrously expensive (and bad).
People need to stop making infra easy. It’s not supposed to be easy, because when you make a bad decision, you don’t get to revert a commit and carry on with life. You don’t understand IOPS and now your gp2 disk is causing CPU starvation from IOWAIT? Guess you’re gonna learn some things about operating within constraints while waiting for a faster disk to arrive at the DC! Buckle up, it’ll be good for you.
I’m fully aware that I sound like a grouchy gatekeeper here, and I’m fine with it. People making stupid infra decisions en masse cause me no end of headaches in my day job, and I’m tired of it.
Separate generated content from maintained content. Works for me. But on the specifics here, from a very python POV.
Strict YAML is easier to maintain than json if you have deeper than one or maybe two levels of nesting, multiline strings, or comments.
So, I build my config systems to _generate_ YAML instead of “templating YAML.”
PyYAML extensions and ruamel.yaml exist—Though kind of out of date, and more new projects are using TOML. (From project description: “ruamel.yaml is a YAML parser/emitter that supports roundtrip comment preservation”)
Confession: but yeah, not when I use ansible. Ansible double-dog-dares you to “jinja2 all the things” without much in the way of structured semantics.
I can't remember how many times I heard or saw the argument "but that is in YAML", which implies that the configuration(or god forbid, the code) is simple and well designed. I find it hilarious.
And a worst contender is embedding text template like jinja in a YAML config and forcing everyone to use such abomination to change production config via deployment. Yes, I'm talking about Terraform or the like. Why people think this kind of design is acceptable is beyond my comprehension.
I think YAML is a good pick for non-developers / content creators. The front matter section in Markdown files is a good example. Or is there a better, human-friendly alternative?
You just pinpointed my biggest peeve with YAML. It looks like it's "human friendly" because there are no scary curly braces. But you still need to get the syntax exactly right, so that benefit is very small. And now you have to keep your finger on the screen while scrolling in order to figure out what a bullet belongs to.
Note that I am not a content creator myself. I build solutions for web teams and on those teams, some people focus solely on content and Markdown. I want to offer them an easy editing experience. So far YAML has been the easiest format for them.
What is the best term to use for the people who are writing content on the web team? The ones who write blog entries, documentation, and marketing pages. The ones who mainly touch Markdown files.
>I think it's more that it's declarative that makes it simple
..it's no more or less declarative than other configuration languages?
And yes, I get that it looks simpler. I just think that it applies as long as your file can fit in about half a page. As it grows and becomes deeply nested, IMO, that simplicity disappears.
YAML is all but human-friendly. It has far too many special features and edge cases for most people. Something simple like Java properties files would solve something like markdown front matter perfectly fine.
Look at the documentation [0] or at the OpenJDK code. Both assume ISO-8859-1, unless you're dealing with a special case where resource bundles are involved.
Serendipity strikes as I'm implementing an emrichen interpreter in golang after getting too annoyed about templating YAML as a string.
The reasons I like yaml is that I can see the tree structure directly, and to my lisp brain it is extremely easy to read. Furthermore, in our age of LLMs, I find LLMs to be able to generate "correct" YAML more easily than JSON, since the tree depth is encoded in every line, and doesn't require matching larger structures. It also uses an actually significant amount less tokens.
I find it extremely easy to have LLMs generate decent DSLs by asking them to use a YAML output format, and found it very robust to generate code out of these (or generate an interpreter for the newly created DSL).
I didn't know about !tags until sunday, which is quite shameful, but I find that the emrichen solution is actually quite elegant, and really kind of feels like a lisp macro expander.
Overall, YAML is just good enough for me to get shit done, I can read and skim it quickly, LLMs do well with it, and it's easy to work with. It has aliases, multiline strings and some other Quality of life features built in.
I totally agree with you on LLM usage. I have recently switched from JSON to YAML for requests and replies from LLMs (GPT-4 specifically) and I find it much better:
fewer tokens used, more readable if you are looking at the http requests and responses and you can parse it on the fly in streaming responses. The last point lets you do visual updates for the user, which is pretty important if you need to wait 1+ minutes for the full response
I'd be very curious to know what kind of previews/streaming YAML applications you are building with LLMs. I have building a v0.dev kind of thing with streaming update on my TODO list.
The cycle seems: Invent a new static information format (XML/JSON/HTML/...), reduce verbosity, add GUI, variables, comments, expressions, control flow, validation, transformation, static typing, compilers, IDE support, dependency management, and maybe a non-backwards-compatible major version etc. And you end up with yet another Java/C# clone, just inferior because it was never meant to support all these things.
Obviously biased but we at Kurtosis are trying to solve this problem through Starlark.
We took Starlark added a few more of our instructions that make our Starlark container native. The complex Starlark definition supports
- Composition - you can import a remote definition and just use it
- Decomposable - you can break things apart
- Parametrizability - want one of a service and 10 of the other, just pass an argument
- Portable - It runs pretty much anywhere
Our runtime takes the Starlark and creates environments in both Docker and Kubernetes; from one definition
I wish the industry would standardize on a solution like this. IMO you shouldn't use a "real" language unless you can lock it down to be determinisitic. JSON is supposed to be human readable but fails for lots of real-world data like multi-line strings or lists of records.
CSV is more readable but doesn't supported nested objects.
I wrote a monstrosity of a terraform module that takes pre-existing helm charts/templates, feeds some json into them via terraform, translates the results to HCL, and deploys them.
It's kind of a rube-goldberg machine that I made as a bespoke solution to a weird problem but it's been fairly pleasant to work with so far.
The worst part about this mess is that it's all fine and dandy with any other tool as long as it's first party but as soon as you want to use another software you're stuck reading and understanding the spaghetti that Helm inevitably becomes and hope that you can configure it the way you need to.
It's come to the point that I don't even think about using most charts and just build them myself. The issue with that is that software like Prometheus, Loki, Grafana or f.e. Postgres operators are so complex that it's almost impossible do "fix" them.
This really is like that dog sitting in a burning house and saying "This is fine" because that's how most go on with their day after hitting their head for a few hours and running dozens of pipelines.
From my vantage point it seems to have happened roughly like this
First we had arbitrary code that did something. Then we thought, "hey wouldn't it be nice if we could do this declaratively in a standard way using configuration instead?" Then we could reason about it more easily. But then came the realization "this declarative system isn't quite powerful enough, what if we could sprinkle some logic on top of it". "Hey wouldn't it be nice if we could go back to doing it declaratively? I guess we can just add the missing features to our not-turing-complete config language". "Wow now it can almost do everything I want.... but there is just this tiny little thing I want to do in addition, let's do templating!" etc
YAML is fine for human-maintained configuration. Yeah it has its footguns (like Norway) but if you're actually writing human-maintained configuration then you quickly pick these up with practice and they turn into a non-issue.
If your configuration is complicated enough that it needs to be generated, then use a real general-purpose language to generate it. Not crummy pseudo-imperative constructs bolted onto the YAML.
At the very least, systems that take YAML as configuration should also take JSON as configuration. GitOps-style systems should allow you to define, not just system.yaml config, not just system.json config, but also system.js config, that is evaluated in some kind of heavily-restricted sandbox.
I would love for someone to eviscerate the following idea:
Every deployed process receives exactly 3 environment variables, NONCE, TAGS and CONFIG_DB_PARAMS. Every process is bootstrapped in the same way:
1. Initialize config db client.
2. config = db.fetchConfig(tags)
3. Use NONCE to signal config changes if needed.
Of course there's some environments where this Just Won't Work. I'm wondering if there are some very serious issues with this approach in a "standard" web application environment. It seems so straightforward but I've literally never seen it done before, so I feel like I'm missing something.
It's fashionable to hate YAML. And sometimes rightly so. But what are the alternatives? JSON, XML, INI, TOML, Dahl, Cue, JSONNET, HCL, your programming language of choice. Also let's agree on the target use case - in that YAML is largely used for configuration and operational tasks. If I were to rank-order the features necessary in a good configuration language they would be 1) readability 2) data/schema validation 3) stackability/composability 4) language support 5) editor support 6) industry adoption. So let's do a comparison:
YAML is fairly easy to read, has schema validation with the right library, and is pretty ubiquitous. It can get unwieldy like JSON though.
XML is big, ugly, unreadable. No one likes XML despite it's robust schema validation capabilities.
Your programming language of choice doesn't work because of the target use case unless you truly are a build-run group.
INI it too simplistic for many environments.
HCL is included because I'm a bit of a Terraform fanboy and it has great features like validation, readability and composability. However you're not going to find it in the wild as a general purpose configuration language - outside of Terraform it just hasn't taken hold.
Does anyone really use Dahll? (Serious question.)
JSON is nice because everyone understands JSON. JSON is not nice because all the brackets, braces, quotes, etc get in the way and make sufficiently large configurations hard to read. With the right library you can get schema validation.
JSONNET suffers from the same problems that JSON does, but adds more operations which makes sufficiently large things very hard to read.
TOML is nice and reminds me of INI in it's simplicity.
Cue looks and smells like JSON, has schema validation, but is much more readable.
If I were to rank-order these options it would be 1) Cue 2) TOML 3) YAML 4) JSON 5) your programming language of choice 6) JSONNET 7) INI 8) HCL 9) XML 10) Dahll (maybe?). My point here is that while YAML has a lot to be desired it's still very useful for most implementations and is better than many of the alternatives.
I'd take an XML config over a YAML one any day. This isn't to say that XML is great, but its warts are well-known, and they are generally not of the kind that makes it easy to shoot yourself in the foot. Mostly the problem is that it's verbose, and to some extend, redundant.
JSON is also fine, esp. if it is JSON5 (with comments, unquoted keys, and other such minor improvements). I find that braces, brackets, and quotes don't get in the way - if anything, they make the structure clearer.
YAML, TOML and JSON can be ingested to represent the same data structures internally, it's just a few lines of code to decide which load() function should we use for a particular file. Why not support all three formats in your applications for configuration and just let users decide, which one they want to use? Put a 'config.json' in '/etc/app/conf.d/' and you get the same data, as with 'config.yml' or 'config.toml'. Then users can use whichever format they prefer for the input data.
I tried this, it significantly complicated documentation and support after release. Lots more logic handling conflicting cases in two otherwise identical files, etc.
This article is not convincing me about yaml or json templating.
The only thing I can think of is that generating these files requires picking a language platform. I chose Ruby to generate the k8s manifests I need.
If you are picking something that is meant to be language-agnostic, or to have very little ramp-up time, then sure, templating. It just comes at a cost where the templating language itself approaches that of a full blown, Turing-complete language as more features gets added, often with shaky foundations (such as HCL).
Like many other things, Azure services can be deployed using JSON. Of course it’s not just json, it’s an entire language of deployment definitions and templating language hidden within json markup.
But next to that, Microsoft came out with bicep, which is a domain specific language for defining resources. It comes with a full language server and is honestly quite nice to use (if only azure services had some sort of reasonable logic to them)
If you can write Python,
Perl, Ruby, etc. Hell even yq in the shell. Then you have a full programming language that can output YAML or JSON in any way you want. No weird DSL, no twisting yourself into knots.
Just write normal code, make any data structure, print it as any data format. Call the code, output to temp file, use file, delete file.
Is it clunky? Yes. But it works, and you can't get any simpler.
Clearly lots of people have tried to replace YAML with something else and that hasn’t worked. What’s your wishlist on making YAML actually work for declarative systems? Or can it work?
Would things like..
A great LSP with semantic autocomplete, Native cross-file imports, and conditionals based on environment make things feel different/better?
What should modern declarative systems be doing in your opinion?
I think we're barking at the wrong tree here (speaking about Kubernetes workloads).
This ugly mess of low-ish level details will never go away. Developers that are trying to focus on developing apps will never enjoy these things and that's fine.
Something like score.dev which abstracts things even further seems to be the way to go as the interface that is exposed to developers.
Is Kubernetes using YAML 1.1 still? Because some of the complaints I hear shouldn’t be an issue with 1.2.
Moreover the YAML spec allows for specifying the tags recognized per application.
So on two counts, if “on”, “true”, “false”, as well as “yes”, “no”, “y”, “n”, “off”, and all capitalized and uppercase variants, are all boolean literals, it is not YAML’s fault.
It's always been relatively shocking to me that this is still relevant 4 years after I wrote it. Helm is as Ubiquitous as ever, despite attempts to replace it with Jsonnet, Cue and programming languages.
I've personally moved on from Jsonnet and would recommend Pulumi to anyone experiencing this problem.
I know, I know — everyone raves about the power of separating your code and your data . . . But it's [not] what you really want, or all the creepy half-languages wouldn't all evolve towards being Turing-complete, would they?
Templating YAML is the same, but... Honestly, Jsonnet is too. If you're going to generate JSON — by god use a normal programming language. You have to teach your team one fewer thing (approximately no one knows Jsonnet); it already integrates with your existing build system; if you wrote a useful util function in your main codebase you can reuse it; if you have a typechecker or a linter you can use it; etc etc.
IMO yes, although sadly Lisp is a pretty unwieldy language to do it in — it truly needs a little more syntax sugar for readability's sake. (Also a decent type checker would help.)
If need anything more complicated than simple $var substitution, it's time to use a general purpose scripting language with appropriate libraries to generate your data structure. A half-baked template DSL will never work.
In my case, I'm templating YAML because the Obsidian Templater plugin can read YAML frontmatter it asks you for and then fill in a Markdown file with the Mad Libs you choose to populate it with.
Perhaps we need something along the lines of an infrastructure description language. Some of these yamls get pretty long. Using a real language (like Python) is probably not constrained enough however.
Yeass. I encountered jsonnet thanks to Ory Kratos, it’s great. Yaml is an awful hack that’s only still around in devops because of chance and circumstance. I hate IT.
Except when you need anything more complex than a string or an array of strings, when they become entirely useless.
There is not a single even slightly complex piece of software that uses exclusively env vars for configuration. Even bash or vim have config files, this is not some new idea.
I swear this is how we got docker containers... some ruby dev who abused env vars and a SA who was sick of his shit breaking on every roll out and hearing "but it works for me"...
And now installable software is a fucking unicorn!
( This week I keep running into go apps that can be installed from source or as straight down load, with docker as well. Been a breath of fresh air)
I'm reminded of DHH's article, "Rails is Omakase" [0] back during the time when "convention over configuration" [1] was a common refrain, meant to avoid the proliferation of (YAML or XML) configuration files by assuming sensible defaults and pre-selecting various parts of the solution stack or architecture, instead of letting it be freely specified by the developer.
You lose a few degrees of freedom and flexibility in your implementation this way, but at the same time you also don't need to wade through pages and pages of configuration documents.
Everything is cyclical. I'm waiting for the next "omakase" offering that provides a sane low-configuration platform for building "cloud native" apps. Right now it looks like we're in an analogue of the XML hell that prompted the design philosophy of Rails and "convention over configuration."
Honestly I find all these different config languages either too much to learn or they get too unwieldy quickly.
I have arrived that I think using typescript to generate JSON as being the ultimate solution.
Easy JSON support, optional typing, you already know it, and adding reusable functions and libraries is understandable. Just prevent external node modules, and have a tool that takes a typescript file with a default export of some JSON and renders the JSON to a string on stdout.
YAML and its ecosystem is full of footguns and ergonomics problems, especially when the length of the document extends beyond the height of a user's editor or viewport. Loss of context with indentation, non-compliant or unsafe parsers, and strange boolean handling to name a few.
It becomes even worse when people decide that static YAML data files should have variable substitution or control flow via templating. "Stringly-typed programming" if you will. If we all started writing JSON text templates I think a lot of people would rightly argue we should write small stdlib-only programs in Python, Typescript, or Ruby to emit this JSON instead of using templated text files. Then it becomes apparent that the YAML template isn't a static data file at all, but part of a program which emits YAML as output. We're already exposing people to basic programming if we're using YAML templates. People brew a special kind of YAML-templated devops hell using tools like Kustomize and Helm, each of which are "just YAML" but are full of idiosyncracies and tool-specific behaviour which make the use of YAML almost coincidental rather than a necessity.
Yes, sometimes people would prefer to look at YAML instead of JSON, in which case I suggest you use a YAML serialization library, or pipe output into a tool like `yq` so you can view the pretty output. In a pinch you could even output JSON and then feed it through a YAML formatter.
The Kubernetes community seems to have this penetrating "oh, it's just YAML" philosophy which means we get mediocre DSLs in "just YAML" which actually encode a lot of nuanced and unintuitive behaviour which varies from tool to tool.
Look at kyverno, for example: it uses _parentheses_ in YAML key names to change the semantics of security policies! https://kyverno.io/docs/writing-policies/validate/ . This is different to (what I think are the much better ideas of) something like kubewarden, gatekeeper, or jspolicy, which allow engineers to write their policies in anything that compiles to WASM, OPA, and Typescript/Javascript respectively.
We engineers, as a discipline, have decades of know-how building and using general purpose programming languages with type checkers, linters, packaging systems, and other tools, but we throw them all away as soon as YAML comes along. It's time to put the stringified YAML templates away and engage in the ecosystem of mature tools we already know to perform one simple task they are already good at: dumping JSON on stdout.
Let's move the control flow back into the tool and out of the YAML.
To me YAML seems like the CoffeeScript of JSON, and unlike CoffeeScript I don’t understand why people are still using it.
I guess XML and JSON are too verbose. But YAML is so far in the opposite direction, we get the same surprise conversions we’ve had in Excel (https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...). Why is “on” a boolean literal (of course so are “true”, “false”, as well as “yes”, “no”, “y”, “n”, “off”, and all capitalized and uppercase variants)? And people are actually using this in production software?
Then when you add templating it’s no longer readable and concise anyways. So, why? In JSON, you can add templating super easily by turning it into regular JavaScript: use global variables, functions and the like. I don’t understand how anyone could prefer YAML with an ugly templating DSL over that.
And if you really care about conciseness, there’s TOML. Are there any advantages of YAML over TOML?
Dunno, to me YAML is the python of markup languages.
YAML is decent at handling things like nesting and arrays, while TOML sucks at it.
I don't dislike YAML that much.
That being said, we knew since the dawn of C macros that templating languages which are not aware of syntax, are AWFUL.
Likewise, writing Helm charts (the place I encountered YAML templating) is just horrible, but would be so much nicer is templates respected the YAML syntax tree and expanded at the right subnode, instead of being a text replace botch-jobs.
The worst thing with Helm charts is not the YAML, or even the text replace botch-jobs, but that they seem to think that a Go stacktrace is reasonable error reporting. I don't think I've ever worked with a tool with such awfully useless error messages.
But I agree, it'd be better if the template expansion was actually structural and not just text. The huge amount of "| indent 8" etc. in Helm charts is such a stench that by about the second time people encountered that they ought to have made a better template expansion mechanism top priority.
Unlikely it will ever get better. First to market with a prototype tool, gains market share and momentum. Eventually the enthusiasm fades off and people start hating it, for good and sometimes bad reasons. Yet users are stuck because change is expensive and risky. The team is stuck because any change risks becoming the straw that broke the camel's back, possibly cascading through the user population. Story of our young industry.
I think you're part right in that I don't think they will make any backwards incompatible changes. But they could still make things a lot better in two simple ways:
* Fix error reporting. Nobody is doing anything that relies on the current error reporting anyway because it's near useless.
* Add a slight templating change that means "after this parses as valid YAML, expand this bit, and check that the expansion is itself valid YAML before merging it in" with options to either replace the node, or merge in adjacent (the latter to insert in lists etc). You can do that without backwards incompatible changes by making a syntax change that still uses the go {{ ... }} blocks, but that starts with a directive they can make simply expand to a new template processing directive in the first pass. Then just add a second pass that operates on a parse tree (I've just written a template expansion mechanism that works on json/yaml parse trees, in fact; if we didn't need Helm charts primarily for distribution to partners that I don't want to make use a custom deployment tool, I'd be tempted to replace our Helm charts with an expansion of that.
Better error reporting and being able to avoid the incessant "| indent .." blocks and ensuring the output either generates valid yaml or can "contain" the error report to the generated sub-block would make it so much easier to use.
The biggest issue I have with Yaml is that they forbid tabs.
Their argument is that tabs are shown differently in every editor which is actually something I like. When you're looking for something deeply nested you can reduce the tab distance a bit, when that's not needed you can increase it to improve visibility of nesting levels.
And forbidding it makes a one-keystroke action a two or four one.
I really don't understand the python/Yaml hate for tabs, and as a result I don't really use either.
What is a problem is not picking one or the other. There's arguments for both sides but it is critical to just take a side. I'm sorry your side lost but it makes everything better to just go along with the consensus.
No, that's what the tabs hold-outs have morphed into. Which illustrates the problem with tabs: It's very difficult to get everyone on a team to care about tabs or not care about alignment.
Yeah, OP is not wrong. I also like neatly formatted code and is way easier to read.
I always reformat all my code before all commits. It's just good hygiene.
The funny part is the fussing and the answer they get.
I'd just autoformat the area of my patch and send in the patch that way, maybe plus some autoformatted blocks here and there, slowly fixing the stuff as I go.
If something is too bothersome, first try doing something, and figure out the rest of the process as you go.
Edit: blocks became blogs without my knowledge. Maybe I should write a blog post about it. Don't know.
Us old folks remember the days when reformatting was a computationally expensive action that required a special program to “pretty print” the code. And heaven forbid your code used some language feature your pretty printer didn’t understand and mangled the output making your code uncompilable.
Well, I'm not that of a young folk. I was playing with computers (programming, in fact) in the early 90s, and I remember when it was expensive.
However, Eclipse is formatting C++ code with a simple hotkey and without breaking it and understanding the language for the last 15 years as far as I can remember. It's instant, too.
Because of that I feel a bit surprised when younger people look it like it's black magic. It's neither new, nor unsolved in my conscious experience.
JSON formatting is less important because most apps that deal with it come with good “beautify”, “sort”, “remove all formatting white space” functions in the editor
Ouch. The only problem with the obvious sarcastic tone of that comment is that there are plenty of people that do say exactly the same thing and mean it.
For code I'd agree. However for configuration files, I find that I often need to edit them in places or environments where I don't have anything but the most bare-bones editor.
When this happens, I copy four spaces and then use Ctrl+V for Tab.
Yes, it’s not exactly the same due to alignment, and yes you have to repeat it after using the clipboard for other purposes, but it’s good enough for that occasional use.
When looking at the code, tab-containing files are the most inconsistent ones, especially when viewed via general tools (less, diff, even web viewers).
Sure, if people would only ever use tabs for indentation and spaces for alignment, things could be good. But this almost never happens, instead:
... some lines start with spaces, some with tabs. This looks fine in someone's IDE but the moment you use "diff" or "grep" which adds a prefix, things break and lines become jagged.
... one contributor uses tabs mid-line while other use spaces. It may look fine in their editor with 6 character tabs, but all the tables are misaligned when looking in app with different tab size.
Given how many corner cases tabs has, I always try to avoid them. Spaces have no corner cases whatsoever and always look nice, no matter what you use to look at the code.
(the only exceptions are formatters which enforce size-8 tabs consistently everywhere. But I have not seen those outside of golang)
> Sure, if people would only ever use tabs for indentation and spaces for alignment, things could be good. But this almost never happens, instead:
... some lines start with spaces, some with tabs.
People using tabs for alignment can happen when you've got a tab-camp-person who hasn't yet realized how they're terrible for alignment.
But "some lines start with spaces, some with tabs" happens for precisely two reasons:
* you have a codebase with contributors from both camps
* people thought in-editor tooling was the solution (now you have two problems)
> Spaces have no corner cases whatsoever
This is tooling and (as you realized) stop preference dependent.
And I've been using vim exclusively for north of fifteen years with Tab replacement, never had a problem with the editor getting confused about what happens with spaces when I hit Tab.
Some detail about the corner cases you've run into would be great, if they're happening constantly I can see how it would be a bugbear.
For example with vim (debian) defaults, if you happen to have a 2-space indented Python (the first two spaces are for HN formatting, the first if should start at zero indent):
if True:
# Two space indent
And continue to add another if block in that, the autoindent will give you four spaces:
if True:
# Two space indent
if True:
# Four space autoindent
And if you make a new line after the last row there and hit a backspace, it'll erase one space instead of four, giving an indentation of 3 (+2) spaces. And if you start a new line after that, you'll get an indentation of 8 spaces in total. Ending up with:
if True:
# Two space indent
if True:
# Four space autoindent
# Hitting backspace gives this
# Hitting a tab gives this
This is just a one case, but things like this tend to happen quite often when editing code. Even if it's been originally PEP-8 indented. Usually it's not what the Tab does, but what the Backspace or Autoindent does. I'm not exactly sure what exact Tab/Backspace/Autoindent rules underlie the behavior, but I can imagine there having to be quite a bit of hackery to support soft-tabs.
For me this kind of Tab/Autoindent/Backspace confusion is frequent enough that I'd be very surprised if others don't find themselves having to manually fix the number of spaces every now and then. And when watching over the shoulder I see others too occasionally having to micromanage space-indents (or accidentally ending up with three space indented blocks etc), also with other editors than vim.
if !exists("g:python_recommended_style") || g:python_recommended_style != 0
" As suggested by PEP8.
setlocal expandtab tabstop=4 softtabstop=4 shiftwidth=4
endif
So if you use "set sw=2" then it leaves tabstop and softtabstop at 4.
You can set that g:python_recommended_style to disable it.
Also sw=0 uses the tabstop value, and softtabstop=-1 uses the shiftwidth value.
I agree Vim's behaviour there is a bit annoying and confusing, but it doesn't really have anything to do with tabs vs. spaces. I strongly prefer tabs myself as well by the way.
Even when you DO use tabs Vim will use spaces if sw/ts/sts differ by the way. Try sw=2 and using >>, or sts=2 with noexpandtab.
As with most things in vim, it is definitely manageable in settings such as tw=2 (tab width) and sts=2 (soft tab stop). This is why a lot of older Python files, in particular, are littered with vim modelines with settings like these.
The nice modern twist is .editorconfig files and the plugins that support them including for vim. You can use those to set such standard language-specific config concerns in a general way for an entire "workspace" for every editor that supports or has a plugin that supports .editorconfig.
The defaults are either 4-space or 8-space soft tab stops. 8 spaces it the oldest soft tab behavior. 4-space soft tabs have been common for C code among other languages for nearly as many decades. It is only relatively recently that Python and JS and several Lisp-family derivatives have made 2-space tab stops much more common of a style choice. Unfortunately there is no "perfect" default as these are as aesthetic preferences as anything else.
(It is one of the arguments for using hard tabs instead of soft ones in the eternal tabs versus spaces debates because editors can show hard tabs as different space equivalents as a user "style choice" without affecting the underlying text format.)
The part where the user is on a line indented by 2, hits return, and gets a line indented by 2+4=6 doesn't sound like soft tabs at 4 to me. And I wouldn't expect hitting backspace to then only remove 1 space (if it actually removed 2 that makes more sense, but is inconsistent with what what it just added). At that point, hitting return and getting a line indented by 8 might make sense but is weird.
Another comment suggests it's using 2 and 4 for different settings and that's causing problems.
2 is the base indent of the line where the : was added. Autoindent adds 4 spaces for the current tab stop. Autoindent isn't using some counts of indents, it's taking "spaces in previous line + tab stop".
Backspace doesn't unindent in vim by default, it removes spaces one at a time. That's a difference between the ts=4 (tab stop) and sts=4 (soft tab stop) is sts also applies to backspace. But the default is that it doesn't because the out of the box default believes that backspace operates on physical characters (spaces) not soft/fake ones (tabs expanded to spaces) by default.
I don't know if that is the right default, and it is definitely a baroque exercise to get all the settings right for some languages, but there is a consistency to the defaults even if those defaults don't meet some modern expectations from newer code editors.
(Also, I just realized above I confused tw [text width] and ts [tab stop]; my vim skills are rusting a little.)
I don't want that though. Because then when editing I still have to mess around with spaces.
And the double nature of the spaces makes it hard to see when you have an odd number of spaces when you reach deep indenting levels, which counts as the lesser number of double spaces in Python.
IMO it would be ideal if tabs would be displayed as a block, and you could resize the width of that block on the fly <3
> And forbidding it makes a one-keystroke action a two or four one.
The majority of editors can be configured to use tab to insert the appropriate number of spaces. Many will automatically detect the correct configuration.
Your problem, and I mean this sincerely and respectfully, is that you're not using your text editor / IDE correctly. Adding two or four spaces of indentation is done by pressing TAB! Once. Most editors will do know how to do this out of the box, but if yours doesn't you need to change it.
You still have to mess around with a bunch of spaces when you're editing or copy/pasting, and not having exact even numbers makes for ambiguous situations.
Especially if something is 5 levels deep, it's really hard to see if you have 12 or 11 spaces (so 5 levels + 1 space or 6 levels) indentation.
My editor, I press tab once and it inserts the correct number of spaces (on a new line it also starts at the previous indentation level as appropriate). I press backspace once and it deletes the correct number of spaces.
Any editor used for programming needs to be capable of this.
I agree with you about YAML's treatment of tabs. I still use YAML because there's often no other choice.
Python is actually flexible in its acceptance of both spaces and tabs for indentation.
Maybe you were thinking of Nim or Zig? Nim apparently supports an unsightly "magic" line for this (`#? replace(sub = "\t", by = " ")`), and Zig now appears to tolerate tabs as long as you don't use `zig fmt`. I haven't used either yet because of the prejudice against tabs, but Zig is starting to look more palatable.
> I agree with you about YAML's treatment of tabs. I still use YAML because there's often no other choice.
True, I'm using it too when I have no other choice.
> Python is actually flexible in its acceptance of both spaces and tabs for indentation.
True but it does give constant warnings then which is annoying. And I was worried about it dropping support in the future so I didn't want to waste time learning it.
> forbidding it makes a one-keystroke action a two or four one.
Not if your editor can be configured to interpret a Tab keypress as the appropriate number of spaces. AFAIK all common text editors, at least in the Unix world, do this.
*> I still have to mess around with spaces when editing.
Not if your editor automatically indents and dedents with spaces. I find that to work just fine when editing Python code, for example. Tab is interpreted as "indent" and Backspace if you're at an indent stop is interpreted as "dedent".
It does not. You still have to mess around with a bunch of spaces when you're editing or copy/pasting, and not having exact even numbers makes for ambiguous situations.
I use tab and shift tab on intellij and vim on insert mode. Outside of insert mode I use "<<" and ">>". I am on vim mode on intellij too. What editor are you using?
My personal favorite was when my company switched to configuring Jenkins in YAML, with some of the config being in YAML proper and other config being in Groovy embedded inside of multiline strings. Since it's Jenkins, the Groovy itself embeds multiline strings for scripts that need to run, so the languages end up nested three levels deep!
The only thing that saves me is IntelliJ's inject-language-in-string feature.
TOML has the inline table syntax with curlies, like JSON, and inline array syntax with brackets, also like JSON. It could support nesting pretty well.
Sadly, it doesn't support line breaks in the inline table syntax, so using inline tables for nesting is a PITA; inline tables are pretty much unusable for anything which doesn't fit within like 80-100 characters. Inline arrays can contain newlines however, so deeply nested arrays works well.
Newlines in inline tables will be coming in TOML 1.1, which will make TOML much better for deeply nested structures. Unfortunately, there will probably be many years until 1.1 is both actually released and well supported across the ecosystem.
And of course, inline tables can't be at the top level of the document, so TOML might still not be the best way to represent a single deelpy nested structure.
Yeah, that's why I prefer ytt over helm syntax. It isn't great syntax, but at least it is aware of what it is doing.
Having said that, yaml has some pretty obvious mistakes. It should have been a lot more prescriptive about data types. Not doing that creates a lot of unneeded confusion and weird bugs.
> I don’t understand why people are still using it
It's a good comaparator, there are indeed a lot of similarities, but I never understood why anyone ever used Coffeescript whereas I do think I have a solid understanding of why people use YAML.
It's more like Python than Coffeescript really: it's not just about simplicity & brevity, it's about terminators.
Whitespace-dependent languages are often a pain to format / parse / read in many ways - Python has survived this by the skin of its teeth by being extremely strict about indentation, both in terms of the parser & also community convention. YAML hasn't had this - it remains a mess.
However, both have that very attractive property of not requiring terminators, which can't really be understated.
> if you really care about conciseness, there’s TOML. Are there any serious advantages of YAML over TOML?
TOML's got some good properties but its handling of structures with a depth > 1 is far from concise, and pretty terrible if I'm honest.
> I never understood why anyone ever used Coffeescript whereas I do think I have a solid understanding of why people use YAML.
When Coffeescript was invented, it was an advancement on top of the awful Javascript standards at the time. It never went anywhere because Javascript caught up, but Coffeescript had a good reason for existing.
Today, Coffeescript is a remnant of old frontends that nobody has bothered transpiling into Javascript yet, but back in the day it was a promising new development.
Coffeescript came with spreads and destructuring, and added string interpolation, just to name a few things. It also added classes and inheritance, the ?. operator, .
I suppose you could argue those are just synctatic sugar because they compiled down to ES5, in the same way you can argue that any programming language is synctatic sugar over raw machine code.
I may disagree (_heavily_) with the Pythonesque syntax Coffeescript chose, but it took a while for ES6 to be widely available, and Coffeescript made ES6 features work on most browsers without any additional effort. It's easy to take today's Javascript for granted, but the web was very different back in 2009.
In addition to this: ruby-like classes and "sane"/expected handling of this using fat arrow functions. I've worked with a few developers at the time that considered themselves pure backend/rails developers and didn't (bother to) grok the details around the way this worked in JS.
I distinctly remember lots of var that = this; in JS code back then, which wasn't required anymore when using CoffeScript.
Class sanity was the major reason I chose it for a project in the early 2010s. I was interacting with the classes in OpenLayers and being able to do so without all those footguns was very welcome.
javascript was never designed to be used like a classic OOP language, that's why jquery won, it was functional which meant it didn't fight you the way the other libraries did.
javascript is first and foremost functional no matter how hard MS and others have tried to hammer it into a more typical OOP language.
I'm not sure what you mean. You can put functions into objects, you have "this" when you call the functions, you even have prototypes. It seems to me like the language is designed to let you do OOP just fine, and the only thing that was awkward was organizing the code where you define all those functions and the constructor. So they added a sugar keyword for it.
right, it's awkward, so don't do that, be functional instead.
jquery vs mootools/scriptaculous/etc.
jquery won for a reason, it's just flat out a better experience in terms of code specifically because it uses a functional approach in its api rather than an OOP approach.
> right, it's awkward, so don't do that, be functional instead.
I feel like you're just saying that because you like functional code. I'm sure that when they've added syntax to make certain functional things easier to type, you didn't respond "it's awkward, so don't do that, write it in an entirely different way instead".
Regardless of what is "better", which tends to differ based on situation, there was no need for the awkwardness. Classes weren't bad to use, it was just that declaring them had some pointless busywork.
I've used it a moderate amount. But I'm not here to argue about how fluid functional code is, I'm here say that OOP works fine, and making slight changes to improve that experience is good. We don't need to actively discourage OOP by making it awkward.
Especially when you're not dealing with the DOM, sometimes objects work quite well.
The original awkwardness does not show that javascript "was never designed to be used like a classic OOP language".
Nor is it why jquery worked well.
And adding these slight changes is not trying to "hammer" javascript into being "more typical OOP".
saying the words "I'm not sure what you mean" doesn't give you a pass to speak with authority about the effort involved in getting the class keyword into javascript when you're ignorant of the history.
----
edit: But also, let me point something out.
what you're calling "awkwardness with classes" is incorrect. they were _functions_ that you could attach state to, some of that state could, itself, be callable functions. That's a large part of _why_ javascript has prototype inheritance.
javascript was primarily functional with some features that allowed a bit of OOP sprinkled in.
I'm not interested in the effort to get that particular change in, I'm asking for you to elaborate in this broad effort you're implying beyond that. If I misread you, and you're not implying something broader and that's the only change they fought for, then yes it is quite small.
To be extra direct there: I didn't say the effort was small, I said that change was small. You can have a big effort for a small change. So you definitely misread me there.
But when you talked about "hammering" it into a more OOP language, I thought you were talking about big changes or many changes.
> what you're calling "awkwardness with classes" is incorrect. they were _functions_ that you could attach state to, some of that state could, itself, be callable functions. That's a large part of _why_ javascript has prototype inheritance.
Does it matter if the "class" itself is a function or an object or something else entirely? It makes thingies that have the prototype applied and you can do .foo on.
But classes you make with the keyword are still functions, aren't they? So what's the big betrayal?
stop trying to weasel-word your way to being right, people fought MS and largely ignored them for years. There was a time when you didn't use the class keyword because it was non-portable because MS wasn't collaborating with anyone.
But more importantly, this all started because I pointed out that javascript is a functional language.
This remains true, which is why writing functional code in javascript ends up with a better experience, and that's a large part of why jquery won.
Brendan Eich, the creator of javascript, was heavily influenced by. Scheme is functional so I'm not saying anything outlandish here.
> I've never used Self myself, but I believe that JavaScript's extensive use of prototypes came from Self.
> As for Scheme's influence, you need look no further than JS's first-class functions and lexical scoping (okay, so JS doesn't implement full lexical scoping in the way Scheme does, it implements function-level scoping, but still, it's close).
Asking what you meant is not weasel wording, goddamn.
(Some of the distinctions you're making still make no sense to me because you think they're so evident you won't elaborate, but at this point it's definitely not worth the effort.)
I would argue that fat arrow functions really are nothing more than synctactic sugar. I don't know of any place where (x,y) => {} couldn't be replaced by function(x,y){}. I prefer arrow functions myself, but it's a very minor additions.
When you didn't know how this worked, CoffeScript's fat arrow functions became a life saver when attaching callbacks from inside some object you were writing that probably had an init() method to set up the handlers:
You only needed a .bind(this) in the plain JS version, but it felt like surprisingly few people knew this back then.
Interestingly enough, the current version of CoffeeScript compiles this code into a ES6 arrow function itself, but I think back then they used bind() in the transpiled JS.
>by being extremely strict about indentation, both in terms of the parser & also community convention. YAML hasn't
This is why I created StrictYAML. A lot of the pain of changing YAML goes away if you strictly type it with a schema but you keep the readability.
Counterintuitively that also includes most indentation errors - it's much easier to zero in on the problem if the error was "expecting status code or content on line 334, got response", for instance.
StrictYAML is a great initiative. On the other side of the fence I also love JSON5, for opposite reasons - it's essentially "UnstrictJSON".
JSON5 has achieved a reasonably high level of adoption (though I think it's plateaued & I don't see it ever becoming the standard way people do JSON). Would be great to at least see StrictYAML hit a similar level of adoption though - the network effect is so hard to overcome.
TOML's sections remind me of the directory part of a filename and keys files.
For the content that belongs in a typical configuration file this or the INI style roots are probably the most human approachable formats. For anything more complex maybe a database (such as SQLite?) is preferable past application bootstrap?
Reading yaml has the enjoyment of reading a love letter where else json has the deterimental feeling of a solicitor email. For writing, yaml is like putting out the draft, you only focus on the meaning not care for else or the form, but for json it is like finishing up your thesis with hard defined structure.
People balk at XML, but its verbosity plus DTD allows it to pull tricks which you can't do on other things.
Well everything has its place, but XML is I think very well suited where you need to serialize complex things to a readable file, and verify it while being it's written and read back.
Indeed. I get a lot of value out of my strongly typed XML documents. I generally have code that validates them during writing and after reading. Those who don’t understand XML end up learning why it is verbose when they eventually add all of the features they need to whatever half-baked format they are using.
The 'XML is verbose' argument is exactly analogous to the 'static typing is verbose' argument. JSON is decent, but it quickly breaks down if you want to have any sort of static sanitisation on input data, and the weird `"$schema"` attribute is quite strange. YAML makes no sense whatsoever to me.
XML is by far the most bulletproof human-readable serialisation-deserialisation language there is.
> The 'XML is verbose' argument is exactly analogous to the 'static typing is verbose' argument.
It’s two things: the static typing analog is definitely there but I’d extend the comparison to something like the J2EE framework fetish & user-hostile tools, too. There were so many cases where understanding an XML document required understanding a dozen semi-documented “standards” and since few of the tools actually had competent implementations you were often forced to write long-form namespace references in things like selectors or repeat the same code.
I worked with multiple people who were pretty gung ho about static typing everything but the constant friction of that self-inflicted toil wore over time. I sometimes wonder whether something more in the Rust spirit where the tools are smart enough not to waste your time might be more successful.
I agree. Here in 2024, I hope everyone agrees that types are great.
Static types, aren't just verbose, they're clunky. They only work in a perfect world - dynamic types provide the functionality to actually thrive.
> I sometimes wonder whether something more in the Rust spirit where the tools are smart enough not to waste your time might be more successful.
That could help, the problem being XML. You mention the J2EE framework and semi-documented "standards" - the world is rife with bad xml implementations, buggy xml implementations, and bad programmers reading 1 GB xml documents into memory (or programs needing to be re-worked to support a SAX parser).
There's too much baggage at the feet of XML, and the tools that maybe could have helped were always difficult to use/locked behind (absurdly expensive) proprietary paywalls.
JSON started to achieve popularity because as a format, it was relatively un-encumbered. Its biggest tie was to Javascript - if certain tools hadn't been brain-dead about rejecting JSON that wasn't strictly just JSON, it might have achieved same level of type safety as schema-validated XML, without much of the cruft. But that's not what the tools did, and so JSON became a (sort-of) human-readable data-interchange format, with no validation.
So in 2024 we have no good data-x-change formats, just random tools in little niches that make life better in your chosen poison format. We await a rust - a good format with speed, reliability, interoperability, extensibility, and easy-to-use tools/libraries built in.
I think PDML hits a sweet spot. The author didn't set out to recreate XML in a less verbose, more human readable syntax, but pretty much ended up doing so. I'd like to see it mature and gain more widespread adoption.
Agreed. XML is clunky, no doubt, but it's partly that the tools were just clunky.
Having said that, I do like that you can flip between YAML and JSON. If we could do that with XML (attributes vs sub-elements a problem here) it would be much more useful I think.
An XML document without a schema is strictly worse than JSON without a schema. JSON with a schema is strictly better than XML with a schema. XML structure does not map neatly into the data types you actually want to use. You do not want to use a tree of things with string attributes, all over your code. If you do have a schema, the first thing you will want to do is turn your data into native language data types. After that point, the serialization method does not matter anymore, and XML would have just be slower. Designing a schema for XML is also more tedious than for JSON.
> XML structure does not map neatly into the data types you actually want to use.
> After that point, the serialization method does not matter anymore, and XML would have just be slower.
Considering I have mapped 3D objects to (a lot of) C++ objects containing thousands of facets under 12ms incl. parsing, sanity checking, object creation, initialization and cross linking of said objects on last decade's hardware, I disagree with that sentiment.
Regarding your first point, even without a schema, an XML shows its structure and what it expects. So JSON feels its hacked together when compared to XML in terms of structure and expressiveness.
It's fine for serializing dark data where people won't see, but if eyes need to inspect it XML is way way more expressive by nature.
Heck, you even need to hack JSON for comments. C'mon :)
I enjoy JSON for internal stuff and where it does not matter that JSON is not very expressive. JSON Schema is a poor substitute for a proper schema. For anything where I am interfacing with another person or team, I send them a DTD or XSD, which documents the attributes and does not have nonsense like confusing integers and floating point values.
For quick and dirty, I agree about JSON. For serious data interchange, I use XML.
Not to me. I have lots of data exchanging going on where the format is expressed well in XSD and in JSON Schema it is expressed through documentation, code, and a history of angry emails.
JSX stands for JSX. Your definition is something that people just imagine to be true. The React docs do not mention the word XML at all. The “templating” syntax is not XML. It has no defined semantics and does not generally support crucial XML features like namespaces.
Indeed, XML is a decent document language because of the quality of tools available and its power/flexibility. I hate when people use it for config files and other things that are usually human edited where readability is paramount though.
When I first encountered XSLT I seriously thought it was the most ridiculous thing I had ever seen. A frickin' programming language whose syntax was XML.
But then I learned it and I don't think I've ever seen another language that could do what XSLT could do in such a small amount of code. The trick was to treat it like a functional language (I got this advice from someone else and they were absolutely correct). Where most people got into trouble was thinking of it as an imperative language.
Pattern matching expressions is the kool kid on the block, but XSLT had that to the nth degree 20 years ago.
The problem goes deeper. I can't remember who coined the term, but all "implerative" (imperative declarative) languages share the same issue. I don't care if it's JSON, XML, TOML, or YAML, we shouldn't be interpreting markup/data languages. GitHub actions are a good example of everything wrong with implerative languages.
Use a real programming language, you can always read in JSON/YAML/whatever as configuration. Google zx is a good example of this done right, as is Pulumi.
Kris Nóva said it best: "All config drifts towards Turing completion."
Oh man, i have a similar issue with NixLang. Though i know it's not "implarative". Many days i just want to write Nix in my preferred language. I wish Nix had made a simple JSON based IO for configuration, because then i could see what the output of something is - and generate the input state from some other language.
Really frustrating. Nix works.. but i just don't see the value, personally. And this is after living on NixOS for ~3 years now, with 4 active Nix deploys in my house.. i just don't like the language.
I'm currently building this (plus more) - the happy path of what you're talking about is almost complete. There are fundamental issues preventing what you're talking about being used as a complete replacement for NixLang: you'd need every possible language installed/available on the builder machine in order to build packages, and lazy evaluation would completely break (merely evaluating all of nixpkgs takes hours). So you do ultimately need a primary language. That being said, for devops-like stuff there is no reason to have that limitation.
I wanted to use Nickel, but it turns out that it can't do everything you'd need it to do to completely replace NixLang. So right now I'm bikeshedding on what to use instead (and desperately trying not to invent something), in other words it's definitely being renamed. Either way there's a bash script in the `test` dir that shows the general concept.
I see Nix as a powerful way to write config files. It is purely functional, so the only thing it does is create a build recipe. That build recipe is then run by other Nix tooling.
A .nix file is either a config file itself or a function that returns a config file or a function. By passing in enough parameters, you get the configuration. I've not seen as clean a way of doing this anywhere else. Guix uses Guile which is a full programming language and can probably have side effects. They use something called G-Expressions which is not quite clear to me.
The problem is (to me) it's entirely obtuse. I can't call the function and get back some configuration - which is insane to me. You have to pass in all sorts of state, and you have a lot of difficulty producing the exact same state as your config in question would see in a real execution. Or at least i do. I even asked on several forums and the answer kept boiling down to "Well, it's just not easy. Sometimes not possible." What's the point of it being functional?
Ie yea, i can load up the Eval and call my config func - but what about the params? Well now i have to generate the params. Some of them might be easy, but some are difficult as hell - and if they differ now executing my func in the Eval is not producing the same output (or failing entirely) as it does when i run it "for real".
Nix in practice felt like all of the problems of imperative languages but wrapped in a nice functional wrapper. It was functional without any of the real benefits of functional - to me.
Eg i can't easily get the same input and pass it into a function to produce the same output. To be able to view a function as a simple slice of functionality that i can inspect, debug, etc. They have get access to the entire universe (nixpkgs/etc), a huge stdlib, etc - and you need to recreate all of that if you want to use the function.
The parameters you pass in define your dependencies. For a program to compile it needs the compiler and that is a complicated dependency. One might think that only passing the paths to the dependencies would be enough. That way the inputs could be much simpler indeed. I guess there's room for a simpler Nix.
While i will instantly switch to Nickel for the type system once Nickel is available, i do think Nix could get a lot further by just having better tooling.
Notably error reporting is atrocious, but an interactive debugger would be amazing too. Ie to set a breakpoint and hop into an eval at your breakpoint. Would help immensely.
Still i just can't get behind a dynamic typing for anything remotely complex.. which i would describe Nix as. I have been counting down the days for Nickel.. it's been a long wait.
Nix can read JSON, there's a deserializer as one of the builtins you can call. So you can make a bridge where Nix reads your JSON and does something with it, and you can generate the JSON externally like you want. It's how things like poetry2nix work.
Completely opposite experience for me. I think Nixlang is exceptionally well designed and makes sense for the usecases it wants to cover, and it is exactly what I would expect from a DSL tackling the problems it tackles.
"Implerative" - thank you for this, this is the term I've been searching for to describe the weird blending of the two things.. I immediately Googled it and saw that it has previous uses as well, I would love to know who originated the concept. I see so many times, confusion and arguing about what is imperative and declarative, to the point where I question the value of the terms any longer.
FWIW, I have flirted with my own DSL implementations in a few cases. Certainly, language design is much more complex, but I also felt that once you understand enough of EBNF/parser generators (and some of the simpler alternatives), this is a very powerful option as well.
I'm also pretty against DSLs, although they do rarely have uses cases. For an example of why DSLs can be bad, look at Dockerfiles contrasted with Buildah. The former makes tons of assumptions, especially when to perform layer checkpoints. The latter is just a script in Bash or whatever your language of choice.
For the curious, this might be it:
"I've cracked our marketing code, y'all!
Pulumi: Implerative Appfrastructure" [1]
@funcOfJoe, Joe Duffy: CEO of Pulumi
I've always wondered why we seem to have implemented a whole programming language in yaml or json for so many CI/CD systems rather than just writing quick python scripts to describe the logic of a particular build step, then MAYBE using a JSON or XML file to enumerate the build steps and their order, like:
Sure, that's orchestration, though. The problem with GHA is the sheer amount of expressive power that it has. If you need to do dynamic stuff then that should be in a "pre-workflow" step, written however/in whatever you please, that emits the actual workflow.
Why shouldn't the python script be the discrete workflow step? It could be mounted on some file system which has checked out the git at a particular commit with a particular tag, then runs whatever tasks are required to validate or deploy the project
For tools that allow configuration in either JSON or Javascript (like eslint), I prefer the JS version. The syntax is similar but has much more flexibility, like being able to use environment variables or add comments.
Pulumi was also a good tool when I was doing kubernetes deployments.
> Are there any serious advantages of YAML over TOML?
Probably not but you forget YAML came out in 2001 where TOML came out in 2013. Neither are spring chickens but inertia is a hell of a thing. For example, Symfony supports YAML, XML and PHP definitions -- but not TOML. Symfony v2 simply predates TOML and they never got around to ditch YAML for TOML because it's not worth the bothering.
[section]
option=value it the way you want it.
; And these are comments. That's all.
I don't argue. I use TOML too, but it doesn't change that it's an ini++. You can treat an .ini file as a TOML file (well, maybe comments needs some changing, but eh), they're not different things.
I don't think, even though TOML has some official spec, all parsers are up to it, and may have disagreements between them. It's same for INI.
You can have "native types" in .ini as well. The difference is you'll be handling them explicitly yourself, and you should do that in defensive programming anway. A config file is a stream of input to your code, and if you don't guard it yourself, you agree what that entails.
If you look at the failure details then most of them are either minor issues about where things like escape characters are/aren't allowed, or about overriding existing tables (previously the spec was ambiguous on that, and I expect that will clear up over time). Note that overview is not entirely fair because it uses the latest (unreleased) version of toml-test where I added quite a few tests.
These kind of imperfections in implementations are of course true for any language, see e.g. YAML: https://matrix.yaml.info – I have no reason to believe it's worse in TOML vs. YAML, XML, JSON, or anywhere else. If anything, it's probably a bit better because it's fairly simple and has a pretty decent test suite.
The problem isn't with the small configuration files, those are just argv put into a file.
Here's an experiment actually worth doing: ask ten people to write a ini file for configuring between 3 and 6 servers where some properties are the same for several servers.
One may write a single value containing a CSV, another may use a convention of namespaced keys, whatever. One may base64, one may urlencode, whatever.
The differences don't change the fact that they will all have the same things in common.
Even without a formal spec, we all know what we are free to change and not free to change, and free to assume and not free to assume. The unwritten spec specifies very little, so what? That means maybe it isn't a good choice for some particular task that wants more structure, but that was not what you said and not what I'm ridiculing.
Or was that all you meant in the first place? That without some more to it to define standardized ways to do things, it's not good for these kinds of jobs? I confess I am focusing on the literal text of the comment as though you were trying to say that the term is not meaningful because it is not defined in a recognized and ratified paper.
My point is indeed that it is not meaningful to speak of the INI culture as something directly comparable to a standardised format.
> One may write a single value containing a CSV, another may use a convention of namespaced keys, whatever. One may base64, one may urlencode, whatever.
> The differences don't change the fact that they will all have the same things in common.
I think this is the first time I've seen this sort of neo-romantic argument, where the representation of information is claimed to be irrelevant because, for some unspecified reason, we all known in our hearts what is being said.
Is this a mystical theory you've built on extensively, or something that came to you from the aether just now?
That's all any communication is, is two or more parties using symbols who's meanings a majority agrees on. It does not require a dictionary.
I refer back to the simple fact that the original commenter felt it reasonable to speak the words, believing that others had the same idea what the words meant as they themselves did, and to the fact that I and others did in fact have that same understanding. That means it is utterly silly to be trying to say that the term has no meaning. Does everyone else have telepathy and only poor you are left out of the club? It's even silly to claim that merely you individually just don't know what the term meant, if you would claim to work in any remotely IT related field.
It basically looks like an attempt to look smart backfiring badly.
The reason the things the term doesn't define don't matter is the same reason as for all other terms or symbols. No term is a complete description of anything. It defines what it defines and does not define anything else.
When you say "XML", you still have not said an infinity of things. XML merely defines more than INI. INI defines a certain structure, and you are free to do whatever you want within that structure, exactly like XML and all other formats & protocols.
If they defined everything, then they wouldn't be general purpose frameworks for packaging data, they would be snapshots of specific particular data. In fact they would not even be snapshots but one specific physical instance taking one specific form as it exist in one place at one time somewhere.
There is no way you don't already know all of this, I absolutely credit you with having this much understanding of how symbols work, which makes your argument disingenuous.
If you didn't and your argument was sincere, then you are embarrassingly illiterate for trying to partake in a conversation in this area. Not a crime to be that ignorant, and if so then I apologize for ridiculing a 6 year old who somehow found their way onto HN, but consider yourself now better informed than you were. That a ratified rfc or iso for INI, or any other term, is not required for a term to be valid communication. All that's required is for speaker and listener to both understand it, and such definitions are merely one of many ways for a term to have meaning and for all involved parties to have that mutual understanding.
Wait, I suppose I have to explain what rfc and iso and ieee all mean in this context. Anyone who did not know what .ini meant surely can not recognize any of those either.
How the software operates is of course dependent on the expressiveness of the configuration format, so it is clearly false in most practical senses to claim that the flat key-value format of INI and BICF will generate the same set of problems as formats that allows for list and nesting.
If we accept the assertion that the complexity of a configuration file for the stated scenario is constant across all configuration formats, we will next be asserting that there's no difference in complexity between solutions in x86 assembly and LISP.
You stated a problem: Configure ~6 servers where they share variables.
I can implement it in plethora of ways. The most sensible one for me is to have a general or globals or defaults area where every server overrides some part of this defaults. The file format has nothing to do with the sectional organization of a configuration file. Because none of the files force you to a distinct section organization.
e.g.: Nesting is just a tool, I don't care about its availability. I don't guarantee that I'll be using if that's available.
I can write equally backwards and esoteric configuration file in any syntax. Their ultimate expressiveness doesn't change at the end of the day.
I don't care. All can do whatever I want and need. Only changes how you parse and map. It's hashmaps, parsing and string matching at the end of the day.
If you know both languages equally well, LISP becomes as complex as x86 assembly and x86 assembly becomes as easy as LISP. Depends on your perspective and priorities.
If you don't know how to use the tool you have at hand, even though it's the simplest possible, you blow your foot off.
> Why is “on” a boolean literal (of course so are “true”, “false”, as well as “yes”, “no”, “y”, “n”, “off”, and all capitalized and uppercase variants)?
The YAML 1.2 spec removed “no” as a synonym for false. That arguably just made that entire problem worse, and even though it’s been almost 15 years YAML 1.1 is still the commonly used variant.
It wasn’t obvious to me. I read it as the literal string “Norway” being parsed as false, which didn’t sound believable but I didn’t make the connection to NO at all.
In general I am always confused that it lets you use strings unquoted, which is what allows for all these issues with ambiguity of the interpreted data type, Norway problem and all that.
It also just looks odd to me, I don't see why it's necessary to allow this.
It’s great for end users who don’t understand what a string is or don’t have to play the game of finding the hanging single quote when they write the file by hand in a textarea.
On the opposite end of UX, there’s hand written JSON which is just too meticulous in some scenarios when people are writing config without editor support.
That’s probably a good thing for end users but if it’s running on something that affects the live service I’d rather not have people edit the config who don’t know what a string is
Dealing with inline quotes is annoying, but if you care about users writing things by hand, and especially in a textarea, you should not be using a format that depends on indentation.
YAML is older and more well supported. I'll explain to you why I ended up choosing YAML for the config files for a CLI utility written in Python that I maintain.
I initially chose TOML for many of the reasons mentioned here but before my first release I ended up switching to YAML. Python added support for reading TOML to the standard library in version 3.11, however it still requires you use an external library for writing. Do I use the built in library for reading and an external library for writing? A chunk of my users are on versions of Python older than 3.11 (generally Windows users who installed Python manually at some point), do I import a separate library for THEM to read the files but use the standard library if ver >= 3.11?
Now that I look at the state of things today I probably would add the tomlkit library to my setup file, but that wasn't very mature at the time, so I just used pyyaml. Changing it now would break compatibility with my older versions that use yaml config files, unless I maintained both paths... which I could do but it's just another source of complexity to worry about. These are relatively simple config files the user has to interact with manually so yaml works fine and I don't see any reason to change at this point.
I have seen this post on HN before and I wasn't received very well AFAIR.
But I can't help agreeing with its main point: so much complexity to support a few basic data types that are not sufficient for anything complex anyway.
If you haven't checked it out, NestedText is a great format that offers no handling of types beyond string/list/dict, leaving all that to the application reading in the values.
I'm still using CoffeeScript whenever I can. It has one of the nicest syntaxes out there, a lot of code fits to one screenful, the logic of the code is easier to see without the clutter of unnecessary syntax and it's a joy to write too.
YAML is probably used for similar reasons.
I don't understand why people want redundant verbose syntax that makes reading and writing code harder. And sadly don't anymore expect anyone to really explain it based on anything tangible.
I'm glad I'm not the only one. I prototyped an SPA recently with mithril.js and CoffeeScript and I think there's really something magical about that combo.
Oddly enough I can't stand writing python or js. I do almost all of my actual programming in Rust, because I adore the type system.
YAML is an amazing config language for simple to mildly complex configs. It's easier to read and write than JSON, and it only really breaks apart when you're heavily deviating from nested lists/dictionaries with string values. People use it everywhere because by the time it becomes painful you're already so invested it's not really worth the hassle of switching.
It’s aesthetically pleasing for simple configs. I’m so used to writing JSON by hand by now I don’t find it much easier. At least I never have to think about how a value is going to be interpreted from a JSON since it has a decent subset of types and I can visually tell what it is
I, on the other hand, find it much harder to read and write even in very simple configs. I never know what the indent is supposed to be, I just press my spacebar until my editor stops complaining. I find it really hard to tell if a line is a new entry or a subset of the parent entry.
I'm sure if I used it more it'd become easier, but my whole team doesn't understand it either. Luckily we only need it for GitHub configurations.
YAML is (vaguely) a superset of JSON, so you can just use JSON (without tabs) and get your life back.
I don’t need a config language with no fewer that 6 subtly different ways of decoding a string to remember, and certainly not one with a spec longer than C’s. Compare to JSON’s, which (famously) fits on a postcard.
Until you find a snippet of config you want to copy into your `application.yml` in Spring or Quarkus (Java frameworks). If it doesn't paste in cleanly (and it rarely ever does) you'll need to go research the schema and find out where to put things. Meanwhile, if you're using a normal `application.properties` file, after you've finished pasting, you can go on with your life.
I can't find any JSON5 parser that isn't for JavaScript. I've started writing one in C that can then bind to other language, but it takes time to write!
I think YAML is for code what Markdown is for Text: It is easy to read and _can_ produce the same or equal output that more strict and extensive languages. Easy readability makes this tradeoff acceptable for most.
HJson https://hjson.github.io seems a nice 'in-between' between YAML and JSON without the indentation-based syntax, so closer to the JSON side but with comments and less quotes.
What I don't really get is why the cloud providers / tooling implementors have never drafted up a "YAML-light" that just throws out the rarely-used headache-inducing syntax elements.
Right, I also don't understand why it's considered a feature of many of these languages to introduce so many ways of doing the same thing. Like the boolean example, but also having three different ways to express a list or dictionary? It's the classic Robustness principle which makes it less robust, making reading and parsing more complicated. How about just allowing one syntax and error if it's not according to spec.
I've had few to no issues when using YAML for docker-compose.yml files. This isn't to say that use of YAML can't be problematic, but I don't believe it's necessarily bad at all for configuration.
> So, why? In JSON, you can add templating super easily by turning it into regular JavaScript: use global variables, functions and the like. I don’t understand how anyone could prefer YAML with an ugly templating DSL over that.
That's a valid use case when the target user is the software developer themself, but access to the language runtime is not something that should be accessible to a technical but non-maintainer user. Granted, it's plausible that a "template" JSON can be defined, which would be spread over a JSON-formatted configuration, but what YAML allows the user to do is define "templates" within the configuration itself and control over where those template structures are extended.
When the user is a developer maintaining a software project, they should probably just use JavaScript for configuration, and not JSON files, except when there's a possibility that the configuration can be intercepted.
> Why is “on” a boolean literal (of course so are “true”, “false”, as well as “yes”, “no”, “y”, “n”, “off”, and all capitalized and uppercase variants)?
”on”, ”off”, ”yes”, ”no”, “y”, and ”n”, and case variants thereof, are not boolean literals in YAML since YAML 1.2 (2009).
I guess the real mystery is why so many tech types speak like a infant having a tantrum, about some esoteric trivia, and then have hordes of their kind come and vigorously head-knod it, and all involved think virtue is being done.
People started using things like YAML, obviously, because it reads closer to natural language. It's like a nested bullet list, which everyone can easily read. Readability is important to people. It's why we don't all still write C and Perl.
So it's one thing to say "I think people should be careful about prioritizing readability over precision especially for production systems". It's another to do this narcissistic dramatic faux-incomprehension implying the markup language gained the popularity it did because everyone's stupider than you.
> I guess the real mystery is why so many tech types speak like a infant having a tantrum, about some esoteric trivia, and then have hordes of their kind come and vigorously head-knod it, and all involved think virtue is being done.
Ha, great line. And you caught me mid-tantrum and mid-head nod. :)
Having on and no both be Boolean literals, but of opposite values sounds like a horrible decision, a typo doesn't result in a syntax error, but instead in a completely wrong semantic misconfiguration.
XML also has some other issues (no typing, to many ways to have maps but non seems to be the correct way etc.)
JSON just isn't mean to be written by humans (no comments).
But YAML is just horrible, like the whole accidental mistyping issues (NO => false) are just horrible and not acceptable IMHO. That it's a pretty complex thing doesn't help either.
I honestly don't understand why we (e.g. github actions) still use YAML for new thinks even knowing all the issues especially if we, there are many other well suited decent but less wide spread alternatives.
> But YAML is so far in the opposite direction, we get the same surprise conversions we’ve had in Excel
This is optional. Besides using a better parser that uses the spec that's long fixed a lot of these listed in the article, another way to avoid the issue is adding more verbosity (that would still not match XML nor JSON).
You don't have this option in XML/JSON, you can't remove all that useless markup (and leave it only when it's useful)
stronly agree, I came to that conclusion before k8's even existed because I myself thought to use it as a configuration file format and the second I started realizing some of the ambiguity in it's syntax I walked away from it.
The only thing I disagree with is that Coffeescript is still useful. I had the same reaction to Coffeescript that I had with yaml, Coffeescript _never_ had any real point outside of a segment of people preferring to write javascript in Ruby syntax. The biggest issue Coffeescript had is that debugging meant reading through the javascript anyway so you never really got away from javascript.
I'm a fan of either using a full-blown programming language or ini files, and yes I realize that seems insane to many people but at the end of the day ini files are stupidly easy to edit and if you can get away with not needing a full-blown turing complete language then convention based ini files are vastly easier on the human than yaml or json.
I'm either a greybeard that never got with the times or I'm a rebel, probably depends on who you talk to.
> I'm a fan of either using a full-blown programming language or ini files
How do you persist complex multi-object state? Think nested lists of objects with references to one another.
If your answer is still "ini files", I'm sure it can be done, but only with a lot of custom-rolled code...xml/json(even yaml) for all their issues provided a code-free way of persisting this all - either through use of marshalling (xml) or json/yaml.load().
you cut off the part of my statement that answers your question
> if you can get away with not needing a full-blown turing complete language then convention based ini files are vastly easier on the human than yaml or json.
My claim isn't that ini files solve for every use case, it's that if your needs are simple enough ini files are superior to json/yaml, but that full-blown turing complete languages are superior to everything else.
Also, if you're saving complex object state you don't have a configuration format but a serialization format and definitely ini isn't good for that.
> you don't have a configuration format but a serialization format
While I better appreciate what you are saying now (you don't have a solution), the only appreciable difference between "config" and "serialization" is that of write frequency - config is seldom updated, serialization is often updated.
Otherwise, they are the same problem with the same solution - you might provision resources differently based upon "dynamic" vs "static" data, but that's an operational perspective. From the perspective of the application maintainer, there is no difference.
I'm going to submit that if you think configuration and serialization are the same problem it's time to step back and re-evaluate what you're doing, which is really the authors point.
As Joel Spoelsky said years ago, if you abstract far enough up everything starts to look the same but that doesn't make it so
At the end of the day you could claim that all data exchange is exactly the same, and indeed Claude Shannon showed all information is just data, but that misses the point entirely. All humans are exactly the same and yet sex between them can look vastly different based upon such details as genitals.
> if you think configuration and serialization are the same problem
Except that's precisely not the point...rather the formats they are written in are the same, they are indistinguishable.
Reductio-ad-absurdum, if all data exchange is the same then there is no benefit to any format, just write binary strings with null-terminal characters. Except for the many downsides to that approach, so it turns out that they are not the same...
And never-the-less, if all configuration were not serialization, there would be not be any need to be generating config via a different language per the OPs' post...
So we find the similarities between configuration and serialization to be more pertinent than their dissimilarities wrt to format.
INI is absurd for any complex configuration, "just use a turing complete language" is as good an answer as deciding to write binary data randomly...
oh look, the internet denizen was able to weave their way through a rationalization, that's certainly never been done before!
What makes it even more absurd is that we do, in fact, have binary serialization protocols and they're very popular especially amongst companies dealing with scale.
> Values in Cap'n Proto messages are represented in binary, as opposed to text encoding used by "human-readable" formats such as JSON or XML. Cap'n Proto tries to make the storage/network protocol appropriate as an in-memory format, so that no translation step is needed when reading data into memory or writing data out of memory.
---
But that's actually the fucking point, serialization only looks the same as configuration if you've gone too high up the abstraction ladder and lost your perspective and that _is_ the point of TFA. At some point you need to stop and ask if what you're doing is really the right approach.
You've destroyed your _own_ point with your long-winded, weaving, rationalization.
and to top it all off, you've strawmanned a point I've clarified already. It makes you look like an asshole. I've never claimed ini works well for complex configuration, I said the opposite in fact.
> oh look, the internet denizen was able to weave their way through a rationalization, that's certainly never been done before!
Yes, I gathered a couple replies ago you weren't interested in meaningful discussion...and probably hadn't even read anything I'd said.
> You've destroyed your _own_ point
What point? I asked you a question. You continually divert and misdirect.
Now the only point of contention I have left with you: Configuration and Serialization are the same thing, at the format layer. You mumble some nonsense about an abstraction ladder, but the truth is you're climbing it. The difference between only appears at higher levels of abstraction.
> I've never claimed ini works well for complex configuration
And yet you never made a claim about what works well. This is precisely the reason JSON/YAML are popular, and most people ditched INI, people don't care about your higher order abstraction, they just want a format that gets the job done, and doesn't get in your way.
> I guess XML and JSON are too verbose. But YAML is so far in the opposite direction, (...)
YAML is a far better format in terms of being human readable and editable, and supports features such as node labels and repeated nodes that turn into killer features when onboarding YAML parsers into applications.
YAML is fine if you don't do weird stuff with it. (And some stupidity like the Norway problem) A good example is OpenAPI schemas, which are quite legible in the YAML.
TOML has some nasty edge cases like top level arrays, arrays of objects under a key, etc.
StrictYAML is great (and the author is in these comments!), but ultimately it's one specific library, not a format spec, so to depend on it for a project you need every person/tool doing the writing/parsing to commit to use that library (and the programming language it was written for).
Again, it's a great project, but I wanted something similar that is a language-agnostic format specification, so moved on to using NestedText wherever I can.
I've asked you similar in the past in comments on this site, but: what do you find lacking in the NestedText spec that a new YAML-like format might do better? Why not just embrace NestedText for the task?
We're still using the CoffeeScript of JSON because YAML's UX improvments haven't been brought into the upstream JSON spec like CoffeeScript's UX improvements were brought into JavaScript.
CoffeeScript is the worst thing that ever happened to the software industry.
CoffeeScript fooled developers into thinking that transpilation was free and had absolutely no downsides whatsoever. The advantages of CoffeeScript over JavaScript were so incredibly marginal. I've never heard a single good argument about why it was worth adding a transpilation step and all the complexity that came with it.
I think even TypeScript isn't worth transpilation step and bundling complexity these days, especially not when modern browsers allow you to efficiently preload scripts as modules and bypass bundling entirely.
About YAML. It's also not worth it though it's not quite as infuriating as CoffeeScript. The advantage of JSON is that it's equally as human-friendly as it is software-friendly. YAML leans more towards human-friendliness and sacrifices software friendliness. For instance, you can't cleanly express YAML on a single line to pass to a bash command as you can with JSON. It's just one additional format to learn and think about which doesn't add much value. Its utility does not justify its existence.
The problem is very specifically the fact that YAML, as a config language, sucks.
I have no idea why people started using it. "bUt jSOn dOeSn'T HaVe cOmMenTS" ... oh gimme a break! You want a comment in JSON?
{
"//": "This is a comment explaining key1.",
"key1": "value1",
"//": "This is a comment explaining key2.",
"key2": "value2"
}
There. Not so hard. Writing a config parser that just ignores all keys starting with "//" is trivially easy...if it's necessary to ignore them at all that is, because most config parsers I have seen couldn't care less about unknown keys, let alone repeated keys.
So what other "reasons" were there for YAML?
Oh, the human readbility thing. Yeah. Because syntactically relevant whitespace is such a joy in a data serialization format. It's bad enough when a programming language does that (and I am saying this as someone who likes python), but a serialization format? Who thought that would make things easier?
And then of course there are other things filled with joy and happiness...like the multiple ways to write "true" and "false", because that's absolutely necessary for some reason.
"Oh but what about strict-yaml?!" I hear the apologies coming...great, so now I have the ambiguitiy of what parser is used on top of the difficulties introduced by the language itself. Amazing stuff. If that's the solution, then I'd rather not have the problem (aka. the language).
But despite all[1] these[2] problems[3] and more, YAML somehow became the goto language for configuring pretty much everything in DevOps, first in containerization, then in cloud, and everything in between. And as a result, we now have to make sure our config template parsers get whitespace right. Great.
So bottom line: The problem here is maybe 1/3 the complexity of config files and 2/3rd the fact that YAML should have never been used as a configuration format in the first place. It's benefits are too small, and it's quirks make too many problems for that role, outside of really trivial stuff like a throwaway Dockerfile.
Want config? Use JSON. And if you need something more "human friendly", use TOML.
That's fine, you can pick whatever XML ugliness you like, I was just pointing out that you can't solve the basic fail of JSON with comments by making them data
No, you were pointing out that some people won't like that solution. Which is completely fine. And yes, that solution is ugly as hell, and it is a dirty hack, and I don't recommend actually doing that if there is a better way (like TOML).
> ugly as hell, and it is a dirty hack, and I don't recommend actually doing that
you're just paraphrasing the obvious - it's not a solution. And YAML is a better way and addresses the problem you choose to ignore as such - ugliness.
> And YAML is a better way and addresses the problem you choose to ignore
I haven't ignored it, as shown by us discussing it here.
And no, YAML isn't a better way. It does commenting better, true, and at the same time, it does so many things wrong, that are absolutely no problem in JSON, that it becomes a treatment worse than the disease.
You've argued that ugliness is a solution while the problem that needs to be solved is ugliness. You can't have it both ways, either it's a solution, then don't call it ugly as hell, or it's ugly, then not a solution (or it is but that means you ignore that ugliness is the problem)
This specific brand of cleanliness does have a bunch of issues, but that's a trade-off depending on a use case
I think I speak for basically the entire history of programming when I say that ugly solutions exist and are widely used throughout our profession.
> while the problem that needs to be solved is ugliness
That's your opinion. I never argued that this hack solves ugliness. I argued that it solves the problem of not having a way to comment in JSON...which it does. I never claimed that it does so in a non-ugly or even good way.
> This specific brand of cleanliness does have a bunch of issues
Issues so massive they, imho, make it not worthwhile to use. As in, I rather miss the ability to write comments or write them with ugly hacks, than put up with YAMLs nonsense.
The fact that better solutions exist (again, like TOML) make this worse.
You just persist in failing to understand the issue
> I think I speak for basically the entire history of programming when I say that ugly solutions exist and are widely used throughout our profession.
I can raise it to the level of the entire history of humanity when I say that people yearn for cleanliness and beauty
> I never argued that this hack solves ugliness.
Sure, because "you choose to ignore [the problem of ugliness] as such". People who share "my opinion" don't ignore that and use cleaner formats, so YAML is a solution that addresses the problem, your suggestion doesn't
So true. I abhor yaml. It’s impossible to know correct indentation without a plugin that shows it which isn’t available in many cases where you edit yaml. It’s whitespace sensitive. The data types are not obvious. It’s just all a round bad
I love json. It’s explicit and easy to read. We should just be using json for everything that needs to be human readable.
Preferably written in assembler, to avoid the extra complexity of a compiler, right?
Configuration files have been a common feature of software since OSs exist, basically. They serve a clear and useful purpose, even though they create some problems of their own.
For complex environments like those discussed in the article, there’s unavoidably complicated logic.
Code is a good place for logic to live.
Compared to yaml, code is more testable, readable and expressible.
I should’ve restricted my original comment to the kind of situation in the article where different configs are created for various regions and test environments with optional values. Totally agree configs are useful for defining more static values.
Restricting config to static values removes quite a bit of the value of config, in my opinion.
Yes, logic should live in code, but very often that logic needs to behave differently depending on some piece of (inherently variable, not static) configuration.
Random examples (written from the perspective of personified code):
- How many threads should I use?
- On which port should I serve metrics?
- Which retry strategy should I use?
By "more static", I meant items with only a handful of variations.
If you're using one port for dev & another for prod I reckon it's best to have it in config.
But if you're port is varying by image, region, dev/test/prod status and has exceptions for customers using your app on prem then keeping all that logic in code may be easier.
Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right. My config and declarative definition of my cloud infra is all written in a typesafe language with great IDE support without the need for random plugins that some rando wrote and never updated since 2 years ago.