Why are we templating YAML? (2019)

bilalq · on Jan 23, 2024

I'm completely done with configs written in YAML. Easily the worst part of Github Actions, even worse than the reliability. When I see some cool tool require a YAML file for config, I immediately get hit with a wave of apprehension. These same feelings extend to other proprietary config languages like HCL for Terraform, ASL for AWS Step Functions, etc. It's fine that you want a declarative API, but let me generate my declaration programatically.

Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right. My config and declarative definition of my cloud infra is all written in a typesafe language with great IDE support without the need for random plugins that some rando wrote and never updated since 2 years ago.

jrockway · on Jan 23, 2024

At this point, I even prefer plain JSON to YAML. What pushed me over the edge is that "deno fmt" comes with a JSON formatter, but not a YAML formatter. It's a single binary that runs in milliseconds. For YAML auto-formatting you basically have to use Prettier, and Prettier depends on half of NPM and takes a good 2 seconds to startup and run. So, I literally moved every YAML file in our repository at work that could be JSON to JSON and I think everyone has been much happier. Or, at least I have been, and nobody has complained to me about it.

Various editors also support a $schema tag in the JSON. I added this feature to our product (which has a flow that invokes your editor on a JSON file), and it works great. You can just press tab and make a config file without reading the docs. Truly wonderful.

YAML has this too with the YAML language server, but you need your tab key to indent stuff, so the ergonomics are pretty un-fun. JSON isn't perfect, but at least the text "no" is true.

matsemann · on Jan 23, 2024

At work we're currently expanding to another country. Which means that many services now need a country label etc., which is fun when you're adding "no" to all our existing services. Luckily it's quick to catch, but man... why?

throwaway894345 · on Jan 23, 2024

Yeah, I'm pretty sure there are exactly two substantive problems with JSON for (static) configuration file use cases, which are comments and multiline strings (especially with sane handling of indentation). YAML fixes these, but it adds so much complexity in the process including such a predictable footgun of unquoted strings (the no/false problem is particularly glaring/absurd, but it's also easy to forget to quote other boolean values or numbers in a long list of other strings).

MathMonkeyMan · on Jan 23, 2024

Solutions abound, but one option is to use either Javascript (config.js):

    // comments!
    ({
       no_quotes: [1, 2, (() => /* code! */)()],
       ...
     })

Or, let the whole thing be a function. Then your config can have parameters, maybe mapped from environment variables or something.

    ({foo, bar, ...kwargs}) => ({
      datacenter: foo === 'old' ? 'useast1' : 'uswest',
      ...
     })

Can do as Crockford says, and write in the JSON subset of Javascript, but with comments, and convert it to JSON by running it through a JS minifier. I think you need parens around the object, though, or else it looks like a code block (boooo...):

    ({
      // This is very much like JSON
      "foo": [1, 2, "bar"]
     })

Python also has JSON-like syntax, so you could use config.py:

    {
      'foo': [1, 2, 'bar']
    }

That would require a wrapper script. Or, you can have the self-contained convention:

    import json
    import sys
    
    # Yay, comments!
    json.dump({
      # more comments!
      'foo': [1, 2, 'bar']
    }, sys.stdout)

throwaway894345 · on Jan 24, 2024

Yeah, I'm mostly just not sure I want to put a full programming language interpreter in my application, especially Python which is not designed to be embeddable. Moreover, I would really want something that is typed, like TypeScript, but libraries for embedded TypeScript interpreters are even more rare :/.

MathMonkeyMan · on Jan 26, 2024

I figure we'd leave it to the build/deploy/CI/development whatever system. I also don't want to extend or embed my application with a full-blown runtime if I don't have to.

"source" config --> convert to JSON config "on the fly" --> app that expects JSON

edit: I worked on a C++ team that used `std::system` to invoke the system python interpreter when loading a config file. My teammates weren't morons, either, it's just the simplest thing that worked and they knew that the config script and the surrounding file system were secure.

LorenzoGood · on Jan 25, 2024

Or you can produce Configuration with nix

johnmaguire · on Jan 23, 2024

Can I add "trailing commas are invalid" to the list?

throwaway894345 · on Jan 23, 2024

Please do.

bobbylarrybobby · on Jan 24, 2024

json5 is pretty good, if you can use it

DecoySalamander · on Jan 24, 2024

AFAIK Prettier has 0 dependencies and runs fast enough that triggering formatting on save wasn't ever noticeable (granted I've never tried it with YAML specifically). Curious what kind of setup you had to push it to 2 seconds - maybe bulk formatting in CI for whole repository?

bilalq · on Jan 23, 2024

I prefer JSON to YAML as well. The lack of comments is a problem though. But I feel like this is a false dichotomy. Both kind of suck for this need, but I can accept that JSON is at least reasonable to work with if you need language agnostic config.

hintymad · on Jan 23, 2024

An often-heard benefit for using YAML is that JSON does not have comment. What I don't understand is why we would switch to a whole new language. Just add a filter before loading the configuration, which can't be harder than switching to YAML, right?

Another reason for YAML is that it is easier to read. That I don't understand either. The endless pain of dealing with configuration does seem come from saving a few seconds of parsing off braces and brackets, but from not being about easily figure out what goes wrong, especially when what's wrong is a missing space or tab embedded in hundreds of lines of configurations.

m463 · on Jan 23, 2024

I like json a lot.

That said, I think json would benefit from only two things:

1) comments

2) allow extra commas, like ["a", "b", "c",] or {"a":"b", "c":"d", }

or more properly:

  {
    "a":"b",
    "c":"d",
  },

EDIT: and json5 does both, plus a few more niceties. (hmm. too much?)

int_19h · on Jan 23, 2024

https://json5.org/

p10_user · on Jan 23, 2024

Just make another named list key called "comment". Problem solved.

int_19h · on Jan 23, 2024

This is not always an option when JSON is propagated as is, nor does it allow for comments on specific object properties.

mekster · on Jan 24, 2024

Not sure why people just don't settle with TOML.

xdennis · on Jan 24, 2024

It has atrocious arrays. Example: https://youtu.be/n9mGk8_tQtM?t=367

weebull · on Jan 23, 2024

GitHub actions would suck whatever you "configured" them in, because you are trying to describe a program in a data structure.

Ansible makes the same mistake, as do countless other tools.

tomjakubowski · on Jan 23, 2024

"because you are trying to describe a program in a data structure"

(cries in lisp)

kazinator · on Jan 23, 2024

The best interpretation of weebull's comment is not that describing a program in a data structure is "bad" per se, but that doing that in a configuration language (or requiring configuration constructs to be programming constructs) might not be a hot idea.

Even Lisp software that uses Lisp for configuration does not necessarily allow programming in that configuration notation.

jrockway · on Jan 23, 2024

Yeah, I think describing a program in a data structure is fine. I honestly prefer it to any syntax that a "real" programming language has brought me. It's so consistent and you can really focus on what you care about. What is unhappy about Github Actions and similar is that your programming language has like 2 keywords; "download a container" and "run a shell script". I would have preferred starting with "func", "handle this error", and "retry this operation if the error is type Foo" ;)

Since this article is about helm, I'll point out that Go templates are very lispy. I often have things in them that look like {{ and (foo bar) (bar baz) }} and it only gets crazier as you add more parentheses ;)

throwaway894345 · on Jan 23, 2024

The problem I have with GitHub Actions is that I usually want to metaprogram them. I have a monorepo and I want a particular action to run for each "project" subdirectory. I've written a program that generates GitHub Actions YAML files, but all of the ways to make sure the generator was run before each commit are fairly unsatisfying.

The problem I have with infra-as-code tools is that what I really want is a pretty simple representation for "the state of the world" that some reconciliation can use, and then I want to generate that stuff in a typesafe, expression-based language like TypeScript or Python (Dhall exists, but its Haskell-like syntax and conventions are too steep a learning curve to get mainstream adoption). Instead we get CloudFormation and Terraform which shoehorn programming language constructs into a configuration language (which isn't strictly an objection to code-as-data generally) or things like Helm which uses text templates to generate a "state of the world" description or these CDKs which all seem to depend on a full JavaScript engine for reasons that don't make sense to me (why do I need JavaScript to generate configuration?).

btown · on Jan 23, 2024

I often wonder if the only reason we haven't used lisp more as a society, and certainly in the devops world, is because our brains find it easier to parse nested indentation than nested parentheses.

But in doing so, we've thrown out the other important part of lisp, which is that you can use the same syntax for data that you do for control flow. And so we're stuck in this world where a "modern-looking" program is seen as a thing that must be evaluated to make sense, not a data structure in and of itself.

https://www.reddit.com/r/lisp/comments/1pyg07/why_not_use_in... is a fascinating 10 year old discussion. And of course, there's Smalltalk, which guided others to a treasure it could not possess. But most younger programmers have never even had these conversations.

kazinator · on Jan 23, 2024

The vast majority of Lisp code is assiduously written with nested indentation! So that can't be it.

Non-lisp languages have parentheses, brackets and braces, using indentation to clarify the structure. Nobody can reasonably work with minified Javascript, without reformatting it first to span multiple lines, with indentation.

Lisp has great support for indentation; reformatting Lisp nicely, though not entirely trivial, is easier than other languages.

Oh, have you seen parinfer? It's an editing mode that infers indentation from nesting, and nesting from indentation (both directions) in real-time. It also infers closing parentheses. You can just delete lines and it reshuffles the closers.

The github.io site has animations:

https://shaunlebron.github.io/parinfer/

eternityforest · on Jan 24, 2024

To me it seems a lot of the benefit of declarative programming is just that you can use less powerful tools that don't allow constructs you don't want to have to deal with .

LISP seems great for tinkerers and researchers, but not so much corporate devs who want extreme amounts of consistency and predictability, but don't need the absolute most elegant solution.

lolinder · on Jan 23, 2024

> you are trying to describe a program in a data structure

This describes 100% of software development, though! Programming is just designing data structures that represent some computation. Each language lends itself better to some computations than to others (and some, like YAML, are terrible for describing any kind of computation at all), but they're all just data structures describing programs.

The problem isn't that GitHub Actions tries to describe a program in a data structure, the problem is that the language that they chose to represent those programs (YAML and the meta language on top) is ill-suited to the task.

latchkey · on Jan 23, 2024

> Ansible makes the same mistake, as do countless other tools.

My favorite example of this is chown/chmod taking 4-5 lines, in yaml. Sure you can do it a bunch of different ways, sure it allows for repeatable commands. But, it just sucks.

hintymad · on Jan 23, 2024

The same reason I don't like AWS' Step Functions. The spec in JSON is horrible. On the other hand, Step Functions is pretty scalable and reliable and can take practically unlimited throughput. It's a good story for how a product can succeed by getting the primitives right and by removing just the key obstacle for users. Now that Step Functions has gained momentum, they can construct higher-level APIs and SDKs to translate user spec to the low-level JSON/YAML payload.

throwaway894345 · on Jan 23, 2024

> These same feelings extend to other proprietary config languages like HCL for Terraform, ASL for AWS Step Functions, etc. It's fine that you want a declarative API, but let me generate my declaration programatically.

Yeah, I've had the same sort of opinion since the bad old AWS CloudFormation days. I wrote an experimental CloudFormation generator 4 years ago where all of the resources and Python type hints were generated from a JSON file that AWS published and it worked really well (https://github.com/weberc2/nimbus/blob/master/examples/src/n...).

> Config declared in and generated by code has been a superior experience. It's one of the things that AWS CDK got absolutely right.

Is that how CDK works? I've only dabbled with it, but it was pretty far from the "generate cloudformation" experience that I had built; I guess I never "saw the light" for CDK. It felt like trading YAML/templating problems for inheritance/magic problems. I'd really like to hear from more people who have used AWS CDK, Terraform's CDK, and/or Pulumi.

bilalq · on Jan 23, 2024

It's an annoyingly OOP model with mutations and side-effects, but if you look past that, it's pretty nice. The core idea is you create an instance of a CDK "App" object. You create new instances of "Stack" objects that take an "App" instance as a context parameter. From there, resources are grouped into logical chunks called "Constructs" which take either a stack or another construct as their parent context param. The only things you should ever inherit from are the base Constructs for Stack, Stage, and Construct. Don't use inheritance anywhere else and you'll be okay.

The code then looks something like this (writing this straight in the comment box, probably has errors):

    // Entrypoint of CDK project like bin/app.ts or whatever
    import * as cdk from 'aws-cdk-lib'
    import { MyStack } from '../lib/my-stack.ts'
    const app = new cdk.App()
    const stack = new MyStack(app, 'StackNameHere', someProps)
    
    // lib/my-stack.ts
    // Imports go here
    export class MyStack extends cdk.Stack {
      constructor(scope: Construct, id: string, props: MyStackProps) {
        super(scope, id, props)
        const bucket = new s3.Bucket(this, 'MyBucket', {
          bucketName: 'example-bucket',
        })
        const lambda = new NodejsFunction(this, 'MyLambdaFn', {
          functionName: 'My-Lambda-Fn',
          entryFile: 'my-handler.ts',
          memorySize: 1024,
          runtime: Runtime.NodeJS_20X
        })
        bucket.grantRead(lambda),
        tracing: Tracing.Active
      })
    }

The best part is the way CI/CD is managed. CDK supports self-mutating pipelines where the pipeline itself is a stack in your CDK app. After the pipeline is created, it will update itself as part of the pipeline before promoting other changes to the rest of your environments.

The equivalent CloudFormation for the above example would be ridiculously long. And that's putting aside all the complexity it would take for you to add on asset bundling for code deployed to things like Lambda.

TL;DR: Infrastructure-as-code-as-code

throwaway894345 · on Jan 23, 2024

> It's an annoyingly OOP model with mutations and side-effects, but if you look past that, it's pretty nice

I think I was getting hung up on the mutations and side-effects of it all. Thanks for putting words to that. I'll have to give it another try sometime. Have you used Terraform's CDK by chance? I assume it's heavily inspired from AWS's CDK, but my company has since moved to GCP/Terraform.

bilalq · on Jan 24, 2024

The mutations and side-effects only last until synthesis. You can imagine a CDK app as a pure function that runs a bunch of mutations on an App object and then serializes the state of that object in the end to static assets that can be deployed. The internals of it all are messy, but at a conceptual level, it's easy to think about.

CDKTF is really promising, IMO. When I last looked, it was still pretty new, but it's maturing, I think. One downside compared to regular AWS CDK is that the higher level constructs from the official AWS CDK can't be used in CDKTF. There is an adapter that exists, but it's one more layer between you and knowing what's going on: https://github.com/hashicorp/cdktf-aws-cdk

Footkerchief · on Jan 23, 2024

In the case of GitHub Actions, it's made more painful by the lack of support for YAML anchors, which would provide a bare minimum of composability.

https://github.com/actions/runner/issues/1182

didip · on Jan 23, 2024

For real, I want a real language (Lua/JS/Lisp) for configuration but without 3rd party imports so that it's secure and predictable.

treflop · on Jan 23, 2024

If configs had well-adopted schema support, it wouldn't be so bad.

bilalq · on Jan 23, 2024

Even then, it gets messy. From a tooling standpoint, how will I load your schema? How will my editor respect it? How do I run a validator against it? I know XML kind of solves some of these problems, but it has its own thorns and despite what anyone says, it is not easy to work with. XSD, XSLT, etc. So much complexity that needs to be managed in a different way in every runtime. And then type safety goes out at the boundary where it connects to your code.

treflop · on Jan 23, 2024

That's how it used to be for your suggestion too.

We're living in a dream state now where the creators of IDEs like Visual Studio (Code) or IntelliJ actively implement common languages and frameworks. It used to be 'find a half-baked community plugin so JSON works.'

If someone made a standard schema and people used it, I can assure you the magic you are expecting from your tooling would suddenly pop in just like how JSON support appeared one day. But they can't do nothin' if there is no community support for it.

XSD and XSLT are complicated because XML is complicated.

gofreddygo · on Jan 25, 2024

> let me generate my declaration programatically

this sort of thing looks and sounds like the right thing to do. Till you do it, on a largeish project with multiple teams that have experienced attrition.

I quietly added a layer of yaml generating code to make it bearable.

SergeAx · on Jan 24, 2024

> Config declared in and generated by code has been a superior experience.

And here we are at the point in time when people are plainly forgotten about compiled programming languages.

w0m · on Jan 24, 2024

living in a yaml-world; i honestly hate it.

Draiken · on Jan 23, 2024

I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.

If you need complex logic, use a programming language and generate the YAML/JSON/whatever with it. There you go. Fixed it for you.

Ruby, Python, or any other language really (I only favor scripting ones because they're generally easier to run), will give you all of that without some weird pseudo-language like Jsonnet or Go templates.

Write the freaking code already and you'll get bitten way less by obscure weird issues that these template engines have.

Seriously, use any real programing language and it'll be WAY better.

kimbernator · on Jan 23, 2024

I once took a job that involved managing Ansible playbooks for an absolutely massive number of servers that would run them semi-regularly for things like bootstrapping and patching. I had used Chef before for a similar task, and I loved it because it's just ruby and I could easily define any logic I wanted while using loops and proper variables.

I understand that Ansible was designed for non-programmers, but there is no worse hell for someone who is actually familiar with basic programming than being confined to the hyper-verbose nonsense that is Jinja templating of Ansible playbooks when you need to have a lot of conditional tasks and loops.

jimkoen · on Jan 23, 2024

I agree. And to make matters worse, the DSL on YAML has grown so large in features, it may as well be a programming language now.

dabber · on Jan 23, 2024

https://yamlscript.org/ was posted here a while back: https://news.ycombinator.com/item?id=38726370

I thought I remembered more comments on that thread, but I guess nothing more than what's there needs to be said.

dharmab · on Jan 23, 2024

It technically is. Long ago as a junior sysadmin I created turing complete nightmares in Jinja.

aarmenaa · on Jan 23, 2024

Chef vs Ansible was the first example that popped into my mind. I had a very love/hate relationship with Chef when I used it, but writing cookbooks was definitely one of the good parts.

linuxftw · on Jan 23, 2024

Ansible has a great module/plugin system. It's trivial to handle complex tasks or computations in a custom module or action.

jimkoen · on Jan 23, 2024

So why is there this massive ecosystem around not writing modules then? RedHat invented automation controller just so they didn't have to implement proper error handling with Ansible.

linuxftw · on Jan 23, 2024

The 'not writing modules' approach is for people that aren't comfortable writing code. I think most capable users for non-trivial things should write custom modules a lot of the time.

Draiken · on Jan 24, 2024

That's not how Ansible is meant to be used by default though. Modules are, in general, meant to be generic.

I bet you if I started writing modules for everything in most companies, people would complain. Unfortunately defaults matter.

duped · on Jan 23, 2024

I think language embedding is kind of a lost architecture in modern stacks. It used to be if you had a sufficiently complex application you'd code the guts in C/C++/Java/Whatever and then if you needed to script it, you'd embed something like a LISP/Lua/whatever on top.

But today, you have plenty of off-the-shelf JSON/TOML/YAML parsers you can just import into your app and a function called readConfig in place of where an embedded interpreter might be more appropriate.

It's just easier for developers to add complexity to a config format rather than provide a full language embedding and provide bindings into the application. So people have forgotten how to do it (or even that they can do it - I don't think it occurs to people anymore)

xwowsersx · on Jan 23, 2024

Pulumi is enticing because it allows you to write in your preferred language and abandon HCL, but it is strictly worse in my opinion. IaC should be declarative in my opinion. That allows for greater predictability, reproducibility and maintainability. In general, I think wanting to use Python or Ruby or whatever language you're going to use with Pulumi is not a good basis for choosing the tool.

There are many graveyards filled with places that tried to start writing logic into their IaC back in the Chef/Puppet era and made a huge mess that was impossible to upgrade or maintain (recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state). The Chef/Pulumi approach can work, but it requires one person who is draconian about style and maintenance. Otherwise, it turns into a pile of garbage very quick.

Terraform/Puppet's model is a lot more maintainable for longer terms with bigger teams. It's just a better default for discouraging patterns that necessitate an outsized investment to maintain. Yes HCL can be annoying and it feels freeing to use Python/TS/whatever, but pure declarative code prevents a lot of spaghetti.

flanked-evergl · on Jan 23, 2024

Pulumi is declarative. The procedural code (Python, Go, etc) generates the declaration of the desired state, which Pulumi then effects on the providers.

HCL is also not pure declarative code either. It can invoke non-declarative functions and can do loops based on environment variables, so in that sense there is really no difference between Pulumi and Terraform. The only real difference is that HCL is a terrible language compared to say Python.

I'm actually fairly sure HCL is Turing complete, it has loops and variables. But even if it is not all the way turing complete it's pretty close.

xwowsersx · on Jan 23, 2024

Pulumi may be declarative, but you use imperative languages to define your end state. The language you're actually writing your Pulumi in is what's most relevant to the point I'm making about maintainability. HCL isn't turing comlete, but even if it was, the point is that doing the types of things you can do in Python or other "real" languages is a major pain in HCL which effectively discourages you from doing that. I'm arguing that is actually a good thing for maintainability.

lamontcg · on Jan 23, 2024

> recall that Chef is more imperative/procedural, whereas in Puppet you describe the desired end state

Chef's resources and resource collection and notifications scheme is entirely declarative. And after watching users beat their heads against Chef for a decade the thing that users really like is using declarative resources that other people wrote. The thing that they hate doing is trying to think declaratively themselves and write their own declarative resources or use the resource collection properly. People really want the glue code that they need to write to be imperative and simple.

The biggest issue that Chef had was the "two-pass parsing" design (build the entire resource collection, then execute the entire resource collection) along with the way that the resource collection and attributes were two enormous global variables which were mutable across the entire collection of recipe code which was being run, and then the design encouraged you to do that. And recipes were kind of a shit design since they weren't really like procedures or methods in a real programming language, but more like this gigantic concatenated 'main context' script. Local variables didn't bleed through so you got some isolation but attributes and the resource collection flowing through all of them as god-object global variables was horrible. Along with some people getting a bit too clever with Ruby and Chef internals.

I had dreams of freezing the entire node attribute tree after attribute file processing before executing resources to force the whole model into something more like a functional programming style of "here's all your immutable description of your data fed into your functional code of how to configure your system" but that would have been so much worse than Python 2.7-vs-3.0 and blown up the world.

Just looking at imperative-vs-declarative is way too simplistic of an analysis of what went wrong with Chef.

blandflakes · on Jan 23, 2024

The fact that HCL has poor/nonexistent multi-language parsing support makes building tooling around terraform really annoying. I shouldn't have to install Python or a Go library to read my HCL.

i_play_stax · on Jan 23, 2024

The limitations of HCL are actually a good thing!

I have never seen Pulumi or CDKTF stuff work well. At some point are you simply writing a script and abandoning the advantages of a declarative approach

xwowsersx · on Jan 23, 2024

Right. That's what I'm arguing.

jen20 · on Jan 23, 2024

The existence of the YAML language for Pulumi and the CDK for TF both confound this explanation, it’s just not grounded in reality.

lamontcg · on Jan 23, 2024

> I agree that YAML templating is kind of insane, but I will never understand why we don't stop using fake languages and simply use a real language.

The problem is language nerds write languages for other language nerds.

They all want it to be whatever the current sexiness is in language design and want it to be self-hosting and be able to write fast multithreaded webservers in it and then it becomes conceptually complicated.

What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.

The problem is that language nerds can't control themselves and would add stuff that would grow the language to be more complex, and then they'd use that in core libraries and style guides so that newbies would have to learn it all. I myself would tend towards adding "each/map" kinds of functions on arrays/hashmaps instead of just using for loops and having first class functions and closures, which might be mistakes. There's that immutable FP language for configuration which already exists (i can't google this morning yet) which is exactly the kind of language which will never gain any traction because >95% of the people using templated YAML don't want to learn to program that way.

int_19h · on Jan 23, 2024

> What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book.

I would argue that Tcl is exactly that. It's hard to make things any simpler than "everything is a string, and then you get a bunch of commands to treat strings as code or data". The entire language definition boils down to 12 simple rules ("dodekalogue"); everything else is just commands from the standard library. Simple Tcl code looks pretty much exactly like a typical (pre-XML, pre-JSON, pre-YAML) config file, and then you have conditionals, loops, variables etc added seamlessly on top of that, all described in very simple terms.

trealira · on Jan 23, 2024

What we need is like a "Logo" for systems engineers / devops which is a simple toy language that can be described entirely in a book the size of the original K&R C book. It probably needs to be dynamically typed, have control structures that you can learn in a weekend, not have any threading or concurrency, not be object oriented or have inheritance and be functional/modular in design. And have a very easy to use FFI model so it can call out to / be called from other languages and frameworks.

I think Scheme would work, as long as you ban all uses of call/cc and user-defined macros. It's simple and dynamically typed, and doesn't have built-in classes or hash maps. Only problem is that it seems like most programmers dislike Lisp syntax, or at least aren't used to it.

There's also Awk, although it's oriented towards text, and doesn't have modules (the whole program has to be in one file).

It probably wouldn't be that hard to make this language yourself. Read the book Crafting Interpreters, which guides you through making a toy language called Lox. It's close to the toy language you describe.

kazinator · on Jan 24, 2024

If you combine Awk with the C preprocessor, you have a way for an Awk program to load modules, relative to where that file is located.

There is such a combination project: cppawk.

https://www.kylheku.com/cgit/cppawk/about/

trealira · on Jan 24, 2024

Thanks for the link! It seems interesting.

hnlmorg · on Jan 23, 2024

There’s plenty to choose from that support embedding: Python, Perl, Lua. Heck, even EMCAScript (JavaScript, VBA, etc).

As another commenter rightfully stated, this used to be the norm.

I wouldn’t say LOGO is the right example though. It’s basically a LISP and is tailored for geometry (of course you can do a heck of a lot more with it but its strength is in geometry).

lamontcg · on Jan 23, 2024

You're really missing the point. Logo was super simple and we learned it in elementary school as children, that's all that I'm talking about. And those other languages have accreted way too many features to be simple enough.

hnlmorg · on Jan 23, 2024

> You're really missing the point.

I got your point. I think it is you who is missing mine:

> You're really missing the point. Logo was super simple and we learned it in elementary school as children

You wouldn't have learned conditionals and other such things though. That stuff wasn't as easy to learn in LOGO because LOGO is basically a LISP. eg

    IFELSE :num = 1 [print [Number is 1]] [print [Number is 0]]

vs

    if { $num == 1 } then { print "number is 1" } else { print "number is 0" }

or

    if num == 1:
        print "number is 1"
    else:
        print "number is 0"

I'm not saying these modern languages don't have their baggage. But LOGO wasn't exactly a walk in the park for anything outside of it's main domain either. Your memory of LOGO here is rose tinted.

> And those other languages have accreted way too many features to be simple enough.

I agree (though less so with Lua) but you don't need to use those features. Sure, my preference would be "less is more" and thus my personal opinion of modern Python isn't particularly high. And Perl is rather old fashioned these days (though I think modern Perl gets more criticism than it deserves). But the fact is we don't need to reinvent the wheel here. Visual Basic could make raw DLL calls meaning you had unfettered access to Win32 APIs (et al) but that doesn't mean every VBScript out there was making DLL calls left right and centre. Heck, if you really want to distil things down then there's nothing even stopping someone implementing a "PythonScript" type language which is a subset of Python.

I just don't buy "simplicity of the language" as the reason languages aren't often embedded these days. I think it's the opposite problem: "simplicity of the implementation". It's far easier to load a JSON or YAML document into a C(++|#|Objective|whatever) struct than it is it to add API hooks for an embedded scripting language. And that's precisely why software written in dynamic languages do often expose their language runtime for configuration. Eg Ruby in Puppet and Chef, half of PHP applications having config written in PHP, XMPP servers written in Haskell, etc. In those kinds of languages, it is easy to read config from source files (sometimes even importing via `eval`) so there often isn't any need to stick config in JSON documents.

lamontcg · on Jan 23, 2024

I'm deeply uninterested in continuing to have this discussion with you.

anon291 · on Jan 23, 2024

I mean... Nix satisfies every single one of what you mentioned and people say its too complicated. It's literally just the JSON data structure with lambdas, which really is basic knowledge for any computer scientist, and yet people complain about it.

It's fairly straightforward to 'embed' and as a bonus it generates json anyway (you can use the Nix command line to generate JSON). Me personally, I use it as my templating system (independent of nixpkgs) and it works great. It's a real language, but also restrictive enough that you don't do anything stupid (no IO really, and the IO it does have is declarative, functional and pure -- via hashing).

In Nix's favor:

1. Can be described in a one page flier. An in-depth exhaustive explanation of the language's features is a few pages (https://nixos.org/manual/nix/stable/language/)

2. dynamically typed

3. Turing complete and based on the lambda calculus so has access to the full suite of functional control structures. Also has basic if/then/else statements for the most common cases and for intuition.

4. no threading, no concurrency, no real IO

5. definitely not object-oriented and no inheritance

6. It is functional in design and has an extremely thin set of builtins

7. FFI model is either embed libnix directly (this does not require embedding the nix store stuff, which is a completely separate modular system), or use the command line to generate json (nix-instantiate --eval --json).

Note: do not confuse nixpkgs and NixOS with the nix language. The former is a system to build linux packages and entire linux distributions that use the latter as a configuration language. The nix language is completely independent and can be used for whatever.

lamontcg · on Jan 23, 2024

Tried to use Nix as a homebrew replacement and failed to get it installed correctly with it blowing up with crazy error messages that I couldn't google. I didn't even get to the point of assessing the language. It really seems like the right kind of idea, but it doesn't seem particularly stable or easy enough to get to that initial payoff. If there's a nice language under there it is crippled by the fact that the average user is going to have a hard time getting to it.

anon291 · on Jan 24, 2024

You can use nix without using nixpkgs (you seemed to be trying to use nixpkgs). The nix language is accessible via several command line tools, nix repl, nix eval, nix-instantiate, etc, and can emit json via everal flags, as well as a builtin function.

oever · on Jan 23, 2024

I agree with the point's in Nix's favor except for 2. dynamically typed. Defining structs as part of the language would be nice. In fact, type checking is done ad-hoc now by passing data through type checking functions.

ParetoOptimal · on Jan 24, 2024

What are your thoughts on:

- https://dhall-lang.org/ - https://toml.io/en/

eternityforest · on Jan 24, 2024

I think I'd rather just have logicless templates than use anything dynamically typed...

Jinja2 makes a lot of sense when you're trying to make it hard to add bugs, and you also don't want everyone to have to learn Rust or Elixir or something.

It would be interesting to extend a template language with a minimal FP language that could process data before the templated get it.

pseudonom- · on Jan 23, 2024

Dhall is the FP config language you're thinking of, I think.

mrloba · on Jan 23, 2024

I agree, and I just want to highlight what you said about generating a config file. It's extremely useful to constrain the config itself to something that can go in a json file or whatever. It makes the config simpler, easier to consume, and easier to document. But when it comes to _writing_ the config file, we should all use a programming language, and preferably a statically typed language that can check for errors and give nice auto complete and inline documentation.

I think aws cdk is a good example of this. Writing plain cloudformation is a pain. CDK solves this not by extending cloudformation with programming capabilities, but by generating the cloudformation for you. And the cloudformation is still a fairly simple, stable input for aws to consume.

c0l0 · on Jan 23, 2024

Relevant article (2012): http://mikehadlow.blogspot.com/2012/05/configuration-complex...

krapp · on Jan 23, 2024

You shouldn't need the full complexity and power of a Turing complete programming language to do config. The point of config is to describe a state, it's just data. You don't need an application within an application to describe state.

Inevitably, the path of just using a programming language for config leads to your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.

Draiken · on Jan 23, 2024

The complexity is already there. If you only need static state like you say, then YAML/JSON/whatever is fine. But that's not what happens as software grows.

You need data that is different depending on environments, clouds, teams, etc. This complexity will still exist if you use YAML, it'll just be a ridiculous mess where you can break your scripts because you have an extra space in the YAML or added an incorrect `True` somewhere.

Complexity growth is inevitable. What is definitely avoidable is shoving concepts that in fact describe a "business" rule (maybe operational rule is a better name?) in unreadable templates.

Rules like: a deployment needs add these things when in production, or change those when in staging, etc exist whether they are hidden behind shitty Go templates or they are structured inside of a class/struct, a method with a descriptive name, etc.

The only downside is that you need to understand some basics of programming. But for me that's not a downside at all, since it's a much more useful skill than only knowing how to stitch Go templates together.

eternityforest · on Jan 24, 2024

Why are we writing software that needs so much configuration? Not all of it is needed. We could do things more like consumer software, which assumes nobody will even consider your app if they have to edit a config file.

sigwinch28 · on Jan 23, 2024

> your config becoming more and more complex until it inevitably needs its own config, etc. You wind up with a sprawling, Byzantine mess.

We're already there with Helm.

People write YAML because it's "just data". Then they want to package it up so they put it in a helm chart. Then they add variable substitution so that the name of resources can be configured by the chart user. Then they want to do some control flow or repetitiveness, so they use ifs and loops in templates. Then it needs configuring, so they add a values.yaml configuration file to configure the YAML templating engine's behaviour. Then it gets complicated so they define helper functions in the templating language, which are saved in another template file.

So we have a YAML program being configured by a YAML configuration file, with functions written in a limited templating language.

But that's sometimes not enough, so sometimes variables are also defined in the values.yaml and referenced elsewhere in the values.yaml with templating. This then gets passed to the templating system, which then evaluates that template-within-a-template, to produce YAML.

btown · on Jan 23, 2024

At the end of the day, Helm's issues stem from two competing interests:

(1) I want to write something where I can visualize exactly what will be sent to Kubernetes, and visually compare it to the wealth of YAML-based documentation and tutorials out there

(2) I have a set of resources/runners/cronjobs that each require similar, but not identical, setups and environments, so I need looping control flow and/or best-in-class template inclusion utilities

--

People who have been working in k8s for years can dispense with (1), and thus can use various abstractions for generating YAML/JSON that don't require the user to think about {toYaml | indent 8}.

But for a team that's still skilling up on k8s, Helm is a very reasonable choice of technology in that it lets you preserve (1) even if (2) is very far from a best-in-class level.

matharmin · on Jan 23, 2024

I have a recent example of rolling out IPv6 in AWS:

1. Create a new VPC, get an auto-assigned /56 prefix from AWS.

2. Create subnets within the VPC. Each subnet needs an explicitly-specified /64 prefix. (Maybe it can be auto-assigned by AWS, but you may still want to follow a specific pattern for your subnets).

3. Add those subnet prefixis to security / Firewall rules.

You can do this with a sufficiently-advanced config language - perhaps it has a built-in function to generate subnets from a given prefix. But in my experience, using a general-purpose programming language makes it really easy to do this kind of automation. For reference, I did this using Pulumi with TypeScript, which works really well for this.

avianlyric · on Jan 23, 2024

That kind of ignores the entire pipeline involved in computing the correct config. Nobody wants to be manually writing config for dozens of services in multiple environments.

The number of configurations you need to create is multiplicative, take the number of applications, multiply by number of environments, multiply by number of complete deploys (i.e. multiple customers running multiple envs) and very quickly end up with an unmanageable number of unique configurations.

At that point you need a something at least approaching Turing completeness to correctly compute all the unique configs. Whether you decide to achieve that by embedding that computation into your application, or into a separate system that produces pure static config, is kind of academic. The complexity exists either way, and tools are needed to make it manageable.

pid-1 · on Jan 23, 2024

That's not my experience after using AWS CDK since 2020 in the same company.

Most of our code is plain boring declarative stuff.

However, tooling is lightyears ahead of YAML (we have types, methods, etc...), we can encapsulate best practices and distribute as libs and, finally, escape hatches are possible when declarative code won't cut.

worldsayshi · on Jan 23, 2024

We need turing completeness in the strangest of places. We can often limit these places to a smaller part of the code. But it's really hard to know beforehand where those places will occur. Whenever we think we have found a clear separation we invent a config language.

And then we realize that we need scripting so we invent a templating language. Then everybody looses their minds and invents 5 more config languages that surely will make us not need the templating language.

Let's just call it code and use clever types to separate turing and non-turing completeness?

kevincox · on Jan 23, 2024

A really good solution here is to use a full programming language but run the config generator on every CI run and show the diff in review. This way you have a real language to make conditions as necessary but also can see the concrete results easily.

Unfortunately few review tools handle this well. Checked-in snapshot tests are the closest approximation that I have seen.

gtirloni · on Jan 23, 2024

> You don't need an application within an application to describe state.

As shown in the article, you apparently do.

IggleSniggle · on Jan 23, 2024

It happens because config is dual purpose: its state, but it's also the text-UI for your program. It spirals out of control because people want the best of it being "just text" and being a nice clean UI.

myaccountonhn · on Jan 23, 2024

I agree, I think a language like dhall (https://dhall-lang.org/) strikes a good balance.

SOLAR_FIELDS · on Jan 23, 2024

Yeah, YAML is good at declarative things. It’s when you start using it imperatively eg CI/CD is when it really starts to get ugly.

mark_and_sweep · on Jan 23, 2024

Agreed, and I almost feel silly for pointing this out, but for writing JSON (JavaScript Object Notation), I'd recommend using JavaScript...

jeroenhd · on Jan 23, 2024

For JSON I'd stick with Typescript to be honest. You end up executing Javascript and producing Javascript-native objects, but the typing in Typescript to ensure the objects you produce are actually valid will save a lot of debugging.

Draiken · on Jan 23, 2024

JS is actually not that great for this IMO. You probably need an NPM package to even deal with YAML because JS has a shitty standard library.

Sticking to a scripting language with a strong standard library is way better.

Any unix system can get Ruby/Python and read/write YAML/JSON immediately without caring too much about versions.

Of course in today's upside down world most developers seem to only know JS, so it would at least be "familiar". Still a bad choice in my view.

The way this industry is going, give it a few years and we'll have React-Kubernetes for generating templates. And I wish I was joking.

mark_and_sweep · on Jan 23, 2024

Parent is talking specifically about writing JSON, not YAML.

Draiken · on Jan 23, 2024

Yeah, but the article is about YAML and my original comment was about configuration in multiple formats.

So, to clarify, for JSON JS is definitely not the worse option. For me though, even for JSON, you have much better options.

angarg12 · on Jan 23, 2024

I'm very happy using Typescript to templatize JSON. You can define a template as a class, compose them if needed, and when you are done, just write an object to a file.

valty · on Jan 23, 2024

The problem with imperative languages in configs is that they become harder to read. Webpack configs always devolve into this.

We need better tooling to allow tracing a how final configuration values are being generated.

And a _live programming_ environment so we can see the final generated configuration in one view.

speleding · on Jan 23, 2024

Completely agree, my wish is that anything that risks getting complex uses a Ruby-based DSL.

For example, I like using Capistrano, which is wrapper around rake, which is a Ruby based DSL. That means that if things get tricky I can just drop down to using a programming language. Split stuff into logical parts that I load where needed and, for example, I can do something like YAML.load(..file..).dig('attribute name') or JSON.load from somewhere else.

Yes, you risk someone building spaghetti that way, but the flip side is that a good devops can build something much easier to maintain than dozens of YAML and JSON files, and you get all the power from your IDE and linters that are already available for the programming language, so silly syntax errors are caught without needing to run anything.

martsa1 · on Jan 23, 2024

This. It's why things like Cloud Development Kit and Pulumi are quite interesting to me.

ForHackernews · on Jan 23, 2024

Throwing in a plug for https://dhall-lang.org/

> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports

otabdeveloper4 · on Jan 23, 2024

> I heard you liked configuration languages, so I made this configuration language for your configuration language generation scripts. It supports templates, of course.

jayd16 · on Jan 23, 2024

Because the security surface of "any language" is tricky and most (all?) popular languages do not have nice data literal syntax better than JSON and YAML.

andix · on Jan 23, 2024

Helm would probably benefit from something like JSX for YAML/JSON. Just being able to script a chart instead of this templating hell.

m463 · on Jan 25, 2024

I wonder if there isn't a place for both:

1. a full-blown language that can generate complex output

2. a declarative static data file

I hope I'm not just pulling my punches with #2

on the other hand, some complexity spirals out of control, especially when people use it without any need. Some great things come out of creating boundaries.

karlicoss · on Jan 23, 2024

I argued that point in my article some time ago https://beepb00p.xyz/configs-suck.html also HN discussion at the time news.ycombinator.com/item?id=22787332

asimpletune · on Jan 23, 2024

This is how config actually works in Scala.

TheFuzzball · on Jan 23, 2024

I just knew this would be about Kubernetes when I saw the title.

The Kubernetes API is fairly straightforward, and has a well-defined (JSON) schema, people should be spending a bulk of their time learning k8s understanding how to use the API, but instead they spend it working out how to use a Helm chart.

I don't think Jsonnet, Ksonnet, Nu, or CUE ever gained that much traction. I'm convinced most people just use Kustomize, because it's fairly straightforward and built in to kubectl.

I'd like a tool that:

- Gives definition writers type checking against the k8s schemas - validation, version deprecations, etc.

- Gives users a single artefact that can be inspected easily and will fail (ACID) if deployed against a cluster that doesn't support any objects/versions.

- Is built into the default toolchain

---

I feel like writing a Bun or Deno TypeScript script that exports a function with arguments and returns a list of definitions would work well, esp. with `deno compile`, etc. but that violates the third point.

ryandv · on Jan 23, 2024

> The Kubernetes API is fairly straightforward, and has a well-defined (JSON) schema, people should be spending a bulk of their time learning k8s understanding how to use the API, but instead they spend it working out how to use a Helm chart.

This is a general pattern in software. Instead of learning the primitives and fundamentals that your system is built on, which would be too hard, instead learn a bunch of abstractions over top of it. Sure, now you are insulated from the lower-level details of the system, but now you have to deal with a massive stack of abstractions that makes diagnosis and debugging difficult once something goes wrong. Now it's much harder to ascertain what exactly is happening in your system, since the details of what is actually going on have been abstracted away from you by design. Further, you are now dependent on that abstraction layer and must support and accommodate whatever updates may be released by the vendor, in addition to whatever else is lurking in your dependency graph.

DrBazza · on Jan 23, 2024

We're using jsonnet for our systems and they have absolutely nothing to do with k8s. I'm not sure it's true to say it has ever gained much traction. It's just a niche case for complex configuration, and isn't the most publicised tool.

It does precisely what we need with zero fuss, cross platform and cross _language_ (we've embedded it in C++, .NET, and JVM executables).

We can use the resulting json config with a vast array of tools that simply don't exist for the alternatives such toml/yaml/hocon/ini whatever. In fact we tried to get HOCON working for non-JVM languages but there was always some edge case.

baq · on Jan 23, 2024

probably doesn't meet the 2nd requirement, most definitely doesn't meet the third, but:

https://cdk8s.io/docs/latest/

TheFuzzball · on Jan 23, 2024

The second requirement is actually probably the most important - if someone that just set up ArgoCD, Flux, or has their own GitOps pipeline, how much of a headache does using a new compile step present?

Lots of things are simple in isolation: want to use Cue? Just get your definitions and install the compiler and call it and boom, there are your k8s defs! Ok, but how do I integrate all of that into my existing toolchain? How do I pass config? Etc, etc.

The best, fastest tool won't win. The tool that has the most frictionless user story will.

shepherdjerred · on Jan 23, 2024

I was able to get CDK8s working easily by simply committing the built template along with my TypeScript. Then, I just pointed ArgoCD to my repo.

how_gauche · on Jan 23, 2024

We do the same thing but commit to a second git repo that we treat like the "k8s yaml release database".

ithkuil · on Jan 23, 2024

I love the idea of keeping it simple and I do try to use kustomize or even plain yaml as installation method as much as possible.

But in practice when managing large systems you inevitably end up benefiting from templating

worldsayshi · on Jan 23, 2024

I've begun thinking that if you start thinking about templating you might be better off building an operator. Operators aren't as well understood and documented. But in my mind an operator is just a pod or deployment that creates on demand resources using the k8s api.

ithkuil · on Jan 23, 2024

oh yeah; operators are great and sometimes they are necessary.

On the other hand, most operators I've seen are just k8s manifest templates implemented in Go.

I often end up preferring using Jsonnet to deal with that instead of doing the same stuff in Go.

Jsonnet is much more close to the underlying datamodel (the k8s manifest Json/Yaml document) and comes with some useful functionality out of the box, such "overlays".

It has downsides too! It's untyped, debugging tools are lacking, people are unfamiliar with it and don't care to learn it. So I totally get why one would entertain the possibility of writing your "templates" using a better language.

However, an operator is often too much freedom. It's not just using Go or Rust or Typescript to "generate" some Json manifests, but it also contains the code to interact with the API server, setup watches, and reactions etc.

I often wish there was a better way to separate those two concerns

I'm a fan of metacontroller [1], which is a tool that allows you to write operators without actually writing a lot of imperative code that interacts with the k8s API, but instead just provide a general JSON->JSON transformer, which you could write in any langue (Go, Python, Rust, Javascript, .... and also Jsonnet if you want).

I recently implemented something similar but much tailored to just "installing" stuff, called Kubit. An OCI artifact contains some abitrary tarball (generally containing some template sources) and a reference to a docker image containing an "engine" and runs the engine with your provided tarball + some parameters passed in a CRD. The OCI artifact could contain a helm chart and the template engine could contain the helm binary, or the template engine could be kubecfg and the OCI artifact could contain a bunch of jsonnet files. Or you could write your own stuff in python or typescript. The kubit operator then just runs your code, gathers the output and applies with with kubectl apply-set.

1. https://metacontroller.github.io/metacontroller/intro.html

2. https://github.com/kubecfg/kubit

ryandv · on Jan 23, 2024

> On the other hand, most operators I've seen are just k8s manifest templates implemented in Go.

> I'm a fan of metacontroller [1], which is a tool that allows you to write operators without actually writing a lot of imperative code that interacts with the k8s API, but instead just provide a general JSON->JSON transformer,

That seems... surprising, to me. It's not clear to me how a JSON->JSON transformer (which is essentially a pure function on UTF-8 strings to UTF-8 strings, i.e. an operation without side effect) can actually modify the state of the world to bring your requested resources to life. If the only thing the Operator is being used for is pure computation, then I agree it's overkill.

An example use case for an Operator would be a Pod running on the cluster that is able to receive YAML documents/resource objects describing what kind of x509 certificate is desired, fulfill an ACME certificate order, and populate a Secret resource on the cluster containing the x509 certificate requested. It's not strictly JSON to JSON, from "certificate" custom resource to Secret resource - there's a bunch of side-effecting that needs to take place to, for instance, respond to DNS01 or HTTP01 challenges by actually creating a publicly accessible artifact somewhere. That's what Operators are for.

ithkuil · on Jan 23, 2024

Metacontroller is actually quite easy to learn. It comes with good examples too. Including a re-implementation of the Stateful Set controller, all done with iterations of an otherwise pure computation. The trick is obviously that the state lives in the k8s api server, from which the inputs of the subsequent invocation of your pure function come.

worldsayshi · on Jan 23, 2024

> an operator is often too much freedom

While that is true I'm a bit afraid that we might be overselling the concept of limiting freedom past a certain point. Limiting freedom has the upside of giving us some guarantees that makes a solution easier to reason about. But once we step out of dumb-yaml I don't see that making additional intermediate trade-offs is worth it. And there are apparently some downsides to introducing additional layers as well.

The main downside of limiting freedom seems to be the chaos of having so many different ways to do things. Imagine what could happen if we agreed that there are two ways of doing things; write yaml without templates or write an operator. Then maybe we could focus efforts on the problem of writing maintainable operators.

Things should be either dumb data or the kitchen sink I think.

ithkuil · on Jan 23, 2024

I'm not against having actual controllers with powerful logic.

But often is possible to separate the custom logic from the bulk of the parameterized boilerplate.

ryandv · on Jan 23, 2024

The purpose of an Operator is to realize the resources desired/requested in a (custom) resource manifest, often as YAML or JSON.

You give the apiserver a document describing what resources you need. The Operator actually does the work of provisioning those resources in the "real world" and (should) update the status field on the API object to indicate if those resources are ready.

doctorpangloss · on Jan 23, 2024

Helm is a low budget operator.

ryandv · on Jan 23, 2024

No... no, no, no. No kidding; Operators are indeed poorly understood. They are not just glorified XSLT for YAML/JSON.

https://kubernetes.io/docs/concepts/extend-kubernetes/operat...

sanderjd · on Jan 24, 2024

Yep, I find kustomize and (especially) helm so confusing, while finding kubernetes yaml files very easy to use and understand.

fsniper · on Jan 23, 2024

For Helm the value is it is not a configuration managemet solution but a package manager. The rest are just methods of writing json/yaml.

I understand the "hate" against yaml, But I don't think it's deserving it that much.

Perhaps timoni will take over with it's usage of cue. At least it's a package management solution.

Havoc · on Jan 23, 2024

How would one use the json api without ending up writing a bunch of custom code?

TheFuzzball · on Jan 23, 2024

I think custom code is to be expected, and making it maintainable is what's important.

> everything should be made as simple as possible, but no simpler.

Helm et al made it simpler than it was, IMO.

Havoc · on Jan 23, 2024

Everyone hand rolling code does not seem like an improvement over tools like helm even if it’s yaml

TheFuzzball · on Jan 23, 2024

No, obviously not, and that's not what I've suggested.

rad_gruchalski · on Jan 23, 2024

Helm is another can of hot garbage. Impossible to vendor without hitting name collisions, can configure only what’s templated.

Jsonnet is the way to go with generated helm manifests transformed later. Kustomize with its post-renderer hooks is another can of even hotter garbage.

aeyes · on Jan 23, 2024

> Impossible to vendor without hitting name collisions

What problem exactly are you facing? I can change the name of the chart itself in chart.yaml and if the name of the resources collide I change them with nameOverride/fullnameOverride in the values. All charts have these because they are autogenerated by `helm create`.

I vendor all charts and never had this problem.

rad_gruchalski · on Jan 23, 2024

You just made a copy of a chart. You modified your chart. What I’m missing is helm having some notion of an org in the chart name, like docker does: repo/name:tag, helm only has name and version. Hence you modify your chart.yaml and it should be preferable without having to modify anything.

This is really problematic when a chart pulls dependencies in.

liveoneggs · on Jan 23, 2024

k8s make me miss xml

valty · on Jan 23, 2024

It's funny how little developers think about how to do configuration right.

It's just a bunch of keys and values, stored in some file, or generated by some code.

But its actually the whole ball game. It's what programming is.

Everything is configuration. Every function parameter is a kind of configuration. And all the configuration in external files inevitably ends up as a function parameter in some way.

The problem is the plain-text representation of code.

Declarative configuration files seem nice because you can see everything in one place.

If you do your configuration programmatically, it is hard to find the correct place to change something.

If our code ran in real-time to show us a representation of the final configuration, and we could trace how each final configuration value was generated, then it wouldn't be a problem.

But no systems are designed with this capability, even though it is quite trivial to do. Configuration is always an after-thought.

Now extend this concept to all of programming. Imagine being able to see every piece of code that depends upon a single configuration value, and any transformations of it.

Also, most configuration is probably better placed into a central database because it is relational/graph-like. Different configuration values relate to one another. So we should be looking at configuration in a database/graph editor.

Once you unchain yourself from plain-text, things start to become a lot simpler...of course the language capabilities I mentioned above still need to become a thing.

Syntonicles · on Jan 23, 2024

This is something I'm trying really hard to do with a client. They have a bunch of 1500+ line "config" files for products, which are then used to make technical drawings and production files. The configs attempt to use naming scheme to group related variables together.

I want to migrate to an actual nested data-structure using (maybe) JSON - and these engineers absolutely will not write code, so config-as-code is a no-go, in addition to the disadvantage you mentioned.

My next thought was that there should be a better way to show the configuration, and allow that configuration to be modified. I was thinking maybe some sort of visual UI which where the user can navigate a representation of the final product, select a part and modify a parameter that way.

Is that along the lines of your suggestion? If not will you please expand a little? Configuration is the absolute core of this application.

valty · on Jan 23, 2024

Sounds like you need an SQL database. You could use SQLite.

Then provide a GUI to modify that database. You could add a bunch of constraints in the database too to ensure the config is correct.

Usually when there is plain-text files though, it's because they want it that way. It's easier to edit a text file sometimes than rows in a database. Cut/copy/paste/duplicate files and text. Simple textual version control.

Syntonicles · on Jan 23, 2024

Sure, I agree - I'm proposing JSON as an intermediate step toward a well-defined data-model since the thousands of copied config files have evolved over time, so the data-model is a smear of backward-compatibility hacks.

What I was trying to do is get you to explain what you mean by this:

> If our code ran in real-time to show us a representation of the final configuration, and we could trace how each final configuration value was generated, then it wouldn't be a problem. [...] But no systems are designed with this capability, even though it is quite trivial to do. Configuration is always an after-thought.

valty · on Jan 24, 2024

This is only relevant if you allow code to define config.

If you use conditionals and loops to create config, and then view the final json, it quickly becomes annoying when you know the thing you want to change in the final json, but have to trace backwards through the code to figure out where to change it.

So programmatic configs only work if you have this "value tracing" capability. Which nothing really does.

ManBeardPc · on Jan 23, 2024

Worse yet, in some places (CI/CD) YAML becomes nearly a programming language. A very verbose, unintuitive, badly specified and vendor-specific one as well.

marginalia_nu · on Jan 23, 2024

It's pretty much repeating the mistake of early 2010s Java, where the entire application frequently was glued together by enormous ball of XML that configured all the dependency injection.

It had the familiar properties of (despite DTDs and XML validation) often blowing up late, and providing error messages that were difficult to interpret.

At the time a lot of the frustration was aimed at XML, but the mid 2020s YAML hell shows us that the problem was never the markup language.

epistasis · on Jan 23, 2024

You have a loosely coupled bundle of modules that you need to glue together with some configuration language. So you decide to use X. Now you have two problems.

dygd · on Jan 23, 2024

Spot on. We use ytt[0], "a slightly modified version of the Starlark programming language which is a dialect of Python". Burying logic somewhere in a yaml template is one thing I dislike with passion.

[0] https://tanzu.vmware.com/developer/guides/ytt-gs/

jsight · on Jan 23, 2024

TBH, ytt is the only yaml templating approach that I actually like.

The downside is that it is easy to do dumb things and put a lot of loops in your yaml.

The positive is that it is pretty easy to use it like an actual templating language with business logic in starlark files that look almost just like Python. In practice this works pretty well.

The syntax is still fairly clumsy, but I like it more than helm.

sspiff · on Jan 23, 2024

In some places working with Kubernetes, people unironically use the term "YAML engineer".

donalhunt · on Jan 23, 2024

I've seen memes where SREs complain they have just become YAML engineers. :(

ManBeardPc · on Jan 23, 2024

I've been there. Not YAML specifically, but basically just configuration (XML, JSON, properties, ...) for some proprietary systems without any good documentation or support available. "It's easy, just do/insert X", half a year and dozens of meetings and experts later, it was indeed not just X. Meanwhile I could've build everything myself from scratch or with common open-source solutions.

switch007 · on Jan 23, 2024

I mean...building a data centre / PaaS with YAML is pretty cool

We used to have to shove servers in to racks ! Kids these days :D

sspiff · on Jan 23, 2024

I *loved* shoving servers in racks!

bpicolo · on Jan 23, 2024

I dream of a day there's a physical component of my job, not just the staring at a screen bit.

jsight · on Jan 23, 2024

yamlops is a real thing :)

indymike · on Jan 23, 2024

YAML is the Bradford Pear of serialization formats. It looks good at first, but as your project ages, and the YAML grows it collapses under the weight of it's own branches.

Y_Y · on Jan 23, 2024

I had to look up that tree. Invasive, offensive odour, cynaide-rich fruit. That's a a good insult!

DonHopkins · on Jan 23, 2024

YAML is also just as bad as the Linden tree.

https://www.youtube.com/watch?v=aoqlYGuZGVM

indymike · on Jan 23, 2024

You should see what they look like after a 25kph breeze. Which isn't too far off from what templated YAML generates after someone commits a bad template.

MrBuddyCasino · on Jan 23, 2024

Even worse, every generation repeats this mistake. I‘m not sure S-Expressions are the answer, but Terraform HCL should never have been invented.

SOLAR_FIELDS · on Jan 23, 2024

I was just telling a colleague today that HCL is great until you need to do a loop. A lot of parallels to this YAML discussion

quchen · on Jan 23, 2024

My favorite pattern in HCL is the if-loop. Since there is no »only do this resource if P« in Terraform, the solution is »run this loop not at all or once«.

kevincox · on Jan 23, 2024

I'll take HCL over YAML templating any day. At least it is working with real data structures not bashing strings together.

That being said, yes, it is also an awful language.

deathanatos · on Jan 23, 2024

Yeah … for CI files (like Github workflows & such), one of the best things I think I've done is just to immediately exec out to a script or program. That is, most of our CI steps look like this:

  run: 'exec ci/some-program'

… and that's it. It really aids being able to run the (failing) CI step offline, too, since it's a single script.

Stuff like Ansible is another matter altogether. That really is programming in YAML, and it hurts.

zelphirkalt · on Jan 23, 2024

In such places one frequently has to remind oneself and others to not start programming in that configuration language, if avoidable, to not create tons of headache and pain.

XorNot · on Jan 23, 2024

This criticism doesn't pass the sniff test though: your average Haskeller loves to extoll the virtues of using Haskell to implement a DSL for some system which is ultimately just doing the same thing in practice (because they're still not going to write documentation for it, but hey, how hard can it be to figure out it's just...)

YAML becomes a programming language because vendors need a DSL for their system, and they need to present it in a form which every other language can mostly handle the AST for, which means it's easiest if it just lives atop a data transfer format.

ManBeardPc · on Jan 23, 2024

I don't know what this has to do with Haskell. I understand that they need a DSL for their system. I just don't agree that it is a good idea to use some general purpose serialization format. In the end they always evolve to a nearly full programming language with conditions and loops. Using a full programming language makes much more sense IMHO, for example like Zig build files or how we use Python to build neural networks. That way I can actually use existing tools to do what I need.

hiAndrewQuinn · on Jan 23, 2024

Hey now. Your average Haskeller would simply recommend you replace YAML with Dhall.

https://dhall-lang.org/

worldsayshi · on Jan 23, 2024

Why not "just" use an embedded DSL?

baq · on Jan 23, 2024

maybe yaml should standardise hygienic macros. and a repl.

TeMPOraL · on Jan 23, 2024

The lengths people go to avoid using s-expressions never ceases to amaze me.

We're talking countless centuries and great many minds pushed to brink of madness, just to keep the configs looking like Python or JavaScript.

baq · on Jan 23, 2024

I'd say it's even worse: it's a collective hallucination that complex configs are not code.

jrockway · on Jan 23, 2024

Yeah, I'm very sad that helm won. We do OSS k8s stuff at work, and 100% of users have asked for us to make a helm chart. So we had to. It is miserable to work on; your editor can't help you because the files are named like "foo.yaml" but they aren't YAML. You have to make sure you pipe all your data through "indent 4" so that things are lined up correctly in the YAML. What depresses me the most is that you have to re-expose every Kubernetes feature in your own way. Someone wants to add deployment.spec.template.spec.fooBars? Now you have to add deploymentFooBars to your values.yaml file and plumb it in. For every. single. feature.

It's truly "worse is better" gone wrong. I have definitely done some terrible things like "sed -e s/$FOO/foo/g" to implement templating... and that's probably how Helm started. The result is a mess.

I personally grew up on Kustomize before it was in kubectl, and was always exceedingly happy with it. (OK, it has a lot of quirks. But at least it saves you time because it actually understands the semantics of the objects you are creating.)

I like Jsonnet a lot better. As part of our k8s app, we ship an Envoy deployment to do all of our crazy traffic routing (basically... maintaining backwards compatibility with old releases). Envoy configs are... verbose..., but Jsonnet makes it really easy to work on. (The code in question: https://github.com/pachyderm/pachyderm/blob/master/etc/gener...)

I'm seriously considering transpiling jsonnet to the Go template language and just implementing everything with Jsonnet. At least that is slightly maintainable, and nobody will ever know because "helm install" will Just Work ;)

But yeah, I think Helm will be the death of Kubernetes. Some competing computer allocator container runner thingie will have some decent language for configuration, and it will just take over overnight. Mark my words!

thinkmassive · on Jan 23, 2024

> I have definitely done some terrible things like "sed -e s/$FOO/foo/g" to implement templating

Next time you reach for this, check out envsubst for a slightly improved solution that’s somewhat standard (at least common).

On the topic of templating or modifying helm charts using jsonnet, you might find Tanka helpful:

https://tanka.dev/helm

dewbrite · on Jan 24, 2024

> But yeah, I think Helm will be the death of Kubernetes. Some competing computer allocator container runner thingie will have some decent language for configuration, and it will just take over overnight. Mark my words!

I want to believe this.

Everywhere I've worked we're still rawdogging tf/hcl and helm though, because change is scary.

At least I get some relief in my personal projects. :')

roenxi · on Jan 23, 2024

I see a problem here. I'm not certain if the sort of person who would choose YAML as their configuration language sees a problem here.

There is a direct conflict between human-centred data representations and computer-centred. Computers love things that look like a bit like a Lisp. Humans like things that look a bit like Python. If you're the sort of person who wants to use a computer to manipulate their Kubernetes config then you'd be secretly annoyed that Kubernetes uses YAML. However, it appears the Kubernetes community are mainly YAML people, so why would they mind that their config files will be horrible to work with once programming logic gets involved? The downside of YAML is exactly this scenario, and I believe the people involved in K8s are generally cluey enough to see that coming.

> YAML is a superset of JSON

The spec writers can put whatever they want in their document, but I don't think this is true. If you go in and convert all the YAML config to JSON, the DevOps team is going to get upset. The two data formats have the same semantic representation, but so do all languages compiled to the same CPU arch. JSON and YAML are disjoint in practice. Mixing the two isn't a good idea.

Joker_vD · on Jan 23, 2024

The ironic thing is that, IIRC, k8s manifests were supposed to be machine-generated from the k8s's inception, you weren't supposed to write them by hand... of course, people wrote them by hand anyway, until it became unbearable ― at which point they've started templating them because that's how the things always seem to progress: manually-written text is almost never replaced by machine-generated config-serialized-to-text, it's replaced by templated-but-originally-still-manually-written text.

baq · on Jan 23, 2024

> k8s manifests were supposed to be machine-generated from the k8s's inception,

failed spectacularly at not being inconvenient enough for their intended purpose.

one of those cases where unreadable by design would be a most welcome feature.

mortehu · on Jan 23, 2024

"YAML is a superset of JSON" only means that any JSON document is a valid YAML document. It does not mean YAML is equal to JSON.

neallindsay · on Jan 23, 2024

My personal philosophy is that string interpolation should not be used to generate machine-readable code, and template languages are just fancy string interpolation. We've all seen the consequences of SQL injection and cross-site scripting. That's the kind of thing that will keep happening as long as we keep putting arbitrary text into interpreters.

Yes, this means I don't think we should use template files to make HTML at all.

Alternatives to using template languages for HTML include Haml (for Ruby) and Pug (for JavaScript). These languages have defined ways to specify entire trees of tags, attributes, and text nodes.

If you don't like Python-style significant indentation, JavaScript has JSX. The HTML-looking parts of JSX compile down to a bunch of `createElement` expressions that create a web document tree. That tree can then be output as HTML if necessary.

Haml, Pug, and JSX are not template languages even though they can output HTML. Likewise, `JSON.stringify(myObj)` is not a template language for JSON. Generating machine-readable code should be done with a tool that understands and leverages the known structure of the target language when possible.

lioeters · on Jan 23, 2024

> Haml, Pug, and JSX are not template languages even though they can output HTML.

That's nonsense, unless we go by your idiosyncratic definition of what a template language is ("fancy string interpolation").

> Haml (HTML Abstraction Markup Language) is a templating system that is designed to avoid writing inline code in a web document and make the HTML cleaner.

> Pug – robust, elegant, feature rich template engine for Node.js

> JSX is an XML-like syntax extension to ECMAScript without any defined semantics.

OK, I'd agree that JSX is not strictly a template language.

But in the end, all of these compile down to HTML. Not by string interpolation, but as a language that is parsed into a syntax tree, then rendered into HTML properly with an internal understanding of valid structure.

YAML with templating is fancy string interpolation, it's not a template language (or at least a poorly implemented one).

neallindsay · on Jan 23, 2024

I am aware that Haml and Pug call themselves template languages, but they are not. In a template language, the source is a "template" that has some special syntax to fill in some bits. I don't think that's a very idiosyncratic definition. Pretty much any programming language can output a bunch of text, but most of them are not template languages. Java has XMLBuilder, but that doesn't make it a template language for outputting XML. But PHP is a template language, even though it's not recommended to use it that way anymore.

lioeters · on Jan 24, 2024

Sorry, reading over my comment, I sounded more antagonistic than I meant to be. After all, we're here to enjoy discussion and not to battle against each other.

As an aside, on another post yesterday, I had a pleasant surprise about "templating" in life itself.

> The familiar distinction between software and hardware loses its meaning in living cells. We propose new ways to study the phylogeny of metabolisms, new astronomical ways to search for life on exoplanets, new experiments to seek the emergence of the most rudimentary life, and the hint of a coherent testable pathway to prokaryotes with template replication and coding.

https://arxiv.org/abs/2401.09514

Maybe DNA is the original templating language. (Hopefully with more sophistication than fancy string interpolation.)

lioeters · on Jan 23, 2024

Well, it's true that Haml calls itself a "templating system", and Pug uses the term "template engine". That's 3 out of 3, you win. ;)

PHP is a scripting language that is also a template processor, but I wouldn't call it a template language. So we disagree on several points, but no big deal. A big disadvantage of PHP, in relation to your original point about "fancy string interpolation", is that it does not natively understand the target output HTML syntactically and structurally.

int_19h · on Jan 23, 2024

Not all template languages are string template languages, though. If you consider PHP a templating language for text, for example, then by the same logic XQuery is a templating language for XML.

kevincox · on Jan 23, 2024

This is the essence of the problem! Yaml and templates are just distractions. It just boils down to the fact that "string" is a very general type and we use it lazily.

My personal rule: Every time a value is inserted into a string it must be properly encoded.

I wrote a full blog post around this a while back https://kevincox.ca/2022/02/08/escape-everything/. But the TL;DR is that every string has a format which needs to be respected wether that be HTML, SQL or human-readable terminal output. Every time you put some value into a string you should be properly encoding it into that format. But we rarely do.

Izkata · on Jan 23, 2024

> My personal rule: Every time a value is inserted into a string it must be properly encoded.

This is how Django templates have done it for over a decade. You have to go out of your way to tell it not to escape the values if for some reason you need that.

gregwebs · on Jan 23, 2024

We are switching to cuelang [1]. IMHO it is better designed than Jsonette. Since Kubeenetes already has state reconciliation, the only thing missing in this setup is deletion. But that can now be accomplished with the prune feature. [2]

[1] https://cuelang.org/docs/integrations/k8s/

[2] https://kubernetes.io/blog/2023/05/09/introducing-kubectl-ap...

lantry · on Jan 23, 2024

I can second cuelang. We started using it at work and it's so nice. Some of the error messages are a little hard to decipher, but that's acceptable because it catches so many errors up front. The few times I have to write yaml directly, it now feels so tedious in comparison.

BiteCode_dev · on Jan 23, 2024

This is where I usually pitch in with "Have your heard of CUELang, our lord and savior?": https://cuelang.org/

- Not turing complete yet sufficiently expressive to DRY

- Define schema and data with the same language, in a separate or same file. With union types.

- Generate YAML or JSON. Can validate itself, or a YAML or JSON file.

The biggest drawback being the only implementation is currently in go, meaning you may have to subprocess of ffi.

1oooqooq · on Jan 23, 2024

we have a pipeline that ingest very concise cuelang files.

then it generates json files for each application for a tool that will create xml definitions which then are applied to a xls which the architects own, to spit out a yaml that we use to apply our helm charts. the charts deploy a k8s client which then interact with the main cluster via json using the api.

took a while, but we are using the best tool for each job.

ljm · on Jan 23, 2024

just throw in a kafka cluster so you can pipe each step through an event bus and you'll have an enterprise-grade deployment setup

BiteCode_dev · on Jan 23, 2024

You used JSON twice, how casual.

Your API should clearly be using protobuf.

planede · on Jan 23, 2024

How does it compare to dhall?

arianvanp · on Jan 23, 2024

Dhall's lack of any form of type inference makes it very verbose and difficult to refactor in my opinion. (I'm the author of dhall-kubernetes and never ended up using it in production; funnily enough). Dhall is also extremely slow. We had kubernetes manifests that took _minutes_ to type-check. Cue is basically instant. This matters a lot to me.

I find cue very ergonomic. Also it treating both types and values as values is very neat. You write your types and your values in the same syntax and everything unifies neatly. but I sometimes miss its lack of functions.

Cue also being to ingest protobuf definitions and openapi schemas makes it very quick and easy to integrate with your project. Have a new Kubernetes CRD you want to have type-checked in cue? No problem just run `cue get go k8s.io/api/myapi/v1alpha1` and off you go you have all your type definitions imported from Go to Cue!

Especially for k8s this makes for very fast development and iteration cycle.

I've wanted to take a look at https://nickel-lang.org/ which is a "what if cue had functions" language. but to be honest Cue kind of serves my needs.

letmeinhere · on Jan 23, 2024

Speaking of Nickel, they've got a great document detailing the reasons for their design (for example why they chose not embed in a general-purpose language like Pulumi) and how Nickel compares to other config languages like Dhall and CUE: https://github.com/tweag/nickel/blob/master/RATIONALE.md

ParetoOptimal · on Jan 24, 2024

> Dhall is also extremely slow. We had kubernetes manifests that took _minutes_ to type-check. Cue is basically instant.

Everyone wants type-safety, but no one wants to wait for the type-checker :)

Maybe in this case dhall with type checks equivalent to dhall would be slower, but I notice in many places people say "strong type-checking is valuable" while still expecting similar compile times as languages with weaker type systems.

BiteCode_dev · on Jan 24, 2024

People always undervalue the beauty of a short feedback loop until it's taken away from them.

And even then, they won't exactly pin point the problem, rather express their general frustration, without realizing that the dynamic system they used had indeed some great properties and were not popular for no reason.

ParetoOptimal · on Jan 26, 2024

I don't disagree with that either :)

I'm conflicted honestly. I find with dynamic languages it's easier to just spin your wheels and move quickly in the hole you are in.

With typed languages its easy to feel you are making less progress because the feedback loop can be longer, but generally the pieces you build are more likely to work correctly.

For me Haskell and ghci repl gives good properties from both areas, especially with something like Rapid for keeping state over repl reloads.

BiteCode_dev · on Jan 24, 2024

Functions are a nice to have, but:

- It tends to make things less declarative.

- You lose locality of behavior, which is very useful in configuration.

Also, nickel doesn't support injecting data into the nickel file, so external program can't set variables, query a database and pass the result to the conf file, etc.

gregwebs · on Jan 23, 2024

Cue was designed very much with k8s in mind and developed tutorials and integrations for it early on. Dhall was designed pre-k8s. Dhall had to introduce a defaults feature: before that it was completely unusable for k8s. Dhall has functions, which are natural to programmers- particularly from an FP background, Dhall would be trivial to start using. Whereas it takes some getting used to cue's unifications- but there is enough documentation and integration for getting going with k8s to make up for it. Dhall has unique features for stably importing configurations from remote locations.