What people really hate is having to write complicated configuration. No matter ...

DonHopkins · on Sept 28, 2019

Pure YAML will never DRY.

https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

If you write procedural configurations with a real Turing Complete, but not Turing-Tarpit language like Python or JavaScript, then you don't need to repeat yourself ridiculous numbers of times, and manage thousands of lines of hand written eye-sore almost-but-not-quite-entirely-unlike-tea YAML. Plus you can implement validation, warnings and error checking, and support multiple input and output formats.

But so many DevOps dogmatically go out of their way to avoid writing any code, even when the alternative means writing a hell of a lot more brittle unmaintainable YAML where you have to carefully check every line for correctness, and make sure that you're actually repeating the same thing in every location required without any typos, and don't let any of those repetitions slip through the cracks when you make changes.

With real Turing complete code, macros, and templates, you can accomplish the same task as with pure un-DRY YAML data, much more easily and efficiently, with orders of magnitudes fewer lines of code, that you can actually understand, maintain, validate, and test, so you can be sure it actually works without meticulously checking every line of output by hand.

It's better to combine DRY combinations of different non-Turing-complete data formats like CSV, YAML, JSON, INI, XML, using the most appropriate formats depending on the type of data. Spreadsheets are much better than JSON or YAML or XML for many different kinds of data with repeating structure (so you don't repeat key names), and JSON and YAML are much better for irregular tree-shaped data (since you're not restricted to rigid structures), and XML for tree structured data and documents including text.

You end up needing different formats as input as well as output. So you need a real Turing Complete language to read them all in, combine and validate them, and shit out the various other formats required by your tools and environment.

linuxftw · on Sept 28, 2019

Almost every piece of software running on Linux or Unix requires some kind of basic configuration file. Whether it's INI or YAML. You can't just 'code all the things'.

If you want an example of a dynamic configuration, look at RPM's .spec. That's a monstrosity. What you're asking for is more of that, and that's insane.

You could also use something like Python to do all your typing, build a dictionary, and just dump it to a yaml file if you think writing yaml by hand is too error prone (which I personally disagree with).

DonHopkins · on Sept 28, 2019

I think RPM .spec files are sunk way deep into the "Turing Tarpit" area I was referring to.

https://en.wikipedia.org/wiki/Turing_tarpit

Any time you take a language like bash that's ALREADY deeply muddled in the Turing Tarpit, and then try to make up for its weaknesses by wrapping it up in yet another arbitrary syntax that you just pulled out of your ass like .spec, you're even worse off than you were when you started. Why pick a terrible language like bash that's no good for reading and writing and manipulating structured data formats like CSV, JSON, YAML, INI, or XML, and then try to "fix" it, when there are so many much better well supported off-the-shelf alternatives that don't require re-inventing the square wheel, like Python, JavaScript, or Lua?

linuxftw · on Sept 29, 2019

I'm sure when RPM first came about, it was probably less of a monstrosity. But like most things in the software world, people start bolting on new features, and you have a big mess.

For most programs though, I would prefer a YAML config file. It's easy to serialize/deserialize for many languages, and you can adjust your init scripts / systemd units to spit out a new config on startup if you so choose. Or you can use something like ansible and some templates to generate that config once when you deploy your application (we're all using immutable infrastructure now, right?), although trying to template YAML files in jinja2 is a real PITA; I'd probably just write an application specific ansible module to dump out my config and skip the yaml jinja template part.

That's the really nice thing about ansible, you can make it do all sorts of interesting stuff.

platz · on Sept 29, 2019

https://dhall-lang.org/ No tarpit here (no Turing Completeness or recursion), and as DRY as you want.

cbanek · on Sept 28, 2019

> But so many DevOps go out of their way to avoid writing code

Well, this is the real problem, is that DevOps people should be writing code in my opinion, especially code to automate deployments and handle configuration. But many times it's just a new job label for people with the same ol' ops and sysadmin skillset who don't want to write code.

It doesn't help that when they make unmaintainable piles of configuration that nobody understands, it typically adds to their job security.

DonHopkins · on Sept 28, 2019

I totally agree!

There are good DevOps and bad DevOps. Personally, I'm a Dev who necessarily knows how to do Ops, because nobody else is there to put out the fires and wipe my ass for me. Good DevOps should not have such disdain for writing code, and should not be so territorial and focused on job security, and should work more closely with developers and code.

And good developers should understand operations, and shouldn't be so helpless and ignorant when it comes to deploying and maintaining systems themselves, so they can design their code to be easily configurable and dovetail into operational systems.

For the same reasons, it's also important for programmers developing tools and content pipelines for use by artists to understand how the art tools and artists work, and how to use them (enough to create placeholder programmer art to test functionality), even if they don't have any artistic ability.

And for artists and designers to have some understanding of how computers and programming works, and how to use spreadsheets and outliners and databases to understand and create specifications and configurations, so they don't design things that are impossible or extremely inefficient to implement, and make intractable demands of computers and programmers.

https://en.wikipedia.org/wiki/Programmer_art

cbanek · on Sept 28, 2019

I'm with you on that. I come from a background of Dev and have been called DevOps (by others), although I just call myself a problem solver.

The realization that I really needed to understand what happens in operations for me came around '09, when the Xbox Operations Center called me and told me my code wasn't working, and we had such a wall between us that I couldn't see what was going on, and they couldn't describe it either.

I ended up writing automated publishing pipelines for them to take the most risky parts of their dozens of pages word doc and writing tools to do this for them automatically. Most people didn't even think this was a thing that could be done, let alone should be done. Problem solved!

I think people who are territorial are inherently insecure in their skills and therefore fear getting out of their comfort zone. Generalists are far better than specialists in my opinion. You want someone to go where the problems are, rather than people who invent new problems for others in their own little empire. I think a lot of big companies are so big they can have people silo'd all day, so people don't even think about the people and systems they are affecting.

ken · on Sept 28, 2019

I've used a lot of JSON. It's OK. No comments, not many data types. But the spec is only a couple pages, and I've never been in doubt about how something should be escaped, or parsed. I could probably write a bare-bones parser in an afternoon, if I needed to.

I've tried to work with YAML a few times. The tree structure and extra data types are great. Everything else is a huge pain. There's at least 3 versions of the spec, and the latest one is nearly 100 pages. The parts I need are always in some "extension", so there's even more that I need to support. It has a system for serializing native Objects, so you have to be careful with untrusted data because there are some interesting security issues. It's so complex, I have trouble knowing what to quote, or how. It's not feasible to write your own parser in any reasonable amount of time. Worst of all, every parsing library is slightly different, so (not unlike SOAP) you kind of have to know that it's going to be parsed with (say) PyYAML.

Complicated configuration is indeed a problem in any format, but YAML makes even simple things complex. From the beginning, I really wanted to like YAML. Unfortunately, I think their goals (human-readable text, language-agnostic, rich data types, efficient, extensible, easy to implement, easy to use) are impossible. You simply can't achieve all of them at once.

davnicwil · on Sept 28, 2019

I'm launching a CI service [0] which instead of using YAML configs to run builds on a third party platform, will let you run the builds yourself on your own machines so you can just use a script or whatever you want to do your builds/deploys.

I share your frustration and was motivated by it to build this. Why should I spend ages writing up everything as config files when I have a script that already works, is easy to change and debug, and can handle any custom thing I need?

I think config files to describe devops processes are a good approach for huge companies with huge teams, lots of churn etc. The approach perhaps has simplicity & stability benefits - works for everyone everywhere without understanding any detail, changes are a bit easier to track, etc. But for small teams wanting control, speed and the flexibility of just writing code to do what you want it can often be an inefficient approach. At least in my experience.

You should check Box CI out. Launching very soon!

[0] https://boxci.dev

raverbashing · on Sept 28, 2019

True, though YAML is the most "human readable/writable" of the usual suspects (YAML/JSON/XML)

DonHopkins · on Sept 28, 2019

I'd say that spreadsheets are vastly more readable/writable/editable/maintainable than YAML or JSON or XML (i.e. no punctuation and quoting nightmares), and they're easy to learn and use, so orders of magnitude more people know how to use them proficiently, plus tools to edit spreadsheets are free and widely available (i.e. Google Sheets), and they support real time multi user collaboration, version control, commenting, formatting, formulas, scripting, import/export, etc. They're much more compact for repetitive data, but they can also handle unstructured and tree structured data, too.

To illustrate that, here's something I developed and wrote about a while ago, and have used regularly with great success to collaborate with non-technical people who are comfortable with spreadsheets (but whose heads would explode if I asked them to read or write JSON, YAML or XML):

Representing and Editing JSON with Spreadsheets

I’ve been developing a convenient way of representing and editing JSON in spreadsheets, that I’m very happy with, and would love to share!

https://medium.com/@donhopkins/representing-and-editing-json...

Here is the question I’m trying to answer:

How can you conveniently and compactly represent, view and edit JSON in spreadsheets, using the grid instead of so much punctuation?

My goal is to be able to easily edit JSON data in any spreadsheet, conveniently copy and paste grids of JSON around as TSV files (the format that Google Sheets puts on your clipboard), and efficiently export and import those spreadsheets as JSON.

So I’ve come up with a simple format and convenient conventions for representing and editing JSON in spreadsheets, without any sigils, tabs, quoting, escaping or trailing comma problems, but with comments, rich formatting, formulas, and leveraging the full power of the spreadsheet.

It’s especially powerful with Google Sheets, since it can run JavaScript code to export, import and validate JSON, provide colorized syntax highlighting, error feedback, interactive wizard dialogs, and integrations with other services. Then other apps and services can easily retrieve those live spreadsheets as TSV files, which are super-easy to parse into 2D arrays of strings to convert to JSON.

tlarkworthy · on Sept 28, 2019

That is super cool, please don't over complicate it with utility features. I have been considering a project to manage a kubernetes cluster via Google spreadsheet. Google docs have great features relating to user authentication and permissions. The project would needs to visualize the JSON state representation for the k8s cluster... your project is ideal.

e.g. calling another google service with the JSON using a token minted BY THE USER CURRENTLY USING THE SHEET

DonHopkins · on Sept 29, 2019

Thanks for the encouragement! I agree, I'd like to keep it from becoming complicated. My hope is to keep it simple and refine it into a clean well defined core syntax that's easy to implement in any language, with an optional extension mechanism (for defining new types and layouts), without falling into the trap of markdown's or yaml's almost-the-same-but-slightly-different dialects. (I wrote more about that at the end of the article, if you made it that far.)

The spreadsheet itself brings a lot of power to the table. (Pun not intended!)

There are some cool things you can do using spreadsheet expressions, like make random values that change every time you download the CSV sheet, which is great for testing. But expressions have their limitations: they can't add new rows and columns and structures, for example. However, named ranges are useful for pointing to data elsewhere in other sheets, and you can easily change their number of rows and columns.

For convenience and expressivity, I've defined ways of including other named sheets and named ranges by reference, and 2d arrays of uniformly typed values, and also define compact tables of identical nested JSON object/array structures by using declarative headers (one object per row, which I described in the article, but it's not so simple, and needs more examples and documentation).

tlarkworthy · on Sept 29, 2019

yeah my eye brows are fairly raised at the thought of embedding a templating language in it. For production use of a spreadsheet, I imagine pulling the source code out of the spreadsheet using https://github.com/google/clasp and synchronising with a repository using Terraform.

At which point Terraform has a weak templating engine already, but its generally enough for building reusable infra. Additional features can be provided within the spreadsheet using reusable libraries. One pain point with embedding functional dataprocessing in a spreadsheet for JSON data, is a decent way of writing tree expressions, to which I would turn to the de facto JSON tooling jq for inspiration.

if you want to take this further, I am up for building some infra for continuous deployment spreadhseets through terraform. tom <dot> larkworthy <at> futurice.com

But I would not embed stuff inline with the JSON. I would have a pure sheet dedicated to stuff going in, and a compute sheet for stuff join out. And the definition for stuff going out should basically be a JQ expression, that can "shell out" to sheets expressions https://github.com/sloanlance/jq/issues/1

xpe · on Sept 29, 2019

TOML is a worthy contender. It is my favorite simple but powerful-enough data language.

Here is a side-by-side comparison of data in TOML versus YAML: https://gist.github.com/oconnor663/9aeb4ed56394cb013a20

And some comments that resonate with me:

  The yaml spec is overly complex and parsing it properly
  is a nightmare. I rather prefer TOML because of it's
  simplicity. Unless one really need the gazillion extra
  features which yaml provides (which one probably doesn't),
  I'd say sticking with TOML seems to be the saner choice.

  I've recently kind of changed my mind on unquoted strings.
  They're nice when you're editing config files by hand, but
  they run into parsing issues in simple cases like when the
  string looks like an int, or of course when the string
  contains quotation marks itself.

IshKebab · on Sept 29, 2019

I disagree. Yes it is human readable if you just want to read the words (like your would with Markdown), but with a configuration file you want to understand the structure. YAML makes that quite confusing IMO. It seems like a random array of dashes and indentation.

JSON is much more human readable in that respect because the structure is explicit so there's no ambiguity. I'd say TOML is somewhere in-between. But both are vastly preferable to YAML because they don't have Javascript-style type insanity.

Kwpolska · on Sept 28, 2019

I’ve yet to see an editor that works well with YAML, yet JSON and XML are simple. Add comments and trailing commas to JSON and it would be perfect to write. And XML isn’t that evil, really.