Hacker News new | past | comments | ask | show | jobs | submit login

YAML Norway problem strikes again!

I wish YAML wasn't so common in Python ecosystem...




Solution: quote values. Like in JSON.


It's not really a solution though because you only notice you need to do it after it triggers a bug.

You need a format that doesn't let you make this mistake in the first place, like [everything except YAML].


If the yaml is generated by code it should be quoted automatically, given that you have a sane yaml library.

try this:

    yaml.dump({"gameServerVersion": "556474378"}, sys.stdout)
It will correctly print the quoted version

    gameServerVersion: '556474378'
At least with Python, strings are strings and there is a minimal risk you store a git hash coming from the output of a git subprocess command as a number. The only way this bug could have happened is if doing manual string templating. If you were doing that, equally or worse kind of bugs are waiting for you in any serialization format.


> The only way this bug could have happened is if doing manual string templating.

I agree you should not do that, but it is common in the YAML world - if you're using YAML you've already decided you don't care if things are reliable.

Given that they were using string templating, this would have been caught earlier using a different format.

Of course there are other string templating mistakes that other formats would not catch (e.g. forgetting to escape strings), but they are still better than YAML.


The problem obviously is using string templating without proper escaping

Any format would fail if used like that, be it YAML, JSON, XML...


This problem is specific to YAML because it doesn't require quoting.


> It's not really a solution though because you only notice you need to do it after it triggers a bug.

… the first couple of times.

After causing the same bug multiple times its the user at fault, not the tool.

YAML does a decent enough job.

I think these are the relevant YAML spec sections from a quick glance. If someone wants to correct me, feel free.

YAML spec on double quotes

> The double-quoted style is specified by surrounding “"” indicators. This is the only style capable of expressing arbitrary strings, by using “\” escape sequences. This comes at the cost of having to escape the “\” and “"” characters.

YAML spec on plain style

> The plain (unquoted) style has no identifying indicators and provides no form of escaping. It is therefore the most readable, most limited and most context sensitive style. In addition to a restricted character set, a plain scalar must not be empty or contain leading or trailing white space characters.


> After causing the same bug multiple times its the user at fault, not the tool.

Absolutely not. If the tool is so bad that users commonly make a mistake then the tool should prevent that mistake.

This is a basic UX tenet that unfortunately many people do not know.


It’s a trade off in how the parsing works. No tool is perfect. :shrugs:

The accepted safe default is to use double quotes. If folks don’t know that then that’s on them, not the tool. It’s in the spec.

Good workmen don’t blame their tools.

Edit — I know the UX tenet. I’ve worked in places where they thought using YAML for end users was a good idea. It’s not. It never will be. It’s a backend tool for engineers. So I agree with you in that specific UX case.

But this isn’t an end-user UX case. It’s backend platform configuration.

Use the right tool for the job, and use it the way it’s been designed —> Double quote strings in YAML.


> Good workmen don’t blame their tools.

Good workmen that are given shoddy tools absolutely blame them.


Good workmen don’t blame their tools.

Good workmen talk shit of particular tools and brands all the time, they just avoid having them in their toolbox. They don’t blame their tools, not all tools.


the first couple of times

Times every yaml user in the multiverse.


It’s a right of passage.

Just like convincing your boss to switch the backend over to K8s and then realising a year later it was a mistake.

Everyone does it.


What's a better alternative? This problem is annoying but to me not more annoying than having to read or write JSON, and TOML is terrible with nested configs. If there were an option with the structure of YAML minus the ridiculous string handling I'd switch to it for sure.


Looked at the sibling comments. Has everybody collectively forgotten that XML exists? This stuff was solved decades ago, XSDs can check types.

TOML nesting is indeed a joke. YAML & JSON "look clean", but try to match up nesting in a large document without help from text editor highlights, then see how easy it is in XML.

XML has universal support everywhere, has first-class tooling for everything that's being re-invented in these other languages, but it's not "cool" any more.


Maybe you have worked with a different XML than I but I have only terrible memories of working with XML. Starting from the parser inconsistencies (that even led to security vulnerabilities in Apple, see https://blog.siguza.net/psychicpaper/).

On a higher level, the fact that so much different kind of information could exist at each level was nothing but headaches. In YAML, or in JSON, it's pretty straight forward. You have an object, it has children, children have types/values etc.

In XML, you have to keep in mind what the tag of the element, what its attributes are, and then what the child elements are, and then whatever the heck CDATA is.

I think my fellow posters are looking at the past through nostalgic rose tinted glasses. XML was terrible and I am glad it's not used as widely anymore.


I too really don't get the hate for XML

Features like schema feel very natural in XML whereas JSON schema feels outer worldly; and is seldomly used

Being able to clearly distinguish between multiple semantically different types of strings is a blessing

And coming back to your example, how is the knowledge in JSON of which attribute is an array, which is an object, which is a string and when order of attributes as well as duplicates matter and when not any easier than XML?


My two biggest issues:

1. No consistency in whether a value is an attribute or a child element, they seem completely interchangeable and redundant. I'm sure there's some nuance I'm missing here but I've worked with my fair share of XML and it has yet to be relevant to me at least. This adds a lot of mostly arbitrary decisions to the design phase, makes it difficult to clearly refer to specific values, and means it's impossible to directly parse XML into an object structure in most languages like you can with JSON.

2. It's awful to read and write. Everything is just a mess of tags and any structure is quickly lost. Line splitting is awkward and often not even attempted (seriously I had coworkers push back when I applied basic formatting to a config file because they claimed it was easier to read with lines 3x the width of anyone's screen). Needing to name the object in both the opening and closing tag wastes space and is absolutely ridiculous when writing by hand (sure editors can kinda handle this now but not perfectly and sometimes I'm using vi over SSH because that's all there is).

I really don't understand why so many people are still so attached to it myself.


XML is a great generic framework for markup languages. When you aren't doing markup, XML is terrible. However, JSON fits the role.

The opposite problem exists in Minecraft chat message formatting - they used JSON for markup, when it should have been XML.


As someone who previously worked in an XML-heavy environment, I would rather have an NFL linebacker dropkick me in the head than deal with XML again. Tim Bray himself has had doubts [1] at one point.

XML is too big of a hammer for the space it fills.

[1] https://www.tbray.org/ongoing/When/200x/2003/03/16/XML-Prog


It's not "cool" anymore because it's painful to work with despite all the tooling and support


> It's not "cool" anymore because it's painful to work with

That's kinda weird way to phrase it; being painful to work with has nothing to do with "coolness"... that's a legitimate complaint.

Despite all the tooling and support, XML is painful to work with.


I think they agree with you and meant that sarcastically.


Pkl (from Apple land) and Dhall (from Haskell land) both solve some of these pain points as well as some others, especially being more seamless about integrating schema with config.

Jsonnet, I haven't used personally but I know people who have raved about it.

Ones I know less about include KCL, CUE, and Nickel.


I don't believe that executable configuration languages are a good fit at a primary configuration source, I would prefer to have them spit out static config before use. From your list KCL fits that bill (and is a really nice config language).


I liked what I saw of Pkl, wanted to use it when it was released but it seemed the only parser was JVM-based and it was intended more to be transpiled into other config languages. If that's changing definitely worth revisiting it. Dhall I had to look up, it seems nice as long as the formatting used on website examples is not enforced, because to me that looks like an absolute nightmare but my problems are with the whitespace and not the structure itself.


I like HOCON, although it's a bit obscure. It's a JSON syntax superset with the same data model, designed for human written config.

It doesn't have schema support however you arguably don't need one, because the software that reads the config specifies the type of keys when the value is read and casting takes place then. If the software expects a string, it reads the value as a string, if it expects a number it's parsed as a number and so on.

HOCON has hierarchical merging, include files, a more convenient syntax, ability to read environment variables, substitutions, comments and a few other convenience features. In Conveyor, a tool for packaging desktop apps I wrote that uses it, it's also extended so you have "hash-bang includes". Those are includes that specify a program to run instead of a file, the output is then included and parsed at that point. This lets you escape from declarative config to a fully dynamic computation if you need to. You can disable this feature with a command line flag if you don't trust the config you're parsing (and also env var substitution).

You can also render the whole config to regular JSON if you need to.

I find that set of features to nicely balance config complexity with read/writeability. The main issue is that the main library is not well maintained, and the best implementation is for the JVM. You could give it a C API these days with Native Image but nobody has.

Main downside vs yaml is that IDEs and editors can use YAML schemas to give auto-completion whereas they don't do that for HOCON.


Does the HOCON parser complain and refuse to proceed if it encounters a comma right before a closing curly brace like the JSON parsers with which I've interacted?


No no. The syntax is designed for usability, training commas are fine.

There's a little tutorial here, with a slider that shows how you can start with json and transform it. The right hand side is valid HOCON at every step:

https://conveyor.hydraulic.dev/15.0/configs/hocon/


JSONC. https://onury.io/jsonc/

It can be hard to find an implementation of this if you're working in a not-so-popular language (looking at you Swift) but JSONC has had the best developer experience of anything I've tried so far.

It's basically JSON with single and multi line comments and trailing commas.


JSON5 has a heck of a lot broader support AFAICT. It's basically JavaScript's notation: https://json5.org/


Json5 is fine. Of the boring config options, it's the best.


The better alternative to yaml are json (particularly json with comments), properties files and xml files.

Basically any mainstream configuration syntax.

The really big problem with xml wasn't actually the verbosity of xml itself but the fact that it was popular in a time before rails popularized "convention over configuration"'



I'm not a fan of that import function.


Protobuf/thrift?


YAML is alright when you create it by hand. Problems start when YAML is templated, and frankly it's just dumb, though common. Templating the source without proper escaping could bite even with JSON.


My last big enterprise job, we did a lot of YAML templating and of course ran into issues like this all the time, eventually though we solved it by requiring defined schemas for all configs and validating those schemas in our pipeline. More overhead but that validation also caught lots of issues aside from the YAML gotchas, it was a decent setup to work with in the end.


or anywhere




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: