I finally ran into the "on" bug just last month with github workflows: the default .yml file github writes has an unquoted `on` as a key to a dictionary, and pyyaml "correctly" interprets that as `True`.
I can imagine this would have been baffling if I hadn't read about it before. There wasn't actually an elegant solution, but replacing `\n on:` with `\n "on":` was an acceptable hack that took two minutes.
> Section 10.3.2 of [YAML] specifies that only the scalars matching the regular expression `true|True|TRUE|false|False|FALSE` are interpreted as booleans. Older YAML versions were more tolerant (e.g., interpreting `NO` and `N` as `False` and interpreting `YES` and `Y` as `True`). When the older syntax is used, a YAML implementation could then interpret `{insecure: n}` as `{insecure: "n"}` instead of `{insecure: false}`. Using the syntax defined in Section 10.3.2 of [YAML] prevents these issues.
> [YAML] Ben-Kiki, O., Evans, C., dot Net, I., Müller, T., Antoniou, P., Aro, E., and T. Smith, "YAML Ain't Markup Language Version 1.2", 1 October 2021, <https://yaml.org/spec/1.2.2/>.
- The fallback schema gives you no Implicit Typing and Direct representations of objects for free (the former explicity, the latter by restricting which tags are allowed). Schemata in general allow for doing the "right thing" for your domain more easily.
I went through this recently trying to heavily use yaml anchoring to determine various k/v's from a .yml config file, and yeah, it was awful.
But I think the biggest problem for me was every single yaml library I tried in various languages (go, rust, python, ruby) was just not good. IIRC only one of them (rubys syck I think) even supported anchors, which is a yaml standard, and NONE of them could read anchors to know where in the yaml file the scan was happening.
So I wound up having to literally crawl tab/indents, specify what symbol anchors where /& and store that and rebuild everything.
I THOUGHT I could just say "goto &anchor, copy &data to anchor, anchor, anchor, *anchor.." but nothing knew what &anchor was! What's the point of having a spec with &anchor if your library does literally nothing with it?
I'm sure someones dealt with this probably in a more elegant way than me.
Even considering I adopted YAML as an easy format to manage append-only tabular data (all the tables in my site are written in YAML), I have to agree with this and would second the vote for that media type.
I feel like for multi-line strings, I almost always use |, and only |. It takes the indented block as-is, which I feel like is going to be what you'd want … always?
I think maybe once, ever, I used > to fold some whitespace…?
I know the chomping & indent indicators exist, but I feel like I've only ever seen those in programmatically generated YAML. I have trouble envisioning a real-world use-case that is going to make me pull them out. They're a facet of the language that feels as if it exists solely for a processor that might need to emit any possible string, which | would not permit, and the inline strings might be uglier.
That said … there is a post like yours in every YAML thread. But I think those of us that understand YAML … I'm having a hard time grokking what you're doing that causes consternation.
> It takes the indented block as-is, which I feel like is going to be what you'd want … always?
What if you are writing primarily paragraphs?
What if you want extra blank lines at the end of a block? Or no carriage return at all?
In my reading, unscientifically I feel I see the `>` text formatter most often, then almost as often `|`, then `>-` occasionally, then `>+` just one time.
That's a big what-if, I guess. If you're okay with line wrapping … do line-wrapping & |, if you're not … don't do line wrapping & |?
I'm not usually putting prose into YAML, outside of comments. I think most of my blocks are scripts, small data files, long queries, the like. That's sort of why I was looking for a (perhaps non-hypothetical) use case.
The authors come from the Italian government, axway, and Mozilla. I wonder if this is being standardized like this because of a government requirement for standardization before use in official work, as is often the case. Axway's flagship seems to be an api marketplace platform.
Does anybody else have insight into what interest these parties might have in a Yaml RFC?
Roberto Polli and Erik Wilde are frequent IETF contributors. I'm not following their work much these days, but I don't think they have a specific interest here. They're active in httpapi, the working group that produced this RFC. Having worked a bit with both, they're great and thoughtful folks.
Eemeli Aro maintains the 'yaml' npm package. Seems like a good person to have aboard make sure things stay practical.
I've always used .yaml and have never understood why someone would choose .yml. Three letter extensions haven't been a requirement since like DOS 5 and I can't imagine that many developers are both using YAML and still using FAT16 but can't afford VFAT long filename entries.
> I've always used .yaml and have never understood why someone would choose .yml.
Because it's what the documentation says. For instance, the "Getting Started" example at https://docs.flatpak.org/en/latest/first-build.html says "call it org.flatpak.Hello.yml" - and that's Flatpak, which has always been a Unix-specific thing.
Old DOS "8.3" restri~1 are still followed by some people. Force of habit I suppose? Also the docume~1 or examples of some projects use .yml and I can't be bothered to experi~1 if both work, because it doesn't really matter.
The current acronym is old enough to drink. The original acronym, AFAICT looking through The Wayback Machine, lasted less than a year. (The original language was pretty wildly different, and much closer to a markup language.)
For those that don't want to wait for the time capsule, here is the earliest example of it (I've shortened it from the original for brevity; best I can gather this should also have been valid):
Oddly it kept the "Another" well into the evolution of not being a markup language anymore, but on the grand scale of things, didn't really keep it that long.
I'd just be glad for the community to standardize on one or the other. I don't care either way but right now we sometimes have both .yml and .yaml in the same directory! For GitHub Actions (the only place where I use yaml) there doesn't seem to be a clear consensus yet.
The two replies I received--one saying .yml is the GitHub standard, and the other saying .yaml is the RFC standard--neatly summarizes the situation we're in. I don't think this extension mess is ever getting fixed.
It could just be following the analogy of application/json. Which in turn raises the same question about json... here FWIW is Crockford's theory :)
> But overall, it was a really painful process. And in the end, I didn’t even get the mime type I wanted. I wanted Text/JSON. They gave me Application/JSON, which was weird because JSON is not an application, it’s a text format. It’s a way of representing data in text. And I think maybe that… Well, I don’t know why they’ve forced that on me. I’d like to think it’s because there were some XML fans who were resentful about what I’d been doing and decided that I didn’t deserve to have text that was reserved for XML. So they got text XML, but I got Application/JSON, which didn’t affect me personally at all, but for the people who use it, it’s not a big deal. It’s a little ugliness in the header.
RFC 6838 is clear about the `text` top-level type [1]:
> The "text" top-level type is intended for sending material that is principally textual in form. [...] Beyond plain text, there are many formats for representing what might be known as "rich text". An interesting characteristic of many such representations is that they are to some extent readable even without the software that interprets them. It is useful to distinguish them, at the highest level, from such unreadable data as images, audio, or text represented in an unreadable form.
In comparison, if `text/*` should be used for any format that encodes something in a text format as Crockford wanted, then you can put every file format into `text/*` with base64 encoding in principle. Limiting `text/*` to the data that is textual in nature avoids this issue by setting a reasonable expectation [2]. It is even possible to register `text/vnd.foobar+json` if you want, because a structured syntax suffix `+json` doesn't affect the top-level type. But JSON in general doesn't guarantee any textual data even after processing.
[2] Therefore `text/rtf` is possible when it doesn't embed any additional object, otherwise `application/rtf` should be used. It is debatable whether a source code is a text or not however, especially when it can be executed in place. The transition from `application/javascript` to `text/javascript` [3] is one example.
We have an integration at work that sends copies of the same message to two different servers running the same software, and one requires text/xml and the other requires application/xml.
There's one issue with text, it defaults to ASCII:
> For example, for any MIME type whose main type is text, you can add the optional charset parameter to specify the character set used for the characters in the data. If no charset is specified, the default is ASCII (US-ASCII) unless overridden by the user agent's settings. To specify a UTF-8 text file, the MIME type text/plain;charset=UTF-8 is used.
If you don't have a content type, you're screwed no matter what you do. Web browsers use a goofy detection system that sometimes works and sometimes fails, because it only reads the beginning of the file.
That's fine, no? Not like application/anything can default to the correct character set. This is why everyone typically uses application/json; charset=UTF-8.
RFC6838: Expected uses for the "application" type name include but are not limited to file transfer, spreadsheets, presentations, scheduling data, and languages for "active" (computational) material.
"File transfer" sounds to me like the distinction is: if you would expect to show it in a browser directly use audio/text/image/.... - otherwise it's application.
This sounds OK for yaml. You _can_ still serve it as text/plain if you want to show it as text, but as yaml it's probably for download purposes.
Btw, the RFC discussions are open. You can join an IETF mailinglist and start discussing such topics :-)
> While I strongly support defining the application/yaml media type, I do not think that the text/yaml should be defined at this time. [...] Defining more than one media type is superfluous, and creates unnecessary uncertainty. The bare-text presentation of YAML content to a human is already now achievable by presenting YAML as text/plain
One reason might be that `text/*` files follow the POSIX standard for text files [1], where no lines can exceed `{LINE_MAX}` bytes in length (and `LINE_MAX` depends on your OS).
I don't believe the YAML spec has any rules on how long lines can be, so this means that some files won't technically be text files. (and some UNIX tools line-based tools might not work correctly on them).
Agree this should be addressed. They explicitly deprecate text/yaml, but don't justify the choice.
I could speculate that the YAML spec assumes full Unicode repertoire, and the only choices are UTF-8/16/32 BE/LE. But text/* allows (encourages) middle-boxes to guess and translate encodings to local standards (e.g. sniff Latin-1 and "helpfully" rewrite to 8859-9) to allow legacy clients to read text. A yaml file is more data than text, so text/* isn't appropriate. Similar to application/json.
Not sure I agree, but my smallest complaint about yaml.