RFC 9512: YAML Media Type

giantrobot · on Feb 21, 2024

Having fought with YAML just recently I suggest the media type application/garbage. The format is filled with strange footguns.

TomSwirly · on Feb 21, 2024

I finally ran into the "on" bug just last month with github workflows: the default .yml file github writes has an unquoted `on` as a key to a dictionary, and pyyaml "correctly" interprets that as `True`.

I can imagine this would have been baffling if I hadn't read about it before. There wasn't actually an elegant solution, but replacing `\n on:` with `\n "on":` was an acceptable hack that took two minutes.

Karellen · on Feb 21, 2024

FTA:

> Section 10.3.2 of [YAML] specifies that only the scalars matching the regular expression `true|True|TRUE|false|False|FALSE` are interpreted as booleans. Older YAML versions were more tolerant (e.g., interpreting `NO` and `N` as `False` and interpreting `YES` and `Y` as `True`). When the older syntax is used, a YAML implementation could then interpret `{insecure: n}` as `{insecure: "n"}` instead of `{insecure: false}`. Using the syntax defined in Section 10.3.2 of [YAML] prevents these issues.

> [YAML] Ben-Kiki, O., Evans, C., dot Net, I., Müller, T., Antoniou, P., Aro, E., and T. Smith, "YAML Ain't Markup Language Version 1.2", 1 October 2021, <https://yaml.org/spec/1.2.2/>.

meowface · on Feb 21, 2024

StrictYAML provides a subset of YAML without the annoying footguns: https://hitchdev.com/strictyaml/features-removed/

Quekid5 · on Feb 21, 2024

Oh, how I wish we could get an RFC for that instead.

clcaev · on Feb 21, 2024

indeed, would be a good idea

aidenn0 · on Feb 21, 2024

YAML 1.2 addresses several of those:

- Duplicate keys are not allowed

- The fallback schema gives you no Implicit Typing and Direct representations of objects for free (the former explicity, the latter by restricting which tags are allowed). Schemata in general allow for doing the "right thing" for your domain more easily.

swozey · on Feb 21, 2024

I went through this recently trying to heavily use yaml anchoring to determine various k/v's from a .yml config file, and yeah, it was awful.

But I think the biggest problem for me was every single yaml library I tried in various languages (go, rust, python, ruby) was just not good. IIRC only one of them (rubys syck I think) even supported anchors, which is a yaml standard, and NONE of them could read anchors to know where in the yaml file the scan was happening.

So I wound up having to literally crawl tab/indents, specify what symbol anchors where /& and store that and rebuild everything.

I THOUGHT I could just say "goto &anchor, copy &data to anchor, anchor, anchor, *anchor.." but nothing knew what &anchor was! What's the point of having a spec with &anchor if your library does literally nothing with it?

I'm sure someones dealt with this probably in a more elegant way than me.

rcarmo · on Feb 21, 2024

Even considering I adopted YAML as an easy format to manage append-only tabular data (all the tables in my site are written in YAML), I have to agree with this and would second the vote for that media type.

milliams · on Feb 21, 2024

Are you referring to things like the boolean "NO" string (i.e. the "Norway Problem") or another issue?

grumbel · on Feb 21, 2024

The handling of multi-line strings is needlessly complex and can easily end up in white-space getting eaten unintentionally.

deathanatos · on Feb 21, 2024

… really?

I feel like for multi-line strings, I almost always use |, and only |. It takes the indented block as-is, which I feel like is going to be what you'd want … always?

I think maybe once, ever, I used > to fold some whitespace…?

I know the chomping & indent indicators exist, but I feel like I've only ever seen those in programmatically generated YAML. I have trouble envisioning a real-world use-case that is going to make me pull them out. They're a facet of the language that feels as if it exists solely for a processor that might need to emit any possible string, which | would not permit, and the inline strings might be uglier.

That said … there is a post like yours in every YAML thread. But I think those of us that understand YAML … I'm having a hard time grokking what you're doing that causes consternation.

TomSwirly · on Feb 21, 2024

> It takes the indented block as-is, which I feel like is going to be what you'd want … always?

What if you are writing primarily paragraphs?

What if you want extra blank lines at the end of a block? Or no carriage return at all?

In my reading, unscientifically I feel I see the `>` text formatter most often, then almost as often `|`, then `>-` occasionally, then `>+` just one time.

deathanatos · on Feb 21, 2024

That's a big what-if, I guess. If you're okay with line wrapping … do line-wrapping & |, if you're not … don't do line wrapping & |?

I'm not usually putting prose into YAML, outside of comments. I think most of my blocks are scripts, small data files, long queries, the like. That's sort of why I was looking for a (perhaps non-hypothetical) use case.

neuromanser · on Feb 21, 2024

> for multi-line strings, I almost always use |, and only |. It takes the indented block as-is

Except it doesn't. Try using

    steps:
      - run: |
          > f echo not redirected

in Github Actions.

anticensor · on Feb 25, 2024

no no, miscellanous/garbage-input

andoma · on Feb 21, 2024

01HNNWZ0MV43FF · on Feb 21, 2024

---

true

djha-skin · on Feb 21, 2024

The authors come from the Italian government, axway, and Mozilla. I wonder if this is being standardized like this because of a government requirement for standardization before use in official work, as is often the case. Axway's flagship seems to be an api marketplace platform.

Does anybody else have insight into what interest these parties might have in a Yaml RFC?

ucarion · on Feb 21, 2024

Roberto Polli and Erik Wilde are frequent IETF contributors. I'm not following their work much these days, but I don't think they have a specific interest here. They're active in httpapi, the working group that produced this RFC. Having worked a bit with both, they're great and thoughtful folks.

Eemeli Aro maintains the 'yaml' npm package. Seems like a good person to have aboard make sure things stay practical.

noirscape · on Feb 21, 2024

> .yaml is the preferred extension, .yml is dated

Is this really true? Basically every project I've ever cloned or set up that relied on YAML still uses .yml as the extension.

WorldMaker · on Feb 21, 2024

I've always used .yaml and have never understood why someone would choose .yml. Three letter extensions haven't been a requirement since like DOS 5 and I can't imagine that many developers are both using YAML and still using FAT16 but can't afford VFAT long filename entries.

cesarb · on Feb 21, 2024

> I've always used .yaml and have never understood why someone would choose .yml.

Because it's what the documentation says. For instance, the "Getting Started" example at https://docs.flatpak.org/en/latest/first-build.html says "call it org.flatpak.Hello.yml" - and that's Flatpak, which has always been a Unix-specific thing.

arp242 · on Feb 21, 2024

Old DOS "8.3" restri~1 are still followed by some people. Force of habit I suppose? Also the docume~1 or examples of some projects use .yml and I can't be bothered to experi~1 if both work, because it doesn't really matter.

quectophoton · on Feb 21, 2024

> have never understood why someone would choose .yml

Probably for the same reason people don't type other 4-letter extensions, like `.ruby` or `.rust`.

I guess `.java` is the exception (no pun intended).

o11c · on Feb 21, 2024

.yml is similar to .xml

deathanatos · on Feb 21, 2024

… but YAML ain't a markup language.

(That's the 'A', in YAML, literally.)

BerislavLopac · on Feb 21, 2024

That's a backronym; originally it meant "yet another".

deathanatos · on Feb 22, 2024

The current acronym is old enough to drink. The original acronym, AFAICT looking through The Wayback Machine, lasted less than a year. (The original language was pretty wildly different, and much closer to a markup language.)

For those that don't want to wait for the time capsule, here is the earliest example of it (I've shortened it from the original for brevity; best I can gather this should also have been valid):

  <timesheet.clarkevans.com:record>
    <person:record>
      <id:int>
        293945</id:int>
      <name:record>
        <given>
          Clark</given>
        <family>
          Evans</family></name:record></person:record>
  </timesheet.clarkevans.com:record>

You can feel the XML influence.

Oddly it kept the "Another" well into the evolution of not being a markup language anymore, but on the grand scale of things, didn't really keep it that long.

richm31415 · on Feb 21, 2024

Ansible generally prefers .yml - https://github.com/redhat-cop/automation-good-practices/tree... "When naming files, use the .yml extension and not .yaml. .yml is what ansible-galaxy init does when creating a new role template."

bewuethr · on Feb 21, 2024

It's what the YAML FAQ say: https://yaml.org/faq.html

electroly · on Feb 21, 2024

I'd just be glad for the community to standardize on one or the other. I don't care either way but right now we sometimes have both .yml and .yaml in the same directory! For GitHub Actions (the only place where I use yaml) there doesn't seem to be a clear consensus yet.

paulddraper · on Feb 21, 2024

GitHub Actions uses .yml in examples. [1]

Thought it supports both equally.

[1] https://docs.github.com/en/actions/quickstart

numbsafari · on Feb 21, 2024

Well, now there’s an RFC to refer to, and it makes a clear choice: .yaml.

electroly · on Feb 23, 2024

The two replies I received--one saying .yml is the GitHub standard, and the other saying .yaml is the RFC standard--neatly summarizes the situation we're in. I don't think this extension mess is ever getting fixed.

milliams · on Feb 21, 2024

I assume that it's because people are still struggling to move away from 8.3 filenames (https://en.wikipedia.org/wiki/8.3_filename).

deathanatos · on Feb 21, 2024

I mean … probably depends on local preferences. Every project I've ever worked on used .yaml.

.yml feels like .htm to me. DOS is long since gone.

But, Github seems to think they're about the same, with a lead for .yml: 23.9M hits for .yml, 17.8M for .yaml.

gweinberg · on Feb 21, 2024

Wouldn't the analog of .htm be .yam?

BandButcher · on Feb 21, 2024

I cringe at .htm

paulddraper · on Feb 21, 2024

While I personally prefer .yml (shorter is better), yeah .yaml is the standard convention.

riffic · on Feb 21, 2024

great, now do markdown!

edit: markdown has been done too :)

https://datatracker.ietf.org/doc/html/rfc7763

https://datatracker.ietf.org/doc/html/rfc7764

benatkin · on Feb 21, 2024

It's good that it identifies it as a family of languages, and doesn't make it refer to original markdown, which doesn't have code fences.

mhitza · on Feb 21, 2024

Does anyone know what's the rationale for settling on application/yaml instead of text/yaml? Couldn't find on within the text.

aragonite · on Feb 21, 2024

It could just be following the analogy of application/json. Which in turn raises the same question about json... here FWIW is Crockford's theory :)

> But overall, it was a really painful process. And in the end, I didn’t even get the mime type I wanted. I wanted Text/JSON. They gave me Application/JSON, which was weird because JSON is not an application, it’s a text format. It’s a way of representing data in text. And I think maybe that… Well, I don’t know why they’ve forced that on me. I’d like to think it’s because there were some XML fans who were resentful about what I’d been doing and decided that I didn’t deserve to have text that was reserved for XML. So they got text XML, but I got Application/JSON, which didn’t affect me personally at all, but for the people who use it, it’s not a big deal. It’s a little ugliness in the header.

https://corecursive.com/json-vs-xml-douglas-crockford/

lifthrasiir · on Feb 22, 2024

RFC 6838 is clear about the `text` top-level type [1]:

> The "text" top-level type is intended for sending material that is principally textual in form. [...] Beyond plain text, there are many formats for representing what might be known as "rich text". An interesting characteristic of many such representations is that they are to some extent readable even without the software that interprets them. It is useful to distinguish them, at the highest level, from such unreadable data as images, audio, or text represented in an unreadable form.

In comparison, if `text/*` should be used for any format that encodes something in a text format as Crockford wanted, then you can put every file format into `text/*` with base64 encoding in principle. Limiting `text/*` to the data that is textual in nature avoids this issue by setting a reasonable expectation [2]. It is even possible to register `text/vnd.foobar+json` if you want, because a structured syntax suffix `+json` doesn't affect the top-level type. But JSON in general doesn't guarantee any textual data even after processing.

[1] https://datatracker.ietf.org/doc/html/rfc6838#section-4.2.1

[2] Therefore `text/rtf` is possible when it doesn't embed any additional object, otherwise `application/rtf` should be used. It is debatable whether a source code is a text or not however, especially when it can be executed in place. The transition from `application/javascript` to `text/javascript` [3] is one example.

[3] https://datatracker.ietf.org/doc/html/rfc9239

aezart · on Feb 22, 2024

We have an integration at work that sends copies of the same message to two different servers running the same software, and one requires text/xml and the other requires application/xml.

wiktor-k · on Feb 21, 2024

There's one issue with text, it defaults to ASCII:

> For example, for any MIME type whose main type is text, you can add the optional charset parameter to specify the character set used for the characters in the data. If no charset is specified, the default is ASCII (US-ASCII) unless overridden by the user agent's settings. To specify a UTF-8 text file, the MIME type text/plain;charset=UTF-8 is used.

See: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_...

paulddraper · on Feb 21, 2024

Many text types default to other encodings.

E.g. text/vcard https://www.iana.org/assignments/media-types/text/vcard

creatonez · on Feb 23, 2024

If you don't have a content type, you're screwed no matter what you do. Web browsers use a goofy detection system that sometimes works and sometimes fails, because it only reads the beginning of the file.

arp242 · on Feb 22, 2024

That's fine, no? Not like application/anything can default to the correct character set. This is why everyone typically uses application/json; charset=UTF-8.

treffer · on Feb 21, 2024

So I got curious, nice questions.

RFC6838: Expected uses for the "application" type name include but are not limited to file transfer, spreadsheets, presentations, scheduling data, and languages for "active" (computational) material.

"File transfer" sounds to me like the distinction is: if you would expect to show it in a browser directly use audio/text/image/.... - otherwise it's application.

This sounds OK for yaml. You _can_ still serve it as text/plain if you want to show it as text, but as yaml it's probably for download purposes.

Btw, the RFC discussions are open. You can join an IETF mailinglist and start discussing such topics :-)

01HNNWZ0MV43FF · on Feb 21, 2024

eh I'm holding out for an email service that doesn't need a blood oath to sign up

vbernat · on Feb 21, 2024

Same as for application/json I suppose: this is a structured format to be processed by an application, not by humans directly.

p1mrx · on Feb 21, 2024

https://github.com/ietf-wg-httpapi/mediatypes/issues/11

> While I strongly support defining the application/yaml media type, I do not think that the text/yaml should be defined at this time. [...] Defining more than one media type is superfluous, and creates unnecessary uncertainty. The bare-text presentation of YAML content to a human is already now achievable by presenting YAML as text/plain

aloisklink · on Feb 21, 2024

One reason might be that `text/*` files follow the POSIX standard for text files [1], where no lines can exceed `{LINE_MAX}` bytes in length (and `LINE_MAX` depends on your OS).

I don't believe the YAML spec has any rules on how long lines can be, so this means that some files won't technically be text files. (and some UNIX tools line-based tools might not work correctly on them).

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

michael1999 · on Feb 21, 2024

Agree this should be addressed. They explicitly deprecate text/yaml, but don't justify the choice.

I could speculate that the YAML spec assumes full Unicode repertoire, and the only choices are UTF-8/16/32 BE/LE. But text/* allows (encourages) middle-boxes to guess and translate encodings to local standards (e.g. sniff Latin-1 and "helpfully" rewrite to 8859-9) to allow legacy clients to read text. A yaml file is more data than text, so text/* isn't appropriate. Similar to application/json.

Not sure I agree, but my smallest complaint about yaml.

slim · on Feb 21, 2024

https://datatracker.ietf.org/doc/html/rfc6838#section-4.2.5

nmz · on Feb 21, 2024

How can there be an RFC for something that is not implementable by anything other than the main yaml library?

aidenn0 · on Feb 21, 2024

I wrote a YAML parser several years ago. It wasn't that hard. YAML block literals can be confusing to write, but are not at all hard to parse.