AFAICT, XML fulfils all of the requirements of this. It is self describing, rigo...

mr_toad · on Aug 19, 2021

Too easy to produce bloated files. People do dumb things like put field attributes and metadata on every row of the output, so you end up with files that can be several times the size of the raw data.

Parsing times are often horrible.

There’s no standard for tabular data. You invariably need some overly complicated XML map, because people can’t resist the temptation to over-engineer.

ArsenArsen · on Aug 19, 2021

I had anticipated the overhead of XML being brought up, hence the other suggestions there, which is anything + jsonschema. jsonschema was used for illustrative purposes, it's a lot more powerful than most usecases call for, and it pays forward for that by making the syntax very verbose and longwinded. It'd certainly be an alternative with less overhead, though.

I've not benchmarked XML parsing times in a very long time, I'd be interested in seeing the numbers now.

Tabular data is barely data, honestly, hence my PS, I don't think spreadsheets in general are a very good way to store anything.

And yes, absolutely, overengineering is bound to happen, which is unfortunate, but I'm not sure if it really can be avoided while still keeping many of those upsides (especially the rigorous definitions)

selfhoster11 · on Aug 19, 2021

Most XML is not human-readable. There's just too much line noise, and most of what's emitted has a weird schema that's difficult to parse using Human Brain 1.0.

ArsenArsen · on Aug 19, 2021

I don't think it's much worse than CSV, with it's inflexible structure, not allowing for any formatting to be inserted. As for weird schemas, that really is an implementation specific issue. This example was sketched up in minutes and already reads better than most spreadsheets I've seen over the years!

  <bankstatement>
      <transaction date="2021-08-19" description="Hello, world!">
          <debit commodity="USD">100</debit>
          <balance commodity="USD">237.87</debit>
      </transaction>
  </bankstatement>

selfhoster11 · on Aug 19, 2021

While weird schemas are an implementation-specific issue, the fact remains that most XML that I've seen in commercial settings is exactly as unreadable as I described. It's harder to go against the grain of the status quo when very few people accompany you on the journey.