Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AFAICT, XML fulfils all of the requirements of this.

It is self describing, rigorously defined for the machine, highly flexible, highly extensible, human readable, text-based and an open format, and it is only mentioned as a storage format in the post, not even nearly doing it justice!

XML is complex, but there's already very established libraries for handling it, so that shouldn't be an issue, and that's the only drawback XML has.

Hell, even alternatives like JSON/YAML + JSONSchema exist, still providing incredibly rigorous validation, but staying human readable and ubiquitous.

P.S. I think a bigger failure here is trying to generalize the concept of spreadsheets, rather than using a common encoding (XML) for domain-specific data formatting.



Too easy to produce bloated files. People do dumb things like put field attributes and metadata on every row of the output, so you end up with files that can be several times the size of the raw data.

Parsing times are often horrible.

There’s no standard for tabular data. You invariably need some overly complicated XML map, because people can’t resist the temptation to over-engineer.


I had anticipated the overhead of XML being brought up, hence the other suggestions there, which is anything + jsonschema. jsonschema was used for illustrative purposes, it's a lot more powerful than most usecases call for, and it pays forward for that by making the syntax very verbose and longwinded. It'd certainly be an alternative with less overhead, though.

I've not benchmarked XML parsing times in a very long time, I'd be interested in seeing the numbers now.

Tabular data is barely data, honestly, hence my PS, I don't think spreadsheets in general are a very good way to store anything.

And yes, absolutely, overengineering is bound to happen, which is unfortunate, but I'm not sure if it really can be avoided while still keeping many of those upsides (especially the rigorous definitions)


Most XML is not human-readable. There's just too much line noise, and most of what's emitted has a weird schema that's difficult to parse using Human Brain 1.0.


I don't think it's much worse than CSV, with it's inflexible structure, not allowing for any formatting to be inserted. As for weird schemas, that really is an implementation specific issue. This example was sketched up in minutes and already reads better than most spreadsheets I've seen over the years!

  <bankstatement>
      <transaction date="2021-08-19" description="Hello, world!">
          <debit commodity="USD">100</debit>
          <balance commodity="USD">237.87</debit>
      </transaction>
  </bankstatement>


While weird schemas are an implementation-specific issue, the fact remains that most XML that I've seen in commercial settings is exactly as unreadable as I described. It's harder to go against the grain of the status quo when very few people accompany you on the journey.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: