Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article's strongest criticism of CSV is that it's easy for someone to mangle it when manually editing. This is true. It's also true for every format.

It was weakest when it implied there is no real standard. There is, and it's robust for representing data, even data that includes any combination of commas and double-quotes.

The algorithm for creating well-formed CSV from data is straightforward and almost trivial: if the datum has no comma in it, leave it alone. It's good to go. If it has even one comma, wrap the datum in double quotes; and if it also contains double quotes, then double them.

Not complicated and covers every edge case. CSV is going nowhere. 1000 years from now, computers will still be using CSV.

The answer to his objections could be to extend the format to include metadata. Perhaps a second row that holds type data.

  fruit,price,expiration
  string,$0.00,MM/DD/YYYY
  apple,$0.45,01/24/2022
  durian,$1.34,08/20/2021
etc


You didn't mention handling new lines as data. Excel for example will include these as-is and double quote the cell.

It is very easy to overlook edge cases in CSV.


> It is very easy to overlook edge cases in CSV.

That was not a criticism from the original article, and isn't even true.

If you have newlines in your original data, and you "overlooked" this "edge case", then neither JSON nor YAML nor any other format will save you. The same fix applies to them all.

This is really a very poor criticism.


The only reason I replied was that you said:

> The algorithm for creating well-formed CSV from data is straightforward and almost trivial: if the datum has no comma in it, leave it alone. It's good to go.

> Not complicated and covers every edge case

Maybe more context was implied, but I didn't want anyone to think it's that simple. I have received and had to process CSV data with unexpected new lines. It gets nearly everyone the first time when they try to process a CSV file line by line.


That is a fair point :)


I wonder why escape dquotes with another dquote instead of a backslash or something?


Because then you'd have to escape backslashes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: