Hacker News new | past | comments | ask | show | jobs | submit login

There is a spec (RFC 4180 [1]) but it's definitely not widely followed. Worse, for a lot of data there's no problems for potentially years, until your numbers get too big or the first time a quote or comma gets in the data.

In my experience one of the biggest barriers I run into -- and the primary reason I hate using CSV -- is Microsoft Excel. It misinterprets numbers as dates, it convers big numeric identifiers to exponents, and more. Even merely opening a RFC4180-compliant file and saving it changes the data, and even Excel itself will often have a different misinterpretation of the de file.

If humans never used Excel for CSV, it would be a viable format. At the same time in most cases where humans aren't in the loop (machine-to-machine communications), there's better formats. You could spec "RFC4180 CSV" and hope no developer just sees the "CSV" and assumes they understand. Or specify something like a JSON streaming format and avoid a whole lot of headache.

[1] https://www.ietf.org/rfc/rfc4180.txt




Regarding excel and rfc4180…

I avoided CSV for quite a while because I had excel-vs-CSV compatibility concerns like this.

However, when I tested this for myself a few years back, Excel output to my surprise was rfc4180 or darn near it (it might use CRLF rather than LF?) It emitted commas and quotes the same way as the rfc for all the test cases I checked.

That said, I agree with you Excel is problematic as an input source. Usually the problems are the humans who touch the data in excel, but what I’ve found is the automation problems tend to be with Excel parsing and interpreting incoming data (before it goes to CSV.) Exponents, trimming leading zeros, etc. as you say. But if the data is confirmed good in excel before being emitted, the CSV it emits is decent.

Counterexamples welcome.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: