"Given how badly common tools mangle unambiguously correct CSV data, how many variations there are which make “unambiguously correct CSV data” a somewhat small proportion of what is out there, and how many tools not only expect but require mis-formatted data and/or output it, it is scary how much the format is relied upon in major industries."
In a nutshell, CSV isn't a format. It's a family of formats, and it's not even a well-specified family of formats.
At least in semi-technical circles, I've had some success in using this to push back against CSV suggestions and get them to use better things. I'm sure that in non-technical circles I'd have zero success with this, though. It sure ain't a magic talisman you can use.
JSON isn't exactly a rigidly specified format, but it's got a lot less flex in it and I've not had as much trouble with it. Biggest problem I have is just getting people using dynamic scripting languages to please output either a string or a number, but don't just output "whatever the scripting language happened to decide based on what code paths I happened to run" when you don't even realize your code ends up casting it back and forth without you knowing and what comes out is effectively random from my point of view.
> In a nutshell, CSV isn't a format. It's a family of formats, and it's not even a well-specified family of formats.
There is RFC4180. Though by 2005 when that came about there were already so many different cases around that it became just one of a great many possible variants.
I try not to push back too hard about CSV, for fear of “well, there is this XML format that is supported”! (bad enough in itself, but sometimes the “XML format” is even more poorly specified than the client's CSV edge cases which we are expected to guess).
JSON is nice as long, as you say, that strings are real strings and numbers are real numbers.
Oh, and dates/times are in an RFC3339 (or ISO8601) numeric (no localised month names, etc.) format either in UTC or with the timezone always specified, as strings (though at a pinch I'll accept a posix time_t for datetime if based on UTC). Not specifying how to handle dates/times/both is the major problem with JSON in my experience.
In a nutshell, CSV isn't a format. It's a family of formats, and it's not even a well-specified family of formats.
At least in semi-technical circles, I've had some success in using this to push back against CSV suggestions and get them to use better things. I'm sure that in non-technical circles I'd have zero success with this, though. It sure ain't a magic talisman you can use.
JSON isn't exactly a rigidly specified format, but it's got a lot less flex in it and I've not had as much trouble with it. Biggest problem I have is just getting people using dynamic scripting languages to please output either a string or a number, but don't just output "whatever the scripting language happened to decide based on what code paths I happened to run" when you don't even realize your code ends up casting it back and forth without you knowing and what comes out is effectively random from my point of view.