If you interpret "CSV" as purely comma seperated values then maybe. But in my bubble "CSV" means textfiles that are separated by some separator. Be it tabs, spaces, commas, or any other ASCII character. Some are more usable then others, if you have commas in your data then use tabs. If you have tabs use Form Feed or Record Separator or vertical tabs ... and so on.
Of course this is not always applicable, since you sometimes don't control the format you get your data.
> But in my bubble "CSV" means textfiles that are separated by some separator. Be it tabs, spaces, commas, or any other ASCII character. Some are more usable then others, if you have commas in your data then use tabs. If you have tabs use Form Feed or Record Separator or vertical tabs ... and so on. Of course this is not always applicable, since you sometimes don't control the format you get your data.
When I see CSV parsers like https://www.papaparse.com/ that even try to support comments and empty lines in the format, I wonder if it'd really be that bad to just raise an error on anything that doesn't fit RFC 4180[1], with explanation on where and how the file is corrupted. Push the issue to the writers of such files.
Then I remember the Robustness Principle[2] and I chill a little.
It seems quite common in some European countries to use semi-colons as the delimiter instead of commas (because they use commas as the decimal separator?), adding a new level of fun to parsing. In Easy Data Transform we count the number of commas, semi-colons, tabs in the file to make an educated guess as the delimiter (which the user can override).
Semicolon-separated "CSV" is a quirk of Microsoft Excel, which they have never fixed. I figure Microsoft would prefer that people use XLS instead of CSV.
In theory, yes, to be pedantic, but for example, LibreOffice saves its exported CSVs by default as tab delimited. You actually have to manually specify you want commas to get those.
> LibreOffice saves its exported CSVs by default as tab delimited
Maybe it's actually presenting what you've selected last?
It's giving me comma as the default separator, and it's the first option in the dropdown. Tab is the 3rd option.
>> Those other things have different names like TSV
That depends on the writer. I've gotten what should be named PTVs (pipe-terminated values) as CSVs. I can understand how it happened. If the underlying software outputs PTVs, you don't want to bother converting that because you're working in a legacy language that's a pain to work with (the type where identifiers can't be longer than 4 chars), and you want the user to be able to double-click on it and have it open in a spreadsheet without prior configuration, you just push the issue to the reader of the file, since by tradition readers are already quite tolerant of format differences...
Of course, there'll always be the case where the reader is simply not tolerant enough, like when the escaping syntax differs. There doesn't seem to be a way to get LO Calc to interpret "foo|bar\|baz|" as cells "foo" and "bar|baz", for example.
And then Excel imports them all using heuristics when you select "Type: CSV". So you'll never train anyone on the demand end of these documents that they're called anything besides CSVs.
Technically true, but like I said "CSV" is more a term for human readable data, with some delimiter in them. Maybe it's a comma, maybe not. In every case you need someonne to look at it. If you want a machine-machine data protocol you can use XML or JSON if it needs to be somewhat human readable.
Of course this is not always applicable, since you sometimes don't control the format you get your data.