Besides inconsistencies, that a non defined quoting and escaping mechanism is already quite painful. Let's look at following single column single line csv:
'\0'
Depending on the source this might be:
- quote back-slash zero quote
- zero
- back-slash zero
- a byte with the numeric value of 0
- quote zero quote
- quote back-slash zero quote
- quote byte-with-value-0 quote
- malformed and should be ignored
Worse the same application might mix different of this quoting mechanisms for different columns or even in the same column. Normally it shouldn't but it's defintely a thing you can find in programs which don't use a proper marschalling library/code, and you know. CSV is easy so you surly don't need to add a library dependency just to use it. (sorry sarcasm).
Or with other words "because CSV is easy" it's somteimes not properly specified and sometimes no "proper" marschalling/serialization library/module is used resulting in ad-hoc fixes for quoting where needed which potentially don't follow any specification either.
The worst part about CSV is the quoting. Oh the quoting. As soon as a human writes something in a plain text field, your nice CSV parsing gets much, much harder.
Salespeople enter things like Company Foo, the One (which used to be Bar). The worst part is that often these are legal names, which means you do need to store them.
When I worked for a FAANG this was one of my major annoyances as there was no canonical number for a customer, so everyone did string matching which broke whenever a company changed their name (like for instance, if they'd just gone public).
'\0'
Depending on the source this might be:
- quote back-slash zero quote
- zero
- back-slash zero
- a byte with the numeric value of 0
- quote zero quote
- quote back-slash zero quote
- quote byte-with-value-0 quote
- malformed and should be ignored
Worse the same application might mix different of this quoting mechanisms for different columns or even in the same column. Normally it shouldn't but it's defintely a thing you can find in programs which don't use a proper marschalling library/code, and you know. CSV is easy so you surly don't need to add a library dependency just to use it. (sorry sarcasm).
Or with other words "because CSV is easy" it's somteimes not properly specified and sometimes no "proper" marschalling/serialization library/module is used resulting in ad-hoc fixes for quoting where needed which potentially don't follow any specification either.