It makes me cry how much time has been invested in formats like CSV's, TSV's, etc. ASCII (and UTF-8) has characters reserved for column, row, and even group separation. Just use them and save a lot of pain.
2. More accurately, people can screw up TSV files from any editor. Have you not seen embedded spaces used instead of tabs and totally mess everyone up?
Like a lot of things, #1 is only partly true, particularly if you have embedded tabs.
The number of times I've had to deal with parsing errors because of embedded carriage returns, commas, tabs, etc., sometimes costing millions of dollars is just... upsetting.
It's like JSON... we say everything can parse it, but really what we've got is some approximation that will come back to haunt us... and once you've done all the work to make sure everything is precise and correct, you'd really have saved time if you'd just used the tools we already had in the first place.
The flip side to that is that json and yaml parsers exist in every language, and would be more than capable of replacing any logic you’d find in a CSV.
Just use these formats if you want to be stuck in the last century when internationalization was that odd thing you could easily afford to ignore. Otherwise, use a real format.
3. If you don't know how to use your editor, maybe you should learn. If your editor can't insert even all the characters in basic ASCII, it's not an editor.
Go ahead and give every engineer who comes across your meta-delimited file a little rundown on how it works. "Can you show me how to import it into google sheets?" "I need to email it to someone, can you show me how to change it?" "My IDE says I need a plugin to read it? Do you know anything about that? Can you help me set it up?"
They'll all agree how clever and useful the meta characters are of course, but only after you've given them your time in learning about it. No thanks no thanks no thanks. For me I'd rather deal with a little bit of serialization headache then a support headache.
While that might have been a good idea if we'd started a while ago, at this point CSV/TSV are so much more established that using commas or tabs doesn't actually save pain.
But wouldn’t you have to account for the possibility that those separators exist in the content your working with, putting you back at square one in terms of escaping?
That's entirely the point: 0x1C and 0x1E should never actually appear in "normal" text unless someone has explicitly put them there, which is not necessarily true of , \t or \n.
But, there is nothing to stop your values from containing these characters. So, you still have to escape your input. And once you've done that, you might as well just use csv / tsv.
For some reason, in some circles, it seems to be semi standard to use þ (0xFE aka thorn) as the delimiter and a paragraph symbol (0x14 aka DC4 aka ^T) as the separator. The latter is not to be confused with 0xB6.
Anyway, these character are presumably not going to occur in ordinary text.
All of "upper ascii half" can occur in ordinary text in "pre-Unicode" encodings.
0xFE is a good example - you may get a customer or employee from Iceland with that character in their name (e.g.
https://en.wikipedia.org/wiki/Haf%C3%BE%C3%B3r_J%C3%BAl%C3%A...), or data in cyrillic cp1251 or koi8-r enconding where 0xFE also represents characters that you'll encounter in surnames, etc.
ISO 2022 aka ECMA-35¹ standardizes how to use it in general, but the only thing that really caught on is a subset of the terminal control extensions (ISO 6429 aka ECMA-48² aka ‘ANSI’). In the hypothetical alternate universe where people use ASCII-based structured data, ISO 2022 would have standard sequences for such metadata.
If you were rolling your own today, you'd probably wrap the metadata in Application Program Command (ESC _ … ESC \) or Start Of String (ESC X … ESC \) or one of the private-use sequences.
¹ https://www.ecma-international.org/publications/standards/Ec...