But humans rarely use notepad these days. They use code editors like Coda or VSC...

sfifs · on May 15, 2020

Let's say we have a product master. Something looks fishy and we want a brand person or product supply person to check... Push a CSV and it's trivially easy for them to open in XL and check, make corrections and send.

In a business context, this happens far more often than you may expect. Sure you can build a custom platform to validate and make people connect to a form connecting to a database - but Excel is a great user interface and everyone knows excel.

A funny incident - we were struggling to build a complex workflow in KNIME where at some points we need user input. Nothing out of the box was great - tools either assume a dashboard paradigm or a data flow paradigm - nothing In between.

One of our creative folks came up with the solution of writing to a CSV and getting KNIME to open it in excel. The user would make changes and save, close excel and continue the workflow in KNIME. Even completely non technical people got it.

hnick · on May 15, 2020

We had to ban our non-technical users from opening CSVs in Excel because it would strip leading zeros and do other exciting things like convert large numbers to 1.23123E+20 :(

It does this without any notification or warning when you save.

Tostino · on May 15, 2020

Kill me...i've dealt with flat file ETL for ~9 years now, with the first 2 pretty much dedicated to it. I hate hate hate Excel for no reason other than that. It has wasted so many hours due to customers messing up their identifiers by opening up their feeds in Excel prior to sending, and accidentally transforming like above, then having to fix changing customer identifiers used throughout the system once they figure out what happened.

Semaphor · on May 15, 2020

Excel can only display CSV files. I had to ban myself from editing files in Excel because it’s atrocious at it. It also likes to somewhat change decimal-seperators (again, without any warning).

Someone · on May 15, 2020

Even technical users have trouble round tripping data to Excel.

https://genomebiology.biomedcentral.com/articles/10.1186/s13...:

“A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.”

That paper is from 2016, at least 12 years after the problem was first identified.

mark-r · on May 15, 2020

East coast zip codes with leading zeros are a real problem.

mywittyname · on May 15, 2020

That behavior is technically correct because those are all valid representations of that value, e.g., 007449938 = 7449938 = 7.449938E6. Pretty much any plain text (or binary) data format will have this same issue, unless it supports additional metadata for display purposes.

If you intended the data to be a string value, then it should have been enclosed in quotes.

yiyus · on May 15, 2020

The problem is that when you have a number like 123451234512345, it may get rewritten as 1.234512E+14, and then you are loosing information.

hnick · on May 16, 2020

Your assertion sounded incorrect so I tested it. Quoting large numbers does not preserve them on a save.

Adding a ' before it should work, but for data extracted from arbitrary systems you'll have few guarantees of the sort. Formatting the cells in Excel to display the full value instead of the truncated value also works from memory, but you won't always know ahead of time if this happened later in the file.

In our case it was often mailing identifier barcodes so any loss of precision made them entirely useless.

rovolo · on May 15, 2020

What if the value is an ID rather than an integer? CSV doesn't specify whether the data type is a string or a number. The schema needs to be inferred by the parser/user.

recursive · on May 15, 2020

> If you intended the data to be a string value, then it should have been enclosed in quotes.

That's not part of any CSV specification I've seen, including RFC4180.

SifJar · on May 15, 2020

but if using the ASCII separators was common, excel would support that and it seems like this workflow would be unaffected?

paulryanrogers · on May 15, 2020

People still need to open them in non-excel apps. And the format may predate widespread use of spreadsheet software.

Fun fact, Excel will truncate numbers beyond 15 digits unless prefixed with a single quote mark.

the_angry_angel · on May 15, 2020

> But humans rarely use notepad these days. They use code editors like Coda or VSCode at the very least which have all kinds of advanced features. Surely, those can include support for ASCII separators?

I do a fair amount of work with companies that do "EDI" over CSV (or worse CSV-like - think 2 CSVs jammed together with different formats, no headers, no support for escaping or quoting) and fixed width documents. I can absolutely assure you that humans do open these files in notepad far more often than I'd like.

Often one of the main reasons they don't use things like X12, ASCII separators, etc. is because a "human needs to open it at some point" was a prevailing business decision some number of years ago (think "what happens if the IT system fails? how can we still ship stuff even in a complete emergency") and now it's baked into their documented process so deeply its like shouting into the wind to alter things. Third party warehouses are the worst at this.

roto · on May 15, 2020

Developers are by no means the only humans that use and/or edit CSV files.

beaconstudios · on May 15, 2020

Developers rarely use notepad. Non technical users still have to edit csvs by hand, all over the world.

C1sc0cat · on May 15, 2020

The average CSV use is as an export/import from excel and most of those users don't have specialised developer editors.

kube-system · on May 15, 2020

I have not seen very many business analysts open an IDE. In my experience, most edit CSVs in Excel.

JauntyHatAngle · on May 15, 2020

I don't think your first line is at all true.

Modifying Csvs with notepad is rife. I'd wager more than using any one particular code editor.

arethuza · on May 15, 2020

Or they use Excel, which can introduce its own entertaining errors.

A few years back while working on something that unavoidably used large quantities of CSV data we would sternly warn people not to use Excel, but people still did it ("But I just looked at it!").