Hacker News new | past | comments | ask | show | jobs | submit login

We had to ban our non-technical users from opening CSVs in Excel because it would strip leading zeros and do other exciting things like convert large numbers to 1.23123E+20 :(

It does this without any notification or warning when you save.




Kill me...i've dealt with flat file ETL for ~9 years now, with the first 2 pretty much dedicated to it. I hate hate hate Excel for no reason other than that. It has wasted so many hours due to customers messing up their identifiers by opening up their feeds in Excel prior to sending, and accidentally transforming like above, then having to fix changing customer identifiers used throughout the system once they figure out what happened.


Excel can only display CSV files. I had to ban myself from editing files in Excel because it’s atrocious at it. It also likes to somewhat change decimal-seperators (again, without any warning).


Even technical users have trouble round tripping data to Excel.

https://genomebiology.biomedcentral.com/articles/10.1186/s13...:

“A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.”

That paper is from 2016, at least 12 years after the problem was first identified.


East coast zip codes with leading zeros are a real problem.


That behavior is technically correct because those are all valid representations of that value, e.g., 007449938 = 7449938 = 7.449938E6. Pretty much any plain text (or binary) data format will have this same issue, unless it supports additional metadata for display purposes.

If you intended the data to be a string value, then it should have been enclosed in quotes.


The problem is that when you have a number like 123451234512345, it may get rewritten as 1.234512E+14, and then you are loosing information.


Your assertion sounded incorrect so I tested it. Quoting large numbers does not preserve them on a save.

Adding a ' before it should work, but for data extracted from arbitrary systems you'll have few guarantees of the sort. Formatting the cells in Excel to display the full value instead of the truncated value also works from memory, but you won't always know ahead of time if this happened later in the file.

In our case it was often mailing identifier barcodes so any loss of precision made them entirely useless.


What if the value is an ID rather than an integer? CSV doesn't specify whether the data type is a string or a number. The schema needs to be inferred by the parser/user.


> If you intended the data to be a string value, then it should have been enclosed in quotes.

That's not part of any CSV specification I've seen, including RFC4180.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: