I absolutely hate Excel. It's not the number crunching that's the problem. Database -> Excel, Excel -> Database issues are almost unavoidable. In 2 months of working with SSIS packages, I encountered every one of these problems (I'm not the original author of this rant):
"Anyone who has worked with a database in a professional capacity for more than 20 minutes should have a list of at least 10 reasons why Excel is a monster. These probably include:
1. The way it butchers postal codes that start with a leading zero, like the town I grew up in (Granby, MA 01033 USA)
2. Dates of any kind
3. Serial numbers that have leading 0's (see #1)
4. The JET database driver for Excel. One large WTF.
5. SQL Server Integration Services Excel datasource. WTF squared.
6. The f-ing "just put an apostrophe" workaround. WTF.
6. a. The equally effective "format as text before you paste" workaround. Gives the illusion of working, only to break later.
7. Save as CSV, then reopen the CSV in Excel. Lots of magical things happen there.
8. While on the topic, CSV files, which are a whole WTF on their own.
The root of all this: Excel makes things that look like tables, and tables are useful for data. There is no other program that is as widespread AND makes things that look like tables, so people use Excel to make tables of data. And it's in fact really, really bad at that. It was designed for ad-hoc numerical analysis and got appropriated as a database loading and reporting tool.
I think it's actually damaged the GNP of whole nations, this Excel program. It'd be interesting to know how badly."
Really, the main problem is that Excel (which has data types) is trying to support CSV (which has no data types) as a pseudo-native format. If Excel forced CSV files through the import wizard, and you could override a data type for each column, it would solve most of the issues. Instead, each column is implicitly treated as Auto and that fails in a lot of cases.
"The way it butchers postal codes that start with a leading zero, like the town I grew up in (Granby, MA 01033 USA)"
Excel, by default, treats any numerical object as a number. Numbers don't have significant leading zeroes. You can change the default data type, i.e. "format", to text or even postal code to preserve leading zeroes.
I find that does not always seem to work. We get CSVs with telephone numbers, and depending on the length, they end up being shown using E notation, even after selecting text formatting.
Why not run the CSV file through a custom made app that "cleans" each line in a proper format that will easily be imported into Excel, taking into account Excel's quirks. The app then spits out another CSV that Excel is able to properly import.
I find this solution a good compromise, although it may not be possible to do it.
Is this what you do? Because if someone sends me a csv file, under your workflow I'd have to open it in a text editor (or a different spreadsheet program) to figure out the data type of each column and run it through a perl script (or awk?). For a few columns, sure. For 200 columns, not ideal. Just to get around using excel's impmort directly. Plus, "taking into account Excel's quirks" is easier said than done.
That said, if you have a script you use that does this, you'd make a lot of friends if you posted a link.
I only do this for automated processes where I know the exact requirements of the CSV and the data is large. I definitely wouldn't do it for ad-hoc type of scenarios.
But for some things it just easier to put out an Excel file. In one scenario I have a set of complext spreadsheets that are updated nightly. I use EPPlus (http://epplus.codeplex.com/) a C# library which updates the appropriate data in the spreadsheet.
In an other scenario, I am taking in transactions (accounts payable/receivable, general ledger,etc.) that are in CSV format, applying some business rules and inserting them into a spreadsheet. This spreadsheet is then used by the accountants to do postbacks to the actual ERP system. I looked at doing this using the ERP's own batch interfaces and couldn't justify the time and expense as Excel was the best way to get the data in.
The above library doesn't require Excel on the machine. With Microsoft moving to an XML format for the file, it's made it much much easier to do these things. This particular library works well as long as you are doing simple data updates. There is no ambiguity in terms of the type of the data as you are able to explicitly state what is stored in the cell.
I would love to know if other library like this exist for Ruby or even Python.
If I'm going to write a custom app in some external programming language to consume and do type-aware processing on the CSV file, I might as well do whatever I analysis I need to do in that app, rather than just using it to reprocess CSV into CSV and then go back to working in Excel.
Its not like any of the languages I would use to do this (Python, Ruby, etc.) don't have easy-to-use libraries to treat the CSV as a data structure and perform analytic operations once I've spent the (minimal) effort to load it and do any basic transformations that I need to do based on the intended semantics of the data in each column.
we have to do cleanup of CSV imports in some areas of LedgerSMB and it isn't pretty.
Additionally the fact that it is even required makes you wonder what sort of monsters are lurking in the shadowy corners only to leap out later when you least expect it.
"Numbers don't have significant leading zeroes" is a rather daring statement. I guess you mean 'the western format of cardinal real numbers includes no leading zeroes,' but that assumes you are speaking of cardinal numbers, which are just some of them. 01033 is as much of a number as 1033.
Sorry for being nitpicky but do not confuse numbers with their representations. You would be amazed at the weird representations humans have used along history.
What a nightmare. Do they not realize how many DB tables start with 'ID'? And CSV is too common a format to just completely ignore. It's not always up to us.
Applies To:
- Microsoft Excel 2004 for Mac
- Microsoft Excel 2001 for Mac
- Microsoft Excel X for Mac
Sure, it sucks that it was there at all, but the most recent mentioned is a 9 year Mac version of the software (which is written by a different team than does the Windows Office anyways).
Yes! This ridiculous bug is still out there and is why I never use 'ID' when designing my schemas. Seriously, why can't reserved words be designed to be so uncommon, you'll never have a conflict? If I see another 'klass' object in Python or email broken because someone started a sentence with 'From' I'm going to cry.
Use LibreOffice calc to import the CSV then save to Excel format. No kidding, Open/LibreOffice gives you (among others) the option to chose the type of each column at import.
> 7. Save as CSV, then reopen the CSV in Excel. Lots of magical things happen there.
Indeed. My favorite one is accounting spreadsheets tending to export accounting numbers as text columns with trailing whitespace. That was off a recent version of excel for the Mac. I understand it's a cute convention for currency formatting but it makes data transformation in a database very, very annoying.
"Anyone who has worked with a database in a professional capacity for more than 20 minutes should have a list of at least 10 reasons why Excel is a monster. These probably include:
1. The way it butchers postal codes that start with a leading zero, like the town I grew up in (Granby, MA 01033 USA)
2. Dates of any kind
3. Serial numbers that have leading 0's (see #1)
4. The JET database driver for Excel. One large WTF.
5. SQL Server Integration Services Excel datasource. WTF squared.
6. The f-ing "just put an apostrophe" workaround. WTF.
6. a. The equally effective "format as text before you paste" workaround. Gives the illusion of working, only to break later.
7. Save as CSV, then reopen the CSV in Excel. Lots of magical things happen there.
8. While on the topic, CSV files, which are a whole WTF on their own.
9. The Jet database driver's "type guess rows" registry entry. WTF factorial.
The root of all this: Excel makes things that look like tables, and tables are useful for data. There is no other program that is as widespread AND makes things that look like tables, so people use Excel to make tables of data. And it's in fact really, really bad at that. It was designed for ad-hoc numerical analysis and got appropriated as a database loading and reporting tool.
I think it's actually damaged the GNP of whole nations, this Excel program. It'd be interesting to know how badly."