Importing data from Excel shouldn't be this painful.
I'd love to see a service that takes in a user's mangled spreadsheet and some regex validation for each column and spits back perfect JSON-formatted data, walking the user through corrections along the way (ie: intelligently guessing column mappings, highlighting malformatted cells, column joins, down/up/title casing, and string substitutions). Something that could be integrated in a line or two of javascript would be fantastically valuable (especially if that "$100,000 in engineering time" heuristic is accurate).
This would be particularly useful if you could somehow do it in such a way that the service doesn't need to actually be in possession of the CSV/Excel/etc data at any point. (That would be a non-starter for privacy reasons at many companies.)
Would I have paid, oh, $500 a month for this? Heck yes. I would have paid it for it on day #1 and continued paying for it for each of the last 4 years, and it would still be cheap at the price.
Huh. That'd be tricky, but not impossible. You'd have to do everything on the browser, which would limit the size of the spreadsheet you could import (it would have to fit in memory), and may limit browser support. It'd also be harder to productize since the secret sauce is now just javascript.
I really want this to exist, though. Maybe it could be open sourced and survive on your proposed enterprise licenses. Hmm...
We built this for internal use within a specific app/process last year, and it was so useful we converted it into a generic uploader than devs could use any time they needed to upload CSV/Excel data.
A while back I built something like that for my old employer. It did most of the work on the server, but it did a good job of guessing column mappings, highlighted bad cells and let the user fix them, checked for dupes, and all the validations could be configured with a little xml, including multi-column constraints. I used Aspose for the excel import.
I've thought about making an updated version, maybe as a hosted service, if I can come up with something different enough so my old employer can't claim ownership. I did a little test marketing a while back and it didn't go that well, but maybe it just needs more experimentation.
This was the premise behind Spreadsheet.io (founder here). I wrote the xls/xlsx/csv/tsv pipeline parser that converts to JSON. Also wrote a native Excel add-in the embeds a JavaScript runtime / REPL for applying JS scripts against local files. Using scripts to extract, clean up and integrate data, etc.
Currently, it's sitting on my local machine collecting bitrot. Thinking about open sourcing it..
This sounds exactly like what SheetJS does. Where are the gaps between what patio posted about and what you are asking for? Just the JSON output and working as a service part?
Is such a thing really possible? To convert CSV (flat data) to JSON (hierarchical data) you need to know the hierarchy. When you convert from JSON -> CSV you lose that information, and to get it back converting CSV -> JSON you need some sort of out-of-band schema information. Otherwise you will just end up with "flat" JSON, which is not better than CSV.
Importing spreadsheets with column headings and data rows is bad enough - my biggest struggle is with spreadsheets used as fillable forms. I've spent tons of time working on generalizable tools for extracting this kind of semi-structured data but in the end each group of files requires a lot of custom work.
I'd love to see a service that takes in a user's mangled spreadsheet and some regex validation for each column and spits back perfect JSON-formatted data, walking the user through corrections along the way (ie: intelligently guessing column mappings, highlighting malformatted cells, column joins, down/up/title casing, and string substitutions). Something that could be integrated in a line or two of javascript would be fantastically valuable (especially if that "$100,000 in engineering time" heuristic is accurate).