> I simply didn’t feel right getting that amount of value for free from two projects which are run by very small teams, so I approached both and convinced them to sell me an enterprise license to their project. It is equivalent to the usual OSS license, except it comes with an invoice.
That's really cool of you, patio11! Most open source stuff I've worked on... I'm happy if I get a "thanks!" from time to time. Sure, it has other benefits, and I wouldn't be where I am today without open source, but handing out some actual cash is very classy.
I wonder if more companies would really consider doing this, though - handing over money that you don't have to is not something I've seen a lot of. I've worked for people who don't even want to let the world they're using various bits of open source software, let alone contribute back anything.
Also, there is a concern that money can really change the dynamics of a community, but by and large, I'd rather see a lot more money funneled to open source than there currently is. For instance:
Convincing businesses to pay for OSS is a frustrating problem. I built a gem[1] that has had almost 5000 downloads, 89 since Monday night (rough proxy for production users), related to payments. Exactly zero people have had any interest in paying for the commercial license I offer, even though the gem is directly related to payments.
The pro offering is probably not quite where it needs to be, but to have zero interest at all is pretty discouraging.
Off topic: This is literally the first gem/package/piece of software I've seen distributed in this way that has sales tax attached to it for residents of a certain state. Is there some Michigan law pertaining to sales tax on software?
My business offers in consultancy-based software development so we are not bound by my state's sales taxes, I was just curious if Michigan has a software-specific law.
It's terrible. I'm a resident of Michigan and so I have to remit sales tax on "packaged software". The rules for what constitutes "packaged software" recently got changed to include anything downloadable. SaaS is specifically excluded, of course.
And that just covers the very specific case of giving an existing project money, and does not cover things like Jetbrains' program that gives free licenses for open source work, Google's Summer of Code program, and various organizations contributing code back to an OSS project.
What a great tool! I think one challenge putting this into production will be all the "extra" stuff people put into spreadsheets:
- one or more titles on top ("Patient Database", "current as of Jan 5, 2014", etc)
- data in a bunch of different sheets.
- multiple types of "stuff" all packed into one sheet, maybe separated by a few blank rows or maybe a summary pivot table pasted in the top-left corner.
It's easy to handle that if you are pre-processing the spreadsheets yourself, but getting to where non-programmers can prep a spreadsheet for uploading seems hard. How are they supposed to know that column headers belong on the first row, that each row of data should have the same "kind" of thing, etc.?
Anyway, good luck and thank you for the great writeup!
One of the commenters on the article suggested sending the customer an "import template," with all of the column headers predefined. The customer simply needs to copy-paste their data into your CSV import template, and upload it.
Never tried it myself, but it sounds like a decent solution to the problem of wacky formatting.
I tried it. Customers pretty frequently made their own spreadsheets that looked mostly like the template, but not quite. I was constantly fixing their uploads.
I built a system to validate their uploads and let them fix their errors on a website, and all that hassle went away.
(I still gave them the template so they'd know what it was supposed to look like, I just didn't rely on that alone.)
Actually some of that is not automatable, so this cannot by its nature be a complete solution, but with heuristics -- ugh -- it worked. Months and months of effort to get it to work, and we didn't even have the general public as customers, just a small set of B2B companies.
Hey Patrick! I'm a huge fan of your work, and really enjoy reading your blog. I have one question though.
AR is HIPAA compliant, which implies that there is (medically) sensitive information hitting your servers. Why is it not an issue for you and your support agents to actually see that data yourselves (as you would when manually fixing CSV errors)? If your seeing this data doesn't violate the letter of HIPAA, surely the ethical impetus behind the act would prevent you from doing so?
I have spoken all the eldritch rituals which legally permit a doctor to share patient information with me personally as long as they have a contract with my name signed in blood on it.
Just kidding. It isn't actually that bad. Appointment Reminder is a "Business Associate" of Happy Teeth Dental. I'm it's HIPAA compliance officer, attend a yearly training session, have been threatened with the most severe of sanctions if I misused patient data, see only the data required for my job, and have my name and access rights recorded in a spreadsheet ready to be audited (along with my access logs). That's probably half of the list. Clearly HIPAA can't completely ban non-doctors from seeing medical data or the entire medical sector grinds to a halt, right?
With regards to support agents, some people at the company are approved for access and some are not. The system enforces access rights, naturally.
HIPAA does not forbid e.g. Patrick from viewing or working with medically sensitive data. If it did, it would effectively prevent any medical software or services from operating at all.
HIPAA does however have an awful lot to say about what can and cannot be done with this data, how it must be handled, who it can and cannot be divulged to, and so on. For example, when and where it must be encrypted, how its use must be audited, etc.
It is in some ways like PCI compliance. All parties handling sensitive medical/financial data on your behalf have to follow certain secure practices, or risk facing steep fines and legal action.
This question seems a bit odd if Patrick viewing the file doesn't actually violate HIPAA (and IANAL, but I believe it doesn't necessarily). What is "the ethical impetus behind the act" you're referring to here?
I had a discussion with the author of sheetjs about a year back as a response to my comment[1] on VBA and Excel. We went back and forth on building a kickstarter campaign - especially for his "transpiler" that converts VBA to JS.
I'm glad to see this has come so far ahead. My offer to contribute to a kickstarter still stands !
Importing data from Excel shouldn't be this painful.
I'd love to see a service that takes in a user's mangled spreadsheet and some regex validation for each column and spits back perfect JSON-formatted data, walking the user through corrections along the way (ie: intelligently guessing column mappings, highlighting malformatted cells, column joins, down/up/title casing, and string substitutions). Something that could be integrated in a line or two of javascript would be fantastically valuable (especially if that "$100,000 in engineering time" heuristic is accurate).
This would be particularly useful if you could somehow do it in such a way that the service doesn't need to actually be in possession of the CSV/Excel/etc data at any point. (That would be a non-starter for privacy reasons at many companies.)
Would I have paid, oh, $500 a month for this? Heck yes. I would have paid it for it on day #1 and continued paying for it for each of the last 4 years, and it would still be cheap at the price.
Huh. That'd be tricky, but not impossible. You'd have to do everything on the browser, which would limit the size of the spreadsheet you could import (it would have to fit in memory), and may limit browser support. It'd also be harder to productize since the secret sauce is now just javascript.
I really want this to exist, though. Maybe it could be open sourced and survive on your proposed enterprise licenses. Hmm...
We built this for internal use within a specific app/process last year, and it was so useful we converted it into a generic uploader than devs could use any time they needed to upload CSV/Excel data.
A while back I built something like that for my old employer. It did most of the work on the server, but it did a good job of guessing column mappings, highlighted bad cells and let the user fix them, checked for dupes, and all the validations could be configured with a little xml, including multi-column constraints. I used Aspose for the excel import.
I've thought about making an updated version, maybe as a hosted service, if I can come up with something different enough so my old employer can't claim ownership. I did a little test marketing a while back and it didn't go that well, but maybe it just needs more experimentation.
This was the premise behind Spreadsheet.io (founder here). I wrote the xls/xlsx/csv/tsv pipeline parser that converts to JSON. Also wrote a native Excel add-in the embeds a JavaScript runtime / REPL for applying JS scripts against local files. Using scripts to extract, clean up and integrate data, etc.
Currently, it's sitting on my local machine collecting bitrot. Thinking about open sourcing it..
This sounds exactly like what SheetJS does. Where are the gaps between what patio posted about and what you are asking for? Just the JSON output and working as a service part?
Is such a thing really possible? To convert CSV (flat data) to JSON (hierarchical data) you need to know the hierarchy. When you convert from JSON -> CSV you lose that information, and to get it back converting CSV -> JSON you need some sort of out-of-band schema information. Otherwise you will just end up with "flat" JSON, which is not better than CSV.
Importing spreadsheets with column headings and data rows is bad enough - my biggest struggle is with spreadsheets used as fillable forms. I've spent tons of time working on generalizable tools for extracting this kind of semi-structured data but in the end each group of files requires a lot of custom work.
I haven't heard of SheetJS before, but now I will definitely look into it! I have recently started using Handsontable though and its absolutely fantastic.
That's really cool of you, patio11! Most open source stuff I've worked on... I'm happy if I get a "thanks!" from time to time. Sure, it has other benefits, and I wouldn't be where I am today without open source, but handing out some actual cash is very classy.
I wonder if more companies would really consider doing this, though - handing over money that you don't have to is not something I've seen a lot of. I've worked for people who don't even want to let the world they're using various bits of open source software, let alone contribute back anything.
Also, there is a concern that money can really change the dynamics of a community, but by and large, I'd rather see a lot more money funneled to open source than there currently is. For instance:
https://twitter.com/antirez/status/557851219088375808