An remarkably odd bit of timing: I saw your initial announcement 6 weeks ago, bookmarked your site for a project I've been working on, and just tonight finally got to the stage where I'm ready to use Docverter. I went to the site and was puzzled, because I'd remembered that this was a paid service, and I couldn't figure out whether I'd bookmarked the wrong link or had gone crazy. Then I checked the github repo and say everything committed ~10 hours ago, and lastly, checked HN only to see the #1 story is your announcement. It usually goes in reverse order! :)
So anyway, a slightly longwinded way of saying sorry that it didn't work out as a business (though I was about to sign up!) and many many thanks for open-sourcing it. I'll be installing in the AM and am deeply grateful.
Another startup operating in this domain is they offer Restful API's for many different conversions and document manipulations, plus basic plan is super cheap.
I have a feeling the parent works for Aspose, the site he is plugging. I use Aspose at work, so I find myself spotting these responses whenever something like this is brought up. Every time, an Aspose employee responds, plugging Aspose without mentioning that they work for them.
Aspose is pretty good, as it does not require Office interop to function. We've hit some limitations with it at work, such as dealing with PDF attachments and larger file sizes.
But anyway, yea. I just wanted to note that I think the parent poster works there. I made a similar response a while back and they actually Tweeted me in response to it, so I know they're on here. :)
Shameless plug, I have a webservice ( similar to your project, but I also added instagram like filters to the images and it converts word and pdf files to images, and I'm adding html and pdf as output formats and html as input format.
It's a pity that docverter didn't work (so far?). Have you tried broadening a bit the range of formats that you cover? Obviously (as I'm also working on the same space) I think there is a real need to cover here. Good luck!
My advice, for what it's worth, and based on my experience building and running, would be to continue to offer it free (in your case, open sourced) but provide the hosted version as a service.
For example: I'd very gladly pay a small monthly fee to use your API for my invoicing system which I'm working on right now. I already have a load of open source tech I rely on, and some things just make more sense to pay for (e.g. using GitHub instead of self-hosting something like GitLab - it's a tiny monthly fee that saves me hours of hassle). I'd much rather use a hosted API that I can integrate in minutes than spend hours potentially faffing about with installing stuff from the repository, and worrying about keeping it up to date, explaining it to outsourcers/team members, etc.
If I do end up using your solution in my biz, I'll gladly donate - make sure you have donate buttons up and prominently displayed!
I've implemented a very nice word to html converter previously but market research have shown that people are barely willing to pay for such a service. Maybe related consulting services can make it worthwhile for you. Good luck!
I did talk to some companies who could use it to speed up SEC/EDGAR submissions, but they were trying to get it almost for nothing and still wanted customizations.
Customizations..? Is this sass or stand alone package?
If it's multiple companies and they all want the same thing, I would do it.
That said, it's all about business development. Gotta put your sales hat on! Tell them you can do customizations but to match their price, you'll have to do recurring $y amount for at least z number of months with the first 1 month free for trying it.
Of course I'm making a lot of assumptions about your want of adding features and giving the first month free...gotta start somewhere!
Thanks for this. It boots pretty much out-of-the-box on our platform thanks to the Heroku support: - very cool buildpack use BTW
Why use MS Office for that? I've built a file sharing site that uses LibreOffice amongst others to convert files to mobile and tablet friendly previews.
Best part of LibreOffice?
1) It's fast
2) You can call it via a simple command to convert a file to another format. So easy as 1 2 3 to integrate in your code. Convert to HTML, PDF, whatever, you name it :)
3) It's free
Also, just check out pandoc. You'll love it. Abiword works too.
Calibre [1] can do a reasonably good job on most types of PDF files,
but a lot depends on the type of PDF file you want to convert. PDF
is essentially a container format, and as expected, it can contain
a whole lot of different types of data such as images, text, fonts,
scripting, and much more. The results you'll get from Calibre (or
any other conversion tool) will depend heavily on the types of data
within the PDF file you want to convert, and also on what kind of
output you want to generate.
Not really. The problem is that PDF is basically a destination format. Converting to PDF strips all of the semantics out of it, leaving you with plain text, fonts, and boxes. The latest versions of the official Adobe Acrobat Reader are able to convert PDF to Doc but I have no idea what the quality is like.
Every time I have used Acrobat to convert PDF to Word, the only usable parts have been the tables. The rest is generally garbage.
Fortunately, the tables were the only parts I wanted! I needed to get them from the PDF into text (csv) form. So, from Word, I copied the tables, pasted them into Excel, and saved that as csv. Easy as 1-2-3-4-5!
So anyway, a slightly longwinded way of saying sorry that it didn't work out as a business (though I was about to sign up!) and many many thanks for open-sourcing it. I'll be installing in the AM and am deeply grateful.