Nice to provide hardware hints and designs but geez that is almost the least of ...

BeetleB · on June 1, 2021

Back in 2012, there was a guy who started an open source project that did exactly this - he wrote it specifically for the DIY Book scanner. It had a local Django project as the interface. I don't remember the details, but it did a decent job of taking the images, OCR'ing them and creating an output PDF.

I believe he abandoned the project some years later as life got busy and he never found enough volunteers to help him.

Would have to go through my email records to find the name of the project.

markvdb · on June 2, 2021

Spreads[0] is probably what you are referring to. Some backstory:

I saw the diybookscanner community - which at that point mostly had Daniel Reetz [1] as its active contributor- struggle with mechanical contraptions for triggering cameras and very little software experience. I built a simple proof of concept to reliably trigger cheap consumer cameras using software. I built it on CHDK[2], the Canon Hack Development Kit, alternative firmware for cheap consumer cameras. The proof of concept worked.

I then had a fairly large number of book scanner kits built and shipped mostly around the EU [3]. More of a work of love than a business really, even if it was formally under an llc umbrella. Johannes initially was just a customer. He wanted to build a better software solution, and within the spirit of the project did so as free software. I tried to support him at this as well as I could, setting up build infrastructure, trying to reel in more people, getting him some cameras to test, get the amazing CHDK people to port to new camera models, ...

Then real life intervened indeed.

Johannes, if you read this, I'm still grateful for the experience of having worked with a great developer like you!

[EDIT] And of course, I should also mention Dan Reetz' incredibly inspiring work bootstrapping an incredible open hardware project! Hats off!

[0] https://github.com/DIYBookScanner/spreads

[1] https://danreetz.com/

[2] https://chdk.fandom.com/wiki/CHDK

[3] http://diybookscanner.eu

daniel_reetz · on June 2, 2021

Hi Mark! We had electronic and USB triggering working with SDM and CHDK before you joined. But no image transfer or control of settings by USB. We deliberately pursued mechanical triggering for places where a computer and crazy firmware wasn't an option. I donated quite a few scanners to projects and people who simply couldn't use that stuff at the digitization site.

Johannes (spreads) was one of the most inspiring people I've ever worked with, so thankful for the energy and intellect he brought to the project- and the software he built. I donated a pair of DSLRs to him as a thank-you. Last I heard he was still working in a related space, but at a higher level.

Personally, I left the project to join Apple (they refused to let me continue any work on the open- source project while I was employed there), and gave Jonathon (tenrec) control. He redesigned the scanner again and sold kits as well as produced a Raspberry Pi based controller with nice software. Seems he has closed the store.

jbaiter · on June 2, 2021

Hey Mark, hey Daniel, Johannes here, thanks for the praise, you're making me blush :D. The inspiration was mutual, I remember the time working on spreads and with you guys very fondly, learned a lot from it.

I'm still active in the "digitizing books" sector, albeit now officially employed at a library and more concerned with what we can do with the books after they're scanned :-)

mdaniel · on June 2, 2021

> (they refused to let me continue any work on the open- source project while I was employed there)

I somehow thought that was illegal for residents of California, and IMHO should be illegal nationwide on general "not indentured servitude" grounds. Then again, I guess who wants to go up against Apple's legal team to find out

markvdb · on June 2, 2021

Hello Dan! Nice to see you here!

Rereading my previous comment, I realise that part of it could be misinterpreted. The diybookscanner project was wonderful to be part of, to contribute to.

One thing I realise was particularly impressive about diybookscanner.org was how much time you spent making it an environment friendly to broad experimentation and tinkering. Exploring broadly was absolutely necessary for a project like this. Mechanical triggering, the SDM experiments I had forgotten about, lighting, glass experiments, and more.

You sowed some powerful seeds. Your effort nursed diybookscanner.org into something that still speaks to the imagination of so many people. I feel privileged to have been part of that, and I'm more than happy to give you full credit.

BeetleB · on June 2, 2021

Actually, it was a project called Paper Upgrade. Here is an old archive link:

http://web.archive.org/web/20140101000000*/http://www.paperu...

I don't know if you can find the code through there, but I'm pretty sure he had made it free. I think spreads is a bit newer.

Edit: Found some more info. It did indeed use Scantailor in the backend. His SW was more of a Web based frontend to all the parts. You can see a video demo of it here:

https://www.youtube.com/watch?v=Ad7aFYdbDos

Start at about 4:40.

The source is here:

https://code.google.com/archive/p/diy-ebook-creator/

daniel_reetz · on June 2, 2021

Paper Upgrade was an awesome project and the author changed my life. I met my future wife on the plane while flying to visit him and donate a book scanner.

BeetleB · on June 2, 2021

Wow. He spoke highly of you when I met him. My guess is you had not yet married, though - it was shortly after your visit - perhaps a month or two.

And yes, Paper Upgrade was awesome (especially if he did all the work on it). I was sad to hear he had shut it down.

Oh, and I'll make a note to myself never to work for Apple ;-)

fernly · on June 1, 2021

Not the same as Scan Tailor[1,2] ? Which was referenced from the Instructables link cited earlier. That apparently was a comprehensive toolkit in C++ and Qt, now archived.

[1] https://scantailor.org/

[2] https://web.archive.org/web/20210304015939/https://github.co...

jccalhoun · on June 2, 2021

There are a couple folks that forked scantailor. I'm not sure the status of those. Here are a couple: https://github.com/4lex4/scantailor-advanced https://github.com/trufanov-nok/scantailor-universal

hackeyed · on June 2, 2021

Right, step 1 -> get page images, step 2 -> author images into book file. While OCR is obviously useful for search, a rotated phone screen will let you comfortably read a pdf book just fine unless you are talking about something like a textbook, in which case you probably wanted a tablet anyway.

I wrote up a guide on the authoring process using FOSS tools for some Digital Humanities folks a couple years ago: https://github.com/wikey/bookscan

It gives some background on the problem and covers a Scantailor (page crop, rotate, deskew), pdfbeads (compression, book metadata) authoring workflow, with pdftk for some general odds and ends.

jccalhoun · on June 2, 2021

scantailor will get you most of the way there. the original project is dead but there are a few forks on github. It has been a while since I did any serious scanning so I can't remember which version I used. https://github.com/4lex4/scantailor-advanced https://github.com/trufanov-nok/scantailor-universal

Mediterraneo10 · on June 2, 2021

I scan heavily from academic libraries in order to contribute to LibGen, but even with Scantailor it is very time-consuming. For example, if you are scanning scientific literature from the Eastern Bloc, it was often printed on low-quality, speckled paper, which means Scantailor often identifies too much of the scan as the page block, and then you have to manually tweak the rectangle.

walrus01 · on June 1, 2021

The simplest of which would be to turn the images into a multi page raster PDF, using freely licensed linux based command line tools for PDF generation. Which will of course result in a rather large file size vs doing OCR, but might be the best preservation method for books with illustrations, unusual fonts, catalogs, mixed text and photos, etc.

I am not clear on to what extent the existing workflow does a de-skew of the camera images to deal with page curvature towards the spine.

I think I recall the Internet Archive having an open source design for something similar to this? And other projects which accomplish generally the same idea.

vixen99 · on June 2, 2021

Just page images? No, Czur software with its OCR generates searchable pdfs and Word or Excel files with no further input. With careful attention to the scanned area, it's easy to get .xlsx files needing zero or minimal editing. The other advantage of the Czur is the automatic correction for curvature when scanning books with narrow margins on either side of the spine.

No, I have no connection with Czur - just an enthusiastic user!

ajot · on June 2, 2021

This is an old article, so maybe some software isn't the best option nowadays, but you can get the idea of postprocessing: http://natecraun.net/articles/linux-guide-to-book-scanning.h...

rahimnathwani · on June 1, 2021

https://www.instructables.com/Bargain-Price-Book-Scanner-Fro...

"Step 10 - Post-processing" has some steps

sandeep1338 · on June 2, 2021

Video looks interesting, I'll check it out!