I'm not ignorant of the needs and concerns of self-promotion in order to build a popular campaign...but I hope they have a technical advisor who will, at some point, inform them about the technique of OCR and how a large hashtag-watermark can obstruct such a technique.
Also, minor detail, but the images should also be rotated to their proper orientation. Crowdsourcing data collection has to be as frictionless as possible, and this is an easy fix.
Depending on how many actual documents there are (i.e. how many pages are in those 200 folders), it might be worth it to go the route of ProPublica's "Free the Files" project, in which they built a mini-app that let people voluntarily transcribe the important fields in each document:
> the images should also be rotated to their proper orientation
there's a rotate button when you click through to the detail page. We're tracking where images get rotated to, and setting the orientation according to that. It's still a bit buggy, but we're getting there
> the technique of OCR and how a large hashtag-watermark can obstruct such a technique
We're running OCR over non-watermarked versions. We're hoping to have a search function up later today
Thanks for the links -- we'll look at them, and see what we can use
Also, minor detail, but the images should also be rotated to their proper orientation. Crowdsourcing data collection has to be as frictionless as possible, and this is an easy fix.
Depending on how many actual documents there are (i.e. how many pages are in those 200 folders), it might be worth it to go the route of ProPublica's "Free the Files" project, in which they built a mini-app that let people voluntarily transcribe the important fields in each document:
https://www.propublica.org/series/free-the-files
Their Al Shaw wrote a piece about designing for efficient crowd-sourcing:
http://www.propublica.org/nerds/item/casino-driven-design
They even open-sourced the Rails plugin for it:
http://www.propublica.org/nerds/item/transcribable-free-the-...