Additionally, my company (link in profile) builds a commercial product to help people define and iterate on prediction problems in a structured way based off of the ideas in that paper.
You need a good random sample and lots of manpower to manually label them. Mechanical Turk [0] is one place to go for that man power if you don't have a "grunt work" team and are not willing to spend a few days doing it yourself.
There are also some methodologies out there that can help you label data sets more efficiently. I don't often see them used, but they exist. Look up "active learning" and "semi-supervised learning".
There were some folks in this area targeting generating annotation tools of various kinds. I believe the focus was on something reducible to a web interface but if you can label things going that route check it out https://en.wikipedia.org/wiki/CrowdFlower