Hacker News new | past | comments | ask | show | jobs | submit login

What are best resources for "defining and generating" labels? Any recommendations?



I don't know of a definitive public resource for this. I published a paper in IEEE's Data Science and Advanced Analytics conference on it back in 2016. You can find that here: https://dai.lids.mit.edu/wp-content/uploads/2017/10/Pred_eng...

Additionally, my company (link in profile) builds a commercial product to help people define and iterate on prediction problems in a structured way based off of the ideas in that paper.


You need a good random sample and lots of manpower to manually label them. Mechanical Turk [0] is one place to go for that man power if you don't have a "grunt work" team and are not willing to spend a few days doing it yourself.

There are also some methodologies out there that can help you label data sets more efficiently. I don't often see them used, but they exist. Look up "active learning" and "semi-supervised learning".

[0]: http://nlp.cs.illinois.edu/HockenmaierGroup/Papers/AMT2010/W...


There were some folks in this area targeting generating annotation tools of various kinds. I believe the focus was on something reducible to a web interface but if you can label things going that route check it out https://en.wikipedia.org/wiki/CrowdFlower




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: