Right - you don't send to Mechanical Turk. You instead send it to a highly train...

jrabone · on April 25, 2012

You can always require Mechanical Turks to pass qualification tests before they work on your HITs. There are also pre-existing qualifications like language and location.

Of course, you need to pay well enough to get enough people to bother, but I imagine one could build a reasonably skilled workforce for many areas if your training material is good and you have an objective measure of the correct answer.

This image description application is both perfect and terrible for Mechanical Turk though - it's an ideal task for a human rather than a computer, but it's also impossible to score the result objectively, so you'll have to pay everyone, all the time, or introduce another level of scoring - "Is this an accurate description?" "Does this description read fluently in $LANGUAGE" etc.

anandkulkarni · on April 25, 2012

Indeed, we do this kind of work today in MobileWorks, using exactly this fact.

Once your workforce is skilled all kinds of new opportunities of this nature open up.

ghshephard · on April 25, 2012

Nice. I guess we don't need to look too far to see that soon-to-be-billion dollar company then. Have you considered doing real-time feedback to a device that has remote telemetry (video/sound/pictures/etc...)?

anandkulkarni · on April 25, 2012

We've done some work in this vein already, as a matter of fact: real-time communication between the crowd and cameras / robots. We're currently supporting a few fast camera-driven smartphone applications that developers have built on the platform.

I think what you're after, though, is more comprehensive integration: a remote telemetry system that has crowd intelligence baked into its circuits along several media, rather than one: analyzing simultaneous audio and video. I don't think this would be difficult for a developer to build, and it's a great idea.

mbesto · on April 25, 2012

So why is this "technology" ground-breaking then? I thought the point of this is that it that you 1. take picture 2. send over the internets 3. wait 3~6min to get a response on what the picture is about.

Try scaling this out..."highly-trained" means poor scalability and lots of liability.

ghshephard · on April 26, 2012

The technology isn't ground breaking - it's the combination of five or six technologies that are maturing at the same time that make this sort of application (real time human assessment of remote telemetry) work.