It shouldn't take an intern too long to collect a representative set of Congress...

jeremyjbowers · on June 7, 2018

Hi! I'm Jeremy, one of the developers.

We'll probably work on something like this for the next version. One reason it's harder than you think: We would have to buy / own rights to the photographs before we could use them to train -- most of those photos are owned by Getty or the AP. And our own photographs are perfectly lit and square, which made them awful for training face recognition.

The other hangup (which I didn't get to in the article) is having to add / remove people. New members are constantly being added and that's a maintenance burden for us. Amazon usually has the new member within a day or two. (Our team is very small and we have a lot of other responsibilities!)

But good points, definitely.

WrtCdEvrydy · on June 7, 2018

> We would have to buy / own rights to the photographs before we could use them to train -- most of those photos are owned by Getty or the AP.

I think your model would be covered by derivative art... unless you started selling the model itself.

hooloovoo_zoo · on June 7, 2018

"We would have to buy / own rights to the photographs before we could use them to train..."

Is this actually true?

pbhjpbhj · on June 7, 2018

In USA I don't think it is because the end use is transformative [1].

In UK it would be tortuous because it relies on Fair Use to temporarily store the images in order to extract the facial structure data. Fair Dealing is really draconian in comparison.

[1] https://www.lib.umn.edu/copyright/fairuse

happyopossum · on June 8, 2018

Really doesn’t matter - the legal team at the NYT thinks it might be, and lawyers exist to tell people “you’d better not”.

acct1771 · on June 8, 2018

And it's our job, as someone who knows what a computer is, to move forward with common sense if they're overreaching which, is their job.

They have every incentive to be as conservative in their advice as possible, and no incentives to "allow" risks. Doesn't increase their compensation any.

mxuribe · on June 8, 2018

But...doesn't congress maintain some sort of api for available bio data - possibly including photos - to avoid the maintenance issue brought up in the piece? A quick google search shows propublica has sych an api, and it seems to have originally been developed by NYT... https://projects.propublica.org/api-docs/congress-api/

acct1771 · on June 8, 2018

> We would have to buy / own rights to the photographs before we could use them to train

Please do elaborate on who's enforcing this mindset on you/your team.