Hacker News new | past | comments | ask | show | jobs | submit login
Extracting image metadata at scale (netflix.com)
137 points by _jomo on March 21, 2016 | hide | past | favorite | 17 comments



For their image resizing tasks, I wonder if they've tried anything more complex than simply cropping around points of interest, something like seam carving [0]. I imagine that it would be pretty cheap to run a bunch of different algorithms on an image and then A|B test it on Amazon mechanical turk.

[0] https://en.wikipedia.org/wiki/Seam_carving, https://www.youtube.com/watch?v=6NcIJXTlugc, https://www.youtube.com/watch?v=AJtE8afwJEg


Interesting. From the Wikipedia article: "A 2010 review of eight image retargeting methods found that seam carving produced output that was ranked among the worst of the tested algorithms. It was, however, a part of one of the highest-ranking algorithms: the multi-operator extension mentioned above (combined with cropping and scaling)."

Here is the paper: http://people.csail.mit.edu/mrub/papers/retBenchmark.pdf

Don't have time to read the paper, but I wonder if it didn't perform well because of the algorithm that calculated the energy levels?

Edit: The paper seems to suggest that an algorithm for retargeting of streaming video [1] was rated highest by human viewers.

[1] https://s3-us-west-1.amazonaws.com/disneyresearch/wp-content...


I'm somewhat surprised that the best known methods aren't neural nets.


In 2010?


Seam carving is cool! And there are even better methods out there - for anyone interested, this is a great starting point: http://people.csail.mit.edu/mrub/papers/retBenchmark.pdf

All of these methods have advantages, but it's pretty hard to out-weight the simplicity of cropping, especially if you're sensitive to bad results.


That looks amazing, and relatively easy to implement. However it seems that Mitsubishi owns a patent on it, so maybe we will start seeing it used in __ years when they expire.


Seam carving has been in Photoshop, and other applications, for years now (as is clearly mentioned in the linked Wikipedia article).


Seam carving is available via Imagemagick.


This is very interesting, but the real question is: how do you test which approach is better?

For example, in the text detection case there are almost unlimited combinations of transforms that you can put together. Usually you use some hybrid of gut feeling and results to decide, but I bet Netflix has enough data to make that call in a more principled way.

Would be awesome to hear about that. How do you create a labeled dataset? How exactly do you measure which approach is better? Is there a perceptual element to it, or is it all quantitative?

Edit: here's the related money quote from the retargeting paper linked in the other comment:

  "In terms of objective measures for retargeting, our results show that we are still a long way from imitating human perception. There is a relatively  large discrepancy  between such measures and the subjective data we collected, and in fact, the most preferred algorithm by human viewers, SV, received low ranking by almost all automatic distance measures."


don't think it actually talks much about how it does it at ' scale'. how expensive is it to perform these operations? are images cropped dynamically as they are requested or do they pre-process the images and cache it somewhere.

did they do anything clever to parallelize the process? what underlying technologies do they use...


From the code samples, it looks like OpenCV... which is pretty hard to beat for well-understood image processing algos like thresholdling etc.

I guess at this point you can do it "at scale" by throwing enough servers and caching at the problem :)


It would be more interesting to see what they are using to manage the servers and run opencv which is what the comment was asking .


The authors may want to consider how much of this work could be done easily and effectively with deep learning. For content-based search and image similarity, even simple, pre-trained convnets will likely crush the histogram-based approaches you have here.

Just run your images through Google Cloud vision to do the face detection and text detection. With 2M images, it will be cheaper than the amount of dev time you spent here, and you'll get excellent quality.


They explain that not all of the images they want are faces in this case, so you'd have to train your own on "interesting regions" (though there is some work in that area). Part of the challenge in that case is generating all your labels for what the interesting regions are. This way they don't need to generate labels, at least.


Youtube did something similar for 'interesting thumbnails' last year with deep nets (many uploaders do not specify a good thumbnail preview), and reported that it gave a nice performance boost.


My thoughts as well. With the recent advances doing image similarity and content based search should be fairly simple and probably more effective with pre-trained convnets.


It would be awesome if they released this data for another edition of the Netflix challenge, Then we could all try this ourselves!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: