Improving YouTube video thumbnails with deep neural nets

Scaevolus · on Oct 10, 2015

I wonder if they included sexy images in their negative training sets -- many videos accrue millions of views (and ad dollars) by having a few frames of cleavage interspersed with other (often derivative) footage.

It would be great if their algorithm picked a thumbnail that reflected the entire video, not just a few frames specifically chosen to game people's compulsive clicking.

nacs · on Oct 10, 2015

Most of those are just manually selected thumbnails by the uploader. After uploading, YT gives you 3-5 thumbnails you can choose from.

Also, partnered accounts are allowed to upload custom thumbnails (which can be any image, not necessarily even a screenshot from the video).

tshaddox · on Oct 10, 2015

Not just partnered accounts. My YouTube account lets me upload custom thumbnails, and I'm certainly not followed. I have maybe a couple dozen videos with maybe a few hundred views among them all.

ZeroGravitas · on Oct 10, 2015

Is this definitely not algorithmic? I've been noticing for a while that videos might have an incidental flash of cleavage and then that is used as the thumbnail. I'd always wondered if this was arising "naturally" somehow (people pausing that scene perhaps?)

On the basis of the type if video I'd discounted manual intervention. Though if people can just upload any image I'm now surprised that they're not all like this.

JoshTriplett · on Oct 10, 2015

> After uploading, YT gives you 3-5 thumbnails you can choose from.

Can you pick an arbitrary video frame, or only one of the suggested thumbnails?

nacs · on Oct 10, 2015

It automatically captures 3 different thumbnails (I guess using the algorithm in OP) and lets you select any 1 of those 3.

lumpypua · on Oct 10, 2015

I presume they use the image selection as training data too—if not that seems like awfully low hanging data fruit.

chadzawistowski · on Oct 10, 2015

Many videos seem to have completely arbitrary thumbnails which are not from the video. Most of the Epic Rap Battles videos, for example.

Perhaps this option 'unlocks' after you reach a certain subscriber count.

JoshTriplett · on Oct 10, 2015

As mentioned upthread:

> partnered accounts are allowed to upload custom thumbnails (which can be any image, not necessarily even a screenshot from the video).

gizmo686 · on Oct 10, 2015

They stated that the negative training set was constructed by randomly sampling frames from the video.

If someone wants to game the thumbnails, then they will just manually select the thumbnail to use; and there are to many legitimate use cases for this ability for Youtube to remove it.

JoshTriplett · on Oct 10, 2015

> If someone wants to game the thumbnails, then they will just manually select the thumbnail to use; and there are to many legitimate use cases for this ability for Youtube to remove it.

Many channels I watch carefully select an iconic frame from the video to serve as the thumbnail, or construct an artificial thumbnail that provides useful information about the type and subject of the video. Manual will frequently produce better results than automatic for a good-quality channel.

pupulon9 · on Oct 10, 2015

Is there a way YouTube could alter the "view count" to only include views where 100% of the video has been watched? May help cut down on videos with misleading thumbnails and/or titles.

JoshTriplett · on Oct 10, 2015

> Is there a way YouTube could alter the "view count" to only include views where 100% of the video has been watched?

You wouldn't want to require 100%, as many people stop when a video starts rolling credits, or when it switches to a screen using annotations to link to other videos. But 50-75% would work well as a threshold to count "views".

nacs · on Oct 10, 2015

A better way to filter those out algorithmically would be to simply look at the thumbs-up vs thumbs-down ratio.

The ones with misleading titles/thumbnails often have far more down-votes than up-votes yet YouTube continues to show those as the highest recommended/relevant (I guess Google prefers click-throughs over user-satisfaction).

dredmorbius · on Oct 10, 2015

Both explicit (thumbs up/down) and implicit (click-aways / closing window) may count toward quality.

There are other confusing cases. I watch a lot of long-form videos, some too long to view in a single session, many of which I download for offline viewing (yt-download). I've been quite actively dissuaded from either publicly rating videos, or even linking to YouTube itself on my primary social channel (G+) given the Anschluss forced-marriage between YouTube, G+, and what had once been individual and separate accounts (similar logic applies to Google Play, and I've taken to "registering" my Android devices under randomly generated usernames).

For videos I particularly like, I may reference them, but only specific portions which I skip to, view, and then close. That's far less than a 100% view, but still significant.

It's not that I'm opposed to providing appropriateness and quality data to YouTube. I absolutely give massive shits about who they share that data with, and how. The "make it all public" default is utterly fucked in the head.

I think Google are starting to realise that.

JoshTriplett · on Oct 10, 2015

> A better way to filter those out algorithmically would be to simply look at the thumbs-up vs thumbs-down ratio.

> The ones with misleading titles/thumbnails often have far more down-votes than up-votes

Especially once the total votes pass a certain threshold. Below a certain threshold, any activity makes something interesting; you wouldn't want to let a handful of downvotes bury something early on (as in, 4 upvotes and 6 downvotes). But once you hit the hundreds or thousands of votes, the ratio should take over.

Animats · on Oct 10, 2015

It looks like they prefer images with a few large faces near the center of the frame. That's probably the right answer for social media. (Plus a cat recognizer.) Used on news footage, you probably get the talking head rather than the news event.

anjc · on Oct 10, 2015

We can't guess as to how the NN is preferring images, but it looks to me like it's preferring images with a high entropy in certain regions

trjordan · on Oct 10, 2015

There's an outside company that was working on this: Neon Labs (https://www.neon-lab.com/).

Their insight is that not only are there images that are "high-quality", but also images that are positive. Positive images get more clicks, over just a decent image. I wonder if that information is encoded in the RNN in some way.

(This is where I'd normally rant about RNNs and other ML techniques hiding this information from their creators by locking it up inside the black box, but I'll save that for another day.)

mutagen · on Oct 10, 2015

They've got to be training on more inputs than mentioned. For example, is one or a close set of times in the video linked externally and generating traffic? Grab the entire set of frames from that time period and run it through the quality classifier, there might be iconic frames from that section that people are looking for.

Are people re-watching a small segment of the video? Try classifying individual frames from that segment or just before. Of course, those are often action moments that result in smeared motion and artifacts and may not result in a quality thumbnail.

These ideas also only come into play when a video has been live for a while, after the uploader has initially picked a thumbnail. Maybe a "We have some new thumbnail suggestions for you, take a look" alert or message?

needBigrPics · on Oct 10, 2015

So, in an article about image processing, why not include nice big beautiful images, that get even bigger when you click on them?

I click on the low detail inline images, and they stay the same disappointing size and reveal no further detail.

They're all, like 600px X 200px? Am I being greedy for want of gigantic images, upwards of 3000px wide?

I suppose it is an article about thumbnails, after all, so maybe I shouldn't be so surprised.

Nyetan · on Oct 10, 2015

Seeing this run through an equivalent of the deep dream visualizer could be really interesting -- what _are_ people looking for in thumbnails? I'm having difficulty imagining what features would even be relevant in such a situation.

kylebgorman · on Oct 10, 2015

I'm guessing: "sharpness" of image, good saturation, presence of (smiling?) human faces, non-human mammals facing the camera, bare human skin (?)

(I agree that'd be cool.)

mdpm · on Oct 10, 2015

Meanwhile, I still can't edit a playlist while playing it.

edit: constructively put - there's simpler stuff to fix UX and match user patterns still isn't there?

GhotiFish · on Oct 10, 2015

When you have a big system, the most consistent argument against working on one thing is that you should be working on something else, this is true for everything in the system, because everyone has a different opinion on what that thing is.

For example: why should you spend time working on the playlist playback when youtube could instead spend time working on automatic categorization, content creators have to manually create playlists, even if they sequentially number their videos. Youtube shouldn't waste their time on playlist editing when it could be doing the right thing automatically.