Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Blade Runner re-encoded using neural networks (vox.com)
151 points by signa11 on June 3, 2016 | hide | past | favorite | 66 comments



This is the original article by Terrence Broad:

https://medium.com/@Terrybroad/autoencoding-blade-runner-889...

and its appearance on HN the other week:

https://news.ycombinator.com/item?id=11766063


Thanks; forgot about that one.


This article alludes to, but never actually brings up, something much more interesting than the project in question:

How can copyright coexist with human-level automatic analysis and synthesis of works of art?

For example, Spotify is flooded with "covers" of hit songs that are made to sound as similar to the original as possible. In my understanding, it doesn't matter how similar it sounds, as long as it's "remade", it's considered a new work of art. So what happens when the cover song process can be automated? It seems likely that courts and lawmakers will need to define a sphere of similarity around each work of art, within which all points are considered copies.


Music has that sorted already.

There is copyright.

There is publishing right.

If you make a cover of a recording, it is the latter that you are involved with. You'd have to go to the publishers of a piece of music to license the right to publish your interpretation of the work.

Obviously this post simplifies, but it's already answered.


There's some questions I don't think this addresses, though.

Right now, I have the ability to produce covers of lots of songs. That's addressed in law: if I make a recording, I need to get a mechanical license, if I'm performing at a certain scale I or the venue need to have an agreement with the right publisher for performance rights.

At a personal scale, though, I can perform covers in somebody's living room all day and nobody will care (or would be capable of doing much about it if they did). In fact, once I went looking for an agreement with a publisher for covers a small scale venue and they told me not to bother, because either the venue itself already had the agreement in place or it didn't matter.

But what happens if/when we everyone can have software that is capable of performing a cover of any work it's familiar with? Even if it doesn't have any particular digital copy directly encoded in any way we might now recognize?

If the law expands its definition of copyright to cover that situation, we're getting into territory in which it may be a copyright violation for me to remember a song. Or if enforcement of performance licensing laws starts to extend down to living-room scale performances, we're getting into really invasive surveillance.

But if that doesn't happen, those personal-performance-scale one-off covers (not stored, just part of the capabilities of the particular learning-machine) are probably going to be legal in the same way that my living room performances are right now.


If you had to get permission, these millions of sound-alike tracks wouldn't exist. There's a compulsory license for cover songs, but the question becomes: is it still a cover if a machine does it? Or does it infringe the recordings' copyright since it's merely a "processed" version of the original recording? And how would you tell the difference between a human-cover and a machine-cover? Interesting times ahead...


Re: publishing rights, you can make a cover version of a song using a "compulsory mechanical license", whether the owner likes it or not.


So are remixes allowed, or do you need permission from the original publisher?


For a remix, you would also need the permission to use the specific recording, (which can be declined) as well as rights from the song owner.

For example, If you wanted to cover Viva la Vida by Coldplay, you do it and pay the mechanical licensing fees. If you wanted to sample/remix it the recording of Viva la Vida, you would need to contact Capitol Records, who could decline or request specific payment. If you wanted to sample or remix the performance of Viva la Vida from Super Bowl 50, you would probably need the rights from CBS.

https://loudr.fm/faq

http://www.ascap.com/playback/2011/01/features/limelight.asp...


Assuming the remix uses the actual recording you need permission from the person who made that recording (the recording is covered by copyright).


Corner case: Neil Innes' songs for The Rutles. They are so clearly not covers yet so perfectly recognizable.


I'm pretty sure the law as it stands is robust enough to deal with these issues on the basis of similarity vs substantive difference. A cover usually sounds quite different to an original, enough so that a fan would certainly not consider them the same thing.

OTOH, simply using some well recognised phrases (not necessarily a sample) can infringe on the mechanical rights, which are the copyright on the composition as distinct from the performance.

Also consider fake paintings and photos. Recent cases have shown that even dong things like re-staging an iconic photograph or transferring the image to another medium can constitute copyright infringement. As ever, it comes down to the duck test, generally in the opinion of a judge.

An automated re-encoding of a movie would almost certainly be at least a derivative work (if say, you applied a deep-dream filter to each frame) or at worst a straightforward lossy copy like any other encoding.

In order to be considered a new work, there would need to be some substantial reinterpretation of the original. It shouldn't matter whether that reinterpretation is done dy a human or AI, but current AI just isn't capable of doing that yet.

Disclaimer: IANAL, so treat my understanding of copyright law with the usual level of scepticism applied to Internet comments.


So then the algorithm for creating a cover should include a way to detect whether the song is still within the sphere of similarity. This could be used, for example, to iteratively come to a point where the song is no longer considered similar.


The AI can get as close to the similarity line as possible without infringing on copyright and make the final output even better. This could be applicable for everything from music to patented product designs.

Lyrics excluded, there are plenty of human cases where the cover of a song ends up better and more popular than the original. Hell, many of the most well known singers aren't even very good but just a lot better at marketing than say a classically trained opera singer.

Rather than having a conversational AI that can answer complex question the most immediate and low hanging commercial fruit for AI based companies may very well be along the lines of sucking up intellectual property and spitting out stuff that is as good or better. This might sound like theft but it is exactly what every human content producer is doing.


I found this article talking out what someone has to do to secure the rights to make a remake of another work http://www.movieoutline.com/articles/how-do-you-secure-the-r...

It basically points out that even shot-for-shot remakes are violating various copyrights, especially the copyright for the screenplay.


The question reminds me a bit of the idea behind https://en.wikipedia.org/wiki/OFFSystem

It was a fun project back in the day :)


Well that's a derivative work: https://en.wikipedia.org/wiki/Derivative_work


Can anyone explain what does this encoder actually does and how is it different than any other encoder? The article almost seems to make it sound that this encoder somehow reconstructed the film...


This is possibly the next generation of video/audio/image codecs.

What he did was create a specialized compression algorithm that works very well to compress the data that is each frame of Blade Runner, and decompress it (lossily, like mp3) back into a video stream.

To put this into perspective: Blade Runner is 117 minutes long. At 25 frames per second, that is 175_500 frames.

As he says, the input data he used was 256x144 with 3 colour channels, meaning each frame was 110_592 bytes. This results in an input amount of 18_509 MB, uncompressed.

His neural network compresses each image down to 200 floats though, i.e. 800 bytes. So the whole movie as compressed by the NN is 132 MB.

A friend of mine who works with neural networks estimated his NN to be roughly 90MB at 256x144, so the storage needed for the movie he created is about 222 MB.

This means that if this technology can be made fast enough, and the reproduction high enough in fidelity, we could be looking at replacing compressors (Fraunhofer MP3, Lame, MP4, DIVX, JPEG) made to handle all types of input equally well; with compressors that can are so good for one specific set of data that compressor + data is smaller than anything the traditional compressors could create.

And even if its not fast enough, it can still be a very efficient compressor, trading size for CPU.

Plus, lastly, the NN itself can be compressed as well, and classic compressors can possibly be applied to the frame data.


"with compressors that can are so good for one specific set of data that compressor + data is smaller than anything the traditional compressors could create."

This is well, hopeful but probably wrong ... because we already know how to make it smaller for this type of data.

The algorithms used for interframe/intraframe prediction are chosen to tradeoff speed vs size. If you built a really large-scale predictor that was able to generate very small representations for changes, you would get what he's built.

(Note that encoders already complexly select from tons of different prediction algorithms for each set of frames, etc)

We can already do this if someone wanted to. We just don't.

Because it's not fast enough (and nothing in that work changes this)

Would it be useful to apply NN's to video encoders to better select among prediction modes, etc. Probably. But that's already being done, and is not this guy's work.

"And even if its not fast enough, it can still be a very efficient compressor, trading size for CPU. "

The problem you have is not just compression time. It's decompression time. The bitstream of H265/etc is meant to be decodable fast.

What this guy is building is not. If you were to make it so, it would probably look closer to a normal video codec bitstream, and take up that much space.

In fact, he hasn't built anything truly new, he's just using existing papers and making an implementation. He also says, in his masters thesis, that is primarily an artistic exploration.

Even with hardware decoding, you can only make stuff so fast.

TL;DR While interesting, there are people working on the things you are talking about, and it's not this guy (at least in this work).

I would not expect magic here. We already create video codec algorithms by trading off cpu cost and size. The trick is trying to get better size without increasing CPU cost significantly. As these resources change (and remember, moore's law is pretty much dead), the video codecs will change, and videos will get smaller, but you aren't likely to see serious breakthroughs. We already could produce very small videos by applying tremendous amounts of CPU power.


It wouldn't be decoded on the CPU, but on the GPU. Or even specialized hardware. As convnets are being used more in image processing, that isn't too unrealistic. There is interest in making specialized consumer hardware for convnets.

And it doesn't need to work on every frame, it could pump out I-frames every 15 seconds or so.


Except that in my understand, the decoded version looks like shit. So, while optimistic that this technology might be quite good eventually, more heuristics are probably needed for the right way to extract an optimal encoding.

e.g. see the screenshots here:

http://www.eteknix.com/blade-runner-gets-trippy-auto-encoded...


As part of a compression suite, that may not matter all that much. I can think of a hybridized version of this that starts with this ANN and then applies 'fixes' by a traditional video codec stream. If the ANN can represent most of the bits of the frame pretty well, the ANN + Codec data can match a traditional codec-only approach with substantially less data.


The thing is that diff between the film and "encoded" version look pretty fuzzy/muddy and so compressing that diff would have to be difficult.

Anyway, the article doesn't mention this idea, the first hn discussion had someone claim some other ridiculous use of the fuzzy/muddy images that are supposed to mean something and ultimately I'd say it's an example of "my research project has cool images, I can use them for publicity".


This is only a proof of concept, with a very small input size, and very strong compression. The impressive thing is that it works at all.


Right, but my point is that your numbers don't indicate anything about the actual potential as a compression format. It is certainly possible to use it that way, but there is still much that is uncertain about whether a neural network can do better than hand-crafted compression formats. So it's a little early to call it "next generation." It is literally meaningless to point out that "his neural network compresses each image down to 200 floats", since it doesn't actually work well with such a small latent space.

It's not exactly unknown that autoencoders can reconstruct images, so I don't see why it's "impressive that it works at all."


> So it's a little early to call it "next generation."

Please don't ignore words other people write, or at least reread before hitting post. I did say possibly.

> I don't see why it's "impressive that it works at all."

I thought it was implicit from my explanation that the "works at all" includes the fact that the data sizes are reasonable already. If we had the same result, but the intermediary data was 18 gigabytes, then there would be nothing impressive about it. As it is, we're a lot below that, before further compression, so it is.

Remember: Prototype, Proof of Concept. You're looking at the first step, not the last step. You're looking at a motor carriage, not a Tesla.


> I thought it was implicit from my explanation that the "works at all" includes the fact that the data sizes are reasonable already.

Right, but why is that impressive if it doesn't actually result in a good reconstruction? I can take any collection of numbers and summarize it with the mean value, that doesn't imply that averaging is a good compression method.

If you consider this proof of concept, then what, exactly, concept does it prove? That statistics can represent a dataset?


You would store the diff between the reconstructed version and the original. The diff could be compressed to much less space than the original image, because the NN can predict most of the pixels. And get within a few bits of the rest.


This network was trained on a single NVidia 960 GPU for two weeks, I wonder how much better it could have been with more computing power thrown at it?


It sounds to me, almost superficially, that this technique might benefit from work done in compressive sensing. The mechanism of encoding and decoding seem very related.


Is it like a virtual, self modifying, FPGA for compression?


As far as I can tell, it just sent it through a deep learning network. Cool, but not exactly the same as an AI reconstructing the movie. It's just a distorted version of the original, and Warner Bros was not entirely incorrect in requesting its removal (though it doesn't hurt them much to allow it, because it's so heavily distorted).


Autoencoders are a special type of deep neural network where there is a small middle layer, and the labels are the same as the input.

The idea is you train the network to reproduce the input, but it has to pass the information through the small middle layer which has relatively few neurons. Thus the network learns a good (hopefully) way to represent typical inputs with not much information. You have automatically generated an encoding (hence the name).

You're right it's not really 'reconstructing' the film in the way they imply. But it isn't just a distorted version of the original either. The really important question is how big the middle layer is (and hence how much it has learned / how good the compression is). Unless this is just done for artistic purposes, which seems to be the case.


It applies the unsupervised learning technique of https://en.wikipedia.org/wiki/Autoencoder

As for why it's important and not just another encoder like mpeg4, see my response at https://news.ycombinator.com/item?id=11830437


I believe it's similar to Google's deep dreaming created images, but with videos.


This technique is quite different from deep dream. An autoencoder is a model that is trained to compress then reconstruct an image. After seeing and trying to reconstruct many images, it learns how to make a "compressed representation" from original images by going through this compress -> uncompress scheme, which typically results in blurry/lossy reconstructions of images which are very different than the normal artifacts we see in lossy compression schemes like jpg, etc.

In deep dream, a network whose goal is to predict image categories is "run in reverse" and used as a generative model, but it was not intended to be generative from the outset. Many times autoencoders are intended to be generative, or at least learn relationships between images (sometimes called a latent space/latent variable), where classification tasks care more about just getting the classification correct - no explicit concern on learning relationships between categories/images in those groups.


Would it be possible to make it look better or even the same by increasing digits per frame (eg 1000 instead of 200)?


Yes, although the particular technique used here (variational autoencoder, or VAE) doesn't benefit from increasing the code space due to a particular penalty (KL divergence against a fixed N(0, 1) Gaussian prior) during training. So even by making it go to 1000 digits in the code space, the model will probably still only choose to use 5 or 10 dimensions unless the KL penalty is relaxed, making it closer to a standard autoencoder.

An autoencoder with a much larger code space could be an improvement, or some newer works such as Adversarially Learned Inference / Adversarial Feature Learning [0, 1] or real NVP [2, 3] could probably do a much better job at the task, at the cost of increased computation.

Also something like inpainting parts of the frames with pixel RNN [4] would be interesting.

[0] http://arxiv.org/abs/1606.00704

[1] http://arxiv.org/abs/1605.09782

[2] http://www-etud.iro.umontreal.ca/~dinhlaur/real_nvp_visual/

[3] http://arxiv.org/abs/1605.08803

[4] http://arxiv.org/abs/1601.06759


Given that Adversarially Learned Inference came out yesterday, the best approach may be to just wait a couple of months and see what the state of the art is then.


Sure - that is always an option especially in deep learning right now. But this current crop of models (counting in DCGAN and LAPGAN/Eyescream) has really made a leap in my eyes from before "oh cool generative model" to "are these thumbnails real?". They are really generating a lot of cohesive global structure, which is pretty awesome!


Yes, technically yes. Although I'd say this is not what this experiment aims to.

In the future we may be able to colouring b/w movies and increase fps via this automatic software.



The worst part about this encoder is it thinks it's dealing with static pictures in plenty of cases. The version on the right simply has no movement while movie continues to play on left. Also, faces don't move when one person speaks. That's a no go even if the picture quality didn't look like crap.


It depends what your goal is. This is quite impressive for compressing each image down to 200 bytes.

If you were actually using this for compression, this could be useful.You could store the "diff" between the reconstruction and the actual pixels, which would take many fewer bits than storing the whole image. Video compression already does something like this, storing a frame, and then the following frames just being diffs to the pixels of the previous frame.

The goal of this project seems to be more artistic, so the poor quality might even be desirable.


It's not impressive to get an image small if you delete a bunch of critical stuff out of it plus warp what remains. Ill agree on the art project part, though.


What do you expect from compressing an image to 200 bytes? It's impressive that it has as much detail as it does.


I expect that decompressing the image produces one that looks close enough to the original to meaningfully call it compression. This I call mostly image deletion.


Bullshit... this is still encoding. It does not matter that it is using neural network to encode frame. Guys, when you get over this neural network hype? I can as well create and encoder using some machine learning, to create a blurry and inferior version of the movie. But what's the point? I could as well used other prediction method with some kind of memory, much superior and nobody will enjoy it. Summing up: nothing impressive here.


> But what's the point?

Autoencoders don't really have much practical use now, but that's not the point of them. The point is we really want to figure out how to do unsupervised learning well and autoencoders are one of the few ways of doing it.

We want to do unsupervised learning well because most learning that humans do is unsupervised.

The idea behind autoencoding is that by forcing the network to try to learn efficient ways to compress the data, it could learn important features of the data. The fact that the pictures are so blurry means that this doesn't work very well, but that's why it's a research problem.

Autoencoders don't work well, but some unsupervised techniques that extend on them do, and we get impressive results like https://arxiv.org/pdf/1511.06434.pdf (see page 5) where the network learns to generate natural-looking bedroom images.


Note that DCGAN (or standard GAN generally) doesn't have an encode path (or any other way to easily condition generation) so it is not well suited to this particular task. The image quality is stellar though, and I linked to several recent papers above which combine a GAN approach with an explicit encode. This should allow them to be used for the kinds of things in this link.


Thanks for great answer!


The author's own blog https://medium.com/@Terrybroad/autoencoding-blade-runner-889... makes it clearer that this is primarily an art project. It's supposed to make you think about how minds understand things, not to be a superior encoder.


I agree. I was wondering why this is such a big deal. When a "neural networking" computer has a flash of inspiration and creates a screen play for a movie, and creates a virtual world based on that screen play. Then we'll have a something to talk about.


See my explanation for why this matters here: https://news.ycombinator.com/item?id=11830140


OH. Thank you. Now that makes a bit more sense & it's looks like an impressive feat for data compression.


An explanation written by someone who actually did it a decade ago:

https://www.quora.com/What-is-the-potential-of-neural-networ...


But no audio, right?

So, you'd think that alone would be a strong indicator that there was no intent (or capacity) to profit from the reconstructed version.


This reminds me very much of Parag Mital's work on audiovisual resynthesis: http://pkmital.com/home/

IIRC he has been beset with takedown notices for his 'smash up' videos which resynthesise the weeks #1 most viewed youtube video from the next 9 most viewed videos http://pkmital.com/home/youtube-smash-up/


The idea of this is way more interesting than the actual result. Unfortunately it just ends up looking like something run through content aware fill rather than anything interesting visually.


Discussions would be more focused if participants were aware of the AFC test -- Abstraction-Filtration-Comparison

https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...

Will that test be applied here?

Does it need to be revised in light of deep learning?


These complexities and nuances of sci-fi culture and artificial learning were quite possibly lost on whoever decided to file the takedown claim for Warner Bros.


> whoever decided to file the takedown

maybe that was a bot, which would add another layer to the story.


It's saying down for scheduled maintenance for me, anyone got a mirror?


Sure do.

If you're running IPFS locally use this link: http://127.0.0.1:8080/ipfs/QmPo9savMewcGCzUhzNQi96b6gSSoQ4k4...

If not, use this one (public IPFS gateway) http://ipfs.io/ipfs/QmPo9savMewcGCzUhzNQi96b6gSSoQ4k4DFEVkF5...

Be aware, there is no sound with this video. You'll have to add it from your own sources (nor did I want to add the sound from my dvd set).

And frankly, IPFS is awesome :)


[flagged]


Just guessing but the downvotes may be due to the 'sig' you've included coming across as spammy. You might notice that no one else here is doing that, and it likely isn't a good way to encourage interest in your product. How about submitting a post where you discuss some of the problems you've solved?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: