Heh, yeah, tough crowd I guess. The full code, models, and videos are all releas...

godelski · on June 22, 2021

> I feel like 95%+ of papers don't do anything besides tell you what happened and you're just supposed to believe them.

Honestly I think there's a big problem with page limits. My team recently had a pre-print that was well over 10 pages and we still didn't get everything and then when we submitted to NeurlIPS we had to reduce it to 9! This seems to be a common problem and why you should often check different versions on ArXiv. And we had more experiments and data we needed to convey since the pre-print. This problem is growing as we have to compare more things and tables can easily take up a single page. I think this causes an exaggeration of the problem that always exists of not explaining things in detail and expecting readers to be experts. Luckily most people share source code which helps show all the tricks authors used and blogging is becoming more common which further helps.

> I'd welcome someone who can actually prove the model just "memorized" every combo possible

Honestly this would be impressive in of itself.

danuker · on June 22, 2021

There's the Hutter Prize [1] - memorizing is useful (and arguably intelligent) if it's compressed.

http://prize.hutter1.net/

justinjlynn · on June 22, 2021

Indeed. Novel, efficient program synthesis is still novel, efficient program synthesis even if it's a novel, efficient data compression codec you're synthesising.

YeGoblynQueenne · on June 22, 2021

>> The full code, models, and videos are all released and people are still skeptical.

If you're uncomfortable with criticism of your work you should definitely try publishing it, e.g. at a conference or journal. It will help you get comfortable with being criticised very quickly.

alimov · on June 22, 2021

I think he’s pointing out that the “criticism” here is similar to that of a person criticizing a book they’ve never read or even flipped through.

YeGoblynQueenne · on June 23, 2021

Perhaps, but that criticism should be the easiest to ignore. The OP expresses frustration to lay criticism and I expect that even brief contact with academic criticism will make the frustration felt by the OP to lay criticism fade into irrelevance.

ShamelessC · on July 2, 2021

I've been learning about this stuff for about a year now. Your earlier experiments with learning to drive in GTA V were an inspiration for me - because they hit that perfect intersection of machine learning, accessibility in education, and just plain cool.

The pandemic hit and Open AI had released DALL-E and CLIP. I was unemployed and bored with my Python skills and decided to just dive in. I found a nice gentleman named Phil Wang on github had been replicating the DALL-E effort and decided to start contributing!

You can find that work here

https://github.com/lucidrains/DALLE-pytorch

and you'll find me here:

https://github.com/afiaka87

We have a few checkpoints available with colab notebooks ready and there is also a research team with access to some more compute who will eventually be able to perform a full replication study and match a similar scale to Open AI and then some because we are also working with another brilliant German team https://github.com/CompVis/ who has provided us with what they are calling a "VQGAN" (if you're not familiar) - which is a variational autoencoder for vision tokens with the neat trick from GAN-land of using a discriminator in order to produce fine details.

https://github.com/CompVis/taming-transformers

We use their pretrained VQGAN to convert an image into digits. We use another pretrained text tokenizer to convert words to digits. The digits both go into a Transformer architecture and a mask is applied to the image tokens in the transformer so that the text tokens can't see the image tokens. The digits come out and we encode them back into text and image respectively. Then, a perceptual loss is computed. Rinse, wash, repeat. Slowly but surely, text predicts image without ever having been able to actually _see_ the image. Insanity.

Anyway, taking a caption and making a neural network output an image from it has again hit that "perfect intersection of machine learning, accessibility in education, and just plain cool". I don't know if you could fit it into the format of your YouTube channel but perhaps it would be a good match?

codetrotter · on June 22, 2021

FWIW I saw your video a couple of days ago via Reddit and I loved it a lot. Even sent a link to the video to a friend of mine because I think it was a very inspiring and interesting video.

I hope you don't let naysayers get to you :)

fossuser · on June 22, 2021

This is wild - thanks for putting the video together, it’s very cool.