Hacker News new | past | comments | ask | show | jobs | submit login
Parti: Pathways Autoregressive Text-to-Image Model (research.google)
134 points by amrrs on June 22, 2022 | hide | past | favorite | 40 comments



It's funny how they never release the model. I guess they are scared of spammers, 4chan or worse, the Russians. This is the harbinger of the future, isn't it? Technology that's too powerful to be widely deployed is kept under lock and key by priests who deal in secret knowledge only available to the properly initiated.


Give it a few days and lucidrains will have the code up[0].

But in honesty, it is probably how people react. We saw this with Pulse, GPT, and many others. The authors are clear about the limitations but people talk it up too much and others shit on it. There's also a reproducibility crisis in ML (many famous networks, like Swin[1][2][3], can't be reproduced (even worse when reviewers concentrate on benchmarks)). It isn't like many can train a model like this anyways. It gives them benefit of the doubt and maintains good publicity rather than controversial.

Of course, this is extremely bad from an academic perspective and personally I believe you should have your paper revoked if it isn't reproducible. You'd be surprised how many don't track the random seed or measure variance. We have GitHub. You should be able to write training options that get approximately the same results as the paper. Otherwise I don't trust your results.

[0] https://github.com/lucidrains/parti-pytorch

[1] https://github.com/microsoft/Swin-Transformer/issues/183

[2] https://github.com/microsoft/Swin-Transformer/issues/180

[3] https://github.com/microsoft/Swin-Transformer/issues/148


I really doubt that the motivation for not publishing the model for "turns text into trippy images" is "it's too powerful to trust the world with it" vs banal business reasons.


The claimed reason for not publishing the model in TFA:

> This leads such models, including Parti, to produce stereotypical representations of, for example, people described as lawyers, flight attendants, homemakers, and so on...

> Models which produce photorealistic outputs, especially of people, pose additional risks and concerns around the creation of deepfakes.

> ...the range of outputs from a model is dependent on the training data, and this may have biases toward Western imagery and further prevent models from exhibiting radically new artistic styles...

> For these reasons, we have decided not to release our Parti models, code, or data for public use without further safeguards in place.

So maybe they're partly afraid that it's too powerful, but even moreso they are afraid it's too white.


Rather, I think they're afraid of the criticism they'll face over such biases if they make the model generally available.

It doesn't take much imagination to see these issues being used by AI opponents as justification to ban or heavily regulate machine learning research.


Eh, the cost of training is already within the reach of wealthy individuals, and as better TPUs/GPUs appear on the cloud cost drops.

This is definitely going to become commonly available tech this decade.


The training cost of state of the art models has been increasing every year for the last 10 years. I don’t see any indication this will change in the near future.


The parts of the model and how to go about training it are well known (i.e VIT VQGAN, auto-regressive transformers etc). They also use LAION-400 for training which is an open source dataset. Can be replicated, but will take time, patience and compute.

This is not true for some of the other recent headline papers: DALL-E2, PALM, Imagen etc. datasets are the primary deterrent, model details are well known.


By now these researchers could show me a deep learning model that accurately predicts the future, and I'd shrug my shoulders and say "so what?".

As a mortal, there's not much to learn from these insanely big models anymore, which makes me kinda sad. Training them is prohibitively expensive, the data and code are often inaccessible, and i highly suspect that the learning rate schedules to get these to converge are also black magic-ish...


There is public code and data available to train similar models (text generation, image generation, whatever you like). Training details are also often available. The learning rate schedule is actually nothing special.

However, you are fully right that the computation costs are very high.

One thing we can learn is: It really works. It scales up and gets better. Without really doing anything special. This was kind of unexpected to most people. This is really interesting. Most people expected that there is some limit and the performance would level out. But so far this does not seem to be the case. It rather looks like you could scale it up as much as you want to get even better and better performance without any limitation.

So, what to do now with this knowledge?

Maybe we should focus the research on reducing the computation costs. E.g. by better hardware (maybe neuromorphic), or more computational efficient models.


>By now these researchers could show me a deep learning model that accurately predicts the future, and I'd shrug my shoulders and say "so what?".

So what? Are you just taking the piss? You are saying a literal Oracle wouldn't be impressive because the "learning rate schedules" are black magic??


It bothers me that foundation models are not decentralised. While it's too dangerous to distribute these models generally to the public (because that would increase the amount of fiction in the world), the technology continues to improve regardless, albeit in secret. Seems inevitable that what is being built is an AI god - an intelligence that demands trust and faith in it since we're merely spectating the rise of AGI rather than being allowed to augment ourselves with it. But I suppose that a warning is better than no warning, so thanks G.


Very true, but here's a counterpoint - EletheurAI and at least one other effort (Big Science I think, or something similar) are making good headway from the open source side.

Also, it's not quite I'm secret - the big labs continue to publish papers on the major advancements (at least, as far as we know), which in itself enables open source.

Lastly, academia is lagging but is making efforts to get in on making such models - see Stanford's CRFM, which I believe released some models recently.

PS I rather disagree with 'AI god' as something that is likely to come out of these efforts anytime soon personally, though that's a longer discussion.


It's interesting how LAION-400M, an open-access dataset for democratized AI, was used to train this model which will seemingly never be truly available in its full capacity for the lay population. Is it time for open-access datasets to consider licensing measures to prevent this?


More restrictive licensing wouldn’t be enough. This stuff is sufficiently transformative to count as fair use without any permission at all from the data owner. New laws will be required for stuff like this.


I don't think fair use comes into play for this matter. It's the use of the dataset for training that's being objected to, not its reproduction during inference.


But then what about training on hacker news comments, for example? Without permission from ycombinator or the posters, there’s even less permission than any curated dataset could provide or revoke.


Like Imagen, Parti is not open-sourced/easily accessible for the same reasons.


There is equal contribution, core contribution, and then the order of authors. Which of those attributes have actually which meaning?

I thought the order defines how much someone has contributed. Core contribution sounds like it should be the most, so it should be first but it is not here. Equal contribution sounds like it should come right behind each other in the order but this is also not the case here.


I can't tell whether it's a curse or a blessing that the company most invested and succeeding seemingly in AI is also the one who seems least capable of commercially leveraging it and have shown to fail at doing so with most their products.


Or their applications of AI to their products are so transparent to the level where you cannot recognize? I'm pretty sure that their revenue will drop by more than half without their machine learning infrastructure.


But they actually have machine learning in almost all their products. E.g. Google Search, YouTube, GMail, Maps, AdSense, all have machine learning at their core.


The ML in google search is apparently atrocious.... see all of the posts about how Search doesn't work anymore


All the posts moan about poor results. But few say 'my query was this, and the ideal result would have been this URL: xxx'


That's akin to saying you can't criticize art if you can't make it yourself.

The close-to-ideal result would have been whatever google gave me 5 years ago before they started their latest ML push with query embeddings.


IME the issue is not "the ideal result would have been xxx", the issue is "it is trivially obvious that the first ten results are ludicrously far from ideal."

Browser extensions like Unpinterested tell a story.


https://news.ycombinator.com/item?id=31484562

Relevant discussion from the last model (imagen)


How are two separate groups at Google doing the exact same thing and releasing it within a month? Imagen [1] and Parti.

Is this a clear sign of organizational dysfunction and I should sell my shares? Or am I missing something?

[1] https://news.ycombinator.com/item?id=31484562


It's not the same internally. Imagen is diffusion model, Parti is autoregressive. Research at Google is quite decentralized, but IMHO it is wise of them to pursue both approaches. At this time it is very unclear what would the successor of GANs be, but probably one of these two approaches, or some refinement.


What are the inputs derived from the training images?

Do they do object detection prior and if so, at what granularity (toes, eyes, hat, gloves, etc)? Is it at the pixel level?


I think the training data is just image/caption pairs. I don't think there's any notion of localizing or detecting specific objects in the training images.


Yet another LLM that is not released because the model doesn't produce outputs which align with the researchers' Western, liberal viewpoint. If the authors care so much then why are they even releasing the architecture? To do the research but not release the model weights because your feelings were hurt by the output of some matrix multiplication is hypocrisy at its finest - the authors get all the PR attention and benefits of publishing with the veneer of being politically correct, but the actual negative impact is not mitigated in the slightest. The real difficulty is not reproducing the research but identifying the architecture that works best in the first place, and the authors have done that for any would-be malicious actors.


Actually the architecture of many of these models is profoundly unsurprising. The real difficulty actually is in training them. Just preprocessing the data for a language model can take several hundred days of CPU time. Training takes months on thousands of GPUs. We


Yes, thankfully Google has saved us from this one-in-a-century world-ending catastrophe.


> because the model doesn't produce outputs which align with the researchers' Western, liberal viewpoint.

What evidence do you have for claiming this motivation?


RTFA, after discussing the various biases at work in the training set:

> For these reasons, we have decided not to release our Parti models, code, or data for public use without further safeguards in place.

The problem is the results align too much with a western liberal viewpoint, which is anti-thetical to the western liberal viewpoint. They would prefer an AI which has a culturally diverse output.


what on earth are you talking about


I believe he's trying to say something similar to the "one sided political view" section here: https://gist.github.com/yoavg/9fc9be2f98b47c189a513573d902fb...

I believe it's more complex than that but it's undeniable there is a cadre of Twitter AI activists who would pick models like this apart if released and use the worst anecdotal examples to generate a ton of bad press over "racist AI" which is why you don't see these made public.


It's really true. Politically, I am a progressive (in US terms). The way these companies describe the problem and solution -- the model data reflects our culture back at us, so we are not giving anyone access to it -- is so nonsensical as to leave me wondering if there is a "real" reason, or is that just how they think?


Check the authors' names before calling them names.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: