Hacker News new | past | comments | ask | show | jobs | submit login

Stable diffusion open source release and llama release



But what technically allowed for so much progress?

There’s been open source AI/ML for 20+ years.

Nothing comes close to the massive milestones over the past year.


Attention, transformers, diffusion. Prior image synthesis techniques - i.e. GANs - had problems that made it difficult to scale them up, whereas the current techniques seem to have no limit other than the amount of RAM in your GPU.


> But what technically allowed for so much progress?

The availability of GPU compute time. Up until the Russian invasion into Ukraine, interest rates were low AF so everyone and their dog thought it would be a cool idea to mine one or another sort of shitcoin. Once rising interest rates killed that business model for good, miners dumped their GPUs on the open market, and an awful lot of cloud computing capacity suddenly went free.


the Transformers are all you need paper from Google, which may end up being a larger contribution to society than Google search, is foundational.

Emad Mostaque and his investment in stable diffusion, and his decision to release it to the world.

I'm sure there are others, but those are the two that stick out to me.


Public availability of large transformer-based foundation models trained at great expense, which is what OP is referring to, is definitely unprecedented.


People figuring out how to train and scale newer architectures (like transfomers) effectively, to be wildly larger than ever before.

Take AlexNet - the major "oh shit" moment in image classification.

It had an absolutely mind-blowing number of parameters at a whopping 62 million.

Holy shit, what a large network, right?

Absolutely unprecedented.

Now, for language models, anything under 1B parameters is a toy that barely works.

Stable diffusion has around 1B or so - or the early models did, I'm sure they're larger now.

A whole lot of smart people had to do a bunch of cool stuff to be able to keep networks working at all at that size.

Many, many times over the years, people have tried to make larger networks, which fail to converge (read: learn to do something useful) in all sorts of crazy ways.

At this size, it's also expensive to train these things from scratch, and takes a shit-ton of data, so research/discovery of new things is slow and difficult.

But, we kind of climbed over a cliff, and now things are absolutely taking off in all the fields around this kind of stuff.

Take a look at XTTSv2 for example, a leading open source text-to-speech model. It uses multiple models in its architecture, but one of them is GPT.

There are a few key models that are still being used in a bunch of different modalities like CLIP, U-Net, GPT, etc. or similar variants. When they were released / made available, people jumped on them and started experimenting.


> Stable diffusion has around 1B or so - or the early models did, I'm sure they're larger now.

SDXL is 6.6 billion.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: