Generate images in one second on your Mac using a latent consistency model

herpdyderp · on Oct 27, 2023

32GB M1 Max is taking 25 seconds on the exact same prompt as in the example.

Edit: it seems the "per second" requires the `--continuous` flag to bypass the initial startup time. With that, I'm now seeing the ~1 second per image time (if initial startup time is ignored).

m3kw9 · on Oct 27, 2023

What does bypass startup time really do? Does it keep everything in memory or something?

fassssst · on Oct 27, 2023

Probably, you have to load the weights from disk at some point.

echelon · on Oct 27, 2023

That's exactly it. These models are huge.

cal85 · on Oct 27, 2023

I’m probably missing something but if the bottleneck is disk read speed, wouldn’t it only take about 5-6 seconds to fill the entire 32GB memory from disk? I just googled and found a benchmark quoting 5,507 MB/s read on an M1 Max.

liuliu · on Oct 27, 2023

PyTorch checkpoint is slow to load.

brucethemoose2 · on Oct 27, 2023

The diffusers format this repo uses should be faster, but there is still some overhead, yeah.

Yoric · on Oct 27, 2023

Yeah, the PyTorch disk format is pretty bad.

m3kw9 · on Oct 27, 2023

Everytime i execute: python main.py \ "a beautiful apple floating in outer space, like a planet" \ --steps 4 --width 512 --height 512

It redownloads 4 gigs worth of stuff every execution. Can't you have the script save, and check if its there, then download it or am I doing something wrong?

jandrese · on Oct 27, 2023

For me it does not re-download anything on the second run. But it is also only running on the CPU and is slow AF.

With 5 iterations the quality is...not good. It looks just like Stable Diffusion with low iteration count. Maybe there is some magic that kicks in if you have a more powerful Mac?

simple10 · on Oct 27, 2023

Did you enable the virtualenv first? If not, it might not be caching the models properly.

simple10 · on Oct 27, 2023

This is awesome! It only takes a few minutes to get installed and running. On my M2 mac, it generates sequential images in about a second when using the continuous flag. For a single image, it takes about 20 seconds to generate due to the initial script loading time (loading the model into memory?).

I know what I'll be doing this weekend... generating artwork for my 9 yo kid's video game in Game Maker Studio!

Does anyone know any quick hacks to the python code to sequentially prompt the user for input without purging the model from memory?

simple10 · on Oct 27, 2023

Answered my own question. Here's how to add an --interactive flag to the script to continuously ask for prompts and generate images without needing to reload the model into memory each time.

https://github.com/replicate/latent-consistency-model/commit...

Maxion · on Oct 27, 2023

> It only takes a few minutes to get installed and running

A few minutes? I have to download at least 5GiB of data to get this running.

simple10 · on Oct 27, 2023

Lol. Yeah, I have 1.2Gb internet.

m3kw9 · on Oct 27, 2023

The stupid script seem to not know how to save to disk, so it downloads on every run.

maccard · on Oct 28, 2023

I've got a 500Mb wifi connection. it took me less than 5 minutes from git clone to having my first image (I did have python installed already, though).

naet · on Oct 27, 2023

Well, how do they look? I've seen some other image generation optimizations, but a lot of them make a significant tradeoff in reduced quality.

oldstrangers · on Oct 27, 2023

Interesting timing because part of me thinks Apple's Spooky Fast event has to do with generative AI.

joshstrange · on Oct 27, 2023

I think the current rumors are MBPs which would be odd to do the pros before the base models but I wouldn’t complain.

00deadbeef · on Oct 28, 2023

Not only odd because of that but because it's less than a year since they got updated to M2 Pro/Max

m3kw9 · on Oct 27, 2023

Likely not show any generative software till macOS (next ver) comes out, they don’t usually showcase stand alone features without a bigger strategy to include the OS

hackthemack · on Oct 27, 2023

If you want to run this on a linux machine and use the machine's cpu.

Follow the instructions. Before actually running the command to generate an image.

Open up main.py Change line 17 to model.to(torch_device="cpu", torch_dtype=torch.float32).to('cpu:0')

Basically change the backend from mps to cpu

brucethemoose2 · on Oct 27, 2023

For linux CPU only, you want https://github.com/rupeshs/fastsdcpu

zorgmonkey · on Oct 28, 2023

It is very easy to tweak this to generate images quickly on a nvidia GPU:

* after `pip install -r requirements.txt` do `pip3 install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu121`

* on line 17 of main.py change torch.float32 to torch.float16 and change mps:0 to cuda:0

* add a new line after 17 `model.enable_xformers_memory_efficient_attention()`

The xFormers stuff is optional, but it should make it a bit faster. For me this got it generating images in less than second [00:00<00:00, 9.43it/s] and used 4.6GB of VRAM.

agloe_dreams · on Oct 27, 2023

This....but a menu item that does it for you.

bigethan · on Oct 27, 2023

Mac shortcuts are exactly the use case for this. Menu bar, ask for a prompt, run script. I was always wary of shortcuts, but they're quite powerful and nicely integrated with the OS in the latest versions

m3kw9 · on Oct 27, 2023

Gpt4 likely can give you code for this

tobr · on Oct 27, 2023

What will be possible to do once these things run at interactive frame rates? It’s a little mind boggling to think about what types of experiences this will allow not so long from now.

iinnPP · on Oct 27, 2023

Trippy VR is where my mind goes. Specifically with eye tracking to determine where to go and what to generate next.

throwawayfm · on Oct 27, 2023

Buy 60 machines, and it interactive.

astrange · on Oct 27, 2023

Alas you've mixed up throughput and latency.

But you might be able to generate at 15fps and interpolate between them or something.

Maxion · on Oct 27, 2023

TTS --> Prompt --> Generating live imagery from your rambles?

LauraMedia · on Oct 27, 2023

Thought it was too good to be true, tried it with an M2 Pro MacBook Pro.

Generation takes 20-40 seconds, when using "--continuous" it takes 20-40 seconds once and then keeps generating every 3-5 seconds.

gmerc · on Oct 28, 2023

Without continuous it vacates the memory again which can be useful if you use the machine for other things.

simple10 · on Oct 27, 2023

Does anyone know of other image generation models that run well on a M1/M2 mac laptop?

I'd like do to some comparison testing. The model in the post is fast but results are hit or miss for quality.

liuliu · on Oct 27, 2023

There are plenty of models to try with Draw Things app. You can try SDXL on it to see what's the quality looks like. The speed comparison here: https://engineering.drawthings.ai/integrating-metal-flashatt...

simple10 · on Oct 27, 2023

Thanks!

brucethemoose2 · on Oct 27, 2023

https://github.com/lllyasviel/Fooocus#mac

Its not fast, but its SOTA local quality as far as I know, and I've tried many UIs and augmentations.

Also, maybe it will run better if you grab Pytorch 2.1 or nightly.

m3kw9 · on Oct 27, 2023

Is fast but only if you go 512 512 res will generate an image from start script to finish in 5 seconds, but if you up it to 1024 it takes 10x as long

This on an M2 Max 32gb

brucethemoose2 · on Oct 27, 2023

Yeah, high res performance is very non linear, especially without swapping out the attention for xformers, flashattention2 or torch SDP (and I don't think torch MPS works with any of those).

That model doesn't work well at 1024x1024 anyway without some augmentations. You want this instead: https://huggingface.co/segmind/SSD-1B

firechickenbird · on Oct 27, 2023

Quality of these LCM is not the best though

simple10 · on Oct 27, 2023

True, not the best quality, but still fantastic results for a free model running locally on a laptop. Setting the steps between 10-20 seemed to produce the best results for me for realistic looking images. About one out of 10 images were useful for my test case of "a realistic photo of a german shepard riding a motorcycle through Tokyo at night"

https://github.com/simple10/ai-image-generator/blob/main/exa...

brucethemoose2 · on Oct 27, 2023

> Setting the steps between 10-20

But thats the point where regular diffusion (with the UniPC scheduler and FreeU) overtakes this in terms of quality.

simple10 · on Oct 27, 2023

Good point. I haven't done a lot of testing yet. I'm not sure if the default of 8 steps yields poorer results than 10-20 steps. Either way, it was fast on my M2 mac with 8 to 20 steps, much faster than other models I've played with.

latchkey · on Oct 28, 2023

The speed is impressive, but the output is honestly not. It feels like DALLE3 is light years ahead of it.

qup · on Oct 28, 2023

Maybe could use this to get halfway there, then feed an image to DALLE for enhancement?

ForkMeOnTinder · on Oct 27, 2023

Why bother with the safety checker if the model is running locally? I wonder how much faster it would be if the safety checks were skipped.

radicality · on Oct 27, 2023

Was gonna comment the same thing, feels ridiculous to include it here for local use. I believe you should be able to remove it if you edit the python inference code from huggingface.

edit: I tried it out by copying this pipeline file locally and then disabling the safety checker. https://raw.githubusercontent.com/huggingface/diffusers/main...

On my M1 macbook, did a test of 10 images, including the one-off loading time. With checker: 10.51s, without safety checker: 9.48s. So not that big of a hit.

ozr · on Oct 27, 2023

Not much faster tbh, but it's a bit of virtue signaling you're often required to do with generative AI.

Turing_Machine · on Oct 27, 2023

I agree. It's pretty easy to bypass if you know a bit of Python, though.

Doing a search for "nsfw" in all subdirectories seems to turn up all the files you need to edit.

00deadbeef · on Oct 28, 2023

You only need to add two lines to the example main.py file to disable it, no need to go editing anything else

whatsthenews · on Oct 27, 2023

Seems like a waste of time, more of a nice to have/tip of the cap to yud

grandpa_yeti · on Oct 27, 2023

Seeing this kind of image generation limited to M series Macs just goes to show how far ahead Apple is in the notebook GPU game.

andybak · on Oct 27, 2023

I've got a Windows laptop with an RTX 3080 in it that runs this model no problem. I don't have it to hand or else I'd post some timings.

On my Desktop PC with a 4090 in I was getting speeds of 0.2 to 0.3 seconds for reasonably acceptable quality settings so I would expect 0.5s or so on the laptop.

What Apple are ahead on is doing this on a fanless laptop that doesn't hit internal temperatures of triple digits.

traceroute66 · on Oct 27, 2023

> What Apple are ahead on is doing this on a fanless laptop that doesn't hit internal temperatures of triple digits.

You also forgot the bit where Apple are ahead of doing it on a laptop that can achieve it without needing to be tethered to a power socket to achieve the performance.

alex_duf · on Oct 27, 2023

It's the same thing, power is heat with talking about chips

echelon · on Oct 27, 2023

> You also forgot the bit where Apple are ahead of doing it on a laptop that can achieve it without needing to be tethered to a power socket to achieve the performance.

Kind of sad that a huge anti-competitive, trillion dollar company is the one offering it. Especially given their stances around user freedom.

I'd much rather innovation be distributed. The goal posts should be moved to a point everyone is pushing towards the next thing. Having Apple be the only game in town is unhealthy.

astrange · on Oct 27, 2023

Would say that rather than one company being the only one who can do it, there is only one company that can't do it, and it's Intel.

echelon · on Oct 27, 2023

Ouch. But true.

errnoh · on Oct 27, 2023

45it/s (0.1~s per image) on 7900XTX here, so it's still one magnitude faster on GPU with a lot higher power draw than the macs. Doing 10x slower with non-tethered is quite nice outcome.

brucethemoose2 · on Oct 27, 2023

> What Apple are ahead on is doing this on a fanless laptop that doesn't hit internal temperatures of triple digits.

I think you could pull this off on a Asus G14 in an ultra power saver mode, with the fans off or running inaudibly. The cooling is so beefy they will actually work fanless if you throttle everything down and mostly keep the GPU asleep.

The M chips could certainly sustain image generation better without a fan.

wasyl · on Oct 27, 2023

At this point what Apple is ahead with is hype that M Macs are that fast, and the developers targeting them because things just work. Plenty of people should be able to run these models locally but there's close to no nice software that does that out of the box for Windows or Linux

astrange · on Oct 27, 2023

It's because of the unified memory architecture. It's harder/different to do this on x86, because you have to have a large memory GPU and target that.

orbital-decay · on Oct 27, 2023

Not sure why you think it's limited to M series Macs or has to do anything with Apple at all. It's just an instruction on how to run a diffusion model trained in a novel way on particular hardware.

minimaxir · on Oct 27, 2023

It's possible to do on non-Apple Silicon Macs, just more annoying. There are a few generative AI implementations which use raw Metal but not sure what the most popular one is.

liuliu · on Oct 27, 2023

The implementation is not even optimized for Macs. LCM is just very easy to be fast (batch size = 1 and only 2 to 8 steps, depending on what kind of headline you are trying to make).

filterfiber · on Oct 27, 2023

They also have a decent advantage for LLMs because of their memory bandwidth to system memory vs GPU's with limited VRAM limited over PCIE to the system memory.

novaomnidev · on Oct 27, 2023

Got this working on an intel Mac

amelius · on Oct 27, 2023

It mostly shows how shitty compatibility is between platforms that share the same roots.

AIorNot · on Oct 27, 2023

Awesome