Hacker News new | past | comments | ask | show | jobs | submit login
Diffusion Bee: Stable Diffusion GUI App for M1 Mac (github.com/divamgupta)
727 points by divamgupta on Sept 12, 2022 | hide | past | favorite | 190 comments



This really is a one-click installer, which is excellent. As far as I know it's the first and only one for M1. It's quite a polished UI, but there are a lot of features missing. The first ones that come to mind are:

- No way to specify a seed. This is an important part of SD workflows, letting you redo an image with a slightly tweaked a prompt.

- No way to specify a custom model. Alternative models (such as Waifu Diffusion) are fun to play with too.

- No way to generate batches of images.

- No way to specify which sampler to use.

- No way to adjust the weight of specific sub-phrases, or use negative weights.

There's also no img2img yet, but it sounds like that's a planned feature.

Other GUIs such as https://github.com/sd-webui/stable-diffusion-webui, https://github.com/AUTOMATIC1111/stable-diffusion-webui have many more features - but not all will be trivial to port to M1.

P.S.: For the people asking - yes, it can do NSFW images. I checked.


Unfortunately they did not include a one-click uninstaller! Besides removing the .app, you also need to delete the 4GB+ model that it downloads and other files that it scatters about:

  $HOME/.diffusionbee
  $HOME/Library/Application\ Support/DiffusionBee


To be fair, this is a long standing issue between how Apple wants people to uninstall their app (by throwing them in the trash) and how modern apps really work in practice (some apps need/generate temp data in AppSupport, and sometimes it's a non trivial amount).

As a developer the best you can usually do is put some instructions somewhere (website, somewhere in a menu in UI), or do a package install that puts an uninstaller somewhere (more of the Windows way, but it breaks many expectations from users like having your app be one icon in `Applications`, and not it's own folder with your app and maybe a "Uninstall" app there too). For Aerial I give uninstall instructions on the website, but there's not much of a standard on how to handle this (that I know of, at least).


Not quite sure why this couldn't sit in the Application bundle itself.


As far as I understand this is downloaded data and they may offer model selection later on? If so they have little choice and putting your data in the user `/Library/Application Support` (or maybe `/Library/Caches` but that's a mess in itself) is what you're supposed to do.

With the various security changes in "recent" macOS, you can't modify your bundle and download files inside it, as it would change its signature and break the notarisation system.


Do you mean that it should be distributed with it, or that it should write it in there when it downloads it at runtime?


The latter. A model update would then just be an application update. To answer my own question - if they want to allow a choice of downloadable models, then Application Support is the perfect place.


And also to allow updating the app independently of the model, without a 4GB download every time.


for those with stable diffusion running, an acquaintance and I have been working on another GUI https://github.com/breadthe/sd-buddy/

which offers custom seed and batches of images (we are also working on parametric prompting https://github.com/breadthe/sd-buddy/discussions/12#discussi... )

i'd love to get to img2img and alt models next.


A note that this appears to be, at the time of writing, shared source and not strictly "open source" as it claims to be.

"Stable Diffusion Buddy is open source and free for personal use.

You may not use Stable Diffusion Buddy for any commercial purpose. That means you may not sell or profit in any way from the compiled app, from compiling the app yourself, from the source code, or a fork of it. The images generated by using this app do not fall under these limitations for obvious reasons."


Hey, that's very nice. However I wasn't able to actually generate images because I need to have a specific conda environment enabled first.

A couple of suggestions, maybe you can implement:

1) option to switch the conda environment since I had setup the dependencies in a specific environment (for me, the command is "conda activate ldm")

2) It would be nice to know what was the error when there is an error.


Looking good over here too: M1 Max.

God I would hate to be whoever OpenAI blames this on.

I’m going to leave that DALL-E beta approval email unread like a food delivery recruiter email.


> This really is a one-click installer, which is excellent. As far as I know it's the first and only one for M1.

There's https://www.charl-e.com/


That's a very pretty landing page, but it entirely avoids describing any features, or showing what it looks like beyond a thumbnail-sized image. Do you have any insights?


Unfortunately, no, haven't had time to try it yet, bookmarked it few days ago. Just wanted to comment that there are others and probably will be many more coming shortly.

I'm used to troubleshoot all kinds of stuff but I'm not a Python guy and wrangling with dependencies and virtual environments is not fun in my book. I've got lstein repo working but can't wait to have cleaner way of doing things.


I've tried it. The first time it did generate an image, albeit missing one of the few elements I mentioned.

Now it only generates black frames, even after being restarted. Not impressed at this point.

The options are sizes up to 768 x 768, "steps," and "guidance scale." You can do text-to-image or image-to-image.


I tried Charl-E and got Python errors. (Maybe because I don't have Python installed?)


`mkdir ~/Desktop/charl-e/samples` to fix that.


Why do you need a folder on the desktop? I never put files there and it seems like a strange requirement.


It’s just a bug in the code that assumes that folder is there. (Not my app)


I just tried it as well. I also got errors. Well, for a one-click install app, you don't need python preinstalled.


Can it be built from source? (Am thinking if yes, then the seed feature should be easy to add).


The seed problem is in upstream GPU libraries provided by Apple. The community is working on solutions, but they are not easy.


Can you elaborate? Isn’t the seed a simple int (or long int)?


Currently reusing the same seed dose not produce repeatable results on Apple silicon. I assume this UI hides it as missleading.


The logs for the app (logs tab) state it's just repeatedly using the number 42 as a seed. But yes in the UI the results are different each run, even with an unchanged prompt. Didn't quite parse your last but if you're saying it's misleading, yeah kind of, because you can't share or save a prompt to redo the same result.

Weird that it's considered a big challenge to fix… something in Metal? Core ML? SIMD libraries? Some translation layer? Something is not being said here.


has anyone got a frok with img2img working for M1? I keep getting errors no matter what I try.



I will add it to diffusion bee soon.


PRs welcome!


I can confirm that this works great on the M1 Max. It has been taking less than a minute to run the model and generate the images. So far I have been really satisfied with the output. Let's please make one-click installers the norm for future technology. I love nerding out, but wading through dependency hell is not worth it most of the time.


You're right. On top of that, python dependency / package management is a joke.


Normally it's fine, but Apple have decided to force everyone into installing a conda python alongside normal python, which makes everything way more complicated than it needs to be.


I feel like I'm maybe missing some context here, because I've never had to install "conda Python" on any Mac ever. I had the system Python and a Homebrew-installed Python happily co-existing, and at various points have had "virtualized" Pythons installed via virtualenv or pipx. On Apple Silicon Macs there's no Python installed by default at all.

Now, I've seen an awful lot of programs written in Python that decide to force you to install them with virtualenv or pipx, but that's not Apple's doing.


> On Apple Silicon Macs there's no Python installed by default at all.

Not true. There's no python2 anymore, but there is a python3. You can find it at /usr/bin/python3 (most likely your homebrew install preceeds it in your PATH)


Which doesn't work, unless you have Xcode or the Command Line Tools installed.


In my experience, quite a few libraries only worked when pulled from conda, due to the whole M1-compatible compilation thing. So while Apple does not force you to install conda, in practice you kind of are forced.

This is/has been changing slowly, though. Things are much better then they were a year ago.


You do if you want Apple's metal support for tensorflow.


It can be complicated and frustrating getting Stable Diffusion (or indeed, anything related to ML, in my experience) running on other OSs too. It's not really related to Apple's approach to Python.


This isn't true.

The best way to install Python on a mac is the downloadable installer from Python.org (!)


Which would be less of an issue if I didn't have to fix a conda bug while installing StableDiffusion.

https://github.com/conda/conda/pull/11818


It's bad either way. Should be easy like npm.


Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.

Features: - Full data privacy - nothing is sent to the cloud - Clean and easy to use UI - One click installer - No dependencies needed - Multiple image sizes - Optimized for M1/M2 Chips - Runs locally on your computer


Does it work with pornographic or potentially pornographic prompts?


All I want is the ability to use it in peace without getting a threatening message whenever I ask for something vaguely edgy - and that need not be porn either.


And the real questions come out.

Seriously though, I imagine this is less a case of whether this specific implementation permits pornography, but whether any porn was included in the dataset it was trained on. No matter how good AI is, it only knows what it knows.


"Stable Diffusion" is referring to one specific set of models, which were trained on a dataset including some porn.

The official implementation has a second model that detects pornography, and replaces outputs including it with a picture of this dude https://www.youtube.com/watch?v=dQw4w9WgXcQ (Not kidding). Removing that is a really simple a one line change in the official script.


> Not kidding

I was amused by this when reading the source. Here’s the function that loads the replacement.

https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a...

It looks like removing line 309 of the same file would disable the check, but I haven’t tried it.


This is the line to change to disable the nsfw filter: https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a...


I dont see this line in V2.1


I guess the question is whether this toggle is available in the GUI, or if someone has to edit source code


For the CLI version, there was no UI, you have to open the python file and stub out the NSFW check. (trivial though)


If you're curious you can see some of the imagery used to train SD here:

https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...

This is just a small fraction of the imagery and it does include pornography.

I respect that if you're online and reading HN, you're probably mature enough to handle seeing pornography. So if you're curious to see some of the training data that made it in: choose from the dropdown "-column-" and change it to "punsafe" and set the value in the adjacent field to "1", then press Apply.

Obviously this will show pornography on your screen.

An article which talks about the imagery and how this browser came about is here: https://waxy.org/2022/08/exploring-12-million-of-the-images-...


I mean I don't know why we have to talk around this point: yes, you can definitely generate pornography with SD. Be prepared to see a lot of weird things, because that super-realism really means it just drives right into the uncanny/body horror valley at full speed.

But the second point here is also wrong: the whole reason these models are interesting is because they can generate things they haven't seen before - the corpus of knowledge represents some type of abstract understanding of how words relate to things that theoretically does encode a little bit of the mechanisms behind it.

For example, it theoretically should be able to reconstruct human like poses it has never seen before provided it has examples of what humans look like and something which transposes to an approximate value - an obvious example in the context of the original question would be building photorealistic versions of a sketched concept (since somewhere in it's model is an axis which traces from "artistic depiction of a human" to "photograph of a human" in terms of style content).

Of course, most people aren't very good at drawing realistic human poses - it's a learned skill. But the magic of deep learning is really that eventually it doesn't need to be - we would hopefully be able to train a model which can be easily copied and distributed which represents the skill, and SD is a big step in that direction (whether it's a local maxima remains to be seen - it's dramatic, but is it versatile?)


>it can generate porn

It's not good at penises or vaginas, just breasts and butts. I can find what I like pornographically without ai. But the ducking and dodging around nudity and sexuality is childish and tiresome and we ought to discuss this topic as disinterestedly and nonchalantly as we do the other things it excels or struggles at. Penises and vaginas are not somehow more vile than other things.

Sure enough, sexual content ought to be age-gated and there is potential for abuse (slapping a person's face onto explicit imagery without their consent isn't cool yall). But are we working toward AGI or not? Because at some point it's gonna have to know about the birds and the bees.


I think you mean vulva, not vagina[1]. Vaginas are mostly internal.

[1] https://www.allure.com/story/vagina-vulva-difference-planned...


> to know about the birds and the bees

:) made me chuckle. AGI would need to know how to be evil, as well, right?


isn't it obvious yet?

we're creating the speakwrite machine.


Are we talking about Artificial God Intelligence?


> But the second point here is also wrong: the whole reason these models are interesting is because they can generate things they haven't seen before

No, it's essentially generating mashups of its training data, which can be very interesting.

So a model that hasn't been trained on a lot of porn will of course do a very bad job at generating porn.


Mashups is the wrong way to think about it. It's generalizing at a higher level than texture / image sampling and it can tween things in latent space to get to visual spaces that haven't been explored by human artists before.

It requires a good steer and prompting is a clumsy tool for fine tuning - it's adequate for initialization but we lack words for every shade of meaning, and phrase weighting is pretty clumsy too, because words have a blend of meaning.


"It's generalizing at a higher level than texture / image sampling and it can tween things in latent space to get to visual spaces that haven't been explored by human artists before."

The very fact that the model is interpolating between things in the latent space probably explains why its images haven't been explored by human artists before: because there is a disconnect between the latent space of the model and genuine "latent space" of human artistic endeavor, which is an interplay between the laws of physics and the aesthetic interests of humans. I think these models know very little about either of those things and thus generate some pretty interesting novelty.


I think of artistic endeavour as a bit like the inverse of txt2img, but running in your head, and just projecting to the internal latent space, not all the way to words. It's not just aesthetic, it's about triggering feelings through senses. Images need to connect with the audience through associations with scenes, events, moods and so on from the audience members' lives.

Aesthetic choices like colour and shapes and composition combine with literal representations, facial emotions, symbolic meanings and so on. AI art so far feels quite shallow by this metric, usually only hitting a couple of notes. But sometimes it can play those couple of notes very sweetly.


Its only a question because devs went out of their way to limit it, to everyones surprise

Its like if Photoshop broke itself when you tried to modify or create anything nude or provocative, as the default

That would just be weird and thats what these AI software devs have done

so everyone patches that contrived feature flag, but nobody knows if they patched it


Fun fact: Photoshop does exactly that if you open images of currency. See: https://helpx.adobe.com/photoshop/cds.html


As if anyone who really will print money can't open in GIMP.


Probably useful to stop 12 year olds from "accidentally" committing felonies though


It used to just give a warning


It has to be. There was a sub (now closed) called r/UnstableDiffusion, where people shared their porn output from SD.

The filter should be easy to remove and there are already people who simply removed the filter.


Based on the upvotes, inquiring minds _really want to know_!


It can make porn images, but I haven't been able to get consistently good results. There's probably ways to tweak the parameters to improve things, but I haven't figured that out.

It mostly understands what naked people look like, but the images I've generated involve a lot of accidental body horror. You get a lot of people with extra arms, weird eyes, or body parts in the wrong places. The fact that they explicitly removed porn from the training set comes through pretty clearly in the model.

I suspect it could be improved a lot with some specialized retraining. As far as I know, nobody has done that work yet.


In particular it is very confused about genitalia, it tends towards hermaphrodite.


How do you see upvotes? I’ve never had any visibility to upvoted comments (downvoted of course yes), or at least it’s not obvious at all to me and see no indication of it..


You can see the total score (upvotes - downovtes) for your comment where you see the upvote/downvote button for everyone else's comments. Or at least I can.


I’ve always been curious about that, too. I don’t think I see anyone else’s votes, up or down.


Yes (I checked)


Thank you for this. How hard would this be to port to ipad?


Probably isn’t fast enough.


The Pro and Air models have M1 chips, roughly on par with a MacBook Air


Oh wow! Didn’t realize.


iPad Pro got it in spring 2021, so an M2 refresh seems likely too. October event along with an M2 MacBook Pro refresh? Or maybe not until spring 2023.

Another comment mentions RAM capabilities. Unfortunately that’s tied to the storage tiers instead of being something you can pick separately, so if you want 16 GB of RAM you have to buy the 1 TB or 2 TB models. Meaning for a 12.9” iPad Pro, if you want 16 GB you’re looking at an $1800 tablet. Not ideal.


RAM might hold it back?

Still, probably not too hard to build for iPad or iOS.


The ipad pro has a 16gb option. It's essentially the same hardware as the macbook.


do you mind putting a license on your repo? as it is right now Diffusion Bee is technically not open source.


If it works, it's incredible!


It works, well, incredibly.


how is this possible without a dedicated gPU? thought stable diffusion would require far more horsepower


M1s, especially the Max and Ultra, have pretty decent GPUs.


Horsepower can make it go faster, but the major limitation is graphics memory. Graphics cards with under 12 GB of memory can’t handle the model (although I believe there are lower-memory optimizations out there), which means you need a pretty high-end dedicated graphics card on a PC. But because Apple Silicon chips have reasonably fast on-chip graphics with integrated memory, it can run pretty efficiently as long as your Mac has 16gb or more of RAM.


They have integrated GPU that I believe Apple claimed was comparable to RTX 3090 (perhaps since debunked though or at the least maybe a misleading claim).


Apple compared M1 max to RTX 3080 (mobile) which was a stretch.

M1 ultra was compared to RTX 3090 which was a larger stretch.

The M1 max deliver about 10.5 tflops The M1 ultra about 21 tflops.

The desktop RTX 3080 delivers about 30 tflops and RTX 3090 about 40.

Apple’s comparison graph showed the speed of the M1s vs. RTXs at increasing power levels, with the M1s being more efficient at the same watt levels (which is probably true). However, since the graph stopped before the RTX GPUs reached full potential, the graph was somewhat misleading.

The M1 max and Ultra have extra video processing modules that make them faster than the RTX GPUs at some video tasks though.


Tflops isn't an orange to orange comparison.


I believe thats cherry picked data. More specifically apple says it’s comparable to the 3090x at a given power budget of 100 watts. They don’t mention that the 3090 goes up to 360 watts.


The relative feebleness of x86 iGPUs is partly about the bad software situation (fragmentation etc, the whole story of how webgl is now the only portable way) and lack of demand for better iGPUs. AMD tried the strategy of beefier iGPUs for a while with a "build it and they [software] will come" leap of faith but pulled back after trying for many years.


Thanks for sharing. Can you say a little bit more about what prompted you to make this?


Well, all the other offline tools seem didn't seem very intuitive to install, for someone without technical knowledge.


Even for someone with technical knowledge this is a breath of fresh air...why go through the trouble of even writing a script? I just wanna click a button and see something


Didn't realize that it will become #1 on HN. Thanks everyone.

If you are on twitter and would like to share : https://twitter.com/divamgupta/status/1569014206912929796


If anyone using this is getting bad results, it's probably your prompt that needs work. I recommend looking at https://lexica.art for prompts (NSFW warning - unfiltered user generated data). It has the largest collection of results along with their prompts and a good search engine.


i've been putting together a guide to prompts here https://github.com/sw-yx/prompt-eng/blob/main/PROMPTS.md


And also this one: https://generrated.com/


How can I post my images to lexica.art ? I don’t see an upload option


I am not sure, it could be that they are listening on tools that output result publicly (Midjouney's Discord bot for example).

Edit:

Yep, that's it:

"you can't (yet)

this is reiterated here like a 100 times at this point

images and prompts were scraped from dicord bots

on the official discord

as there's no copyright on the images"


Thanks. That's helpful.


For those of you following the lstein fork (especially the development branch), it has been making great progress. I'm not getting black images anymore and the speed has gone up significantly.

Not one click install by a long shot, but the documentation is pretty clear to follow. Anyone with a bit of CLI experience can do it and if you don't have that, this is a great way to kind of stumble your way towards something working and learn in the process...

https://github.com/lstein/stable-diffusion/tree/development/...


As a slightly shameless plug, I've been hacking up a UI specifically for that fork with a focus on a more efficient workflow for image synthesis. A demonstration video can be found here: https://vimeo.com/748114237

If people have any sort of feedback I'd love to hear it, or if people have some specific features that are missing from the other UIs :)


Nice UI. Which GPU are you using? Seems to be really fast. I have 3090 and it's not that fast. 4 images with 20 timesteps takes 7.4 seconds.


Thanks :)

I generate one image in about ~3 seconds with the DDIM sampler, 20 steps, on a RTX 2080Ti (~8it/s). The video on the Patreon page is sped up as it's not very interesting to sit and watch renders haha.

Although, some of the users who started using my UI weren't using the fork my app connects to, and were surprised it was a bit faster than what they were using before, so maybe you can give it a try. The repository is https://github.com/lstein/stable-diffusion


Oh, I thought it was real-time :) Anyway, I might try out that fork too.


Bravo. This looks really promising and is exactly the sort of thing one needs to get models like these to generate what you imagine.


Thank you so much! Really happy to hear.

Hopefully I can do a Show HN in the future, when/if there is a free version for people to play around with, and get some really good feedback that way.


Please do, I am sure you will get some good feedback.


I don't understand why all these crazy forks don't switch to using the HuggingFace codebase[1]. It's much better code and easier to add features to.

It's true you have to use the code from git rather than a release, but that's not hard.

https://github.com/nlothian/m1_huggingface_diffusers_demo is my clean demo repo with a notebook showing the usage. The standard HuggingFace examples (eg for img2img[2]) port across with no trouble too.

[1] https://github.com/huggingface/diffusers

[2] https://github.com/huggingface/diffusers#image-to-image-text...


Inertia, mostly. The official press release on August 10th linked to https://github.com/CompVis/stable-diffusion and diffusers didn't add support for Stable Diffusion until 5782e0393d on August 14th. There has been a ton of work on adding features on top of the CompVis Github release and backporting that work to Diffusers just isn't as interesting as adding new features to the existing fork. There has been some adoption of Diffusers though.


Has anyone looked at the code and verified the statements in the readme?


Ran it with Little Snitch installed (why bother looking at the code when a malicious actor can just upload a modified binary anyway?) and the claims seem to be legit so far.


Presumably need to look at code, build code, and compare binaries.


Why not just do this yourself before commenting?


Because this is a community, and I appreciate having had the question asked and answered, as do many others I'm sure.


Open AI vs "OpenAI", very different things.

The fact that I can do this on commodity hardware on a 4GB model. A model that understands text and visual images, just absolutely blows my mind.

I almost feel like in a new future, a 100GB model may be able to offline handle speech -> text, video -> live scene graph. A robot that could base level physical understanding of our world like a 4 year old does. (objects, their relationship to other objects and behaviors)


This requires macOS 12.5 but it doesn't seem like it's released? At least for my region?

System preferences "Software updates" tab says I'm on the latest (12.4) and there are no updates for me to install.

How am I supposed to try this?


I have the same issue. 12.5.1 seems to have been released in August: https://9to5mac.com/2022/08/17/macos-12-5-1-monterey-securit...

What gives.

edit: I went to the App Store and found the listing for Monterey and clicked "GET" which opened the software update dialog with 12.5.1 and the option to upgrade.


I had the same issue (12.2.1), something seems to be borked with software update. Resolved by booting into safe mode, update showed up as expected.


Same here but after a reboot the option to upgrade to 12.6 appeared.


Thank you for this public service OP! Might consider adding “Show HN” to the prefix if you’ve got the characters left.


Guess I'll ask. Any Linux love here? AMD appreciated as well. Seems like maybe a job for an Appimage? Docker maybe?



There are a couple decent videos on YouTube that walk through getting SD to run on Linux with AMD cards. The best one works with Arch but the same general steps should be compatible.


Work in progress!


Both of my M1 Macs only have 8G of RAM. Is it a waste of time trying to run this with 8G?


Takes about 8 minutes per image with 15 steps on my 8GB M1 Air.


Yes, it's a waste of time. It works, kinda, sorta, but only after shutting down every other app, and even then occasionally it seems to fall over. The model is larger than 8GB in memory, so it's agonizing.


What about 16MB RAM?


16MB? No. 16GB? Sure.


My M1 Pro with 16GB handles it fine. My M1 with 8GB does not.


I have a 2020 mac mini with 8GB RAM, an image takes about 7 minutes with the default settings, no problems so far.


Oof. I guess I should at least 16GB on my next Mac (I've been very happy with 8GB on my M1 Air for just about everything else, though).


RAM won't expected the speed much, mostly impacts how large images you can render (8GB would be limited to 512x512 if not smaller). Memory bandwidth and available computer cores on the GPU matters more when it comes to generation speed.


Apparently it is many times faster on a 16GB M1 Mac. It was taking >5 minutes to render at 512x512 on my 8GB M1.

According to others, it’s about a minute or less with 16GB.


Is there an equivalent for Windows 10 + Nvidia GPU 8gB ram?


I've been using this with a Nvidia GPU with 6GB ram:

https://nmkd.itch.io/t2i-gui https://github.com/n00mkrad/text2image-gui



I've been using a one click installer for windows (grisk) to play with SD and so far I'm very impressed. The technology is there, you just need to tweak your prompts and the gui's setting to get whatever you want. The whole img2img too is awesome, you can simply and a quick sketch in paint (yes! paint!) and then feed it to SD. It'll output your exact idea in whatever style you want.

We're at a turning point.


Why is "trending on artstation" a keyword that influences the ML model? seems weird


It's trained on a dataset that contains images together with text describing the image. Some images have been scraped from artstation, and if they were scraped from the "Trending" page, where well done images end up, it's included in the description. So by adding that to the prompt, you influence the image to be more similar to images that have been trending on artstation.


Does anyone have something for the Intel Macs?


Same question.


Sometimes (~30%) I'm getting a black square. What can be the reason?


Same here.


Looking forward to following this variant of Stable Diffusion, as it's working great on my laptop. Mighty glad I got the 16g of RAM, though I find if I step a canvas dimension down from 512 I get snappier generation… no biggie, anything I got that's useable I'd have to upscale anyway…

Since it's a Mac app, I have to wonder if it could stick the prompt, steps, and guidance into the notes field of Get Info? I find I'm generating a lot of relatively low guidance (I'd love a 6.5 option) images and iterating on the prompts with an eye to what it's suggesting to the algorithm. As such I have no way to closely track what prompt was active on any output as it changes so often.

I strongly suspect the real merit of this approach is not the crowd-pleasing, 'set very high guidance on some artistic trope so it's forced to fake something very impressive', but rather the ability to integrate a bunch of disparate guidances and occasionally hit on a striking image. It's like the harder you force it into a particular mold, the more derivative and stifled its output becomes, but if you let it free associate… I'll be experimenting. Seems like getting the occasional black image shows you're giving it the freest rein.

Looking forward to 'image to image' a lot. I assume the prompt still matters, as it's fundamental to the diffusion denoising? Image to image means iterating on visual 'seeds'.

I've seen talk of textual inversion training: it would interest me greatly to be able to generate objects and styles and train a personal version of SD in a sort of back-and-forth iteration. The link to language is really important here, but so is the ability to operate as an artist and generate drawings, aesthetics and so on, to train the model. I did 440 episodes of a hand-drawn webcomic once, which had recurring characters and an ink-wash grayscale style I gradually developed. That means I have my own dataset, which is my own property, and certainly didn't make it big enough to make it into Stable Diffusion like say Beeple did.

Interesting times for the cybernetic artist. Basically computer-assisted hallucinatory unconscious, plus computer-assisted rendering. You could feed all of Cerebus (Dave Sim and Gerhard) into a model like this, panel by panel, and you'd probably get a hell of a lot of Gerhard out because so much of the panel area is tone and texture from him…


Pretty cool, generates an image in 17 seconds on my M1 Max with 64GB. Not sure how the quality compares, Dall-E seemed a bit more impressive from the small sample I've tried, but great to have it on your laptop.


How long is the generation taking on say a 2021 MacBook M1 Pro?


55 seconds on a M1 Pro MacBook with 16GB RAM to generate a picture (with default settings, so 512x512, 25 steps).

On a M1 Ultra, it takes 12 seconds with 64GB RAM (same settings). While computing, the Mac Studio is pulling 100 watts of power.


> 55 seconds on a M1 Pro MacBook with 16GB RAM to generate a picture

I've been running webui [1] on M1 MacBook Air 16GB RAM: 512x512, 50 steps takes almost 300 seconds. I'm suspecting that it is running on CPU, because the script says "Max VRAM used for this generation: 0.00G" and Activity Monitor says that it's using lots of CPU % and no GPU % at all. When M1 users are running stable diffusion, does the Activity Monitor show the GPU usage correctly?

[1] https://github.com/lstein/stable-diffusion


I found the reason why it was using CPU only. I was running macOS 12.2. I upgraded to 12.5.1. and it's now using GPU. Activity Monitor shows it. Also, the time for 50 steps dropped to 175 secs. 25 steps is about 80 secs, which is closer to the MacBook Pro time..


M1 Pro MBP with 32GB RAM here, takes 30 seconds on the default settings.


Based on the screenshot, it looks like this is set to default to 25 steps. I've been using 50 steps on my M1 macbook pro with 16gb of ram. It takes about 1m30s per image.


Just another datapoint, my M2 Air with 24GB RAM takes around 75-80 seconds.


Thanks for the replies! M1 Pro with 32G and looking forward to firing this up! I wonder if the images will rival the “quality” of what you get with MidJourney..? I don’t know much about any of this ML stuff so not really sure how it all works or how close this might be in the details etc of what Midjourney does..


And another datapoint, my M1 Max (24 core gpu) with 64GB RAM takes: ~36s for 50 steps ~19s for 25 steps


Looking to buy a laptop (preferrably mac) to run such models now and for the upcoming years.

I should max out on the Mac gpus?


on a 2020 M1 Air 8GB memory I timed 9 minutes 48 seconds


Would love to see this turn into a cross-platform/cross-hardware AI art GUI+backend with an open source license.

Speaking of which, what is the license on this? (the electron app)


Why does it need network access? Needs to download models. Will put it in your home dir btw.. in case you suddenly miss 5gb of storage


Nice one. Too bad "image to image" is not available, as that's the one I'd like to experiment with the most.


Any plans for video support?

If we can have "keyframes" with prompts that would output a PNG sequence or video, that would be awesome.


Can you support MacOS Monterey 12.4? You're only supporting Ventura 12.5+ and it isn't even released yet.


12.5.1 is the latest, non-beta release. That's still Monterey. Ventura will be 13.x.


Wow! Literally, the easiest way to install Stable Diffusion. It worked perfectly on my macOS Venture Beta build.

Thank you. :)


Any benchmarks on this? How long does it take to generate a batch of images, say with steps=100?


On my M1 MAx with 32 GB I'm getting 1.5 iterations/second (ie, ~30 seconds for the standard 50 iterations) using this example: https://github.com/nlothian/m1_huggingface_diffusers_demo


That's pretty good then - on my 3080, I'm getting ~8it/s.


Anyone try it on a macbook air m1?


Does this incorporate GFPGAN for face detection/cleaning and realesran for upscaling?


no


The prompt example shows "photrorealistic" which is unlikely to work well.


amazing, look forward to trying it out. has anyone done the same for windows?


Yes: https://nmkd.itch.io/t2i-gui

No idea if it supports AMD GPUs though.


Do you have any info on perf compared to this: https://news.ycombinator.com/item?id=32805174


More optimization work has and is going into Nvidia support, so those are currently faster. Pytorch support for MPS devices is relatively new, so there's a ton of optimization that hasn't been done yet, so it's not clear which underlying hardware is actually faster for this specific task, but it looks like the top end Apple Silicon is in the same bracket as a consumer-grade Nvidia GPU.


My consumer 3060 does 7.2 it/s, which is 5x faster than the M2.

Sorry, Apple is nowhere close yet.


Just done a run on my 3080 under Windows using https://github.com/bfirsh/stable-diffusion.git and it's about 8 iterations/sec when nothing else is using CPU or GPU.


Thanks for this, exactly what I was looking for


I use GRisk GUI.


It’s fanatstic how quickly these models can be shared with anyone.


Which settings are generating the best images for people?


What do the advanced options mean?


Why cap it at 50 steps?


It stops changing.


Depends on cfg scale and sampler. Sometimes 100 and 150 gives different results.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: