This really is a one-click installer, which is excellent. As far as I know it's the first and only one for M1. It's quite a polished UI, but there are a lot of features missing. The first ones that come to mind are:
- No way to specify a seed. This is an important part of SD workflows, letting you redo an image with a slightly tweaked a prompt.
- No way to specify a custom model. Alternative models (such as Waifu Diffusion) are fun to play with too.
- No way to generate batches of images.
- No way to specify which sampler to use.
- No way to adjust the weight of specific sub-phrases, or use negative weights.
There's also no img2img yet, but it sounds like that's a planned feature.
Unfortunately they did not include a one-click uninstaller! Besides removing the .app, you also need to delete the 4GB+ model that it downloads and other files that it scatters about:
To be fair, this is a long standing issue between how Apple wants people to uninstall their app (by throwing them in the trash) and how modern apps really work in practice (some apps need/generate temp data in AppSupport, and sometimes it's a non trivial amount).
As a developer the best you can usually do is put some instructions somewhere (website, somewhere in a menu in UI), or do a package install that puts an uninstaller somewhere (more of the Windows way, but it breaks many expectations from users like having your app be one icon in `Applications`, and not it's own folder with your app and maybe a "Uninstall" app there too). For Aerial I give uninstall instructions on the website, but there's not much of a standard on how to handle this (that I know of, at least).
As far as I understand this is downloaded data and they may offer model selection later on? If so they have little choice and putting your data in the user `/Library/Application Support` (or maybe `/Library/Caches` but that's a mess in itself) is what you're supposed to do.
With the various security changes in "recent" macOS, you can't modify your bundle and download files inside it, as it would change its signature and break the notarisation system.
The latter. A model update would then just be an application update. To answer my own question - if they want to allow a choice of downloadable models, then Application Support is the perfect place.
A note that this appears to be, at the time of writing, shared source and not strictly "open source" as it claims to be.
"Stable Diffusion Buddy is open source and free for personal use.
You may not use Stable Diffusion Buddy for any commercial purpose. That means you may not sell or profit in any way from the compiled app, from compiling the app yourself, from the source code, or a fork of it. The images generated by using this app do not fall under these limitations for obvious reasons."
That's a very pretty landing page, but it entirely avoids describing any features, or showing what it looks like beyond a thumbnail-sized image. Do you have any insights?
Unfortunately, no, haven't had time to try it yet, bookmarked it few days ago. Just wanted to comment that there are others and probably will be many more coming shortly.
I'm used to troubleshoot all kinds of stuff but I'm not a Python guy and wrangling with dependencies and virtual environments is not fun in my book. I've got lstein repo working but can't wait to have cleaner way of doing things.
The logs for the app (logs tab) state it's just repeatedly using the number 42 as a seed. But yes in the UI the results are different each run, even with an unchanged prompt. Didn't quite parse your last but if you're saying it's misleading, yeah kind of, because you can't share or save a prompt to redo the same result.
Weird that it's considered a big challenge to fix… something in Metal? Core ML? SIMD libraries? Some translation layer? Something is not being said here.
I can confirm that this works great on the M1 Max. It has been taking less than a minute to run the model and generate the images. So far I have been really satisfied with the output. Let's please make one-click installers the norm for future technology. I love nerding out, but wading through dependency hell is not worth it most of the time.
Normally it's fine, but Apple have decided to force everyone into installing a conda python alongside normal python, which makes everything way more complicated than it needs to be.
I feel like I'm maybe missing some context here, because I've never had to install "conda Python" on any Mac ever. I had the system Python and a Homebrew-installed Python happily co-existing, and at various points have had "virtualized" Pythons installed via virtualenv or pipx. On Apple Silicon Macs there's no Python installed by default at all.
Now, I've seen an awful lot of programs written in Python that decide to force you to install them with virtualenv or pipx, but that's not Apple's doing.
> On Apple Silicon Macs there's no Python installed by default at all.
Not true. There's no python2 anymore, but there is a python3. You can find it at /usr/bin/python3 (most likely your homebrew install preceeds it in your PATH)
In my experience, quite a few libraries only worked when pulled from conda, due to the whole M1-compatible compilation thing. So while Apple does not force you to install conda, in practice you kind of are forced.
This is/has been changing slowly, though. Things are much better then they were a year ago.
It can be complicated and frustrating getting Stable Diffusion (or indeed, anything related to ML, in my experience) running on other OSs too. It's not really related to Apple's approach to Python.
Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
Features:
- Full data privacy - nothing is sent to the cloud
- Clean and easy to use UI
- One click installer
- No dependencies needed
- Multiple image sizes
- Optimized for M1/M2 Chips
- Runs locally on your computer
All I want is the ability to use it in peace without getting a threatening message whenever I ask for something vaguely edgy - and that need not be porn either.
Seriously though, I imagine this is less a case of whether this specific implementation permits pornography, but whether any porn was included in the dataset it was trained on. No matter how good AI is, it only knows what it knows.
"Stable Diffusion" is referring to one specific set of models, which were trained on a dataset including some porn.
The official implementation has a second model that detects pornography, and replaces outputs including it with a picture of this dude https://www.youtube.com/watch?v=dQw4w9WgXcQ (Not kidding). Removing that is a really simple a one line change in the official script.
This is just a small fraction of the imagery and it does include pornography.
I respect that if you're online and reading HN, you're probably mature enough to handle seeing pornography. So if you're curious to see some of the training data that made it in: choose from the dropdown "-column-" and change it to "punsafe" and set the value in the adjacent field to "1", then press Apply.
Obviously this will show pornography on your screen.
I mean I don't know why we have to talk around this point: yes, you can definitely generate pornography with SD. Be prepared to see a lot of weird things, because that super-realism really means it just drives right into the uncanny/body horror valley at full speed.
But the second point here is also wrong: the whole reason these models are interesting is because they can generate things they haven't seen before - the corpus of knowledge represents some type of abstract understanding of how words relate to things that theoretically does encode a little bit of the mechanisms behind it.
For example, it theoretically should be able to reconstruct human like poses it has never seen before provided it has examples of what humans look like and something which transposes to an approximate value - an obvious example in the context of the original question would be building photorealistic versions of a sketched concept (since somewhere in it's model is an axis which traces from "artistic depiction of a human" to "photograph of a human" in terms of style content).
Of course, most people aren't very good at drawing realistic human poses - it's a learned skill. But the magic of deep learning is really that eventually it doesn't need to be - we would hopefully be able to train a model which can be easily copied and distributed which represents the skill, and SD is a big step in that direction (whether it's a local maxima remains to be seen - it's dramatic, but is it versatile?)
It's not good at penises or vaginas, just breasts and butts. I can find what I like pornographically without ai. But the ducking and dodging around nudity and sexuality is childish and tiresome and we ought to discuss this topic as disinterestedly and nonchalantly as we do the other things it excels or struggles at. Penises and vaginas are not somehow more vile than other things.
Sure enough, sexual content ought to be age-gated and there is potential for abuse (slapping a person's face onto explicit imagery without their consent isn't cool yall). But are we working toward AGI or not? Because at some point it's gonna have to know about the birds and the bees.
Mashups is the wrong way to think about it. It's generalizing at a higher level than texture / image sampling and it can tween things in latent space to get to visual spaces that haven't been explored by human artists before.
It requires a good steer and prompting is a clumsy tool for fine tuning - it's adequate for initialization but we lack words for every shade of meaning, and phrase weighting is pretty clumsy too, because words have a blend of meaning.
"It's generalizing at a higher level than texture / image sampling and it can tween things in latent space to get to visual spaces that haven't been explored by human artists before."
The very fact that the model is interpolating between things in the latent space probably explains why its images haven't been explored by human artists before: because there is a disconnect between the latent space of the model and genuine "latent space" of human artistic endeavor, which is an interplay between the laws of physics and the aesthetic interests of humans. I think these models know very little about either of those things and thus generate some pretty interesting novelty.
I think of artistic endeavour as a bit like the inverse of txt2img, but running in your head, and just projecting to the internal latent space, not all the way to words. It's not just aesthetic, it's about triggering feelings through senses. Images need to connect with the audience through associations with scenes, events, moods and so on from the audience members' lives.
Aesthetic choices like colour and shapes and composition combine with literal representations, facial emotions, symbolic meanings and so on. AI art so far feels quite shallow by this metric, usually only hitting a couple of notes. But sometimes it can play those couple of notes very sweetly.
It can make porn images, but I haven't been able to get consistently good results. There's probably ways to tweak the parameters to improve things, but I haven't figured that out.
It mostly understands what naked people look like, but the images I've generated involve a lot of accidental body horror. You get a lot of people with extra arms, weird eyes, or body parts in the wrong places. The fact that they explicitly removed porn from the training set comes through pretty clearly in the model.
I suspect it could be improved a lot with some specialized retraining. As far as I know, nobody has done that work yet.
How do you see upvotes? I’ve never had any visibility to upvoted comments (downvoted of course yes), or at least it’s not obvious at all to me and see no indication of it..
You can see the total score (upvotes - downovtes) for your comment where you see the upvote/downvote button for everyone else's comments. Or at least I can.
iPad Pro got it in spring 2021, so an M2 refresh seems likely too. October event along with an M2 MacBook Pro refresh? Or maybe not until spring 2023.
Another comment mentions RAM capabilities. Unfortunately that’s tied to the storage tiers instead of being something you can pick separately, so if you want 16 GB of RAM you have to buy the 1 TB or 2 TB models. Meaning for a 12.9” iPad Pro, if you want 16 GB you’re looking at an $1800 tablet. Not ideal.
Horsepower can make it go faster, but the major limitation is graphics memory. Graphics cards with under 12 GB of memory can’t handle the model (although I believe there are lower-memory optimizations out there), which means you need a pretty high-end dedicated graphics card on a PC. But because Apple Silicon chips have reasonably fast on-chip graphics with integrated memory, it can run pretty efficiently as long as your Mac has 16gb or more of RAM.
They have integrated GPU that I believe Apple claimed was comparable to RTX 3090 (perhaps since debunked though or at the least maybe a misleading claim).
Apple compared M1 max to RTX 3080 (mobile) which was a stretch.
M1 ultra was compared to RTX 3090 which was a larger stretch.
The M1 max deliver about 10.5 tflops
The M1 ultra about 21 tflops.
The desktop RTX 3080 delivers about 30 tflops and RTX 3090 about 40.
Apple’s comparison graph showed the speed of the M1s vs. RTXs at increasing power levels, with the M1s being more efficient at the same watt levels (which is probably true). However, since the graph stopped before the RTX GPUs reached full potential, the graph was somewhat misleading.
The M1 max and Ultra have extra video processing modules that make them faster than the RTX GPUs at some video tasks though.
I believe thats cherry picked data. More specifically apple says it’s comparable to the 3090x at a given power budget of 100 watts. They don’t mention that the 3090 goes up to 360 watts.
The relative feebleness of x86 iGPUs is partly about the bad software situation (fragmentation etc, the whole story of how webgl is now the only portable way) and lack of demand for better iGPUs. AMD tried the strategy of beefier iGPUs for a while with a "build it and they [software] will come" leap of faith but pulled back after trying for many years.
Even for someone with technical knowledge this is a breath of fresh air...why go through the trouble of even writing a script? I just wanna click a button and see something
If anyone using this is getting bad results, it's probably your prompt that needs work. I recommend looking at https://lexica.art for prompts (NSFW warning - unfiltered user generated data). It has the largest collection of results along with their prompts and a good search engine.
For those of you following the lstein fork (especially the development branch), it has been making great progress. I'm not getting black images anymore and the speed has gone up significantly.
Not one click install by a long shot, but the documentation is pretty clear to follow. Anyone with a bit of CLI experience can do it and if you don't have that, this is a great way to kind of stumble your way towards something working and learn in the process...
As a slightly shameless plug, I've been hacking up a UI specifically for that fork with a focus on a more efficient workflow for image synthesis. A demonstration video can be found here: https://vimeo.com/748114237
If people have any sort of feedback I'd love to hear it, or if people have some specific features that are missing from the other UIs :)
I generate one image in about ~3 seconds with the DDIM sampler, 20 steps, on a RTX 2080Ti (~8it/s). The video on the Patreon page is sped up as it's not very interesting to sit and watch renders haha.
Although, some of the users who started using my UI weren't using the fork my app connects to, and were surprised it was a bit faster than what they were using before, so maybe you can give it a try. The repository is https://github.com/lstein/stable-diffusion
Hopefully I can do a Show HN in the future, when/if there is a free version for people to play around with, and get some really good feedback that way.
Inertia, mostly. The official press release on August 10th linked to https://github.com/CompVis/stable-diffusion and diffusers didn't add support for Stable Diffusion until 5782e0393d on August 14th. There has been a ton of work on adding features on top of the CompVis Github release and backporting that work to Diffusers just isn't as interesting as adding new features to the existing fork. There has been some adoption of Diffusers though.
Ran it with Little Snitch installed (why bother looking at the code when a malicious actor can just upload a modified binary anyway?) and the claims seem to be legit so far.
The fact that I can do this on commodity hardware on a 4GB model. A model that understands text and visual images, just absolutely blows my mind.
I almost feel like in a new future, a 100GB model may be able to offline handle speech -> text, video -> live scene graph. A robot that could base level physical understanding of our world like a 4 year old does. (objects, their relationship to other objects and behaviors)
edit: I went to the App Store and found the listing for Monterey and clicked "GET" which opened the software update dialog with 12.5.1 and the option to upgrade.
There are a couple decent videos on YouTube that walk through getting SD to run on Linux with AMD cards. The best one works with Arch but the same general steps should be compatible.
Yes, it's a waste of time. It works, kinda, sorta, but only after shutting down every other app, and even then occasionally it seems to fall over. The model is larger than 8GB in memory, so it's agonizing.
RAM won't expected the speed much, mostly impacts how large images you can render (8GB would be limited to 512x512 if not smaller). Memory bandwidth and available computer cores on the GPU matters more when it comes to generation speed.
I've been using a one click installer for windows (grisk) to play with SD and so far I'm very impressed. The technology is there, you just need to tweak your prompts and the gui's setting to get whatever you want. The whole img2img too is awesome, you can simply and a quick sketch in paint (yes! paint!) and then feed it to SD. It'll output your exact idea in whatever style you want.
It's trained on a dataset that contains images together with text describing the image. Some images have been scraped from artstation, and if they were scraped from the "Trending" page, where well done images end up, it's included in the description. So by adding that to the prompt, you influence the image to be more similar to images that have been trending on artstation.
Looking forward to following this variant of Stable Diffusion, as it's working great on my laptop. Mighty glad I got the 16g of RAM, though I find if I step a canvas dimension down from 512 I get snappier generation… no biggie, anything I got that's useable I'd have to upscale anyway…
Since it's a Mac app, I have to wonder if it could stick the prompt, steps, and guidance into the notes field of Get Info? I find I'm generating a lot of relatively low guidance (I'd love a 6.5 option) images and iterating on the prompts with an eye to what it's suggesting to the algorithm. As such I have no way to closely track what prompt was active on any output as it changes so often.
I strongly suspect the real merit of this approach is not the crowd-pleasing, 'set very high guidance on some artistic trope so it's forced to fake something very impressive', but rather the ability to integrate a bunch of disparate guidances and occasionally hit on a striking image. It's like the harder you force it into a particular mold, the more derivative and stifled its output becomes, but if you let it free associate… I'll be experimenting. Seems like getting the occasional black image shows you're giving it the freest rein.
Looking forward to 'image to image' a lot. I assume the prompt still matters, as it's fundamental to the diffusion denoising? Image to image means iterating on visual 'seeds'.
I've seen talk of textual inversion training: it would interest me greatly to be able to generate objects and styles and train a personal version of SD in a sort of back-and-forth iteration. The link to language is really important here, but so is the ability to operate as an artist and generate drawings, aesthetics and so on, to train the model. I did 440 episodes of a hand-drawn webcomic once, which had recurring characters and an ink-wash grayscale style I gradually developed. That means I have my own dataset, which is my own property, and certainly didn't make it big enough to make it into Stable Diffusion like say Beeple did.
Interesting times for the cybernetic artist. Basically computer-assisted hallucinatory unconscious, plus computer-assisted rendering. You could feed all of Cerebus (Dave Sim and Gerhard) into a model like this, panel by panel, and you'd probably get a hell of a lot of Gerhard out because so much of the panel area is tone and texture from him…
Pretty cool, generates an image in 17 seconds on my M1 Max with 64GB. Not sure how the quality compares, Dall-E seemed a bit more impressive from the small sample I've tried, but great to have it on your laptop.
> 55 seconds on a M1 Pro MacBook with 16GB RAM to generate a picture
I've been running webui [1] on M1 MacBook Air 16GB RAM: 512x512, 50 steps takes almost 300 seconds. I'm suspecting that it is running on CPU, because the script says "Max VRAM used for this generation: 0.00G" and Activity Monitor says that it's using lots of CPU % and no GPU % at all. When M1 users are running stable diffusion, does the Activity Monitor show the GPU usage correctly?
I found the reason why it was using CPU only. I was running macOS 12.2. I upgraded to 12.5.1. and it's now using GPU. Activity Monitor shows it. Also, the time for 50 steps dropped to 175 secs. 25 steps is about 80 secs, which is closer to the MacBook Pro time..
Based on the screenshot, it looks like this is set to default to 25 steps. I've been using 50 steps on my M1 macbook pro with 16gb of ram. It takes about 1m30s per image.
Thanks for the replies! M1 Pro with 32G and looking forward to firing this up! I wonder if the images will rival the “quality” of what you get with MidJourney..? I don’t know much about any of this ML stuff so not really sure how it all works or how close this might be in the details etc of what Midjourney does..
More optimization work has and is going into Nvidia support, so those are currently faster. Pytorch support for MPS devices is relatively new, so there's a ton of optimization that hasn't been done yet, so it's not clear which underlying hardware is actually faster for this specific task, but it looks like the top end Apple Silicon is in the same bracket as a consumer-grade Nvidia GPU.
- No way to specify a seed. This is an important part of SD workflows, letting you redo an image with a slightly tweaked a prompt.
- No way to specify a custom model. Alternative models (such as Waifu Diffusion) are fun to play with too.
- No way to generate batches of images.
- No way to specify which sampler to use.
- No way to adjust the weight of specific sub-phrases, or use negative weights.
There's also no img2img yet, but it sounds like that's a planned feature.
Other GUIs such as https://github.com/sd-webui/stable-diffusion-webui, https://github.com/AUTOMATIC1111/stable-diffusion-webui have many more features - but not all will be trivial to port to M1.
P.S.: For the people asking - yes, it can do NSFW images. I checked.