Show HN: InvokeAI, an open source Stable Diffusion toolkit and WebUI

cercatrova · on Oct 10, 2022

Speaking of SD, I wonder if 1.4 will be the last truly open release as Emad said 1.5 would release a while ago but it's been held up for "compliance" reasons. Maybe they got legal threats due to using artists' works and stock images. If so, that would be sad to see it.

In a way it reminds me of people who make unofficial remakes of games but get cease and desists if they show gameplay while in development. The correct move is to fully develop the game and release it, then if you get C&Ds, too late, the game is already available to download.

zamber · on Oct 11, 2022

There was an AMA with Emad yesterday on discord. He got asked this. The promise is that 1.5 will be released in the following week.

The slowdown has numerous issues. They got legal threats, death threats, and threats from some congresswoman to have them banned by the NSA (1).

Stability.ai workers (except for one) have a clause that they can open-source anything they're working on. They do and supposedly will open-source everything because they want to do a ecosystem, not a cash grab in the model of DALL-E.

Also they don't have one central place for all their projects and will scale from 100 to 250 employees in the following year so things should speed up.

1) https://eshoo.house.gov/media/press-releases/eshoo-urges-nsa...

nl · on Oct 11, 2022

> banned by the NSA

Note that this NSA is the National Security Advisor, not the National Security Agency.

cercatrova · on Oct 11, 2022

This is quite a distinction to make and I'm not sure why GP didn't make it. National Security Agency is what everyone thinks of being the NSA.

swyx · on Oct 10, 2022

make what you will of it but as of yesterday this was his answer to one of my readers: https://twitter.com/EMostaque/status/1579204017636667392

> No actually dev decision. Generative models are complex to release responsibly and team still working on release guidelines as they get much better, 1.5 is only a marginal FID improvement.

illuminati1911 · on Oct 11, 2022

Honestly this whole "responsible AI" thing is a sad last attempt to run away from the inevitable. The reality is that our politicians will end up in fake photos/videos/audio recordings, people who can't even draw a straight line will be able to make crazy online memes with any kind of imaginable image in just seconds no matter how offensive it is to some people and there is absolutely nothing we can do about it.

When these companies and OS projects create "responsible" or restricted AIs they are at the same time creating a demand for AIs that have no limitations and eventually open source or even commercial AIs will respond to this demand.

I hope while they still play this "responsible" game, they are at least using the time to figure out ways how we can live with this kind of advanced AI in a future where everything is fake/false by default.

drawingthesun · on Oct 11, 2022

It's a farce.

Stable Diffusion cost less than $1 million to train.

I'm dumbfounded to see all the gatekeeping around these ML models considering this.

Within 24 months someone will train a stronger model with a $20,000 home setup and this gatekeeping ridiculousness will be dead for good.

I'm excited to see what $1 million gets you in 24 months, possibly a multi-trillion parameter NLP god. :p

oifjsidjf · on Oct 11, 2022

This.

"Responsible" is just PR talk for "We're gonna try and NOT release it for as long as possible so that we can milk money from you by blackboxing the models and having you pay for tokens".

w4ffl35 · on Oct 11, 2022

I believe the current stable diffusion license is "responsible" enough.

londons_explore · on Oct 10, 2022

Sounds like some middle manager is trying to put his foot down and is saying things like "no more releases till we have designed and tested a 37 step release signoff procedure"

sophrocyne · on Oct 10, 2022

My take is that the “genie is out of the bottle”

Single source “massive models” may be more difficult to get out, but Emad said they’re working in licensing a ton of content to train future models. Even then, anyone can train new models now - The output from Dreambooth and Textual Inversion are already impressive, and seem like just the beginning.

Going to be an interesting road ahead.

jeffparsons · on Oct 10, 2022

Question from an ML-illiterate:

Are there known ways the training for these models could be distributed/decomposed? E.g. SETI-style distribution of a homogenous centrally-defined task, or — much more exciting — recombination of several different models / sets of weights? (I'm just throwing words around here without really understanding them.)

I'm imaging a world in which one group of enthusiasts could work together to train a model on all images on Wikipedia, another group could work on training a model that understands hands really well, and then later yet another group could combine the work of the other two without doing all that training from scratch.

Is that even remotely plausible?

wokwokwok · on Oct 11, 2022

It’s almost certainly not worth the bother.

The effort and time involved in setting up a distributed community training system would be extremely prone to abuse, errors and uncertainty about the results.

You could get better quality, more quickly by simply running a kickstarter and paying for dedicated gpu time.

swyx · on Oct 11, 2022

not sure how illiterate you are, you're asking good questions, but fwiw if you watch the corridor digital video you should be able to grasp how much transfer learning is possible https://www.youtube.com/watch?v=W4Mcuh38wyM

sophrocyne · on Oct 10, 2022

Yes it is!

cmxch · on Oct 10, 2022

Definitely is out of the bottle, especially when training capable cards are getting within reach of regular people.

Sort of hinted at it upthread, but would be interesting if this eventually brings competition to the GPU compute space (AMD, Intel?) .

judge2020 · on Oct 10, 2022

Just train on the output of existing models minus any photos with watermarks - being twice removed is sure to make it even harder to claim copyright :)

F2hP18Foam · on Oct 10, 2022

I wonder if the same will happen to Midjourney or Dall-E. I have generated images on Midjourney that literally had a 'Shutterstock' watermark plastered across them. This watermark was conspicuously missing when the image was upscaled.

throwaway0x7E6 · on Oct 11, 2022

not to Dall-E, because OpenAI is the very source of all these "ethical concerns" about unwashed masses having access to these tools

klyrs · on Oct 12, 2022

I've seen models amplify textures into noise, which attracts towards periodic noise, which has a goodly chance of turning into a watermark. Usually the watermark is gibberish proto-letters but yeah I've seen straight-up Shutterstock watermarks appear too.

noduerme · on Oct 10, 2022

I've had stock photo watermarks show up repeatedly in SD generations as well.

jhbadger · on Oct 10, 2022

And in "paintings" I generate with SD I often see a squiggle in the corner that is presumably the result of a signature in the training set.

wnkrshm · on Oct 11, 2022

If you browse LAION (e.g. using this [0]), which SD uses you find all sorts of owned IP there (shutterstock, artstation, fanart from deviantart and tumblr etc.) - because it's a research data set. LAION has a disclaimer that it does not claim ownership and that any troubles you run into by using the graph are your own.

[0] https://rom1504.github.io/clip-retrieval

swyx · on Oct 10, 2022

[OT] its been hard for me to trace the universe of stable diffusion forks so ive been maintaining a list here: https://github.com/sw-yx/prompt-eng#sd-major-forks

please let me know/send PRs if i missed anything, its been a couple months so i'm overdue for a round of cleanup/reorganizing

capableweb · on Oct 10, 2022

I'm personally working on a UI as well, that is using InvokeAI :) It has a bit of a different focus, namely organization of generated images and facilitating generating a lot of images quickly via randomization. Here is the current page for it: https://patreon.com/auto_sd_workflow

Currently expanding it a lot with some fun features:

- multi-gpu support (first UI that would support that I think)

- no-click installer (installs everything when you start it up) that works on Windows, Linux and macOS

- A cloud version where you can "rent" access to the UI + very powerful GPU instances without having to run anything locally yourself.

Been waiting to submitting it all to HN as a Show HN but have to wait a bit for everything to get into place first :)

swyx · on Oct 11, 2022

since its paid, i'd classify you as a distro :) https://github.com/sw-yx/prompt-eng/blob/main/README.md#sd-d...

capableweb · on Oct 11, 2022

Yes, that makes sense :)

jayd1616 · on Oct 10, 2022

Here's another one for you: https://github.com/brycedrennan/imaginAIry

stavros · on Oct 10, 2022

Imaginairy is great.

grosswait · on Oct 10, 2022

Maybe I missed it but didn’t see https://github.com/divamgupta/diffusionbee-stable-diffusion-...

swyx · on Oct 11, 2022

i have diffusionbee.com

hleszek · on Oct 10, 2022

I see you've got my https://github.com/leszekhanusz/diffusion-ui gui but it seems to be linked to a completely unrelated face-swapping interface?

And in communities, you can probably add https://stablehorde.net

swyx · on Oct 10, 2022

uhhh... copy paste brainfart sorry. thanks for correction

hafriedlander · on Oct 10, 2022

Since everyone's jumping on the plug wagon, allow me to throw on my project, www.stablecabal.org.

We've got a GRPC server that's backwards compatible with the official grpc.stability.ai but adds advanced features, a Flutter based infinite canvas web client in heavy development, a Discord server and a Krita & Photoshop plugin (all except the last are open source under various licenses).

The outpainting the server supports is (IMHO) the best I've seen, and recently got some a tiny glimpse of attention when combined with the Photoshop plugin - https://twitter.com/NicolayMausz/status/1577767106384433156

swyx · on Oct 11, 2022

great submission, thank you. god its so hard to keep on top of who is doing what.

am reworking my readme now with all the updates

hexomancer · on Oct 10, 2022

Shameless plug: I was frustrated with the poor UI of notebook-based frontends so I wrote a desktop version here: https://github.com/ahrm/UnstableFusion .

Here is a video of some of its features: https://www.youtube.com/watch?v=XLOhizAnSfQ&t=1s

westoncb · on Oct 10, 2022

Here's a cross-platform desktop GUI[0] for img2img/txt2img that takes a unique angle by remaining independent of specific models/scripts, though it was originally designed to work with Stable Diffusion.

[0] https://github.com/westoncb/generation-q

cmdr2 · on Oct 11, 2022

https://github.com/cmdr2/stable-diffusion-ui is pretty popular, and is a 1-click installer for Win and Linux (Mac coming soon). Quite a lot of features, and well-liked by users for its easy-to-install and user-friendly GUI.

swyx · on Oct 11, 2022

ah thanks, i've seen you around but somehow forgot to add. I've put you in as a distro since you bundle SD: https://github.com/sw-yx/prompt-eng/blob/main/README.md#sd-d...

cmdr2 · on Oct 11, 2022

Thanks! :)

wyldfire · on Oct 10, 2022

Is "dreambooth" a fork? Or another feature that has been created by composing Stable Diffusion w/something else?

swyx · on Oct 10, 2022

yeah you're right, more the latter. i should split it out.

OG dreambooth is proprietary Google code. what everyone's using is a third party replication of it using SD

wyldfire · on Oct 11, 2022

I wasn't intending to challenge anything of yours. I just saw a lot of "dreambooth" activity on /r/stablediffusion. I haven't kept up on activity in the last week or two and figured you'd know what it was.

zenlikethat · on Oct 10, 2022

Another feature. You can "teach" SD a new concept, e.g., a new person, with a limited number of training images.

lawik · on Oct 10, 2022

Oh, I used the dreeam.py script to back a Telegram bot. It later ended up in my demo for my talk Chat Bots as User Interfaces (with Elixir): https://www.youtube.com/watch?v=DFGHaER6_j4

I primarily used the InvokeAI release because I found it was easy to get going with on Linux and then it was simple enough to hack around with.

Also the first tool I've ever used where I've rode on the ragged edge of what my 3070 is okay with. I've had graphical glitches due to occupying all the video memory (KDE doesn't like it). I've had to quit apps to make it work.

Thanks for making a useful thing of all this Stable Diffusion stuff. I've enjoyed it.

tehsauce · on Oct 10, 2022

A Shameless plug, if anyone is interested in building apps using stable diffusion and wants to keep things as cheap as possible, I built a very user-friendly API that is 1/4 the cost of the official stable diffusion API. There is also a free demo.

You can try it out:

https://computerender.com.

capableweb · on Oct 10, 2022

The page at https://computerender.com/cost.html has the title "How is computerender 4x cheaper than other services hosting Stable Diffusion?" but doesn't actually explain how/why it is cheaper, just that "crowd-sourced servers are much more difficult to work" without elaborating on how what you're doing is different than that.

Care to shine some light on it? Using something like runpod/vast.ai would be my guess?

tehsauce · on Oct 10, 2022

Hi! Your guess is correct. I monitor prices on vast.ai and runpod to get the best possible GPU power per dollar.

I updated the link you mentioned with some of this info!

KaoruAoiShiho · on Oct 10, 2022

Is there anything new here that might interest an existing user of auti's gui to switch?

sophrocyne · on Oct 10, 2022

To be fair- Auto has been acquiring features at an insane clip (recently getting banned from the SD discord for accusations of code theft lol)

I think Invoke is competitive for now, but biggest advantage is an improved UX, and a large community with an ambitious roadmap focused more on enthusiasts/pros.

I’d give it a whirl and see where you end up preferring to do your SD projects :)

KaoruAoiShiho · on Oct 10, 2022

Oh I see that my comment might be interpreted as snarky, I was literally just asking to please list the stuff that are new or different cause that would be very helpful for everyone.

sophrocyne · on Oct 10, 2022

Oh, no snark interpreted! Very valid question.

I legitimately think the answer is - - Experimental/novel SD features, Automatic typically has them first - Better UI/UX and exploration/gallery workflows, Invoke.

From a feature perspective, both have their own flavor of certain features.

Goal of Invoke is to eventually have prompt/concept library, a node-based workflow UI (with the ability to share techniques), etc.

It’s kind of a - switch if you want a better UX now; keep an eye on it if you want new workflow solutions long term.

hleszek · on Oct 10, 2022

Or you can stay on automatic1111 and use https://diffusionui.com/b/automatic1111 for better UX.

hleszek · on Oct 10, 2022

You can use https://diffusionui.com/b/automatic1111 once the automatic1111 webui is running:

- better inpainting

- a gallery to easily compare generations and easily regenerate images with small modifications

- responsive design --> works great on mobile, swipe left/right to switch between pictures in same generation and up/down to switch to another generation to compare

Here is the repo: https://github.com/leszekhanusz/diffusion-ui

Also if you don't have the hardware, you can also get images for free using the Stable Horde (https://stablehorde.net), a cluster of backends provided for free by volunteers.

You can test it here: https://diffusionui.com/b/stable_horde

nobo122 · on Oct 10, 2022

Automatic1111's webui is what 90% of the StableDiffusion community uses but he recently made the decision to use a company's proprietary code after it was leaked by a hacker and when confronted about it, instead of removing it as requested he chose to lie about it despite the git history evidence and the fact that the paper he claimed to have used as reference wasn't related at all to the techniques used by the stolen code.

The company whose code was stolen works closely with the man behind SD and the decision was made to merely ban him from the community instead of torpedo-ing the repo via DMCA.

sdflhasjd · on Oct 10, 2022

This seems to have been proven to not be the case, and the code in-fact originates from a pre-stable-diffusion MIT licensed repo.

It's not even a particularly large or interesting piece of code, the only reason why it's controvertial is because of the code is only necessary to use the NovelAI leaked models.

nobo122 · on Oct 10, 2022

I'd be interested to see that repo since one of the commits to Automatic's github is Anlatan's code while a second commit changes function names and tweaks the code slightly in an apparent effort to disguise it's origin. Most of the discussion about this has been on 4chan where the amounts of misinformation has been staggering before it pretty much devolved into "it doesn't matter if their code was taken, it should have been open source anyway".

fakedcake · on Oct 11, 2022

This is what other people have found over on reddit: https://reddit.com/r/StableDiffusion/comments/xzipjx/automat... According to that comment, this (https://user-images.githubusercontent.com/23345188/194727572...) is allegedly the only reasonable match with the leaked code from NovelAI, but note that I have not verified that myself, and since NovelAI/Stability never said which part they take issue with it is hard to tell. This is the file in that other repo that the code actually seems to originate from https://github.com/lucidrains/perceiver-pytorch/blame/main/p...

As you can see, that repo from 2 years ago even originates the "# attention, what we cannot get enough of" comment and is an exact 1:1 match to Automatics commit, while the one from NovelAI even has a small change in the if clause that Automatic doesn't have.

nobo122 · on Oct 12, 2022

The code that was taken from Anlatan is actually this: https://user-images.githubusercontent.com/23345188/194727441... But that same GitHub issue also shows code from the lucidrains repo (the picture you've linked there) and people have latched onto it saying nothing was taken when the picture I linked is right above it containing code that doesn't exist outside the leaked code and Automatic's fork.

nikkwong · on Oct 10, 2022

One thing is that invoke-ai can be run via CLI or possibly programatically. I haven't found a good way to do that with the automatic GUI. Personally, I've also found features in automatic to be buggy. For example, batching seems to always break the UI for me personally. With the invoke-ai fork, I can run the CLI and produce images all night if I want to.

The bees knees would be being able to use automatic as a CLI or with some programmatic interface, because it is more feature rich. But I haven't seen anything that allows me to do that yet, so I'm stuck to its clunky UI or to use invoke-ai.

cmdr2 · on Oct 11, 2022

Like I mentioned elsewhere, https://github.com/cmdr2/stable-diffusion-ui is pretty popular, and is a 1-click installer for Win and Linux (Mac coming soon). Quite a lot of features, and well-liked by users for its easy-to-install and user-friendly GUI.

nohat · on Oct 10, 2022

I've been using a modified version of lsteins fork since almost the beginning. Recommended! It does lack some of the features of eg automatic1111, but it has good cli, and actually has a license, which is pretty important (as novelai has learned).

Timwi · on Oct 10, 2022

Sounds awesome! Unfortunately, it says that it requires a GPU. Please consider making it accessible to people without a GPU, for example using OpenVino like this (command line only) project does:

https://github.com/bes-dev/stable_diffusion.openvino

Thanks!

hleszek · on Oct 10, 2022

Those who don't have a GPU could use the Stable Horde: https://stablehorde.net

geuis · on Oct 10, 2022

What you're asking for isn't entirely possible for local installs. Yes, you can run SD on a cpu, but each image takes minutes at a time vs seconds via gpu.

For example, it's not possible to run SD on my 2 year old 16 intel MacBook Pro. This is because PyTorch doesn't have support for the slightly older AMD gpu on board. There's a newer framework called RocM for AMD cards that allows them to work with recent versions of PyTorch.

Given all that, the requirements to have a Nvidia card is entirely acceptable, and for the most part a technical requirement.

nullc · on Oct 11, 2022

Minutes isn't really that big a deal though, one could give it a list of prompts to round-robin through and come back in the morning to a huge collection of images to explore. It's just a different workflow.

Ironically, cpu support would be faster for me (in terms of throughput, at least) because I have on the order of a thousand zen cores put only a couple CUDA compatible GPUs with enough ram to run SD.

Timwi · on Oct 12, 2022

> What you're asking for isn't entirely possible for local installs.

The project I linked to does it, so it's clearly possible. I didn't ask for speed, I only ask to be able to run it at all.

cmxch · on Oct 10, 2022

How hard of a requirement is the NVidia graphics chip? Polaris era AMD chips do work decently at the 4gb level (although a bit finicky) and Navi/Big Navi AMD cards work reasonably well with modern ROCm.

pja · on Oct 10, 2022

Stable Diffusion works for me with a Polaris GPU. Had to compile my own local copy of Tensorflow to use it, but everything runs.

cmxch · on Oct 10, 2022

Which documentation/build environment are you using?

I’m using Ubuntu(to follow what AMD has for ROCm) and building the entirety of (gfx803 patched) ROCm from source.

It works with some forks but not others.

pja · on Oct 10, 2022

IIRC it’s just the standard AMD build from https://repo.radeon.com/rocm/apt/5.2.3/ I think.

It’s possible I had to do something weird, but I think it supports Polaris OOB. rocminfo certainly thinks so.

I’m running it on Debian testing, with an equivs package to get it to install cleanly:

    $ cat rocm-equivs 
    Package: amdgpu-driver-fixes
    Provides: python,libstdc++-7-dev,libgcc-7-dev
    Architecture: all
    Description: Fixes the AMD GPU driver installation on Debian testing

I had to compile my own version of pytorch to get gfx803 support there.

I‘ll see if I can recreate the steps & create a runbook.

MayeulC · on Oct 10, 2022

I am in the same boat with a gfx03 card. What patch did you use? The ones here? https://github.com/xuhuisheng/rocm-build

I also tried to compile pytorch with its Vulkan backend, but ended throwing the towel as LDFLAGS are a mess to get right (I successfully compiled it, but that was only part of the build chain, and decided I had better things to spend time on). I wonder how that would perform; ncnn works pretty decently.

cmxch · on Oct 11, 2022

Yes. I went through their build instruction shell scripts with some issues but it did work as below:

It works with an older fork that has img2img / txt2img separated from each other, but nothing else did.

When you built it, you did at least source the env.sh to pick up the build environment that it needed?

pja · on Oct 10, 2022

I did try that, but I don’t think the pytorch Vulkan backend is complete enough. I never did get a working install going down that route though, so I could be wrong.

wasyl · on Oct 10, 2022

I ran some SD fork on Radeon Pro Vega 20 GPU, I'm not familiar with the whole setup but it was running "torch mps backend"? Anyway it was pretty fast and worked well, so I'm a bit surprised at lack of Intel macs support from all those SD forks

iFire · on Oct 10, 2022

Can you make the ui InvokeAI as easy to install as running a Windows 11 command line script?

I couldn't get it to work following https://invoke-ai.github.io/InvokeAI/installation/INSTALL_WI...

Similar to https://github.com/cmdr2/stable-diffusion-ui/releases/tag/v2...

sophrocyne · on Oct 10, 2022

One click install is a goal, we just need a contributor who is confident taking it on as a project.

cmdr2 · on Oct 11, 2022

Hi, I'm the author of the cmdr2 UI and installer (that iFire linked to). I'd be happy to contribute the 1-click installer used by my project. It's a battle-hardened installer (over 100k installations on all kinds of PCs and networks), and I'm finishing up a rewrite in python, so that the installer code is easier to maintain for others.

I'd be happy to submit a PR to your project, if you're interested in using it. I actually got it working with your project a few weeks ago, so I know it works with your repo.

I've opened a github issue as well, so we can talk there if you'd like: https://github.com/invoke-ai/InvokeAI/issues/1042

neilv · on Oct 10, 2022

Nice! lstein is the SD fork that I ended up using, and I'm delighted to see it evolve into InvokeAI and keep getting better.

Uke · on Oct 11, 2022

How good are solutions like stable diffusion at inpainting nowadays? What about the watermarks of getty et at that have been part of some of dall-e 2.0 images. Could one feasably remove such watermarks or stuff like a white grid array with these solutions?

So how convincing are these solutions in the worst case is what i am asking.

lucasfcosta · on Oct 11, 2022

This is much needed. Even for a software engineer like me, it was quite cumbersome to use Stable Diffusion locally without such an UI.

I feel like there's just so much to improve though. Maybe SD is the definitive proof that one single feature can trickle down into many others just by adding good UI on top of it.

sophrocyne · on Oct 11, 2022

Luckily, the team has a pretty jampacked roadmap. This is v1 of the full WebUI.

hda2 · on Oct 11, 2022

What about safety filters? All the safety filters in the SD interfaces/services I used so far are too false-positive happy. Can these filters be disabled or at least toned down in InvokeAI? If so, how easily?

capableweb · on Oct 11, 2022

It ships without any filters.

cmsj · on Oct 10, 2022

Yay! I built an IRC bot for SD using lstein's repo because it was the first one that I could get to work reliably on M1, so I'm really glad to see the process continue really well with InvokeAI!

paulirish · on Oct 10, 2022

PSA: You can email support@github to ask them to "detach my repo as a fork", in case the repo has matured so much it shouldn't have the "forked from …" treatment.

suyash · on Oct 10, 2022

That's all good but it's nice to give credit where credit is due. I like how they do it in the README.

pdntspa · on Oct 10, 2022

Min requirements say 12gb, I take it this doesn't have the optimizations that automatic1111 has for <8gb cards?

teolandon · on Oct 10, 2022

It says 12GB RAM, not VRAM. Right above that it says that it can work on 4GB VRAM cards.

capableweb · on Oct 10, 2022

You can run it with lower VRAM for sure, up until some weeks ago, I was using that repository with a 11GB card.

pja · on Oct 11, 2022

Yeah. Stable diffusion runs fine on my 8gb Polaris card (rx580) & I've heard of forks that will let you run it in 6 or even 4gb VRAM at a small cost in render time.

ionwake · on Oct 11, 2022

I was unable to get this to run on the Mac M1 over the last week - has anyone here had any success?

wokwokwok · on Oct 11, 2022

Yes.

File an issue of it’s not working for you; it’s working fine for me.

(See for example https://github.com/invoke-ai/InvokeAI/issues/1021 ; if you had a previous install, delete it entirely)

ionwake · on Oct 11, 2022

Thank you for the reply , I already did must be my setup.

pdntspa · on Oct 11, 2022

I am super stoked to see all these Stable Diffusion forks floating around, and I don't want to shit on the authors and their work that hard, but I swear the installation and packaging of these things is INSANE.

* Every single one of these seems to be a web UI, when this is desktop software that needs a desktop computer or workstation to run. Have we all collectively forgotten how to program PyGTK?

* Model files always go in the code repo. Have we forgotten how home folders work or what their purpose is? At the very least this one instructs you to make a shortcut/symlink if you don't want to copy the ckpt file yet again

* On that note, everything is autodownloaded to wherever the hell the programmer wants (once again, usually in the code repo itself). I must have four or five different copies of ESRGAN, and I spent a bunch of time monkeying around with automatic1111's fork trying to get it to correctly see everything when I ripped out the models folder and symlinked one in from a different place on my hard drive.

To the authors: can you all please get together and standardize some of this stuff? Models should go in user's homefolders, or at a customizable location, and NOT within the scope of stuff that can be touched by git pull. (Doing so causes git to freak out in many circumstances)

The breakneck pace of innovation here is awesome, but it feels like all gas no brakes on the usability front.

In the Bad Old Days(tm) you ran an install script which generates a desktop icon and you click that to run it. Meanwhile with this, on Windows, one has to open an anaconda prompt, activate the anaconda venv (or whatever it is), then manually invoke the whole thing with 'python scripts/invoke.py --web'. And if there's a one-click install script included (which invoke doesn't, but I am not knocking it for this!), half the time they seem to try and pull down the entire world all over again (a la sd-webui).

Like I get this need to make it easy to use, but it's like c'mon, there's is existing convention for all these things. Folks, please follow it!

If I had a wishlist, or the wherewithal to fork my own version, it would have:

* an actual GUI made with an actual windowing toolkit. I don't know why the hell everyone is so afraid of GTK, but I would use that. pyGTK is pretty simple IME, you can even read the C++ docs and it all maps over really nice to python. It doesn't need to be pretty!

* configurable model locations, preferably in an agreed-upon standardized hierarchy

* a standardized way of embedding prompt data into the PNG, a la automatic1111

* an uncomplicated but not overly optimistic setup process. An install.py and run.py, both with sensible defaults so that you don't need any command-line switches to run it except for special circumstances, and if it wants to autodownload updates then CHECK WITH ME FIRST! And preferably one that doesn't try to move my entire world (heres looking at you, sd-webui). And it will load the venv/conda environment for me.

And yes, for all the "put your money where your mouth is", I've been thinking about forking. But I don't know if I have the time or energy to keep up with all the developments in this space. But hey you never know...

nullc · on Oct 11, 2022

Gotta love the tools that edit your bashrc! I'm not sure if that's worse or the mystery meat background auto-downloading. 0_o

Once the pace of innovation slows down I'm sure we'll see more effort from people with traditional software engineering experience come in to clean things up.

pdntspa · on Oct 11, 2022

Oh god don't even get me started on that....

sophrocyne · on Oct 11, 2022

- A web app is more versatile, as many users are running this and then accessing the client via other laptops & devices in the house. Understand your point, but not a priority. We do have a GUI mode that runs it in flask, but I don't think that's what you're getting at ;)

- 1 Click Install & Run is in the works to make this easier to install. We agree.

- Model locations are a valid point - We're working on being able to hot swap models mid-session, so I'll bring this up into the convo.

- Invoke has aligned on its own metadata structure for the ability to easily pull those parameters into future invocations. We're not worried about compatibility with Automatic.

No need to fork - Just join us on discord and complain loudly until we make things better. :)

pdntspa · on Oct 11, 2022

I wasn't expecting this list of complaints to be read so positively. So, props for that. Might just see you guys on discord...

drawingthesun · on Oct 11, 2022

If you're using Mac m1 DiffusionBee is a one click app install that I've been using to generate high quality renders in seconds.

The devs recently added img2img support too.

I had tried to install the more complete versions but I just end up in a wormhole of python and conda errors.

nl · on Oct 11, 2022

FYI, the HuggingFace diffusers does the download of models sensibly.

It's probably worth following that.

gernb · on Oct 11, 2022

This is great but it requires lots of "geek" (installing dependencies, borking your system with brew, etc...)

Vs DiffusionBee which just works

https://diffusionbee.com/

Maybe the two projects can merge?