Hacker News new | past | comments | ask | show | jobs | submit login
Stable Diffusion with Core ML on Apple Silicon (machinelearning.apple.com)
723 points by 2bit on Dec 1, 2022 | hide | past | favorite | 178 comments



How come you always have to install some version of pytorch or tensor flow to run these ml models? When I'm only doing inference shouldn't there be easier ways of doing that, with automatic hardware selection etc. Why aren't models distributed in a standard format like onnx, and inference on different platforms solved once per platform?


>How come you always have to install some version of pytorch or tensor flow to run these ml models?

The repo is aimed at developers and has two parts. The first adapts the ML model to run on Apple Silicon (CPU, GPU, Neural Engine), and the second allows you to easily add Stable Diffusion functionality to your own app.

If you just want an end user app, those already exist, but now it will be easier to make ones that take advantage of Apple's dedicated ML hardware as well as the CPU and GPU.

>This repository comprises:

    python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python

    StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion
https://github.com/apple/ml-stable-diffusion


That's done in professional contexts, when you only care about inference onnxruntime does the job well (including for coreml [1]).

I imagine that here apple wants to highlight a more research/interactive use, for example to allow fine tuning SD on a few samples from a particular domain (a popular customization).

[1] https://onnxruntime.ai/docs/execution-providers/CoreML-Execu...


Most models seem to be distributed by/for researchers and industry professionals. Stable Diffusion is state of the art technology, for example.

People who can't get the models to work by themselves given the source code aren't the target audience. There are other projects, though, that do distribute quick and easy scripts and tools to run these models.

Apple stepping in to get Stable Diffusion working on their platform is probably an attempt to get people to take their ML hardware more seriously. I read this more like "look, ma, no CUDA!" than "Mac users can easily use SD now". This module seemed to be designed so that the upstream SD code can easily be ported back to macOS without special tricks.


Seconded, I wish for a way to work with ML models using native code rather than through some Python scripting interface. I believe TensorFlow is there with C++, but it works only with C++ and not through FFI.


It would increase my interest in experimenting with these models 1000% at the least. I really can't be bothered to spend hours fucking around with pip/pipenv/poetry/virtualenv/anaconda/god knows what other flavour of the month package manager is in use. I just want to clone it and run it, like a Go project. I don't want to download some files from a random website and move them into a special directory in the repo only created after running a script with special flags or some bullshit. I want to clone and run.


It's one of the reasons I recently ported the Whisper model to plain C/C++. You just clone the repo, run `make [model]` and you are ready to go. No Python, no frameworks, no packages - plain and simple.

https://github.com/ggerganov/whisper.cpp


PyTorch has libtorch as its purely native library. There are also Rust bindings for libtorch:

https://github.com/LaurentMazare/tch-rs

I used this in the past to make a transformer-based syntax annotator. Fully in Rust, no Python required:

https://github.com/tensordot/syntaxdot


If you are okay with using nvidia-ecosystem, check out tensor rt.


Apple has their own mlmodel format but they can’t distribute this model as a direct download due to the models EULA. The first task is to translate the model.


What part of the SD license prohibits that?


No part of it.


I mean, it is a legal time bomb in general[0], with a non-standard license that has special stipulations in an amendment. Do you really incur the weeks of lead time that it would take Legal to review the legality of redistributing this model?

0: https://github.com/CompVis/stable-diffusion/blob/main/LICENS...


Redistributing that model to end users that violate Attachment A seems like a minefield.


Not really. You're not responsible for how users use products you distribute. The license is passed along to them, they would be the ones violating it.


Are you an attorney?


No, I'm not. If you have supporting precedent for your position (that a licensor can be held liable for the unpreventable actions of a licensee) I would like to see it.


In the professional context (apart of individual apps distributed by small creators / indiehackers) usually models are run using standardized runtimes in native code (C++ usually), using runtimes TensorRT (for Nvidia Devices), onnxruntime (agnostic), etc.


DiffusionBee is an app that is completely self-contained and lets you play with this stuff completely trivially, no installs required.

https://diffusionbee.com/


But it's not optimised to work with Apple's CoreML (yet), isn't it?


It's pretty fast. On an 8GB M2 MacBook Air, it produces more than 2 images per minute using the default settings.

E.g., it's about 20x as fast as InvokeAI, which doesn't have an FP16 option that works on a Mac.


I don't know, but it seems extremely fast.


If you want it and it doesn't exist, why not simply do it yourself? It's open source no?


Atila from Apple on the expected performance:

> For distilled StableDiffusion 2 which requires 1 to 4 iterations instead of 50, the same M2 device should generate an image in <<1 second

https://twitter.com/atiorh/status/1598399408160342039


With the full 50 iterations it appears to be about 30s on M1.

They have some benchmarks on the github repo: https://github.com/apple/ml-stable-diffusion

For reference, previously I was getting about <3 minutes for 50 iterations on my Macbook Air M1. I haven't yet tried Apple's implementation but it looks like a huge improvement. It might take it from "possible" to "usable".


Yeah, it is just PyTorch MPS backend is not fully baked and have some slowness. You should be able to get close to that number with maple-diffusion (probably 10% slower) or my app: https://drawthings.ai/ (probably around 20% slower, but it supports samplers that takes less steps (50 -> 30)).


For comparison, it's also taking ~3min @ 50 iterations on my 12c Threadripper using OpenVino. It sounds like the improvements bring the M1 performance roughly in line with a GTX 1080.


The Apple Neural Engine in the m1 is supposed to be able to perform 11 tops. The GTX 1080 about 9-11 tflops.

So sounds plausible that the m1 can reach the same level in some use cases with the right optimizations.


I have Macbook Air M1, which is passively cooled. When cooled properly, that is thermal pad mod combined with a fan under the laptop, I'm getting closer to 2min - something like 2.8s per iteration. I guess it would be something 140s for 50 iterations on a MacBook Pro or Mac mini for M1.


This is accurate re: M1 Mac Mini times IME


Not SD2.0 but SD1.5, I am getting 30 iterations in 10 seconds on 1080ti. 50 iterations 18 seconds. 100%|| 30/30 [00:10<00:00, 2.84it/s]


How do dreamstudio/craiyon/hugging face manage to do seemingly quicker on their interfaces? Are they hosting these models on super beefy and costly GPUs for free?


M1's single-threaded CPU performance and power efficiency are exceptional; however M1's GPU performance is nothing special compared to normal discrete GPUs. You don't need something super beefy to beat M1 on the GPU side.

But also yes, it's gotta be expensive to host these models and I'm not sure where all these subsidies are coming from. I expect that we'll eventually see these things transition to more paid services.


For a low-power SoC, the GPU performance is actually pretty impressive. We recently did some transformer benchmarks and the inference performance of the M1 Max is almost half that of an RTX3090:

https://explosion.ai/blog/metal-performance-shaders

However the SoC only uses 31W when posting that performance.


Haven't tried this yet, but sounds slower than SD itself if you use one of the alt builds that supports mps where it had been cuda.

Mac Studio with M1 Ultra gets 3.3 iters/sec for me.

MacBook Pro M1 Max gets 2.8 iters/sec for me.


You’re talking about the higher end SKUs with many more GPU cores though and significantly more RAM (I think the lowest you can get is 32GB vs the 8 on their chip)


If you told me this was possible when I bought an M1 Pro less than a year ago, I wouldn’t believe you. This is insane.


Agreed.

And the posted benchmarks for the M2 Macbook Air make me consider 'upgrading' to an Air.


That laptop feels like liquid power. It's uncanny.

Macbook Airs (way back when) felt sluggish. The MBA M1 changed that, it was "fine". These M2s are unexpectedly responsive on an ongoing basis.

The MacBook Pro M1 Max is great (would be fantastic except they lost a Thunderbolt port in favor of legacy HDMI and memory card jacks), but you expect that machine to be responsive, so it's less surprising.

The Studio Ultra, though, never slows down for anything.

Still, if the Air could drive two external screens instead of one, I'd "downgrade" from the Max.


I'd give the M1 air more credit - I moved from a 2019 16" Pro to the Air and performance was nearly identical except for long running tasks (> 10 minutes.) So for mobile app builds, it was blazing fast. And in the meantime the intel machine was blaring fans after the first 30 seconds while the Air barely got warm.And then the real kicker was watching the battery on the intel machine visibly dropping a few percentage points, while the air sits at the same level the whole time.

I've since moved to the M2 air, and it is noticeably faster than M1, but it isn't the huge leap from last gen intel that the M1 was. But the hardware itself feels way better.


I dont like lack of open source drivers, but honestly for work DisplayLink works just fine on MacOS. E.g I used 4 monitors on M1 Air using DisplayLink:

* Air built-in display

* 2K display connected via USB-C -> DisplayPort adapter

* Two more 2K displays of same model via DisplayLink connected via USB hub

For all practical means it's almost impossible to see any DisplayLink compression artifacts even in most of games.

PS: Each adapter cost me $40:

https://www.amazon.com/gp/product/B08HN2X88P/


Appreciate this reply, TY for sharing the exact product that's working for you!

Been nervous to dip into it, given the architecture change and last year's challenges with display link docks.

// UPDATE: Oops, looking at the product, I see I should have specified: 4K screens or higher. About half our desks are 2 x 4K, about half 2 x 5K, except the Air M1 folks who are 1 x 5K.


Sadly I can only report it working on 2560x1440. Even though lower resolution is specified on Amazon.

For higher resolution some other solution is required.


Last nail in the coffin for DALL·E.


Not really, everyone will have their own flavor on how to rapidly train the model.

Dall-e et. al will still be able to bandwagon off of all the free ecosystem being built around the $10M SD1.4 model that is showing what is possible.

E.g. Dall-e could go straight to Hollywood if their model training works better than SD’s. The toolsets will work


source for the $10m number? i havent heard that one before, everyone just keeps parrotting the 600k single run number that is obviously misleading


yeah, finally we see the real openAI


more open than open source, it's the open model age


The true metric contains the output quality of the image, not just the speed. DALL-E output is, generally, much better for things that aren't standard looking.


If that's the metric, MidJourney --v 4 --q 2 is the leader, and it's not close.


I think they can move upmarket just as well as anyone else.


SD2 is the one that was neutered, right?

Maybe a dumb question but can the old model still be run?


It's less versatile out of the box. Give it a couple months for the community to catch up. Everyone is still figuring out what goes where, and SD 1.x was "everything goes in one spot." It was cool and powerful, but limited.


You can still do nice things with SD2, it just requires a different approach. https://news.ycombinator.com/item?id=33780543


Also, can you not "upgrade" but still run new models?


You can do anything you want.

SD2 wasn’t “neutered”, the piece of it from OpenAI that knew a lot of artist names but wasn’t reproduceable was replaced with a new one from Stability that doesn’t. You can fine-tune anything you want back in.


The training-set was nerfed really good as well, it wasn't just OpenCLIP that was replaced. They will successively re-admit more training data during the 2.x releases I guess.


Yes, they removed some NSFW which might've hurt it, but releasing models that can generate CP /will/ get you in legal trouble.

The "in the style of Greg Rutkowski" prompts from SD1 though, IIRC, were thought to be proof it was reproducing the training set. But it actually only saw ~27 images of his, and the rest was residual biases from CLIP.


Note that this is extrapolation for the distilled model which isn't released quite yet. (but it will be very exciting when it does!)


i'm very ignorant here so forgive me but if it can generate images that fast can it be used to generate a video?


There are different requirements for generating video -- at a minimum, continuity is tough. There are models for producing video, but (as far as I've seen) they're still a bit wobbly.


Yeah, sure. The issue is with temporal consistency. Meta and Google have some successes in that area.

https://mezha.media/en/2022/10/06/google-is-working-on-image...

Give it some time and SD will be able to do the same.


They already do, with varying levels of performance and success.

See deforum[1] and andreasjansson‘s stable-diffusion-animation[2]

[1]: https://deforum.github.io/

[2]: https://replicate.com/andreasjansson/stable-diffusion-animat...


Video is really a series of frames, the framerate for film/human can get away with 24 frames/second-- so maybe ~40ms/image for real-time at least?

What's cool about the era in which we live is if you look at high-performance graphics for games or simulations, for instance, it may in fact be faster to a the model to "enhance" a low-resolution frame rather than trying to render it fully on the machine.

ex. AMD's FSR vs NVIDIA DLSS

- AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

- NVIDIA DLSS (Deep Learning Super Sampling): jhttps://www.nvidia.com/en-us/geforce/technologies/dlss/

AMD's approach renders the game at a crummy, low-detail resolution then each frame uses "upscales"

Both FSR and DLSS aim to improve frames-per-second in games by rendering them below your monitor’s native resolution, then upscaling them to make up the difference in sharpness. Currently, FSR uses spatial upscaling, meaning it only applies its upscaling algorithm to one frame at a time. Temporal upscalers, like DLSS, can compare multiple frames at once, to reconstruct a more finely-detailed image that both more closely resembles native res and can better handle motion. DLSS specifically uses the machine learning capabilities of GeForce RTX graphics cards to process all that data in (more or less) real time.

Video is really a series of frames, the framerate for film/human could get away with 24 frames/second-- ~40ms/image for real-time.

What's cool about the era in which we live is if you look at high-performance graphics for games or simulations, it may in fact be faster to run the model on each frame to "enhance" a low-resolution frame rather than trying to render it fully on the machine.

ex. AMD's FSR vs NVIDIA DLSS

- AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

- NVIDIA DLSS (Deep Learning Super Sampling): https://www.nvidia.com/en-us/geforce/technologies/dlss/

AMD's approach renders the game at a crummy, low-detail resolution then use "spatial upscaling" to enhance the images one frame at a time.

NVIDIA DLSS uses "temporal upscaling" to pass over multiple frames and uses other capabilities exclusive to Nvidia's cards to stitch together the frames.

This is a different challenge than generating the content from scratch

I don't think this is possible in real-time yet, but someone put a filter trained on the German country side to produce photorealistic Grand Theft Auto driving gameplay:

https://www.youtube.com/watch?v=P1IcaBn3ej0

Notice the mountains in the background go from Southern California brown to lush green

https://www.rockpapershotgun.com/amd-fsr-20-is-a-more-demand....


FSR 2.0 also uses temporal information and movement vectors to upscale, for what it's worth. DLSS 2.0 also renders at a lower resolution and upscales it. DLSS 3.0 frame generation is interesting, in that it holds "back" a frame and generates an extra one in between frame 1 and frame 2, allowing you to boost perceived frame rate massively, at the cost of some artifacting right now.


You can generate video a lot more efficiently than frame by frame. For example, you can generate every other frame and use something like DLSS 3.0 to fill in the missing ones.


There's also https://draw.nnc.ai/ - which is an iOS / iPad app running Stable Diffusion.

The author has a detailed blogpost outlining how he modified the model to use Metal on iOS devices. https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-mo...


Yeah, that's what immediately came to mind for me as well. I don't know how similar/different the two solutions are, but it made me smile a bit that what Apple is showing off here has been already done by a single independent developer :)


I think it's sad that Apple doesn't even give attribution to any of the authors. If you copy the Bibtex from this site, the Author field is just empty. Their names are also not mentioned anywhere on this site.

This site is purely a marketing effort.


This is about an update to macOS and iOS. Are the 'authors' of macOS updates normally credited? Authors are credited on other papers published on this site that aren't just about OS updates.


Is it standard for Apple to attribute authors in the Bibtex? Or do they usually leave it empty?


> I think it's sad that Apple doesn't even give attribution to any of the authors.

Pretty much like Stable Diffusion and the grifters using it in general and they will never credit the artists and images that they stole to generate these images.


This is sort of like if you learned English from reading a book and the author said they owned all your English sentences after that.

Of course you can see the original images (https://rom1504.github.io/clip-retrieval/), it was legal to collect them (they used robots.txt for consent just like Google Image Search) and it was legal to do this with them (but not using US legal principles since it's made in Germany).

"Crediting the artist" isn't a legal principle - it's more like some kind of social media standard which is enforced by random amateur artists yelling at you if you don't do it. It's both impossible (there are no original artists for a given output) and wouldn't do anything to help the main social issue (future artists having their jobs taken by AIs).


The artist(s) are normally cited. Just download any Stable Diffusion -made image and look at the PNG info / metadata and you'll see "Greg Rutkowski" (lol) in the prompt.


That proves nothing except someone decided to say his name to an AI. He basically isn’t in SD’s training set! You can look it up.

It seems that he works coincidentally because CLIP associates his name with concept art.


> future artists having their jobs taken by AIs

that's simply not going to happen. as in every technological development so far, this is just another tool.

1) artists create the styles out of thin air

2) artists create the images out of thin air

3) computers are just collectors of this data and do not actually originate anything new. they are just very clever copycats.

you're looking at an artist tool more than anything. sure, it's an unconventional one and a threatening one, but that's been true of literally every technological development since the Industrial Revolution.


4) If computers get good enough at 1) or 2), then there'd be much bigger problems, and essentially all humans will become the starving artists.

Also, I'm not so sure that language models like SD, Imagen, GPT-3, PaLM are purely copycats. And I'm not so sure that most human artists are not mostly copycats either.

My suspicion is that there's much more overlap between how these models work and what artists do (and how humans think in general), but that we elevate creative work so much that it's difficult to admit the size of the overlap. The reason why I lean this way is because of the supposed role of language in the evolution of human cognition (https://en.m.wikipedia.org/wiki/Origin_of_language)

And the reason I'm not certain that the NN-based models are purely copycats is they have internal state; they can and do perform computations, invent algorithms, and can almost perform "reasoning". I'm very much a layperson but I found this "chains of thought" approach (https://ai.googleblog.com/2022/05/language-models-perform-re...) very interesting, where the reasoning task given to the model is much more explicit. My guess is that some iterative construction like this will be the way the reasoning ability of language/image models will improve.

But at a high level, the only thing we humans have going for us is the anthropic principle. Hopefully there's some magic going on in our brains that's so complicated and unlikely that no one will ever figure out how it works.

BTW, I am a layperson. I am just curious when we will all be killed off by our robot overlords.


> and essentially all humans will become the starving artists

all of these assumptions miss something so huge that it surprises me that so many miss it: WHO is doing the art purchasing? WHO is evaluating the "value" of... well... anything, really? It is us. Humans. Machines can't value anything properly (example: Find an algorithm that can spot, or create, the next music hit, BEFORE any humans hear it). Only humans can, because "things" (such as artistic works, which are barely even "things", much more like "arbitrary forms" when considered objectively/from the universe's perspective) only have meaning and value to US.

> when we will all be killed off by our robot overlords

We won't. Not unless those robots are directed or programmed by humans who have passionate, malicious intent. Because machines don't have will, don't have need, and don't have passion. Put bluntly and somewhat sentimentally, machines don't have love (or hate), except that which is simulated or given by a human. So it's always ultimately the human's "fault".


>who is purchasing art

Mostly money launderers, I've heard.

>we won't be killed off by AGI because humans don't have malicious intent

I wouldn't say malice is necessary. It's just economics. Humans are lazy, inefficient GI that farts. The only reason the global economy feeds 8 billion of us is that we are the best, cheapest (and only) GI.


If we manage to create life capable of doing 1) and 2) but also capable of self-improvement and self-design of their intelligence I think what we've just done is created the next step in the universe understanding itself, which is a good thing. Bacteria didn't panic when multi-cellular life evolved. Bacteria is still around, it's just a thriving part of a more complex system.

At some point biological humans will either merge with their technology or stop being the forefront of intelligence in our little corner of the universe. Either of those is perfectly acceptable as far as I am concerned and hopefully one or both of them come to pass. The only way they don't IMO is if we manage to exterminate ourselves first.


Bacteria obviously lack the capacity to panic about the emergence of multicellular life.

A vast number of species are no longer around, and we are relatively unusual in being a species that can even contemplate its own demise, so it's entirely reasonable that we would think about and be potentially concerned about our own technological creations supplanting us, possibly maliciously.


Artist most definitely don't create images/styles out of thin air. No human can creatively create anything out of thin air.


I think humans do in fact create things out of 'thin air' - but only in very, very small pieces. What we consider to be an absolute genius is typically a person who has made one small original thought and applied it to what already exists to make something different.


Creating something novel is not even remotely the same as creating something out of thin air. Even the genius with an original thought only could come by that thought by being informed through their life experiences. Not unlike an AI training set allowing an AI to create something novel.


Is creation coming about by analysis of life experience somehow different from creation coming about by analysis of training data?


Yes, because it's multimodal, and because you can think of new things to look at and go out in the world to look at them.


Humans have access to much better thinking abilities than art AIs do.

e.g. SD2 prompted with "not a cat" produces a cat, and "1 + 1" doesn't produce "2".


Some stable diffusion interfaces let you specify a "negative input" which will bias results away from it. It wouldn't be terribly hard to do some semantic interpretation prior to submission to the model that would turn "not a <thing>" into "negate-input <thing>"


> that's simply not going to happen.

I don't think it will either, but artists think it will, so it's strange that their proposed solution "credit the original artists behind AI models" won't solve the problem they have with it.


> > future artists having their jobs taken by AIs > that's simply not going to happen

It will indeed happen, though not to all artists.

> as in every technological development so far, this is just another tool.

Just like every other tool, it changes things, and not everyone wants to change. Those who embrace the new tech are more likely to thrive. Those who don't, less likely.

> 1) artists create the styles out of thin air > 2) artists create the images out of thin air

I understand what you're saying, but as an artist, I can't agree. No artist lives in total isolation. No artist creates images out of thin air. Those who claim to are lying, or just don't realize how they're influenced.

How artists are influenced varies, obviously, but for me I think that however I've been influenced, that influence impacts my output similarly to how the latest generation of AI driven image generation works.

I'm influenced by the collective creative output of every artist who's stuff I've seen. An AI tool is influenced by its model. I don't see a lot differences there, conceptually speaking. There are obvious differences about human experience, model training, bias, etc, but that's a much larger conversation. Those differences do matter, but I don't think they matter enough to change my stance conceptually they work the same in terms of leveraging "influence" to create something unique.

> 3) computers are just collectors of this data and do not actually originate anything new. they are just very clever copycats.

Stable Diffusion does a pretty damn good job of mixing artistic styles to the point where I have no problem disagreeing with you here. It comes as close to originating something new as humans do. You could argue about how it does it disqualifies its output as "origination", but those same arguments would be just as effective at disqualifying humans for the same reasons.

That all said, I agree with you that the tech is a disruptive tool. It's a threat the same way that cameras were a threat to portrait artists, or Autocad for architects, or CNC machines for machinists might be a threat. The idea that new tech doesn't take jobs is naive - it always does. But it doesn't always completely eliminate those jobs. Those who adapt and leverage and take advantage of the new tools can still survive and thrive. Those who reject the new tech might not. Some might find a niche in using "old" techniques (which in away still leverages the new tech - as a marketing/differentiation strategy).

For me, I've been using Stable Diffusion a lot lately as a tool for creating my own art. It's an incredibly useful tool for sketching out ideas, playing with color, lighting, and composition.


Adam Neely discusses this from the standpoint of music: https://www.youtube.com/watch?v=MAFUdIZnI5o He equates it to sandwich recipes: there are only so many ways to make a sandwich, and it's silly to think of copyrighting "a ham sandwich with gouda, lettuce, and dijon mustard."


> Of course you can see the original images (https://rom1504.github.io/clip-retrieval/)

In fairness this is an obscure Github page that <0.001% of people will be aware of. If creators of all these AI generating tools sat down and thought of consequences the author's names could have been watermarked by default and the license required to keep it unless allowed by the author for for example.

there clearly was no thought around mitigating any of these problems and we are having what we are having now with the storm around "robots taking artist's jobs" which they may (at least for some 90% of "artists" who are just rehashing existing styles) or may not, only time will tell.


Do your point is that Apple and those grifters are equally reputable?

two wrongs don't make a right.


I'm neither defending Apple or the grifters using Stable Diffusion in my comment. Both are as bad as each other, giving no attribution or credit.



Oh gosh that's an intimidating installation process. I'll be much more interested when I can just `brew install` a binary.


A bit different take is DiffusionBee, if you're curious to try it out in a GUI form.

https://diffusionbee.com


I’ve used this a fair amount but am not sure it’s much better place to begin than automatic1111, especially for the HN crowd.


automatic1111 does have an M1 workaround in the wiki, but it is incorrect

it's correct enough that if you know your way around a CLI, git, and package management you can figure it out


It sucks to have to figure it out, any person who figures it out should submit a PR on the very outdated Apple Silicon readme.


you cant send a PR on a wiki right?

also wonder if anyone did a blogpost yet


On the one hand, I appreciate the attempt to bring this stuff into the realm of "double click to run" boneheads like me, but on the other hand, I really despise Electron apps when they're multi-platform, where such use is somewhat understandable if still despicable. For a Mac-only app to use Electron… Why do they hate us so?


I'm baffled by continued hate on Electron. The option isn't between Electron and a lean OS-native application, but between Electron and nothing.

I can build an Electron app in under a day with a pretty UI. It would take me several months to get anything sensible that is OS native. And I'm not going to sit down and learn the alternative.

So please just say "thank you" to the developers that are sharing free things with you.


I would argue that shipping bad software is worse than shipping no software at all, yes. And it's impossible not to create bad software when you start with "it runs in a web browser, but it's not a web page." I say this as a web developer with over fifteen years of professional experience.

Worst of all is the shamelessness, though. Don't Electron developers feel ashamed when they ship their products? Or have their brains been so muddled by this "JavaScript everywhere" mentality that they don't realize it's bad? Will future generations even know what a native application is anymore?

This program suggests quitting other applications while it runs. Maybe that wouldn't be so necessary if it wasn't using a framework which needs like 2GB of memory before it can draw a window.

I note that my OP hasn't been downvoted into oblivion as most of my critical HN posts are. I think there's at least a significant silent minority who agree with me on this one.


Developers just like any other inventive field have to balance time and work towards a good product.

Just because the app is written using a chromium framework does not necessarily mean that it's written poorly, VS code is a great example of a fast performance application written in electron.

I don't know where you're getting 2 GB of required memory but if you spin up an electron app it's rare that it requires more than 100 if it's not doing anything.

If you knew anything about these types of stable diffusion interfaces you know that they basically have to load the entire model into memory so that's likely where the multiple gigabytes is coming from.

A lot of us got into development work because we want to create new things, you sound more like the person who spends 99% of their time endlessly optimizing the game engine without actually remembering to build a compelling game experience.

You're getting downvoted because your arrogant tone makes you sound like an insufferable bore.


While I agree the posters comment felt entitled, it should be possible to pick up and make a SwiftUI version of the app fairly quickly.

I assume the developer went for electron due to familiarity, but it would be a pretty good exercise for someone to port it to SwiftUI and native Swift for the front end.

I would do it myself but sadly am bound by other clauses.


It reminds me "Teach Yourself C++ in 21 days". You just need to quickly learn Swift (which you will use exactly nowhere after this task).

It's astonishing how ungrateful people are. Even writing documentation for the software is quite a time-consuming action - writing the software itself is much more time-consuming.

So you are looking at some free software, that gives you the ability to play with StableDiffusion in 2 clicks, has a wide range of features and settings, surely required a ton of time to implement, and you arrogantly saying “pff, an Electron app...”


I think you completely misunderstood what I was saying.

I wasn’t saying that the author of DiffusionBee should make a SwiftUI application. In fact I said the opposite in that I agree that the person who expected a native app is entitled.

I was however refuting the person I was responding to who said making a native app is a huge undertaking, because learning SwiftUI is fairly quick. That’s not to say that the maintainer should learn it but just that it’s fairly quick to learn should someone else want to.

I was also saying that someone (maybe someone other than the maintainer of DiffusionBee) could contribute a SwiftUI front end.

Finally I was saying I would gladly contribute it myself if I could (but unfortunately have other reasons why I can’t)

anyway hopefully that clears things up, and that hostility from your post is unwarranted.


It is still kind of toxicity: “cool, you did it, but you could do it better - I could do it better, just out of time”.

Don't be toxic to don’t get that hostility.


That’s not at all what I’m saying, in fact you keep trying to infer the opposite of what I’m saying, and now you’re just doubling down.

If anything you’re the one being toxic because you’re unable to have a reasonable conversation about a misunderstanding, and are instead trying to put words in my virtual mouth to conform to your outrage.


If you feel that I’m putting words into your mouth - I’m sorry about that, it was not intended.


does it use the optimised model for Apple chips?


Not yet, likely, but the project is very active. I could see it coming quite soon.


I just tested that app and it was taking about 1s/it using the "Double quality, double time" version. Spat out quite nice images at 25 iterations. Way better than stuff I had tried before which looked worse after a minute than this generates in 25 seconds.


Let's give it a few days and someone will have something semi-automatic ready


> Oh gosh that's an intimidating installation process

I'm not seeing any installation instructions on either link - what am I missing?


All I had to do was:

- create a virtual environment (Python 3.8.15 worked best)

- upgrade pip

- pip install wheel

- pip install -r requirements.txt

- and then, python setup.py install

- Had to update my XCode to use the generated mlpackage files :/

- Expand drawer with instructions and follow them to download model and convert it to Core ML format

- Run their CLI command as mentioned


> Had to update my XCode to use the generated mlpackage files :/

I keep running into this, message is

  RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".
I upgraded XCode, tried re-installing the command line tools with various invocations of `sudo rm -rf /Library/Developer/CommandLineTools ; xcode-select --install` etc but still get the above message

(thanks in advance, in case you see this and reply)

edit: I see from https://github.com/apple/ml-stable-diffusion/issues/7 that somebody upgraded to macos 13.0.1 and that fixed the issue for them. I've put off upgrading to Ventura so far and don't want to upgrade just to mess around with stable diffusion on m1, if it can be avoided.


I'm past the edit window, but: I'm a dope, I didn't see the quite clear "macos 13 or newer" requirement.


Where did you get those instructions from? Is creating a virtual environment necessary if I'm fine with it running on my real system?

I assume the environment part is what the "conda" commands on the GitHub repo readme are doing, but finding "conda" to install seems to be its own process. It's not on MacPorts, pip seems to only install a Python package instead of an executable, and getting a package from some other site feels sketchy.

What is it with ML and Python, anyway? Why is this amazing new technology being shrouded in an ecosystem and language which… well, I guess if I can't say anything nice…


Conda's actually a pretty well respected python distribution package manager from Anaconda.com (see e.g. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)). Anaconda has a lot of the standard scientific python computing packages in addition to a virtual environment and package manager or you could use Miniconda version for just the conda package manager + virtualenv.

I think whether you need a virtualenv depends on your system python version and compatibility of any of the dependencies, but it's also pretty nice to be able to spin up or blow away envs without bloating your main python directory or worrying that you're overwriting dependencies for a different project.


> finding conda to install seems to be its own process

    brew install miniconda
brew comes from:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Don't take my word for it, visit https://brew.sh.


I’m still stubbornly using MacPorts. If it ain’t broke…

But given that the entire world of technical documentation assumes all technically-inclined people using Macs are using Homebrew, I’ll probably have to give up and switch over at some point. But not yet.


Grew up on BSD so I feel you. I'd say it became time to give in after they cleaned up need for su and put everything in opt.

In fact, if you used it before they cleaned all that up, or used it before moving from Intel to ARM and did a restore to the new arch instead of fresh install, it's worth doing a brew dump to a Brewfile, uninstalling ALL packages and brew, and reinstalling fresh on this side of the permissions and path cleanups.

- Migrate Homebrew from Intel Macs to Apple Silicon Macs:

- https://sparanoid.blog/749577


They are basically conventions for Python but the actual instructions I just found are unexpanded in the README on the GitHub repo. You have to run one of the commands which downloads the model and converts it for you to Core ML. If you've never used Hugging Face, you'll need to create an account to get a token and then use their CLI to login with the token to be able to download the model. Then you can run prompts from CLI with the commands they give.


+1


Where are you seeing the installation process?


I could be wrong but I think part of the issue is this needs some large files for the trained dataset?


I’ve been using InvokeAI: https://github.com/invoke-ai/InvokeAI

Great support for M1, basically since the beginning. The install is painless.

Release video for InvokeAI 2.2: https://www.youtube.com/watch?v=hIYBfDtKaus


Great stuff. I like that they give directions for both Swift and Python

This gets you text descriptions to images.

I have seen models that given a picture, then generate similar pictures. I want this because while I have many pictures of my grandmothers, I only have a couple of pictures of my grandfathers and it would be nice to generate a few more.

Core ML is so well done. A year ago I wrote a book on Swift AI and used Core ML in several examples.


That’s DreamBooth. There are some services that will do it for you.


Thanks!


I’m making one of those services, if you are interested, please reach me at my email. I would like to know what you have in mind regarding your grandmothers


Man, this takes a ton of room to do the CoreML conversions - ran out of space doing the unet conversion even though I started with 25GB free. Going on a delete spree to get it up to 50GB free before trying again.


All hail Grand Perspective back in the day, not sure who is carrying the "what's wasting my disk space" torch for free these days.

Edit: still alive! https://grandperspectiv.sourceforge.net/


I suspect it was virtual memory - the CoreML conversion progress was at 32Gi at one point and there's only 16GB in this laptop. That would explain why it was consuming 30Gi+ of disk space when the output CoreML models only totalled 2.5Gi.


Just used this again on 3 different computers, including mine. Works fantastically still.

Found a >100GB accidental “livestream” recording on one computer. Would have taken forever to find what was taking up all the room otherwise.


ncdu is the best in my book. TUI, supports deletion of files and folders, and very simple to understand.

GUI apps for this task like GP and the like are more visually complex than they need to be.


OmniDiskSweeper is a GUI that isn’t complex.


Good point!

One gotcha for me is ncdu2 going Zig and Zig dropping support for OS versions as Apple does.


How much space do you have and how much do you try to keep free? I get freaked out if I have less than 400gb free.


    /dev/disk3s5  926Gi  857Gi   52Gi    95% 8067489 540828800    1%   /System/Volumes/Data
It normally hovers around 30-35Gi free.


For the uninitiated, which MacOS GUI app is this library most likely to show up in first/best? DiffusionBee?


automatic111's webui typically gets the most frequent updates. Middling easy to install.


Great, thank you. Look like there’s already a GH issue: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issu...


I can't get fine-tune the model ron Apple Silicon due to PyTorch supportability issues. I don't have high-hopes it will be supported.

https://github.com/pytorch/pytorch/issues/77794

https://github.com/pytorch/pytorch/issues/77764


How does this compare with using the Hugging Face `diffusers` package with MPS acceleration through PyTorch Nightly? I was under the impression that that used CoreML under the hood as well to convert the models so they ran on the Neural Engine.


It doesn't. MPS largely is on GPU. PyTorch's MPS implementation is incomplete a few weeks ago as well. This is about 3x faster.


Is it? I just ran it on my M1 MacBook Air and am getting 3 it/sec, same as I was using Stable Diffusion for M1. Maybe I'm doing something wrong?


That's surprising to me, although I did the look about 3 weeks ago, and MPS support is a moving target. It is just M1 without Pro or Ultra right? Also, diffusers does support different backends other than PyTorch.


Would it be possible to run 2 SD instances in parallel on a single M1/M2 chip?

One on the GPU and another on the ML core?


Can anyone explain in relatively lay terms how Apple's neural cores differ from a GPU? If they can run stable diffusion so much faster, which normally runs on a GPU, why aren't they used to run shaders for AAA games?


They're designed to run ML specific functions like matrix multiply and stuff. Nvidia has a similar idea in "tensor cores". I think because they're low but operations like 8 or 16 bit which is faster but too low res for GPU work.


This may sound naive, but what are some use cases of running SD models locally? If the free/cheap options exist (like running SD on powerful servers), then what's the advantage of this new method?


> There are a number of reasons why on-device deployment of Stable Diffusion in an app is preferable to a server-based approach. First, the privacy of the end user is protected because any data the user provided as input to the model stays on the user's device. Second, after initial download, users don’t require an internet connection to use the model. Finally, locally deploying this model enables developers to reduce or eliminate their server-related costs.


Stability! The main reason why I use it locally is because I don't want some random dev unilaterally deciding to change or "sunsetting" features I rely on.

Centralized services small and large are guilty of this and I'm sick of it.


"Hey Siri, draw me a purple duck" and it all happens without an internet connection!

If you mean monetary usecases: Roughly something like Photoshop/Blender/UnrealEngine with ML plugins that are low latency, private, and $0 server hosting costs.


Even with the slower pytorch implementation my M1 Pro MBP, which tops out at consuming ~100W of power, can generate a decent image in 30 seconds.

I'm not sure exactly what that costs me in terms of power, but it is assuredly less than any of these services charge for a single image generation.


Works offline, privacy, independent of SaaS (API stability, longevity, …). I'm sure there are more.


Don't want to take a risk to be banned by generating some images like nsfw


Powerful servers with GPUs are expensive. Laptops you already own, aren't.


fine tuned custom models, models with IP knowledge, models that know what you look like. Better latency etc etc. Obviously some can be served by models hosted locally. You can host a model with Triton and create an API to call it in your native application.


You can set it to generate 100 images, hit start, come back later and scroll through the results. Can't do that without spending a bunch of money on the hosted services.


Soon you will be able to render home imovies like they were edited by the team that made the dark knight (which costs ~$100k/min if done professionally).


"A long time ago in a galaxy far, far away"

but seriously, I wonder when you'll be able to paste in a script, and get out a storyboard or a movie


What are some good resources to get into working with this and learning the basics around ML to get some fundamental understanding of how this works?



While running locally on an M1 Pro is nice, recently I've switched over to a Runpod[0] instance running Stable Diffusion instead. The main reasons being high workloads placed on the laptop degrade the battery faster and it takes ~40s to render a single image. On an A5000 it takes mere seconds to do 40 steps. The cost is around $0.2/hr.

[0] https://runpod.io


can't the battery problem be mitigated if you plug in your Macbook while running Stable Diffusion?


The laptop body still heats up and over long periods of time this can degrade the battery, I’ve measured a sharp drop in capacity from the device itself.


Can't wait to see this integrated into automatic1111 so I can use it as a normie


Where is the community for this project?


anyone know how to link this to a GUI?


Macbook Air M1 / 16GB RAM took 3.56 to generate an image, this is pretty wild


> 3.56 to generate an image

3.56 seconds?


ah 3.56 minutes, my mistake


8 gb ram


What about it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: