Hacker News new | past | comments | ask | show | jobs | submit login

Autoregressive transformer models are usually memory bound, whereas SD is compute bound, so perhaps the difference lies here. Also the reason why SD runs so much faster on the GPU than on the CPU.



M1 has (fast) unified memory between GPU and CPU, so something being memory bound ought not to have much bearing on whether it belongs on CPU or GPU… at least in theory. I’m a total noob here though so I may be wrong.


We were discussing mostly about NPU, I don't know if it makes a difference.


From https://en.wikipedia.org/wiki/Apple_M1#Memory

> The M1 uses a 128-bit LPDDR4X SDRAM in a unified memory configuration shared by all the components of the processor.

I assume that includes the NPU, media engine, etc.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: