Hacker News new | past | comments | ask | show | jobs | submit login

Note that there is a Metal backend for PyTorch [0]. Sadly it doesn't work well with codebases that didn't account for it from the start...

[0] https://developer.apple.com/metal/pytorch/




This is (partly) outdated. MPS (metal performance shaders) are now (since torch 2.x) fully integrated in standard Pytorch releases, no external backends or special torch versions are needed.

There are few limitations left when compared with other backends. Instead of using 'cuda' device, one simply uses 'MPS' as device.

What remains is: the optimizations Pytorch provides (especially compile() with 2.1) focus on cuda and it's historic restrictions that result from CUDA being _not_ unified memory, and lots of energy goes into developing architectural work-arounds in order to limit the copying between graphics HW and CPU memory, resulting in proprietary compilers (like triton) that move parts of the python code into proprietary hardware.

Apple's unified memory would make all of those super complicated architectural workarounds mostly unnecessary (which they demonstrate with their project).

Getting current D/L platforms to support both paradigms (unified/non unified) will be a lot of work. One possible avenue is the MLIR project currently leveraged by Mojo.


> This is (partly) outdated. MPS (metal performance shaders) are now (since torch 2.x) fully integrated in standard Pytorch releases, no external backends or special torch versions are needed.

Not sure what you're referring to, the link I provided shows how to use the "mps" backend / device from the official PyTorch release.

> lots of energy goes into developing architectural work-arounds in order to limit the copying between graphics HW and CPU memory

Does this remark apply to PyTorch running on NVidia's platforms with unified memory like the Jetsons?


Your link suggests downloading nightly previews and v1.12 torch which are both slightly out of date info.


Practically does using unified memory mean that the slow transfer of training/testing data to the GPU would be eliminated?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: