Hacker News new | past | comments | ask | show | jobs | submit login

Probably not with the same amount of training time, but I'd imagine a recent MBP GPU could handle GPT-2 training. The biggest challenge is that the training would need to be reimplemented for Metal instead of CUDA.



Slightly off topic -- I just saw people saying how Mac's unified memory makes it a strength to train models on Macs: https://www.macrumors.com/2024/07/10/apple-leads-global-pc-g..., and how energy efficient they are etc. But what I am seeing is that people don't often even touch Macs at all -- they write code with CUDA and that's it. I find this kind of conversation fascinating.


Ah so I couldn't just run this on my laptop for ~48 hours? That's too bad.


He does it on 8 H100's in 24 hours, ie 192 H100 hours. It's going to be thousands of laptop hours.


H100 SXM is 2000 TFLOPS at FP16. Multiply by 8.

M3 Max is 28 TFLOPS at FP16.

Based on FLOPS alone, it would be more like a year or two.


Can you estimate how long it would take to replicate alphago zero today on one set of 8xH100.


(H100 SXM is 1000 TFLOPS, *2 is from "with sparsity", which is not used here.)


Right... and there are probably some communication overheads over NVLink that would not be present on single laptop. So a few months maybe :)


MI300x is 1300 TFLOPs at FP16 (without sparsity). Looking forward to seeing the results.

https://www.amd.com/en/products/accelerators/instinct/mi300/...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: