Probably not with the same amount of training time, but I'd imagine a recent MBP...

rty32 · 2024-07-11T20:24:58 1720729498

Slightly off topic -- I just saw people saying how Mac's unified memory makes it a strength to train models on Macs: https://www.macrumors.com/2024/07/10/apple-leads-global-pc-g..., and how energy efficient they are etc. But what I am seeing is that people don't often even touch Macs at all -- they write code with CUDA and that's it. I find this kind of conversation fascinating.

jamestimmins · 2024-07-11T19:53:20 1720727600

Ah so I couldn't just run this on my laptop for ~48 hours? That's too bad.

danielmarkbruce · 2024-07-11T19:59:02 1720727942

He does it on 8 H100's in 24 hours, ie 192 H100 hours. It's going to be thousands of laptop hours.

mmoskal · 2024-07-11T20:03:41 1720728221

H100 SXM is 2000 TFLOPS at FP16. Multiply by 8.

M3 Max is 28 TFLOPS at FP16.

Based on FLOPS alone, it would be more like a year or two.

Davidzheng · 2024-07-11T20:05:58 1720728358

Can you estimate how long it would take to replicate alphago zero today on one set of 8xH100.

karpathy · 2024-07-11T20:17:40 1720729060

(H100 SXM is 1000 TFLOPS, *2 is from "with sparsity", which is not used here.)

mmoskal · 2024-07-11T20:35:00 1720730100

Right... and there are probably some communication overheads over NVLink that would not be present on single laptop. So a few months maybe :)

latchkey · 2024-07-12T01:49:44 1720748984

MI300x is 1300 TFLOPs at FP16 (without sparsity). Looking forward to seeing the results.

https://www.amd.com/en/products/accelerators/instinct/mi300/...