Is there a handy table for this? My napkin math has either underestimated throug...

YetAnotherNick · 2024-11-02T13:16:43 1730553403

You require 6 * parameter * token flops[1] to train LLM. Which means (flop/s of H100 * MFU) / (6 * parameter) token per second. Assuming MFU of 40%, it is (1000 * 10^12 * 0.4) / (6 * 10^9) token/sec = 67,000 token/sec.

This repo[2] by Meta achieves 48% MFU, or 80k token/second.

[1]: https://arxiv.org/pdf/2001.08361

[2]: https://github.com/facebookresearch/lingua

codetrotter · 2024-11-02T03:14:48 1730517288

(1,000,000,000,000/63,000)/(60*60)

(1T tokens / 63k tokens per second) / (60 seconds per minute * 60 minutes per hour)

Is approx 4400 hours

So I guess that’s how the calculation went.

Or did you mean a source for the number of tokens per second?

lumost · 2024-11-02T13:50:39 1730555439

Tokens per second ;) I can do the arithmetic on my own.