> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems
> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT
(disclaimer: MSFT/GH employee, not affiliated with this project)
Is it? It might have even more computing power, but are cards now able to share VRAM now? My hands-on experience of all this is from a few years ago and I think it was not possible back then.
I can see the future where each company will have "assistant AI model" trained/updated on its internal data at periodic intervals. The data sources could be group emails, slack / team messages, docs, company pdfs and so on. Maybe MS will provide it to you since it already has access to many of the data sources.
No, you can see the time breakdown in the table under the "coffee break" quote: it is the time for the 3-step RLHF process only. Training a 1.3B parameter model from scratch is still a very large undertaking.
FYI they don't compare to trlX bc trlX is roughly just as fast. Similarly, they put trl in the worst light possible (trl is actually must faster than they claim.)
> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems
> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT
(disclaimer: MSFT/GH employee, not affiliated with this project)