Oh, absolutely. I'm not defending OpenAI, I just care about accurate reporting. Even on HN - even in this thread - you see people who came away with the conclusion that DeepSeek did something while "cutting cost by 27x".
But that's a bit like saying that by painting a a bare wall green you have demonstrated that you can build green walls 27x cheaper, ignoring the cost of building the wall in the first place.
Smarter reporting and discourse would explain how this iterative process actually works and who is building on who and how, not frame it as two competing from-scratch clean room efforts. It'd help clear up expectations of what's coming next.
It's a bit similar to how many are saying DeepSeek have demonstrated independence from nVidia, when part of the clever thing they did was figure out how to make the intentionally gimped H800s work for their training runs by doing low-level optimizations that are more nVidia-specific, etc.
Rarely have I seen a highly technical topic see produce more uninformed snap takes than this week.
You are underselling or not understanding the breakthrough. They trained 600B model on 15T tokens for <$6/m. Regardless of the provenance of the tokens, this in itself is impressive.
Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.
Let's call it underselling. :-) Mostly because I'm not sure anyone's independently done the math and we just have a single statement from the CEO. I do appreciate the algorithmic improvements, and the excellent attention-to-performance-in-detail stuff in their implementation (careful treatment of precision, etc.), making the H800s useful, etc. I agree there's a lot there.
> that's a bit like saying that by painting a a bare wall green you have demonstrated that you can build green walls 27x cheaper, ignoring the cost of building the wall in the first place
That's a funny analogy, but in reality DeepSeek did reinforcement learning to generate chain of thought, which was used in the end to finetune LLMs. The RL model was called DeepSeek-R1-Zero, while the SFT model is DeepSeek-R1.
They might have boostrapped the Zero model with some demonstrations.
> DeepSeek-R1-Zero struggles with challenges like poor readability, and language mixing. To make reasoning processes more readable and share them with the open community, we explore DeepSeek-R1, a method that utilizes RL with human-friendly cold-start data.
> Unlike DeepSeek-R1-Zero, to prevent the early unstable cold start phase of RL training from the base model, for DeepSeek-R1 we construct and collect a small amount of long CoT data to fine-tune the model as the initial RL actor. To collect such data, we have explored several approaches: using few-shot prompting with a long CoT as an example, directly prompting models to generate detailed answers with reflection and verification, gathering DeepSeek-R1Zero outputs in a readable format, and refining the results through post-processing by human annotators.
I don't agree. Walls are physical items so your example is true, but models are data. Anyone can train off of these models, that's the current environment we exist in. Just like OpenAI trained on data that has since been locked up in a lot of cases. In 2025 training models like Deepseek is indeed 27x cheaper, that includes both their innovations and the existence of new "raw material" to do such a thing.
What I'm saying is that in the media it's being portrayed as if DeepSeek did the same thing OpenAI did 27x cheaper, and the outsized market reaction is in large parts a response to that narrative. While the reality is more that being a fast-follower is cheaper (and the concrete reason is e.g. being able to source training data from prior LLMs synthetically, among other things), which shouldn't have surprised anyone and is just how technology in general trends.
The achievement of DeepSeek is putting together a competent team that excels at end-to-end implementation, which is no small feat and is promising wrt/ their future efforts.
But that's a bit like saying that by painting a a bare wall green you have demonstrated that you can build green walls 27x cheaper, ignoring the cost of building the wall in the first place.
Smarter reporting and discourse would explain how this iterative process actually works and who is building on who and how, not frame it as two competing from-scratch clean room efforts. It'd help clear up expectations of what's coming next.
It's a bit similar to how many are saying DeepSeek have demonstrated independence from nVidia, when part of the clever thing they did was figure out how to make the intentionally gimped H800s work for their training runs by doing low-level optimizations that are more nVidia-specific, etc.
Rarely have I seen a highly technical topic see produce more uninformed snap takes than this week.