Oh, absolutely. I'm not defending OpenAI, I just care about accurate reporting. ...

Palmik · 2025-01-29T15:45:31 1738165531

You are underselling or not understanding the breakthrough. They trained 600B model on 15T tokens for <$6/m. Regardless of the provenance of the tokens, this in itself is impressive.

Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.

sho_hn · 2025-01-29T16:28:37 1738168117

Let's call it underselling. :-) Mostly because I'm not sure anyone's independently done the math and we just have a single statement from the CEO. I do appreciate the algorithmic improvements, and the excellent attention-to-performance-in-detail stuff in their implementation (careful treatment of precision, etc.), making the H800s useful, etc. I agree there's a lot there.

visarga · 2025-01-29T21:53:07 1738187587

> that's a bit like saying that by painting a a bare wall green you have demonstrated that you can build green walls 27x cheaper, ignoring the cost of building the wall in the first place

That's a funny analogy, but in reality DeepSeek did reinforcement learning to generate chain of thought, which was used in the end to finetune LLMs. The RL model was called DeepSeek-R1-Zero, while the SFT model is DeepSeek-R1.

They might have boostrapped the Zero model with some demonstrations.

> DeepSeek-R1-Zero struggles with challenges like poor readability, and language mixing. To make reasoning processes more readable and share them with the open community, we explore DeepSeek-R1, a method that utilizes RL with human-friendly cold-start data.

> Unlike DeepSeek-R1-Zero, to prevent the early unstable cold start phase of RL training from the base model, for DeepSeek-R1 we construct and collect a small amount of long CoT data to fine-tune the model as the initial RL actor. To collect such data, we have explored several approaches: using few-shot prompting with a long CoT as an example, directly prompting models to generate detailed answers with reflection and verification, gathering DeepSeek-R1Zero outputs in a readable format, and refining the results through post-processing by human annotators.

bbqfog · 2025-01-29T15:32:33 1738164753

I don't agree. Walls are physical items so your example is true, but models are data. Anyone can train off of these models, that's the current environment we exist in. Just like OpenAI trained on data that has since been locked up in a lot of cases. In 2025 training models like Deepseek is indeed 27x cheaper, that includes both their innovations and the existence of new "raw material" to do such a thing.

sho_hn · 2025-01-29T15:35:30 1738164930

I don't think we disagree at all, actually!

What I'm saying is that in the media it's being portrayed as if DeepSeek did the same thing OpenAI did 27x cheaper, and the outsized market reaction is in large parts a response to that narrative. While the reality is more that being a fast-follower is cheaper (and the concrete reason is e.g. being able to source training data from prior LLMs synthetically, among other things), which shouldn't have surprised anyone and is just how technology in general trends.

The achievement of DeepSeek is putting together a competent team that excels at end-to-end implementation, which is no small feat and is promising wrt/ their future efforts.

meiraleal · 2025-01-29T16:40:37 1738168837

How much money a third company would need to spend to achieve what OpenAI achieved to compete with them, 5billion or 6million?