I agree that the evidence is weak, and even if they had some, they cannot really...

kgeist · 2025-01-29T20:35:53 1738182953

If it's just a distillation of GPT-4, wouldn't we expect it to have worse quality than o1? But I've seen countless examples of DeepSeek-r1 solving math problems that o1 cannot.

>Very often, DeepSeek tells you it's ChatGPT or OpenAI; it's actually quite easy to get it to do that. Some say that's related to "the background radiation on the post-AI internet". I'm not a fentanyl consumer so, unfortunately, I think that argument is trash.

The exact same thing happened with Llama. Sometimes it also claimed to be Google Assistant or Amazon Alexa.

moralestapia · 2025-01-29T20:42:45 1738183365

>wouldn't we expect it to have worse quality than o1?

That's tricky, you can optimize a model to do real well on synthetic benchmarks.

That said, DeepSeek performs a bit worse than GPT-4 in general and substantially wrong on benchmarks like ARC which is designed with this in mind.

kgeist · 2025-01-29T21:04:43 1738184683

Are you sure you checked R1 and not V3? By default, R1 is disabled in their UI.

  Prompt: Find an English word that contains 4 'S' letters and 3 'T' letters.

  Deepseek-R1: stethoscopists (correct, thought for 207 seconds)

  ChatGPT-o1: substantialists (correct, thought for 188 seconds)

  ChatGPT-4o: statistics (wrong) (even with "let's think step by step")

In almost every example I provide, it's on par with o1 and better than 4o.

>substantially wrong on benchmarks like ARC which is designed with this in mind.

Wasn't it revealed OpenAI trained their model on that benchmark specifically? And had access to the entire dataset?

moralestapia · 2025-01-29T22:18:19 1738189099

That prompt means nothing. Check out the benchmarks.

Also, compare V3 to 4o and R1 to o1, that's the right way.

esafak · 2025-01-29T22:21:03 1738189263

No, because it is not a distillation, but an extension. A selling point of the model is using RL to push past the quality of the base model.

powerapple · 2025-01-30T16:40:53 1738255253

The identity issue is not an evidence at all. It is the easiest thing to clean from data, if you are actually distilling GPT-4, that would be the first thing you do to remove those data samples.

It is predicting next token, are we really taking its words and think the model knows what it is saying?