Hacker News new | past | comments | ask | show | jobs | submit login

While impressive, the deepseek models aren't really "on par" with either oAI or Anthropic offerings, right now. The models seem to be a bit overfitted in the post-training step. They are very "stubborn" models, and usually handle tasks well if they can handle them, but steering them is quite difficult. As a result, they score very well on various benchmarks, but often times perform slightly worse in real-life scenarios.



The blind test at lmarena.ai does give it a higher Elo than GPT-4o (API), Claude, and Gemini 1.5 Pro. It seems that people do enter real-life scenarios in the arena.


DeepSeek v3 feels very much like Sonnet 3.5 (v1) in particular, minus the character. Performs more or less similarly, "feels" overfitted just about the same, and repeats itself in multiturn chats even worse. I hope they address it in v3.5, v4, or whatever comes next.


  They are very "stubborn" models
Have you found this to be the case even when using the recommended temperature settings (ranging from 0 for math, to 1.5 for creative tasks)?


I use 0.05 for math, just did a 5k problem set, trying to fine-tune a smaller model with the outputs. It has some very interesting training, borrowed from r1 per the tech report, where it does the o1/qwq "thinking steps", but a bit shorter. It solves ~80% of the problems in 4k context, while qwq would go on for 8k-16k. It's very good at what it does.

But as soon as I need it to do something other than solve a problem - say rewrite the problem in simpler terms, or given a problem + solution provide hints, or rewrite the solution with these <tags>, etc. it kinda stops working. Often times it still goes ahead and solves the problem. That's why I'm saying it's stubborn. If a task looks like a task that it can handle very well, it's really hard to make it perform that other, similar but not quite the same task.

In a similar vein - https://github.com/cpldcpu/MisguidedAttention/tree/main/eval...


I found deepseek very useful at coding with Aider. On par with claude.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: