The counternarrative is that it is a very accomplished piece of work that most in the sector were not expecting -- it's open source with API available at fraction of comparable service cost
It has upended a lot of theory around how much compute is likely needed over next couple of years, how much profit potential the AI model vendors have in nearterm and how big an impact export controls are having on China
V3 took top slot on HF trending models for first part of Jan ... r1 has 4 of the top 5 slots tonight
Almost every commentator is talking about nothing else
You can just use it and see for yourself. It's quite good.
I do believe they were honest in the paper, but the $5.5m training cost (for v3) is defined in a limited way: only the GPU cost at $2/hr for the one training run they did that resulted in the final V3 model. Headcount, overhead, experimentation, and R&D trial costs are not included. The paper had something like 150 people on it, so obviously total costs are quite a bit higher than the limited scope cost they disclosed, and also they didn't disclose R1 costs.
Still, though, the model is quite good, there are quite a few independent benchmarks showing it's pretty competent, and it definitely passes the smell test in actual use (unlike many of Microsoft's models which seem to be gamed on benchmarks).
Agreed. I am no fan of the CCP but I have no issue with using DeepSeek since I only need to use it for coding which it does quite well. I still believe Sonnet is better. DeepSeek also struggles when the context window gets big. This might be hardware though.
Having said that, DeepSeek is 10 times cheaper than Sonnet and better than GPT-4o for my use cases. Models are a commodity product and it is easy enough to add a layer above them to only use them for technical questions.
If my usage can help v4, I am all for it as I know it is going to help everyone and not just the CCP. Should they stop publishing the weights and models, v3 can still take you quite far.
Curious why you have to qualify this with a “no fan of the CCP” prefix. From the outset, this is just a private organization and its links to CCP aren’t any different than, say, Foxconn’s or DJI’s or any of the countless Chinese manufacturers and businesses
You don’t invoke “I’m no fan of the CCP” before opening TikTok or buying a DJI drone or a BYD car. Then why this, because I’ve seen the same line repeated everywhere
Anything that becomes valuable will become a CCP property and it looks like DeepSeek may become that. The worry right now is that people feel using DeepSeek supports the CCP, just as using TikTok does. With LLMs we have static data that provides great control over what knowledge to extract from it.
This is just an unfair clause set up to solve the employment problem of people within the system, to play a supervisory role and prevent companies from doing evil. In reality, it has little effect, and they still have to abide by the law.
Its pretty nutty indeed. The model still might be good, but the botting is wild. On that note, one of my favorite benchmarks to watch is simple bench and R! doesn't perform as well on that benchmark as all the other public benchmarks, so it might be telling of something.
Ye I mean in practice it is impossible to verify. You can kind of smell it though and I smell nothing here, eventhough some of 100 listed authors should be HN users and write in this thread.
Some obvious astroturf posts on HN seem to be on the template "Watch we did boring coorparate SaaS thing X noone cares about!" and then a disappropiate amount of comments and upvotes and 'this is a great idea', 'I used it, it is good' or congratz posts, compared to the usual cynical computer nerd everything sucks especially some minute detail about the CSS of your website mindset you'd expect.
Of course it isn’t all botted. You don’t put astroturf muscle behind things that are worthless. You wait until you have something genuinely good and then give as big of a push as you can. The better it genuinely is the more you artificially push as hard as you can.
Go read a bunch of AI related subreddits and tell me you honestly believe all the comments and upvotes are just from normal people living their normal life.
Usually, the words 'astroturfing' and 'propaganda' aren't reserved for describing the marketing strategies of valuable products/ideologies. Maybe reconsider your terminology.