Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of talk about how much cheaper it is than all other models.

It remains to be seen what the pricing will be when run by non-Deepseek providers. They might be loss leading.

The comparison for cheap models should also be Gemini 2.0 Flash Exp. I could see it being even cheaper when it stops being free - if it does at all. There's definitely a scenario where Google just keeps it freeish for a long time with relatively high limits.



Per available providers on OpenRouter right now:

DeepSeek - 0.14$ per million tokens input, 0.28$ million tokens output (66 tokens per/s)

Fireworks - 0.9$ per million tokens input, 0.9$ million tokens output (23 tokens per/s)

DeepInfra - 1$ per million tokens input, 2$ million tokens output (1.27 tokens per/s)

Compared to Llama 3.1 405B (smaller model than this afaik):

Cheapest is 0.8/0.8$ at 24 t/s all the way to 4$/4$ at 8 t/s

So third party cost seems similar, but there aren't many people hosting DeepSeek right now.


Just a minor clarification, DeepSeek's pricing for this model is temporary to match their previous model. They announced [1] that it will be the following after February 8:

DeepSeek - 0.27$ per million tokens input, 1.10$ million tokens output (66 tokens per/s)

Still much cheaper than the others though for input pricing.

[1] https://api-docs.deepseek.com/news/news1226#-api-pricing-upd...


This is at peasant 8k context size too.


Yeah, I was assuming they are selling for cheap to get people to try the model.

But still certainly cheaper than everyone else at the moment.


Given that we have open weights on it, the costs to run it relative to other open source models are fairly transparent.


True, but the cost for other models will continue to go down as well.


exactly ... Gemini 2.0 Flash ranks better on quality, is faster, and cheaper if you assume same pricing as 1.5 (might go down).

these models are being commoditized.

https://artificialanalysis.ai/models/deepseek-v3


For what it's worth, as always 99% benchmarks are very unreliable and per-task performance still greatly differs per model, with plenty of cases where results are wildly different.

I have a task I use in my work where Gemini 1.5-Pro is SOTA. Handily beating o1, Sonnet-3.5, Gemini-exp and everyone else, very consistently and significantly.

The newer/bigger models are better at reasoning and especially coding, but there's plenty of tasks that have little overlap with those skills.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: