Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
coolspot
38 days ago
|
parent
|
context
|
favorite
| on:
Deepseek: The quiet giant leading China’s AI race
It activates only 37B per query, but you don’t know which ones ahead of time, so you gotta store all 671B in (V)RAM.
cma
38 days ago
[–]
But you don't need cluster networking or nvlink so much like with splitting out llama 405B. You could even split them out with friends over internet levels of bandwidth.
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: