Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
nik736
5 months ago
|
parent
|
context
|
favorite
| on:
Apple M5 chip
If you have enough memory to load a model, but not enough bandwidth to handle it, you will get a very low token/s output.
Rohansi
5 months ago
[–]
You can also have enough bandwidth but be compute limited and get lower performance than expected. This is more likely to be the case for Apple Silicon vs. high power GPUs.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: