Thanks for the uploads! Was reading through the Unsloth docs for Qwen3-Coder bef...

danielhanchen · 2025-07-24T22:46:56 1753397216

Oh 8*H200 is nice - for llama.cpp definitely look at https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locall... - llama.cpp has a high throughput mode which should be helpful.

You should be able to get 40 to 50 tokens / s in the minimum. High throughput mode + a small draft model might get you 100 tokens / s generation