Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm eagerly awaiting for Qwen 3 coder being available on Cerebras.

I run plenty of agent loops and the speed makes a somewhat interesting difference in time "compression". Having a Claude 4 Sonnet-level model running at 1000-1500 tok/s would be extremely impressive.

To FEEL THE SPEED, you can either try it yourself on Cerebras Inference page, through their API, or for example on Mistral / Le Chat with their "Flash Answers" (powered by Cerebras). Iterating on code with 1000 tok/s makes it feel even more magical.





Exactly. I can see my efficiency going up a ton with this kind of speed. Every time I'm waiting for agents my mind looses some focus and context. Running parallel agents gets more speed but at the cost of focus. Near instant iteration loops in Cursor would feel magical (even more magical?).

It will also impact how we work: interactive IDEs like Cursor probably make more sense than CLI tools like Claude code when answers are nearly instant.


I was justing thinking the opposite. If the answers are this instant, then subject to cost I'd be tempted to have the agent fork and go off and try a dozen different things, and run a review process to decide which approach(es) or part of approaches to present to the user.

It opens up a whole lot of use cases that'd be a nightmare if you have to look at each individual change.


Same.

However, I think Cerebras first needs to get the APIs to be more openAI compliant. I tried their existing models with a bunch of coding agents (include Cline which they did a PR for) and they all failed to work either due to a 400 error or tool calls not being formatted correctly. Very disappointed.


I just set up Groq with Kimi K2 the other day and was blown away by the speed.

Deciding if I should switch to Qwen 3 and Cerebras.

(Also, off-topic, but the name reminds me of cerebrates from Starcraft. The Zerg command hierarchy lore was fascinating when I was a young child.)


Have you used Claude Code and how do you compare the quality to Claude models? I am heavily invested in tools around Claude, still struggling to make a switch and start experimenting with other models

I still exclusively use Claude Code. I have not yet experimented with these other models for practical software development work.

A workflow I've been hearing about is: use Claude Code until quota exhaustion, then use Gemini CLI with Gemini 2.5 Pro free credits until quota exhaustion, then use something like a cheap-ish K2 or Qwen 3 provider, with OpenCode or the new Qwen Code, until your Claude Code credits reset and you begin the cycle anew.


Are you using Claude code or the web interface? I would like to try this with CC myself, apparently with some proxy use an OpenAI compatible LLM can be swapped in.

I am using Claude code, my experience with it so far is great. I use it primarily from terminal, this way I stay focused while reading code and CC doing its job in the background.

I’ve heard this repeated that using the env vars you can use gpt models, for example.

But then also that running a proxy tool locally is needed.

I haven’t tried this setup, and can’t say offhand if Cerebras’ hosted qwen described here is “OpenAI” compatible.

I also don’t know if all of the tools CC uses out of the box are supported in the most compatible non-Anthropic models.

Can anyone provide clarity / additional testimony on swapping out the engine on Claude Code?


I've used Kimi K2, it works well. Personally I'm using Claude Code Router.

https://github.com/musistudio/claude-code-router


Issue most groq models are limited in context as that cost a lot of memory.

Obligatory reminder that 'Groq' and 'Grok' are entirely different and unrelated. No risk of a runaway Mecha-Hitler here!

instead risk of requiring racks of hardware to run just one model!

It'll be nice if this generates more pressure on programming language compilation times. If agentic LLMs get fast enough that compilation time becomes the main blocker in the development process, there'll be significant economic incentives for improving compiler performance.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: