I connect local models to MCPs with LM Studio and I'm blown away at how good they are. But the issues creep up when you hit longer context like you said.
OpenAI and Anthropic's real moat is hardware. For local LLMs, context length and hardware performance are the limiting factors. Qwen3 4B with a 32,768 context window is great. Until it begins filling up and performance drops quickly.
I use local models when possible. MCPs work well, but their large context injection makes switching to an online provider the no-brainer.
reply