It isn’t streaming the ollama output so it feels slow (~3 words/second on a 3090 with the defaults). Using ollama directly streams within a second and you can kill it early. I don’t understand the UX of looping responses to the same question either. This does not feel like magic.