To spell it out for myself and others: approaching equivalent calculations for each individual attention block means we also approach equivalent performance for the combination of them. And with an error bar approaching floating point accuracy, the performance should be practically identical to regular attention. Elementwise errors of this magnitude can't lead to any noteworthy changes in the overall result, especially given how robust LLM networks seem to be to small deviations.
I like embeddings for natural language documents where your query terms are unlikely to be unique, and overall document direction is a good disambiguator.
You can't control it to the level of individual LLM requests and orchestration of those. And that is very valuable, practically required, to build a tool like this. Otherwise, you just have a wrapper over another big program and can barely do anything interesting/useful to make it actually work better.
What can't you do exactly? You can send Claude arbitrary user prompts—with arbitrary custom system prompts—and get text back. You can then put those text responses into whatever larger system you want.
You don't get a simple request/response paradigm with claude code: 1 message from the user results in a loop that usually invokes many inner LLM requests, among other business logic, resulting in some user-visible output and a bunch of less visible stuff (filesystem changes, etc). You control an input to the outer loop: you can only do some limited stuff with hooks to control what happens within the loop. But there's a lot happening inside that loop that you have no say over.
A simple example: can you arbitrarily manipulate the historical context of a given request to the LLM? It's useful to do that sometimes. Another one: can you create a programmatic flow that tries 3 different LLM requests, then uses an LLM judge to contrast and combine into a best final answer? Sure, you could write a prompt that says do that, but that won't yield equivalent results.
These are just examples, the point is you don't get fine control.
It has to be suited for human consumption too though.
I wonder if this has any real benefits over just doing very simple html wireframing with highly constrained css, which is readily renderable for human consumption. I guess pure text makes it easier to ignore many stylistic factors as they are harder to represent if not impossible. But I'm sure that LLMs have a lot more training data on html/css, and I'd expect them to easily follow instructions to produce html/css for a mockup/wireframe.
It kind of makes sense if you relate it to ASCII art, which is very often not ASCII for similar reasons. The naming evokes that concept for me at least. Naming is hard in general, I'm sure they tried to find a name that they thought worked best.
I agree that "TUI" is a better fit though. But not TUI-driven-development, more like TUI-driven-design, followed by using the textual design as a spec (i.e. spec-driven development) to drive GUI implementation via coding agents.
reply