We don't necessarily need to replace the versions humans use - though some of the changes might well make tools better for humans too - but e.g. most of the tools I add for my coding agent are attempts at coaxing it to avoid doing things like e.g. the "head" example in the article.
That’s just evidence that these sophisticated next token predictors are not good enough yet. The works should not bend over backwards to accommodate a new tool. The new tool needs to adapt to the world. Or only be used in the situations where it is appropriate. This is one of the problems of calling LLMs AI: a language model lacks understanding.
Many of us have actual work to get done, rather than conform to purity tests for the sake of it. Nobody will erase the other versions of these tools by making adaptations that are more suitable for this use.
We don't necessarily need to replace the versions humans use - though some of the changes might well make tools better for humans too - but e.g. most of the tools I add for my coding agent are attempts at coaxing it to avoid doing things like e.g. the "head" example in the article.