> Even with typescript Claude will happily break basic business logic to make tests pass.
It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.
This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.
> It's my understanding that LLMs change the code to meet a goal
I assume in this case you mean a broader conventional application, of which an LLM algorithm is a smaller-but-notable piece?
LLMs themselves have no goals beyond predicting new words for a document that "fit" the older words. It may turn 2+2 into 2+2=4, but it's not actually doing math with the goal of making both sides equal.
> It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met.
Do you mean not just LLMs, but agents? Is this jot avoided by narrowing your scope and just using the chat interface that also may not produce what you're hoping for, but at least can't muck about in your existing code?
I told it to add a feature and to update the tests. It added the feature, and then removed it because it made the tests fail lol. I know I can make it work, I did, that's not the point.
Fixing bugs is also changing project code to make tests pass. The assistant is pretty good at knowing which side to change when it’s working from documentation that describes the correct behavior.
It's entirely possible to have specifications somewhere between "vague hand-wavy descriptions" and source code. But it's really not my job to defend AI against all the people who want it to be completely useless, seem to need it to be so, really. I just use it, it works a lot of the time, doesn't work other times, and that's that. Results carry more weight than opinions.
It's not a problem. It's in fact the core trait of vibe-codig. The primary work a developer does in vibe coding tasks is providing the necessary and sufficient context. Hence the inception of the term "context engineering". A vibe coder basically lays out requirements and constraints that drives LLMs to write code. That's the bulk of their task: they shift away from writing the low-level "how" to instead write down the high-level "what".
> The whole point is having the LLM figure out what you want from vague hand-wavy descriptions instead of precise specification.
No. The prompts are as elaborate as you want it to be. I, for example, use prompt files with the project's ubiquitous language and requirements, not to mention test suites used for acceptance tests. You can half-ass your code as much as you can half-ass your prompts.
Speaking of TypeScript, every time I feed a hard type problem to LLMs they just can't do it. Sometimes I find out it's a TS limitation or just not implemented yet, but that won't stop us from wasting 40 minutes together.
We are building a tool specifically for typescript developers, just launched a couple of months ago and I'd really appreciate if you gave it a try and provided me with feedback, people seem to really like using it. http://charlielabs.ai - thank yooou!!! :)
I’m currently doing research on this exact problem. Would you care to share an example of an advanced typing issue that you’ve seen LLMs struggle with?
When I vibe coded with GitHub Copilot in TypeScript, it keeps using "any" even though those variables had clear interfaces already defined somewhere in the code. This drove me crazy, as I had to go in and manually fix all those things. The only thing that helps a bit is me screaming "DO NOT EVER USE 'any' TYPE". I can't understand why it would do this.