Even with typescript Claude will happily break basic business logic to make test...

motorest · 2025-07-20T05:08:37 1752988117

> Even with typescript Claude will happily break basic business logic to make tests pass.

It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.

This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.

DecoySalamander · 2025-07-20T10:50:19 1753008619

> This is not a bug

It's not a bug if we're talking about a mischievous jinn granting wishes instead of a productivity tool.

desdenova · 2025-07-20T13:30:22 1753018222

A jinn granting wishes is the best analogy for LLMs I've seen so far.

Sharlin · 2025-07-20T13:58:39 1753019919

Ecept that LLMs aren't mischievous. They're just stupid.

didgeoridoo · 2025-07-20T11:41:07 1753011667

Can’t it be both?

Terr_ · 2025-07-20T07:11:18 1752995478

> It's my understanding that LLMs change the code to meet a goal

I assume in this case you mean a broader conventional application, of which an LLM algorithm is a smaller-but-notable piece?

LLMs themselves have no goals beyond predicting new words for a document that "fit" the older words. It may turn 2+2 into 2+2=4, but it's not actually doing math with the goal of making both sides equal.

motorest · 2025-07-20T08:39:25 1753000765

> I assume in this case you mean a broader conventional application, of which an LLM algorithm is a smaller-but-notable piece?

Not necessarily. If you prompt a LLM to limit changes to some projects or components, it complies with the request.

bitwize · 2025-07-20T16:53:45 1753030425

> It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met.

"I'm Mr. Meeseeks! Look at meeee!"

brailsafe · 2025-07-20T09:33:49 1753004029

Do you mean not just LLMs, but agents? Is this jot avoided by narrowing your scope and just using the chat interface that also may not produce what you're hoping for, but at least can't muck about in your existing code?

winrid · 2025-07-20T23:59:48 1753055988

I told it to add a feature and to update the tests. It added the feature, and then removed it because it made the tests fail lol. I know I can make it work, I did, that's not the point.

chuckadams · 2025-07-20T05:20:31 1752988831

Fixing bugs is also changing project code to make tests pass. The assistant is pretty good at knowing which side to change when it’s working from documentation that describes the correct behavior.

desdenova · 2025-07-20T13:33:23 1753018403

That's the main problem with vibe coding.

The whole point is having the LLM figure out what you want from vague hand-wavy descriptions instead of precise specification.

You don't need an LLM to parse a precise specification, you have a compiler for that.

chuckadams · 2025-07-21T13:13:10 1753103590

It's entirely possible to have specifications somewhere between "vague hand-wavy descriptions" and source code. But it's really not my job to defend AI against all the people who want it to be completely useless, seem to need it to be so, really. I just use it, it works a lot of the time, doesn't work other times, and that's that. Results carry more weight than opinions.

motorest · 2025-07-20T14:53:11 1753023191

> That's the main problem with vibe coding.

It's not a problem. It's in fact the core trait of vibe-codig. The primary work a developer does in vibe coding tasks is providing the necessary and sufficient context. Hence the inception of the term "context engineering". A vibe coder basically lays out requirements and constraints that drives LLMs to write code. That's the bulk of their task: they shift away from writing the low-level "how" to instead write down the high-level "what".

> The whole point is having the LLM figure out what you want from vague hand-wavy descriptions instead of precise specification.

No. The prompts are as elaborate as you want it to be. I, for example, use prompt files with the project's ubiquitous language and requirements, not to mention test suites used for acceptance tests. You can half-ass your code as much as you can half-ass your prompts.

immibis · 2025-07-21T03:14:24 1753067664

Sounds like a compiler with extra steps.

bapak · 2025-07-20T10:11:51 1753006311

Speaking of TypeScript, every time I feed a hard type problem to LLMs they just can't do it. Sometimes I find out it's a TS limitation or just not implemented yet, but that won't stop us from wasting 40 minutes together.

neom · 2025-07-20T12:13:47 1753013627

We are building a tool specifically for typescript developers, just launched a couple of months ago and I'd really appreciate if you gave it a try and provided me with feedback, people seem to really like using it. http://charlielabs.ai - thank yooou!!! :)

rybosome · 2025-07-20T11:40:08 1753011608

I’m currently doing research on this exact problem. Would you care to share an example of an advanced typing issue that you’ve seen LLMs struggle with?

rs186 · 2025-07-20T13:45:17 1753019117

When I vibe coded with GitHub Copilot in TypeScript, it keeps using "any" even though those variables had clear interfaces already defined somewhere in the code. This drove me crazy, as I had to go in and manually fix all those things. The only thing that helps a bit is me screaming "DO NOT EVER USE 'any' TYPE". I can't understand why it would do this.

CalRobert · 2025-07-20T04:59:57 1752987597

That seems like the tests don’t work?

winrid · 2025-07-21T00:01:33 1753056093

It made the tests fail with the new feature and then removed the feature it just added to make them pass.

paffdragon · 2025-07-20T08:33:13 1753000393

Or at least they don't cover business logic if they pass while breaking it.