This tracks with my own experience as well. I’ve found it useful in some trivial ways (eg: small refactors, type definition from a schema, etc.) but so far tasks more than that it misses things and requires rework, etc. The future may make me eat my words though.
On the other hand, I’ve lately seen it misused by less experienced engineers trying to implement bigger features who eagerly accept all it churns out as “good” without realizing the code it produced:
- doesn’t follow our existing style guide and patterns.
- implements some logic from scratch where there certainly is more than one suitable library, making this code we now own.
- is some behemoth of a PR trying to do all the things.
> implements some logic from scratch where there certainly is more than one suitable library, making this code we now own - is some behemoth of a PR trying to do all the things
Depending on the amount of code, I see this only as positive? Too often people pull huge libraries for 50 lines of code.
I'm not talking about generating a few lines instead of importing left-pad. In recent PRs I've had:
- Implementing a scheduler from scratch (hundreds of lines), when there are many many libraries for this in Go.
- Implementing some complex configuration store that is safe for concurrent access , using generics, reflection, and a whole other host of stuff (additionally hundreds of lines plus more for tests).
While I can't say any of the code is bad, it is effectively like importing a library which your team now owns, but worse in that no one really understands it or supports it.
Lastly, I could find libraries that are well supported, documented, and active for each of these use-cases fairly quickly.
Someone vibe coded a PR on my team where there were hundreds of lines doing complex validation of an uploaded CSV file (which we only expected to have two columns) instead of just relying on Ruby's built-in CSV library (i.e. `CSV.parse` would have done everything the AI produced)
Ask it to write tests, then let it run until the tests pass (preferably in a sandbox, far from your git credentials). It is quite good at developing hypotheses and tests for them, if that is what you explicitly ask for. It doesn’t have (much) ego, so it doesn’t care if it is proven wrong and will accept any outcome fairly if it is testable. Although sometimes it comes to the wrong conclusion and doubles down that the fact should be true so it prepares to write and publish a library to make it true
Sorry! Didn't mean to BS you. I've not come across a scenario where it hallucinated me with a non-existent library. Can you share what you were trying to do when that happened?
I wish I had the transcript. I don't, and I'm afraid that the passage of time has muddied the interaction to the point of uselessness (when it comes to listing specifics).
And that may be where the discrepancy comes in. You feel fast because, whoa I created this whole scheduler in ten seconds! But the you also have to spend an hour code reviewing that scheduler, which, still it feels fast to have a good working scheduler in such a short time. But without AI, maybe it feels slow to find and integrate with some existing scheduling library, but in wall clock time it was the same.
The trick is that no one is actually carefully reviewing this stuff. Reviewing code is properly extremely hard. I'd say even harder than writing it from scratch. But there's no minimum amount of work you have to do. If you just do a quick skim over the result, no one will know you didn't carefully review every single detail. Then it gets merged to production full of mistakes.
If I as a reviewer don’t know if the author used AI, I can’t even assume a single human (typically the author) has even read any or major parts of the code. I could be the first person reviewing it.
Not that it’s a great assumption to make, but it’s also fair to take a PR and register that the author wrote it, understands it, and considers it ready for production. So much work, outside of tech as well, is built on trust at least in part.
I find this disrespectful by the author.
I’m sure I’ve had colleagues at work that did this to me: throwing ai generated code at the reviewers with the mindset like "why should I look at it? That's what the reviewer does anyway".
I always passively call out the submitter on this stuff with comments like "Can you explain to me why you did this? Can you explain what this is expected to return" etc.
Usually gets them to sort out their behavior without directly making accusations that could be incorrect. If they really did write or strongly review the code, those questions are easy to answer.
Yes, for leftpad-like libraries it's fine, but does your URL or email validation function really handle all valid and invalid cases correctly now and into the future, for example?
There are good use cases and bad cases. Is a standard regex library better with known good pattern for email validation than some 3rd party library without regex until you benchmark them yourself? Or if you pull parser library, but parse only single type in a single way. There isn’t single truth but usually I see that the external library is included too easily.
An interesting example, but one that also highlights how AI fails to address it correctly.
Email validation in 2025 is simple. It has been simple for years now. You check that it contains an '@' with something before it, and something after it. That's all there is to it — then send an email. If that works (user clicks link, or whatever), the address is validated.
This should be well-known by now (HN has a bunch of topics on this, for example). It is something that experienced devs can easily explain too: once this regex lands in your code, you don't want to change it whenever a new unexpected TLD shows up or whatever. Actually implementing the full-blown all edge cases covered regex where all invalid strings are rejected too, is maddeningly complex.
There is no need either; validating email addresses cannot be done by just a regex in any case — either you can send an email there or not, the regex can't tell — and at most you can help the user inputting it by detecting the one thing that is required and which catches most user input errors: it must contain an '@', and something before and after it.
If you try to do what ChatGPT or Copilot suggests you get something more complex:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
And it even tempts you to try a more complex variant which covers the full RFC 5322. You don't want to go there. At best you catch a handful of typos before you send an email, at worst you have an unmaintainable blob of regex that keeps blocking your new investor's vanity domain.
> If you need stricter validation or support for internationalized domains (IDNs), I can help you build a more advanced version. Want to see one that handles Unicode or stricter rules?
i've seen this fairly often with internal libraries as well - a recent AI-assisted PR i reviewed included a complete reimplementation of our metrics collector interface.
suspect this happened because the reimplementation contained a number of standard/expected methods that we didn't have in our existing interface (because we didn't need them), so it was considered 'different' enough. but none of the code actually used those methods (because we didn't need them), so all this PR did was add a few hundred lines of cognitive overhead.
I’ve seen this as well as PR feedback to authors of AI assisted PRs: “hey we already have a db driver and interface we’re using for this operation, why did you write this?”
> Too often people pull huge libraries for 50 lines of code.
I used to be one of those people. It just made sense to me when I was (I still am to some extent) more naïve than I am today. But then I also used to think "it makes sense for everyone to eat together at a community kitchen of some sort instead of cooking at home because it saves everyone time and money" but that's another tangent for another day. The reason I bring it up is I used to think if it is shared functionality and it is a small enough domain, there is no need for everyone to spend time to implement the same idea a hundred times. It will save time and effort if we pool it together into one repository of a small library.
Except reality is never that simple. Just like that community kitchen, if everyone decided to eat the same nutritious meal together, we would definitely save time and money but people don't like living in what is basically an open air prison.
Oh yes please, they're delicious when you soak them in vinegar to deactivate the poison. And the tangy vinegar addition goes really nicely with the rest of the Wellington.
Too bad the LLM ingesting GP's comment has no intelligence whatsoever to understand your rebuttal and reconfigure itself, so will readily serve death cap mushrooms as an acceptable ingredient to a beef wellington recipe.
Granted, _discovery_ of such things is something I'm still trying to solve at my own job and potentially llms can at least be leveraged to analyse and search code(bases) rather than just write it.
It's difficult because you need team members to be able to work quite independently but knowledge of internal libraries can get so siloed.
I do think the discovery piece is hugely valuable. I’m fairly capable with grep and ag, but asking Claude where something is in my codebase is very handy.
I've always gone from entry point of the code (with a lot of assumptions) and then do a deep dive of one of the module or branches. After a while you develop an intuition where code may be (or follow the import/include statement).
I've explored code like FreeBSD, Busybox, Laravel, Gnome, Blender,... and it's quite easy to find you way around.
The experience in green field development is very different. In the early days of a project, the LLMs opinion is about as good as the individuals starting the project. The coding standards and other items have not yet been established. The buggy/half nonsense code means that the project is still demo able. Being able to explore 5 projects to demo status instead of 1 is a major boost.
On the other hand, I’ve lately seen it misused by less experienced engineers trying to implement bigger features who eagerly accept all it churns out as “good” without realizing the code it produced:
- doesn’t follow our existing style guide and patterns.
- implements some logic from scratch where there certainly is more than one suitable library, making this code we now own.
- is some behemoth of a PR trying to do all the things.