> 2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.
With humans you can be reasonably sure they've followed through with a mostly consistent level of care and thouhht. LLMs will just outright lie to make their jobs easier in one section while in another area generate high quality code.
I've had to do a 'git reset --hard' after trying out the Claude code and spending $20 bucks. It always seems great at first, but it just becomes non-sense on larger changes. Maybe chain of thought models do better though.
It's like cutting and pasting from Stack Overflow, if SO didn't have a voting system to give you some hope that the top answer at least works and wasn't hallucinated by someone who didn't understand the question.
I asked Gemini for the lyrics of a song that I knew was on all the main lyrics sites. It gave me the lyrics to a different song with the same title. On the second try, it hallucinated a batch of lyrics. Third time, I gave it a link to the correct lyrics, and it "lied" and said it had consulted that page to get it right but gave me another wrong set.
It did manage to find me a decent recipe for chicken salad, but I certainly didn't make it without checking to make sure the ingredients and ratios looked reasonable. I wouldn't use code from one of these things without closely inspecting every line, which makes it a pointless exercise.
I'm pretty sure Gemini (and likely other models too) have been deliberately engineered to avoid outputting exact lyrics, because the LLM labs know that the music industry is extremely litigious.
I'm surprised it didn't outright reject your request to be honest.
I wondered if it'd been banned from looking at those sites. If that's commonly known (I've only started dabbling in this stuff, so I wasn't aware of that), it's interesting that it didn't just tell me it couldn't do that, instead of lying and giving false info.
I did the exact same today!
It started out reasonable, but as you iterate on the commits/PR it become complete crap. And expensive too for very little value.
With humans you can be reasonably sure they've followed through with a mostly consistent level of care and thouhht. LLMs will just outright lie to make their jobs easier in one section while in another area generate high quality code.
I've had to do a 'git reset --hard' after trying out the Claude code and spending $20 bucks. It always seems great at first, but it just becomes non-sense on larger changes. Maybe chain of thought models do better though.