> So, AIs are overeager junior developers at best, and not the magical programme...

troupo · 2025-04-19T21:17:12 1745097432

[flagged]

drodgers · 2025-04-19T21:57:45 1745099865

Don’t be a dismissive dick; that’s not appropriate for this forum. The above post is clearly trying to engage thoughtfully and offers genuinely good advice.

troupo · 2025-04-20T09:38:04 1745141884

The above post produces some vague philosophical statements, and equally vague "juts google it" claims.

xpe · 2025-04-21T13:16:30 1745241390

I’m thinking you might be a kind of person that requires very direct feedback. Your flagged comment was unkind and unhelpful. Your follow-up response seems to suggest that you were justified in being rude?

You also mischaracterize my comment two levels up. It didn’t wave you away by saying “just google it”. It said — perhaps not directly enough — that your comment was off track and gave you some ideas to consider and directions to explore.

troupo · 2025-04-22T12:32:52 1745325172

> There is high quality writing on this to be found. I recommend reading literature reviews on evals.

This is, quite literally, "just google it".

And yes, I prefer direct feedback, not vague philosophical and pseudo-philosophical statements and vague references. I'm sure there's high quality writing to be found on this, too.

xpe · 2025-04-22T14:03:02 1745330582

We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.

> not vague philosophical and pseudo-philosophical statements and vague references

If you stop being so uncharitable, more people might be inclined to engage you. Try to interpret what I wrote as constructive criticism.

Shall we get back to the object level? You wrote:

> AIs are overeager junior developers at best

Again, I'm saying this isn't a good framing. I'm asking you to consider you might be wrong. You don't need to hunker down. You don't need to counter-attack. Instead, you could do more reading and research.

troupo · 2025-04-22T14:38:35 1745332715

> We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.

Aka "I will make some vague references to some literature, go Google it"

> Instead, you could do more reading and research.

Instead of vague "just google it", and vague ad hominems you could actually provide constructive feedback.

xpe · 2025-04-25T15:52:26 1745596346

Want to try a conversational reset? I'll start.

My disagreement with the claim "AIs are overeager junior developers at best" largely has to do with both understanding what is happening under the hood and well as personal experience. Like many people, I have interacted for thousands of hours with ChatGPT, Claude, Gemini, and others, though my interaction patterns may be unusual -- not sure -- which I would characterize as (a) set expectations with a detailed prelude; (b) frame problems carefully; (c) trust nothing; (d) pushback relentlessly; (e) require 'thinking out loud'; (f) resist bundled solutions; (g) actively guide design and problem-solving dialogues; (h) actively mitigate sycophancy, overconfidence, and hallucination.

I've guided some junior / less experienced developers using many of the same patterns above. More or less, they can be summarized as "be more methodical". While I've found considerable variation in the quality of responses from LLMs, I would not characterize this variation as being anywhere close to that of a junior developer. I grant adjusting my interaction patterns considerably to improve the quality of the experience.

LLMs vary across dimensions of intelligence and capability. Here's my current assessment -- somewhat off the cuff, but I have put thought into it -- (1) LLM recall is superhuman. (2) Contextual awareness is mixed, sometimes unpredictably bad. Getting sufficient context is hard, but IMO this is less of a failure of the LLM or RAG and more about its lack of embodiment in a particular work setting. (3) Speed is generally superhuman. (4) Synthesis is often superhuman. (5) Ready-to-go high-quality all-in-one software solutions are not there yet. (6) Failure modes are painful; e.g. going in circles or waffling.

I should also ask what you mean by "overeager"? I would guess you are referring to the tendency of many LLMs to offer solutions problems despite lacking a way to validate their answers, perhaps even hallucinating API calls that don't exist?