Hacker News new | past | comments | ask | show | jobs | submit login

I think this is mostly the fault of RLHF over-indexing on pleasing the user rather than being right.

You can system prompt them to mitigate this to some degree. Explicitly tell it that it is the coding expert and to push back if it thinks the user is wrong or the task is flawed, it is better to be unsure than to bullshit, etc.




This is surprisingly hard to mitigate with system prompts because not being opinionated is ingrained so deeply in (presumably) post-training




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: