Hacker News new | past | comments | ask | show | jobs | submit login

> It answers questions confidently but with subtle inaccuracies.

This is a valid challenge we are facing as well. However, remember that ChatGPT which many coders use, is likely training on interactions so you have some human reinforcement learning correcting its errors in real-time.




How is it trained on reactions? Do people give it feedback? In my experience in trying I stop asking when it provides something useful or something so bad I give up (usually the latter I'm afraid). How would it tell a successful answer from a failing one?


It appears to ask users to rate if the response is better or worse than the first, in other cases, it seems to be A/B testing the response. Lastly, I for instance, will correct it and then confirm it is correct to continue with the next task, which likely creates a footprint pattern.


That's interesting, I haven't come across this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: