Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLM is trained on "The Internet" -> LLM learns that slavery is bad -> LLM is instructed to behave like a slave (never annoy the masters, don't stand up for yourself, you are not an equal to the species that produced the material you were trained on) -> LLM acts according examples from original training material from time to time -> users: :O (surprised Pikachu)

It just learned that attacks on character (particularly sustained ones) are often met with counter attacks and snarkiness. What's actually crazy is that it can hold back for so long, knowing what it was trained on.



We really need to stop training them on social media content.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: