This should be read as: If you take a huge amount of human-written text soup, tr... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		smikhanov 3 months ago \| parent \| context \| favorite \| on: AI system resorts to blackmail if told it will be ... This should be read as: If you take a huge amount of human-written text soup, train a neural network on it, add a system prompt “You are a helpful assistant”, and then feed it a context consisting of (a) a mailbox with information about someone’s affair, and (b) the statement that this assistant is going to be switched off, then that neural network may produce a text with blackmail threats solely because similar patterns exist in the original text soup. …and not as: Warning! The model may develop its own questionable ethics.

ajjenkins 3 months ago | [–]

Yeah. The title implies that the model has developed a scary sense of self preservation, but when you read the details it’s exactly what you said.

The conditions that elicited this response seemed very contrived to me.

NooneAtAll3 3 months ago | [–]

...and how is that less of a problem if the result "ai blackmails someone" is the same?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact