Hacker News new | past | comments | ask | show | jobs | submit login

> provably dangerous things

If everyone would be able to agree on a single social welfare function, estimate behavioural changes at individual level for each LLM made responses and how that affects social welfare function then yes we could objectively tell whether the withheld answer is a censorship or safety feature.




that is a very interesting point! we would get along, lol




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: