Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think "unlearning" is not the actual goal; we don't want the model to stick its proverbial head in the sand. Being unaware of racism is different from not producing racist content (and, in fact, one could argue that it is necessary to know about racism if one wishes to inhibit producing racist content; I remember in elementary school certain kids thought it would be funny to teach one of the special-ed kids to parrot offensive sentences).


Say you tell me you want a red sphere. Taken at face value, you show a prejudice for red sphere's and discriminate against all other coloured shapes.

We've all had to dance that dance with ChatGPT by now, where you ask for something perfectly ordinary, but receive a response telling you off for even daring to think like that, until eventually you manage to formulate the prompt in a way that it likes with just the right context and winner vocabulary + grammar, and finally the damned thing gives you the info you want without so much as any gaslighting or snarky insults hiding in the answer!

It doesn't understand racism, it simply evaluates certain combinations of things according to how it was set up to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: