I think "unlearning" is not the actual goal; we don't want the model to stick it...

krono · on May 5, 2024

Say you tell me you want a red sphere. Taken at face value, you show a prejudice for red sphere's and discriminate against all other coloured shapes.

We've all had to dance that dance with ChatGPT by now, where you ask for something perfectly ordinary, but receive a response telling you off for even daring to think like that, until eventually you manage to formulate the prompt in a way that it likes with just the right context and winner vocabulary + grammar, and finally the damned thing gives you the info you want without so much as any gaslighting or snarky insults hiding in the answer!

It doesn't understand racism, it simply evaluates certain combinations of things according to how it was set up to do.