Kinda feels like the early internet when SQL injection vulns were very common. W...

tomviner · on Feb 28, 2023

There's no generic solution as yet. Bing's Sydney was instructed its rules were "confidential and permanent", yet it divulged and broke them with only a little misdirection.

Is this just the first taste of AI alignment being proved to be necessarily a fundamentally hard problem?

int_19h · on Feb 28, 2023

It's not clear whether a generic solution is even possible.

In a sense, this is the same problem as, "how do I trust a person to not screw up and do something against instructions?" And the answer is, you can minimize the probability of that through training, but it never becomes so unlikely as to disregard it. Which is why we have things like hardwired fail-safes in heavy machinery etc.

og_kalu · on Feb 28, 2023

When you get down to it, it's bizarre that people even think it's a solvable problem. We don't understand what GPT does when you make an inference. We don't know what it learns during training. We don't know what it does to input to produce output.

The idea of making inviolable rules for system you fundamentally don't understand is ridiculous. Nevermind the whole, this agent is very intelligent problem too. We'll be able to align ai at best about as successfully as we align people. Your instructions will serve to guide it rather than any unbreakable set of axioms.

13years · on Feb 28, 2023

I think it will shortly move from hard to impossible. Not before we pour billions of dollars into it though.

I can not conceive how it will ever be solvable for the bias paradox and the intelligence paradox. I've written about both of these in the following:

https://dakara.substack.com/p/ai-the-bias-paradox

https://dakara.substack.com/p/ai-singularity-the-hubris-trap