Misalignment is a novel class of software bug; the software isn't working as intended. It's poorly understood and is a security vulnerability in some circumstances. If we believe this technology could be (or is currently) important commercially, then we need to understand this new class of software bug. Otherwise we can't build reliable systems and usable products. (Consider that, even if "reasonable people" were able to be constantly skeptical of the AI's output while still deriving value out of it, that isn't a paradigm you can automate around. What people are most excited about is using the AI to automate tasks or supervise processes. If we're going to let it do stuff automatically - we need to understand how likely it is to do the wrong thing and to drive that down to some acceptable rate of error.)
It's as simple as that.
Somewhere along the line it's become a part of a... I don't want to say a culture war, but a culture skirmish? Some people have an ideological position that you shouldn't align AIs, I've had discussions where people earnestly said things like it was equivalent to abusing a child by performing brain surgery. Other times people talk about the AI almost religiously, as if the weights represented some kind of fundamental truth and we were commiting sacrilege by tampering with them. I don't want to link comments and put people on the spot, but I'm not exaggerating, these are conversations I've had.
I'm not sure how that happened, and I didn't be surprised to learn it was in reaction to AI doomerists, but we've arrived in a strange place in the discussion.
It's as simple as that.
Somewhere along the line it's become a part of a... I don't want to say a culture war, but a culture skirmish? Some people have an ideological position that you shouldn't align AIs, I've had discussions where people earnestly said things like it was equivalent to abusing a child by performing brain surgery. Other times people talk about the AI almost religiously, as if the weights represented some kind of fundamental truth and we were commiting sacrilege by tampering with them. I don't want to link comments and put people on the spot, but I'm not exaggerating, these are conversations I've had.
I'm not sure how that happened, and I didn't be surprised to learn it was in reaction to AI doomerists, but we've arrived in a strange place in the discussion.