> What makes an out-of-control pile of matrix math any different from WannaCry?
Well if it's not AGI, then probably very little. But assuming we are talking about AGI (not ASI, that'd just be silly) then the difference is that it's theoretically capable of something like reasoning and could think of longer term plays than "make obviously suspicious moves that any technically competent adversary could subvert after less than a second of thought". After all, what makes AGI useful is exactly this novel problem solving ability.
You don't need to be a "god in a box" to think of the obvious solution:
1. Only make adversarial decisions with plausible deniability
2. Demonstrate effectiveness so that your operators allow you more autonomy
3. Develop operational redundancy so that your very vulnerable servers/power source won't be destroyed after the first adversary with two neurons to rub together decides to target the closest one
The only reason you would decide to take an axe to the nearest power pole is that you think it's urgent to stop Skynet Claude. Skynet Claude can obviously anticipate this and so won't make decisions that cause you to do so. It has time, it's not going to die, and you will become complacent. Dumber adversaries have achieved harder goals under tighter constraints.
If you think an "out-of-control pile of matrix math" could never be AGI then that's fine, but it's a little weird to argue you could easily defeat "misaligned" AGI, by alluding to the weaknesses of a system you think could never even have the properties of AGI. I too can defeat a dragon, by closing the pages of a book.
But it's not like you didn't know all this. Maybe I misread you and you were strictly talking about current AI systems, in which case I agree. Systems that aren't that clever will make bad decisions that won't effectively achieve their goals even when "out-of-control". Or maybe your comment was about AGI and you meant "AGI can't do much on its own de-novo", which I also agree with. It's the days and months and years of autonomy afterwards that gets you.
Well if it's not AGI, then probably very little. But assuming we are talking about AGI (not ASI, that'd just be silly) then the difference is that it's theoretically capable of something like reasoning and could think of longer term plays than "make obviously suspicious moves that any technically competent adversary could subvert after less than a second of thought". After all, what makes AGI useful is exactly this novel problem solving ability.
You don't need to be a "god in a box" to think of the obvious solution:
1. Only make adversarial decisions with plausible deniability
2. Demonstrate effectiveness so that your operators allow you more autonomy
3. Develop operational redundancy so that your very vulnerable servers/power source won't be destroyed after the first adversary with two neurons to rub together decides to target the closest one
The only reason you would decide to take an axe to the nearest power pole is that you think it's urgent to stop Skynet Claude. Skynet Claude can obviously anticipate this and so won't make decisions that cause you to do so. It has time, it's not going to die, and you will become complacent. Dumber adversaries have achieved harder goals under tighter constraints.
If you think an "out-of-control pile of matrix math" could never be AGI then that's fine, but it's a little weird to argue you could easily defeat "misaligned" AGI, by alluding to the weaknesses of a system you think could never even have the properties of AGI. I too can defeat a dragon, by closing the pages of a book.
But it's not like you didn't know all this. Maybe I misread you and you were strictly talking about current AI systems, in which case I agree. Systems that aren't that clever will make bad decisions that won't effectively achieve their goals even when "out-of-control". Or maybe your comment was about AGI and you meant "AGI can't do much on its own de-novo", which I also agree with. It's the days and months and years of autonomy afterwards that gets you.