...and many would say that’s because us humans are bad at imagining optimizing agents without anthropomorphizing them. This is a reasonable, even typical suspicion that many people share! The best explanation I know of why it’s unfortunately wrong is by Robert Miles in a video, but if you prefer a more thorough treatment, you could also read about “instrumental convergence” directly. If you find a flaw in this idea, I’d be interested to hear about it! :)
Sorry, just saw this. I think it’s his assumption that an AGI will act strictly as an agent that’s flawed. It requires imagining an agent that can make inferences from context, evaluate new and unfamiliar information, form original plans, execute them with all the complexity implied by interaction with the real world, reprogram itself, essentially do anything... except evaluate its own terminal goal. That’s written in stone, gotta make more paperclips. The argument assumes almost unlimited power and potential on the one hand, and bizarre, arbitrary constraints on the other.
If you assume an AGI is incapable of asking “why” about its terminal goal, you have to assume it’s incapable of asking “why” in any context. Miles’ AGI has no power of metacognition, but is still somehow able to reprogram itself. This really isn’t compatible with “general intelligence” or the powers that get ascribed to imaginary AGIs.
I’m certainly no expert, but I expect there will turn out to be something like the idea of Turing-completeness for AI. Just like any general computing machine is a computer, any true AGI will be sapient. You can’t just arbitrarily pluck a part out, like “it can’t reason about its objective”, and expect it to still function as an AGI, just like you can’t say “it’s Turing complete, except it can’t do any kind of conditional branching.” EDIT better example: “it’s Turing complete, but it can’t do bubble sort.”
This intuition may be wrong, but it’s just as much as assumption as Miles’ argument.
I’m also not ascribing morality to it: we have our share of psychopaths, and intelligence doesn’t imply empathy. AGI may very well be dangerous, just probably not the “mindlessly make paperclips” kind.
Robert Miles’ video: https://youtu.be/ZeecOKBus3Q
Instrumental Convergence: https://arbital.com/p/instrumental_convergence/
Now afaik nothing in this argument says that we can’t find a way to control this in a more complex formalism-but we clearly haven’t done so yet.