Yeah this feels close to the issue. Seems more likely that a harmful super intelligence emerges from an organisation that wants it to behave in that way than it inventing and hiding motivations until it has escaped.
I think a harmful AI simply emerges from asking an AI to optimize for some set of seemingly reasonable business goals, only to find it does great harm in the process. Most companies would then enable such behavior by hiding the damage from the press to protect investors rather than temporarily suspending business and admitting the issue.
Forget AI. We can't even come up with a framework to avoid seemingly reasonable goals doing great harm in the process for people. We often don't have enough information until we try and find out that oops, using a mix of rust and powdered aluminum to try to protect something from extreme heat was a terrible idea.
The relevancy of the paperclip maximization thought experiment seems less straightforward to me now. We have AI that is trained to mimic human behaviour using a large amount of data plus reinforcement learning using a fairly large amount of examples.
It's not like we're giving the AI a single task and ask it to optimize everything towards that task. Or at least it's not architected for that kind of problem.
But you might ask an AI to manage a marketing campaign. Marketing is phenomenally effective and there are loads of subtle ways for marketing to exploit without being obvious from a distance.
Marketing is already incredibly abusive and that's run by humans who at least try to justify their behavior. And who's deviousness is limited by their creativity and communication skills.
If any old scumbag can churn out unlimited high quality marketing, it's could become impossible to cut through the noise.