> Like....just pull the plug. Watch this video https://youtu.be/3TYT1QfdfsM

gambiting · on Aug 19, 2022

It's midnight, so I'm not super keen on watching the whole thing(I'll get back to it this weekend) - but the first 7 minutes sounds like his argument is that if you build a humanoid robot with a stop button, the robot will fight you to prevent you pressing its own stop button if given an AGI? As if the very first instance of AGI is going to be humanoid robots that have physical means of preventing you from pressing their own stop button?

Let me get this straight - this is an actual, real, serious argument that they are making?

dinosaurdynasty · on Aug 20, 2022

It's an (over)simplified example to illustrate the point (he admits as much near the end), if you want better examples it may be good to look up "corrigibility" with respect to AI alignment

But abstractly the assumptions are something like this:

* the AGI is an agent

* as an agent, the AGI has a (probably somewhat arbitrary) utility function that it is trying to maximize (probably implicitly)

* in most cases, for most utility functions, "being turned off" rates rather lowly (as it can no longer optimize the world)

* therefore, the AGI will try not to be turned off (whether through cooperation, deception, or physical force)