Not directly related to the SSL part, but I have a question about how the work o...

drdeca · on March 12, 2021

So it can handle the strategy of making sure no one else makes one which is more harmful. :P

But as a serious answer: of course, there is no major economic use of trying to make sure it has an internal subjective experience, but that’s not what people are aiming for when referring to AGI. The goal is that it be able to accomplish general tasks and goals. Like, what can you do with it? Everything you can do at all. That’s what people are aiming at. What tasks of reasoning and planning could a person do for you? An AGI would, basically by definition, be able to do those same kinds of things. (Of course, something could be an AGI but not as intelligent as a typical human, so long as it could still reason about all the same kinds of things. But at that point it would presumably just be a question of scaling things up.)

TheOtherHobbes · on March 12, 2021

AGI definitions are incredibly fuzzy and agency reliably seems to be confused with intelligence.

The fundamental problem is whether AGI is outer-directed or inner-directed - i.e. whether it sets its own goals, or whether you tell it what to do and it improvises a solution.

AGI is most useful when it's outer-directed with limited agency but some improvisational autonomy. You can give that kind of AGI specific problems and it will solve them in useful but unexpected ways. Then it will stop.

AGI is most dangerous when it's inner-directed with full independent agency. Not only will humans have no control over it, there's a good chance humans won't even understand what it's aiming for.

Agency is almost entirely unrelated to symbolic intelligence. You can have agency with very limited intelligence - most animals manage this - and no agency with very high symbolic intelligence.

This is not a Boolean. But there will be a cutoff beyond which inner-directed behaviour predominates, initially driven by programmed "curiosity", leading to unpredictable consequences.

drdeca · on March 12, 2021

Can you elaborate on the outer-directed vs inner-directed distinction (or link to something else which does, if that would be more convenient)?

I'm not quite sure what you mean by "sets its own goals" (for "inner-directed"). I assume you don't mean "modifies its own goals" (as, why would that help further its current goals?), but I'm not sure what it would mean. Maybe you mean like, if it acquires goals and preferences in the way that humans do, with shifting likes and dislikes that aren't consistent across time? Yes, that would certainly be quite dangerous (unless the AGI was via like, just emulating a human's mind, which might not be so dangerous, in which case only somewhat dangerous).

I believe I see your point in the safety features in having it receive a specific short term task, achieve the task, and then stop. Once it stops, it isn't doing things anymore, and therefore isn't causing more problems. But it seems like it might be difficult to define that precisely? Like, suppose it takes some actions for a period of time, but complicated and anticipated-by-it-but-not-by-us consequences of its actions continue substantially after it has stopped?