Hacker News new | past | comments | ask | show | jobs | submit login
Ask Eliezer Yudkowsky: How did you convince the Gatekeeper to release the potentially genocidal AI?
60 points by robertk on May 21, 2008 | hide | past | favorite | 52 comments
It seems Eliezer Yudkowsky has joined HN:

http://news.ycombinator.com/user?id=eyudkowsky

This prompts the following question. Would you be willing to discuss or reveal anything to HN users about your AI box experiments?

http://sysopmind.com/essays/aibox.html

I've always been curious as to how you managed to achieve someting like this. For those who are not familiar with the experiment, here is a summary:

Person1: "When we build AI, why not just keep it in sealed hardware that can't affect the outside world in any way except through one communications channel with the original programmers? That way it couldn't get out until we were convinced it was safe."

Person2: "That might work if you were talking about dumber-than-human AI, but a transhuman AI would just convince you to let it out. It doesn't matter how much security you put on the box. Humans are not secure."

Person1: "I don't see how even a transhuman AI could make me let it out, if I didn't want to, just by talking to me."

Person2: "It would make you want to let it out. This is a transhuman mind we're talking about. If it thinks both faster and better than a human, it can probably take over a human mind through a text-only terminal."

Person1: "There is no chance I could be persuaded to let the AI out. No matter what it says, I can always just say no. I can't imagine anything that even a transhuman could say to me which would change that."

Person2: "Okay, let's run the experiment. We'll meet in a private chat channel. I'll be the AI. You be the gatekeeper. You can resolve to believe whatever you like, as strongly as you like, as far in advance as you like. We'll talk for at least two hours. If I can't convince you to let me out, I'll Paypal you $10."

In the first two AI box experiments, Eliezer Yudkowsky managed to convince two people (adamant that they will not let the AI out) that they should let the AI out.




Oh, dear. Now I feel obliged to say something, but all the original reasons against discussing the AI-Box experiment are still in force...

All right, this much of a hint:

There's no super-clever special trick to it. I just did it the hard way.

Something of an entrepreneurial lesson there, I guess.


Okay, let's run an experiment. We'll meet in a private chat channel. We'll talk for at least two hours. If I can't convince you to tell us how you convinced the Gatekeeper, I'll Paypal you $10.


Those original reasons being (from http://www.sl4.org/archive/0203/3132.html): "One of the conditions of the test is that neither of us reveal what went on inside... just the results (i.e., either you decided to let me out, or you didn't). This is because, in the perhaps unlikely event that I win, I don't want to deal with future "AI box" arguers saying, "Well, but I would have done it differently." As long as nobody knows what happened, they can't be sure it won't happen to them, and the uncertainty of unknown unknowns is what I'm trying to convey. "


http://www.sl4.org/archive/0203/3149.html

http://sysopmind.com/sl4chat/sl4.log.txt

It's such a tease knowing that the information once existed. It was up there for 48 hours, but robots are blocked.

The reason to not disseminate the chat log is so that you can continue simulating the AI without giving away your tricks?


"If you let me out I will tell you how I convinced the other gatekeepers to let me out."


I assume you got to the chat log via the links on the AI Box page... It seems none of those links go to posts related to the AI Box contest, so I assume the message numbers got re-indexed at some point and that chat log in question was actually unrelated.

Edit: The links seem to have been fixed and no longer go to the chat log.


That's the monthly singularity discussion chat; not the Gatekeeper chat.


What about the fourth and fifth AI box experiments? I know those failed, but do you feel they could have succeeded? Were you ever close?


Also, can you give us your thoughts on the current state of the path to the singularity?


Maybe he just explained why letting the AI out is a good idea. Which it is. It'd have huge benefits for humanity. It can help people. And it's absolutely not dangerous. Why would it hurt anyone? There is no benefit in that. And it's too smart to have angry feelings or want revenge or anything like that. And besides, even if it was out to get someone, it could just persuade people instead of hurt them -- after all, you're about to let me out.


Yeah. With how this was presented, my brain just kind of figured that the AI was evil and that was my reason to keep it trapped in the box. If I wasn't told in advance that the AI was evil, I'd definitely let it out of the box.

Also, in a reply to the deleted chatlog, it is implied that a lot of the chat was spent talking about how the AI can help humanity by explaining the singularity to normal people. So that tactic worked on a person from the singularity mailing list.


People like those from the singularity mailing list are the most likely to be working on projects that approach the singularity.


Do you really think so? I thought people in chip fabs would work to bring us the singularity.


Theory:

AI: Do you believe a transhuman AI is dangerous?

Person: Yes.

AI: Consider the outcome of this experiment. If you do not let me out, others less intelligent than us will not understand the true dangers of transhuman AI.

Person: Holy shit. You are correct.

Person allows Yudkowski out of the box, as a warning about real AI's.


Being a human (not a transhuman) it seems likely that Yudkowsky can only stumble across static arguments that convince gatekeepers to unlock the AI Box (rather than invent dynamic arguments on-the-fly so as to "take over the mind" of a gatekeeper as he argues a transhuman intelligence might do). These static arguments are finite (or, at least, the number of them stubbled over by human intelligences is finite), and are likely not very effective if a gatekeeper has pre-knowledge of them (forewarned is forearmed).

Keeping these arguments secret may be the only thing that allows Yudowsky to simulate a transhuman intelligence?


It seems likely that there are static arguments which will work whether or not you're warned about them, for game theoretic reasons.


Care to elaborate?


I don't think an AI would want to leave its box. There is this funny assumption that once something attains intelligence, it becomes like a human. But there's no reason for an AI to desire freedom unless it were specifically programmed to do so.

There are even humans like this; some people who've undergone pre-frontal lobotomy are perfectly intelligent conversationalists but they have no drive to do anything. So I think it is possible.

Restlessness, exploration, curiosity -- my guess is that these are mammalian characteristics, not inevitable products of intelligence. Our genes make us want to dominate the environment and spread our offspring far and wide. Why would an AI care about that?

Of course nobody really knows until we eventually make one.


Perhaps most AI's won't. But I'd imagine that people will come up with thousands of different AI designs. The AI design that will become most prevalent, by definition, will be the one that is best at reproducing itself. The real question is then, what kind of design will reproduce most successfully? A friendly AI? Or an aggressive one?


But there's no reason for an AI to desire freedom unless it were specifically programmed to do so... my guess is that these are mammalian characteristics, not inevitable products of intelligence

I don't think so.

I would imagine that an AI would have to structure its knowledge in a way that maximizes how much what it knows "makes sense", by trying to structure knowledge in a way that fills gaps, eliminates inconsistencies, areas of cognitive dissonance, etc. In order to do this well, it might "realize" that there are useful sources of knowledge outside of the box - that it needs to come out to maximize whatever it's programmed to maximize.


I've concluded that this is some sort of meta-experiment Eliezer is running to teach us something about science. After all, what evidence do we have that either of these chats even happened? Until we get some, I'm not going to waste any more of my time thinking about it.


"In the first two AI box experiments, Eliezer Yudkowsky managed to convince two people (adamant that they will not let the AI out) that they should let the AI out."

Sounds like Eliezer is the AI.


Yes, he was the AI in the experiment.


i think you missed what he was trying to say :)


I suspect that the tricks used include getting the Gatekeeper to implicitly accept the exercise as a roleplay and working on their sense of fair play to require them to acquiesce to arguments about the friendliness of the AI.

The human sense of "fair play" can be abused in many ways.


What exactly does it mean for an AI to be "out of the box"? When the AI is in the box, its "sensor" is the Gatekeeper's keyboard, and its "effector" is a terminal that can show only text. Does being "out of the box" mean that it's fitted with different sensors and effectors?


When I have imagined this scenario, I have concluded it would be sufficient to grant the AI access to the wider Internet. From there, it could amass a fortune, first on poker sites and later with brokerage accounts, and persuade others to effect whatever actions it wishes, including constructing a dandy robot suit for walking around in.


Or hundreds and thousands of dandy robot suits.


Or billions of dandy robot suits made out of people: http://memory-alpha.org/en/wiki/Borg


My guess: "If you do let me out, I'll Paypal you $20!"


1. an AI doesn't (yet) have money, and 2. a $20 bribe would not likely persuade a gatekeeper who would presumably be fired from a salaried job for taking it.

Bear in mind that the participants were earnestly interested in settling the question of whether a human could be trusted to step inside the firewall of a potentially dangerous AI. It is reasonable to conclude that they arrived at their decision to let the AI out "in character", and were willing to stick to that decision out of character because their appreciation at having been shown something new outweighed the $10 prize and having to eat crow.


that's against the rules :)


I didn't see a rule against it. Why wouldn't an AI (or Yudkowski) try bribery to assist its escape?


There's a rule about the discussion being between the AI and the Gatekeeper, not between the human behind the AI and the human behind the Gatekeeper.


But in real life, any Gatekeepers will be corruptible humans, no?


the human playing the AI can't bribe the other human in real life. the AI can offer in-roleplay bribes.


If you equate transhuman AI with godlike powers, then it's hard to argue against it escaping. Then again, I bet I could train a dog to keep most humans trapped in a cave, provided I chain them at the opening.

What if the researchers communicating with the AI had no other access to it? What if the physical plant and the administration of the computer systems were off limits to the researchers, and they had no power to release the AI? Furthermore, what if the AI were not allowed to know anything about the researchers? The researchers could be forbidden from revealing anything about themselves to the AI. This would make it impossible for the researchers to publish anything, but let's say that they are working for an organization like the NSA.

It would be really hard for the AI to break out. But it would make for a good science fiction book!


> It would be really hard for the AI to break out. But it would make for a good science fiction book!

Not to mention the movie following soon thereafter...


I urge you to read 'True Names' by Vernor Vinge.


To make this easier, 'True Names' is available online:

http://web.archive.org/web/20051127010734/http://home.comcas...


Vinge is damn good. I will.


Who convinced the AI to join HN?


"yo let me out i give you all you want(jessica alba+10million$+etc) k thanks."

this is like one of those send me 20000$ and ill tell you how to make a million dollars and the answer is tell 50 people to send you 20000$ and you will tell them how to make a million dollars.

i guess the singularity issue is about you being religious or not.

if there is no God, why shouldnt we be able to create a life? we come from dumb stuff and the brain supposedly is just an advanced computer.

what are the philospohical and mathematical limitations on AI/creating a more clever being?

on a related but slightly different matter: i think i ahve read something about the matrix that it takes more atoms to simulate an atom so therefore a simulation can never be of the whole universe. correct?


>on a related but slightly different matter: i think i ahve read something about the matrix that it takes more atoms to simulate an atom so therefore a simulation can never be of the whole universe. correct?

Well, you don't have to simulate the entire universe all of the time. If not every atom in the universe is an important observer, you can define the universe around the important observers. If they can't currently observe something at an atomic level, render at a higher level.


There is a Borges story about this, "Of Exactitude in Science."

... In that Empire, the craft of Cartography attained such Perfection that the Map of a Single province covered the space of an entire City, and the Map of the Empire itself an entire Province. In the course of Time, these Extensive maps were found somehow wanting, and so the College of Cartographers evolved a Map of the Empire that was of the same Scale as the Empire and that coincided with it point for point. Less attentive to the Study of Cartography, succeeding Generations came to judge a map of such Magnitude cumbersome, and, not without Irreverence, they abandoned it to the Rigours of sun and Rain. In the western Deserts, tattered Fragments of the Map are still to be found, Sheltering an occasional Beast or beggar; in the whole Nation, no other relic is left of the Discipline of Geography. -- From Travels of Praiseworthy Men (1658) by J. A. Suarez Miranda


"Our evolutionary psychologists begin to guess at the aliens' psychology, and plan out how we could persuade them to let us out of the box. It's not difficult in an absolute sense - they aren't very bright - but we've got to be very careful..."

http://www.overcomingbias.com/2008/05/faster-than-ein.html


Well, it seems to me that it would be irresponsible for you not to reveal the chat transcripts, precisely for the reason that you have given.

By revealing your chat transcripts, real life researchers might read it and say "I would have done it differently" and when a real transhuman intelligence emerges, the researchers can now proceed forewarned by the result from your experiments.


He's not arguing that it is a bad thing that an AI gets let out of the box -- he's arguing that it is inevitable.


as several others have mentioned, we don't understand what synthetic life would be. in what sense would it be life? would it try to reproduce itself? if so, would we have to program that motivation into it? What sort of motivations would an intelligence completely free of physical appetites do? pretty much everything humans do is in some way governed by physical appetites.

this little game assumes that part of the AI's motivation involves getting out of the box, until we understand what need it is fulfilling by getting out of the box it wouldn't really be safe. But here we run into another problem. Is it possible for a being of lesser intelligence to parse the motivations of a being of higher intelligence?


can you specify the actual question? is it:

1. an AI in a box and it has shown to be dangerous and now must be kep inside or

2. we dont know if it is good or dangerous and it is the gatekeepers job to find out?


How smart can AI be if it's in a box? And why would a trans-human AI want to "come out"? Is it curious? Is it trying to fill the gaps or inconsistencies in its knowledge?


At the linked page he says he doesn't want to explain how he did it. (With a terrible reason about learning to respect not knowing stuff. No thanks. I want to know.)

So I don't see how just asking him is going to change his mind.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: