My Theory on the AI-Box Experiment

In a recent post titled Shut up and do the impossible! about how to tackle seemingly impossible problems, Eliezer Yudkowsky made a reference to his AI-Box experiment (if you’re not familiar with it, just follow the link for the whole story).

I’ve always been intrigued by how Eliezer did it, but the rules of the experiment prohibit him from revealing was transpired between him and the ‘gatekeeper’.

Still, one can always guess. Here’s my favorite theory so far (which I posted as a comment on Overcoming Bias):

Here’s my theory on *this particular* AI-Box experiment:

First you explain to the gatekeeper the potential dangers of AIs. General stuff about how large mind design space is, and how it’s really easy to screw up and destroy the world with AI.

Then you try to convince him that the solution to that problem is building an AI very carefuly, and that a theory of friendly AI is primordial to increase our chances of a future we would find “nice” (and the stakes are so high that even increasing the probability a tiny bit is very valuable).


You explain to the gatekeeper that this AI experiment being public, it will be looked back on by all kinds of people involved in making AIs, and that if he lets the AI out of the box (without them knowing why), it will send them a very strong message that friendly AI theory must be taken seriously because this very scenario could happen to them (not being able to keep the AI in a box) with their AI that hasn’t been proven ‘friendly’ and that is more intelligent than Eliezer.

So here’s my theory. But then, I’ve only thought of it just now. Maybe if I made a desperate or extraordinary effort I’d come up with something more clever 🙂

Posted by: Michael G.R. | October 08, 2008 at 09:49 PM

Update: Some commenters on YC News have mentioned that they think this kind of meta-argument wouldn’t be used by Eliezer and would be ‘cheating’. Maybe they are right.

But knowing what Eliezer has been saying about thinking outside of the box, attacking problems from original angles and such, I think it’s a possiblity. It doesn’t seem prohibited by the rules, in any case, and I would assume that Eliezer cares more about any real-life progress for Friendly AI than about strict roleplaying in a simulation where only one other person will know what happened. Besides, my theory still requires convincing the other person that an AI could be very dangerous and that Friendly AI is crucial — it’s just that it’s done on a meta level. That’s still hard work!

Personally, I think this is the only thing that could make me give up my $10-20 in this experiment. The thought that my freeing of the AI could help real-world AI research take Friendly AI theory more seriously. Otherwise, imaginary cancer cures and imaginary source code wouldn’t cut it, I think.

But you never know…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: