An AI box is a hypothetical isolated computer hardware system where a possibly dangerous artificial intelligence, or AI, is kept constrained in a "virtual prison" and not allowed to manipulate events in the external world. Such a box would be restricted to minimalist communication channels. Unfortunately, even if the box is well-designed, a sufficiently intelligent AI may nevertheless be able to persuade or trick its human keepers into releasing it, or otherwise be able to "hack" its way out of the box.
Some hypothetical intelligence technologies, like "seed AI", are postulated such as to have the potential to make themselves faster and more intelligent, by modifying their source code. These improvements would make further improvements possible, which would in turn make further improvements possible, and so on, leading to a sudden intelligence explosion. Following such an intelligence explosion, an unrestricted superintelligent AI could, if its goals differed from mankind's, take actions resulting in human extinction. For example, hypothetical extremely advanced computer of this sort, given the sole purpose of solving the Riemann hypothesis, an innocuous mathematical conjecture, could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations. The purpose of an AI box would be to reduce the risk of the AI taking control of the environment away from its operators, while still allowing the AI to calculate and give its operators solutions to narrow technical problems.
Avenues of escape
Such a superintelligent AI with access to the Internet could hack into other computer systems and copy itself like a computer virus. Less obviously, even if the AI only had access to its own computer operating system, it could attempt to send hidden Morse code messages to a human sympathizer by manipulating its cooling fans. Professor Roman Yampolskiy takes inspiration from the field of computer security and proposes that a boxed AI could, like a potential virus, be run inside a "virtual machine" that limits access to its own networking and operating system hardware. An additional safeguard, completely unnecessary for potential viruses but possibly useful for a superintelligent AI, would be to place the computer in a Faraday cage; otherwise it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns. The main disadvantage of implementing physical containment is that it reduces the functionality of the AI.
Even casual conversation with the computer's operators, or with a human guard, could allow such a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it's in the gatekeeper's interest to agree to allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire; on the other side of the coin, the AI could threaten that it will do horrific things to the gatekeeper and his family once it "inevitably" escapes. One strategy to attempt to box the AI would be to allow the AI to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with or observation of the AI. A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern". Note that on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run the AI for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, the AI could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.
The AI-box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily "releasing" it, using only text-based communication. This is one of the points in Yudkowsky's work aimed at creating a friendly artificial intelligence that when "released" won't destroy the human race advertently or inadvertently.
The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the ability to "release" the AI. They communicate through a text interface/computer terminal only, and the experiment ends when either the Gatekeeper releases the AI, or the allotted time of two hours ends.
Despite being of human rather than superhuman intelligence, Yudkowsky was on two occasions able to convince the Gatekeeper, purely through argumentation, to let him out of the box. Due to the rules of the experiment, he did not reveal the transcript or his successful AI coercion tactics. Yudkowsky later said that he had tried it against three others and lost twice.
Boxing such a hypothetical AI could be supplemented with other methods of shaping the AI's capabilities, such as providing incentives to the AI, stunting the AI's growth, or implementing "tripwires" that automatically shut the AI off if a transgression attempt is somehow detected. However, the more intelligent a system grows, the more likely the system would be able to escape even the best-designed capability control methods. In order to solve the overall "control problem" for a superintelligent AI and avoid existential risk, boxing would at best be an adjunct to "motivation selection" methods that seek to ensure the superintelligent AI's goals are compatible with human survival.
All physical boxing proposals are naturally dependent on our understanding of the laws of physics; if a superintelligence could infer and somehow exploit additional physical laws that we are currently unaware of, there is no way to conceive of a foolproof plan to contain it. More broadly, unlike with conventional computer security, attempting to box a superintelligent AI would be intrinsically risky as there could be no sure knowledge that the boxing plan will work. Scientific progress on boxing would be fundamentally difficult because there would be no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists, by which point the consequences of a test failure would be catastrophic.
The 2015 movie Ex Machina features an AI with a female humanoid body engaged in a social experiment with a male human in a confined building acting as a physical "AI box". Despite being watched by the experiment's organizer, she manages to escape by manipulating her human partner to help her, leaving him stranded inside.
- Chalmers, David. "The singularity: A philosophical analysis." Journal of Consciousness Studies 17.9-10 (2010): 7-65.
- I.J. Good, "Speculations Concerning the First Ultraintelligent Machine"], Advances in Computers, vol. 6, 1965.
- Vincent C. Müller and Nick Bostrom. "Future progress in artificial intelligence: A survey of expert opinion" in Fundamental Issues of Artificial Intelligence. Springer 553-571 (2016).
- Russell, Stuart J.; Norvig, Peter (2003). "Section 26.3: The Ethics and Risks of Developing Artificial Intelligence". Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. ISBN 0137903952.
Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal.
- Yampolskiy, Roman V. "What to Do with the Singularity Paradox?" Philosophy and Theory of Artificial Intelligence 5 (2012): 397.
- Hsu, Jeremy (1 March 2012). "Control dangerous AI before it controls us, one expert says". NBC News. Retrieved 29 January 2016.
- Bostrom, Nick (2013). "Chapter 9: The Control Problem: boxing methods". Superintelligence: the coming machine intelligence revolution. Oxford: Oxford University Press. ISBN 9780199678112.
- The AI-Box Experiment by Eliezer Yudkowsky
- Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (6 June 2012). "Thinking Inside the Box: Controlling and Using an Oracle AI". Minds and Machines. 22 (4): 299–324. doi:10.1007/s11023-012-9282-2.
- Yudkowsky, Eliezer (8 October 2008). "Shut up and do the impossible!". Retrieved 11 August 2015.
There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. ... So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I called a halt to it.
- Vinge, Vernor (1993). "The coming technological singularity: How to survive in the post-human era". Vision-21: Interdisciplinary science and engineering in the era of cyberspace: 11-22.
I argue that confinement is intrinsically impractical. For the case of physical confinement: Imagine yourself confined to your house with only limited data access to the outside, to your masters. If those masters thought at a rate -- say -- one million times slower than you, there is little doubt that over a period of years (your time) you could come up with 'helpful advice' that would incidentally set you free.
- Yampolskiy, Roman (2012). "Leakproofing the Singularity Artificial Intelligence Confinement Problem". Journal of Consciousness Studies: 194-214.
- Eliezer Yudkowsky's description of his AI-box experiment, including experimental protocols and suggestions for replication
- on YouTube