Friendly artificial intelligence
||The topic of this article may not meet Wikipedia's general notability guideline. (March 2014)|
|This article relies on references to primary sources. (March 2013)|
||This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (May 2012)|
|Laws of robotics|
Tilden's Laws of Robotics
by Mark Tilden
A Friendly Artificial Intelligence or FAI is an artificial intelligence (AI) that has a positive rather than negative effect on humanity. Friendly AI also refers to the field of knowledge required to build such an AI. This term particularly applies to AIs which have the potential to significantly impact humanity, such as those with intelligence comparable to or exceeding that of humans ("superintelligence"; see strong AI and technological singularity). This specific term was coined by Eliezer Yudkowsky of the Machine Intelligence Research Institute as a technical term distinct from the everyday meaning of the word "friendly"; however, the concern is much older.
Goals and definitions of Friendly AI
Many experts have argued that AI systems with goals that are not perfectly identical to or very closely aligned with human ethics are intrinsically dangerous unless extreme measures are taken to ensure the safety of humanity. The roots of these concerns are very old: as Kevin LaGrandeur claims in his work, the dangers specific to AI can be seen in ancient literature concerning artificial humanoid servants such as the golem, or the proto-robots of Gerbert of Aurillac and Roger Bacon. In those stories, the extreme intelligence and power of these humanoid creations clash with their status as slaves (which by nature are seen as sub-human), and cause disastrous conflict. Closer to the present, Ryszard Michalski, one of the pioneers of Machine Learning, taught his Ph.D. students decades ago that any truly alien mind, to include machine minds, was unknowable and therefore dangerous to humans. More recently, Eliezer Yudkowsky has called for the creation of “Friendly AI” to mitigate the existential threat of hostile intelligences. Steve Omohundro argues that all advanced AI systems will, unless explicitly counteracted, exhibit a number of basic drives/tendencies/desires because of the intrinsic nature of goal-driven systems and that these drives will, “without special precautions”, cause the AI to act in ways that range from the disobedient to the dangerously unethical. Finally, Alex Wissner-Gross argues that AIs driven to maximize their future freedom of action (or causal path entropy) might be considered friendly if their planning horizon is longer than a certain threshold, and unfriendly if their planning horizon is shorter than that threshold.
According to the proponents of Friendliness, the goals of future AIs will be more arbitrary and alien than commonly depicted in science fiction and earlier futurist speculation, in which AIs are often anthropomorphised and assumed to share universal human modes of thought. Because AI is not guaranteed to see the "obvious" aspects of morality and sensibility that most humans see so effortlessly, the theory goes, AIs with intelligences or at least physical capabilities greater than our own may concern themselves with endeavours that humans would see as pointless or even laughably bizarre. One example Yudkowsky provides is that of an AI initially designed to solve the Riemann hypothesis, which, upon being upgraded or upgrading itself with superhuman intelligence, tries to develop molecular nanotechnology because it wants to convert all matter in the Solar System into computing material to solve the problem, killing the humans who asked the question. For humans, this would seem ridiculously absurd, but as Friendliness theory stresses, this is only because we evolved to have certain instinctive sensibilities which an artificial intelligence, not sharing our evolutionary history, may not necessarily comprehend unless we design it to.
Friendliness proponents stress less the danger of superhuman AIs that actively seek to harm humans, but more of AIs that are disastrously indifferent to them. Superintelligent AIs may be harmful to humans if steps are not taken to specifically design them to be benevolent. Doing so effectively is the primary goal of Friendly AI. Designing an AI, whether deliberately or semi-deliberately, without such "Friendliness safeguards", would therefore be seen as highly immoral, especially if the AI could engage in recursive self-improvement, potentially leading to a significant power concentration.
This belief that human goals are so arbitrary derives heavily from modern advances in evolutionary psychology. Friendliness theory claims that most AI speculation is clouded by analogies between AIs and humans, and assumptions that all possible minds must exhibit characteristics that are actually psychological adaptations that exist in humans (and other animals) only because they were once beneficial and perpetuated by natural selection. This idea is expanded on greatly in section two of Yudkowsky's Creating Friendly AI, "Beyond anthropomorphism".
Many supporters of FAI speculate that an AI able to reprogram and improve itself, Seed AI, is likely to create a huge power disparity between itself and statically intelligent human minds; that its ability to enhance itself would very quickly outpace the human ability to exercise any meaningful control over it. While many doubt such scenarios are likely, if they were to occur, it would be important for AI to act benevolently towards humans. As Oxford philosopher Nick Bostrom puts it:
- "Basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has. Therefore, it is extremely important that the goals we endow it with, and its entire motivation system, is 'human friendly.'"
It is important to stress that Yudkowsky's Friendliness Theory is very different from ideas relating to the concept that AIs may be made safe by including specifications or strictures into their programming or hardware architecture, often exemplified by Isaac Asimov's Three Laws of Robotics, which would, in principle, force a machine to do nothing which might harm a human, or destroy it if it does attempt to do so. Friendliness Theory rather holds that the inclusion of such laws would be futile, because no matter how such laws are phrased or described, a truly intelligent machine with genuine (human-level or greater) creativity and resourcefulness could potentially design infinitely many ways of circumventing such laws, no matter how broadly or narrowly defined they were, or otherwise how categorically comprehensive they were formulated to be.
Rather, Yudkowsky's Friendliness Theory relates, through the fields of biopsychology, that if a truly intelligent mind feels motivated to carry out some function, the result of which would violate some constraint imposed against it, then given enough time and resources, it will develop methods of defeating all such constraints (as humans have done repeatedly throughout the history of technological civilization). Therefore, the appropriate response to the threat posed by such intelligence, is to attempt to ensure that such intelligent minds specifically feel motivated to not harm other intelligent minds (in any sense of the word "harm"), and to that end will deploy their resources towards devising better methods of keeping them from harm. In this scenario, an AI would be free to murder, injure, or enslave a human being, but it would strongly desire not to do so and would only do so if it judged, according to that same desire, that some vastly greater good to that human or to human beings in general would result (though this particular idea is explored in Asimov's Robot series stories, via the Zeroth Law). Therefore, an AI designed with Friendliness safeguards would do everything in its power to ensure humans do not come to "harm", and to ensure that any other AIs that are built would also want humans not to come to harm, and to ensure that any upgraded or modified AIs, whether itself or others, would also never want humans to come to harm - it would try to minimize the harm done to all intelligent minds in perpetuity. As Yudkowsky puts it:
- "Gandhi does not want to commit murder, and does not want to modify himself to commit murder."
Requirements for FAI and effective FAI
The requirements for FAI to be effective, both internally, to protect humanity against unintended consequence of the AI in question and externally to protect against other non-FAIs arising from whatever source are:
- Friendliness - that an AI feel sympathetic towards humanity and all life, and seek for their best interests
- Conservation of Friendliness - that an AI must desire to pass on its value system to all of its offspring and inculcate its values into others of its kind
- Intelligence - that an AI be smart enough to see how it might engage in altruistic behaviour to the greatest degree of equality, so that it is not kind to some but more cruel to others as a consequence, and to balance interests effectively
- Self-improvement - that an AI feel a sense of longing and striving for improvement both of itself and of all life as part of the consideration of wealth, while respecting and sympathising with the informed choices of lesser intellects not to improve themselves
- First mover advantage - the first goal-driven general self-improving AI "wins" in the memetic sense, because it is powerful enough to prevent any other AI emerging, which might compete with its own goals.
Promotion and support
Promoting Friendly AI is one of the primary goals of the Machine Intelligence Research Institute, along with obtaining funding for, and ultimately creating a seed AI program implementing the ideas of Friendliness theory.
Several notable futurists have voiced support for Friendly AI, including author and inventor Raymond Kurzweil, medical life-extension advocate Aubrey de Grey, and World Transhumanist Association co-founder (with David Pearce) Nick Bostrom.
Coherent Extrapolated Volition
Yudkowsky advances the Coherent Extrapolated Volition (CEV) model. According to him our coherent extrapolated volition is our choices and the actions we would collectively take if "we knew more, thought faster, were more the people we wished we were, and had grown up closer together."
Rather than a Friendly AI being designed directly by human programmers, it is to be designed by a seed AI programmed to first study human nature and then produce the AI which humanity would want, given sufficient time and insight to arrive at a satisfactory answer. The appeal to an objective though contingent human nature (perhaps expressed, for mathematical purposes, in the form of a utility function or other decision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to the meta-ethical problem of defining an objective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity.
Other researchers  believe, however, that the collective will of humanity will not converge to a single coherent set of goals.
One notable critic of Friendliness theory is Bill Hibbard, author of Super-Intelligent Machines, who considers the theory incomplete. Hibbard writes there should be broader political involvement in the design of AI and AI morality. He also believes that initially seed AI could only be created by powerful private sector interests (a view not shared by Yudkowsky), and that multinational corporations and the like would have no incentive to implement Friendliness theory.
In his criticism of the Singularity Institute's 2001 Friendly AI guidelines, he suggests an AI goal architecture in which human happiness is determined by human behaviors indicating happiness: "Any artifact implementing 'learning' [...] must have 'human happiness' as its only initial reinforcement value [...] and 'human happiness' values are produced by an algorithm produced by supervised learning, to recognize happiness in human facial expressions, voices and body language, as trained by human behavior experts." Yudkowsky later criticized this proposal by remarking that such values would be better satisfied by filling the Solar System with microscopic smiling mannequins than by making existing humans happier.
Ben Goertzel, an artificial general intelligence researcher, believes that Friendly AI cannot be solved with current human knowledge. In the past he has stated that he does not believe mathematically proven Friendliness to be possible. In 2010 Goertzel favored formulating a theory of AI ethics "based on a combination of conceptual and experimental-data considerations" by "[building and studying] early-stage AGI systems empirically, with a focus on their ethics as well as their cognition". As of 2011 he proposes to build an "AI Nanny" system "whose job it is to protect us from ourselves and our technology – not forever, but just for a while, while we work on the hard problem of creating a Friendly Singularity."
Adam Keiper and Ari N. Schulman, editors of the technology journal The New Atlantis, argue that it will be impossible to ever guarantee "friendly" behavior in AIs because problems of ethical complexity will not yield to software advances or increases in computing power. In particular, they criticize Yudkowsky's definition of Friendliness, noting a variety of situations (such as hostage scenarios) in which it would dictate that a Friendly AI should simultaneously take conflicting or opposite actions. They write that the utilitarian calculi upon which Friendly AI theories are based work "only when one has not only great powers of prediction about the likelihood of myriad possible outcomes, but certainty and consensus on how one values the different outcomes. Yet it is precisely the debate over just what those valuations should be that is the stuff of moral inquiry.... Simply picking certain outcomes — like [Yudkowsky's criteria of] pain, death, bodily alteration, and violation of personal environment — and asserting them as absolute moral wrongs does nothing to resolve the difficulty of ethical dilemmas in which they are pitted against each other."
Stefan Pernar argues along the lines of Meno's paradox against the usefulness of Yudkowsky's approach by pointing out that attempting to solve the FAI problem is either pointless or hopeless depending on whether one assumes a universe that exhibits moral realism or not. In the former case a transhuman AI would independently reason itself into the proper goal system and assuming the latter, designing a friendly AI would be futile to begin with since morals can not be reasoned about.
- Ethics of artificial intelligence
- Machine ethics
- Seed AI - a theory related to Friendly AI
- Singularitarianism - a moral philosophy advocated by proponents of Friendly AI
- Technological singularity
- Yudkowsky, E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Global Catastrophic Risks, Oxford University Press, 2008.
Discusses Artificial Intelligence from the perspective of Existential risk, introducing the term "Friendly AI". In particular, Sections 1-4 give background to the definition of Friendly AI in Section 5. Section 6 gives two classes of mistakes (technical and philosophical) which would both lead to the accidental creation of non-Friendly AIs. Sections 7-13 discuss further related issues.
- Omohundro, S. 2008 The Basic AI Drives Appeared in AGI-08 - Proceedings of the First Conference on Artificial General Intelligence
- Kevin LaGrandeur. "The Persistent Peril of the Artificial Slave". Science Fiction Studies. Retrieved 2013-05-06.
- How Skynet Might Emerge From Simple Physics, io9, Published 2013-04-26.
- A. D. Wissner-Gross, "Causal entropic forces", Physical Review Letters 110, 168702 (2013).
- "Coherent Extrapolated Volition". Singinst.org. Retrieved 2010-08-20.
- "Research Areas | Singularity Institute for Artificial Intelligence". Singinst.org. Retrieved 2010-08-20.
- Objections to Coherent Extrapolated Volition
- Eliezer Yudkowsky (2003-05-29). "Re: SIAI's flawed friendliness analysis". Shock Level 4 mailing list. http://www.sl4.org/archive/0305/6846.html. Retrieved 2009-08-05.
- Ben Goertzel (2010-10-29). "The Singularity Institute's Scary Idea (and Why I Don't Buy It)". The Multiverse According to Ben. Retrieved 2010-10-31.
- Ben Goertzel. "Does Humanity Need an AI Nanny?". H+ Magazine. Retrieved 2011-08-17.
- Adam Keiper and Ari N. Schulman. "The Problem with ‘Friendly’ Artificial Intelligence". The New Atlantis. Retrieved 2012-01-16.
- Stefan Pernar. "Less is More – or: the sorry state of AI friendliness discourse". Retrieved 2012-02-06.
- Ethical Issues in Advanced Artificial Intelligence by Nick Bostrom
- What is Friendly AI? — A brief explanation of Friendly AI by the Singularity Institute.
- SIAI Guidelines on Friendly AI — The Singularity Institute's Official Guidelines
- Creating Friendly AI — A near book-length explanation from the SIAI
- Critique of the SIAI Guidelines on Friendly AI — by Bill Hibbard
- Commentary on SIAI's Guidelines on Friendly AI — by Peter Voss.
- The Problem with ‘Friendly’ Artificial Intelligence — On the motives for and impossibility of FAI; by Adam Keiper and Ari N. Schulman.