Wizard of Oz experiment
||This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (May 2013)|
In the field of human-computer interaction, a Wizard of Oz experiment is a research experiment in which subjects interact with a computer system that subjects believe to be autonomous, but which is actually being operated or partially operated by an unseen human being.
The phrase Wizard of Oz (originally OZ Paradigm) has come into common usage in the fields of experimental psychology, human factors, ergonomics, linguistics, and usability engineering to describe a testing or iterative design methodology wherein an experimenter (the “wizard”), in a laboratory setting, simulates the behavior of a theoretical intelligent computer application (often by going into another room and intercepting all communications between participant and system). Sometimes this is done with the participant’s a-priori knowledge and sometimes it is a low-level deceit employed to manage the participant’s expectations and encourage natural behaviors.
For example, a test participant may think he or she is communicating with a computer using a speech interface, when the participant’s words are actually being secretly entered into the computer by a person in another room (the “wizard”) and processed as a text stream, rather than as an audio stream. The missing system functionality that the wizard provides may be implemented in later versions of the system (or may even be speculative capabilities that current-day systems do not have), but its precise details are generally considered irrelevant to the study. In testing situations, the goal of such experiments may be to observe the use and effectiveness of a proposed user interface by the test participants, rather than to measure the quality of an entire system.
John F. (“Jeff”) Kelley coined the phrases “Wizard of OZ” and “OZ Paradigm” for this purpose circa 1980 to describe the method he developed during his dissertation work at Johns Hopkins University. (His dissertation advisor was the late professor Alphonse Chapanis, the “Godfather of Human Factors and Engineering Psychology”.) Amusingly enough, in addition to some one-way mirrors and such, there literally was a blackout curtain separating Jeff, as the “Wizard”, from view by the participant during the study.
The “Experimenter-in-the-Loop” technique had been pioneered at Chapanis’ Communications Research Lab at Johns Hopkins as early as 1975 (J. F. Kelley arrived in 1978). W. Randolph Ford used the experimenter-in-the-loop technique with his innovative CHECKBOOK program wherein he obtained language samples in a naturalistic setting. In Ford’s method, a preliminary version of the natural language processing system would be placed in front of the user. When the user entered a syntax that was not recognized, they would receive a “Could you rephrase that?” prompt from the software. After the session, the algorithms for processing the newly obtained samples would be created or enhanced and another session would take place. This approach led to the eventual development of his natural language processing technique, "Multi-Stage Pattern Reduction". Dr. Ford's recollection was that Dr. Kelley did in fact coin the phrase "Wizard of Oz Paradigm" but that the technique had been employed in at least two separate studies before Dr. Kelley had started conducting studies at the Johns Hopkins Telecommunications Lab.
In that employment the experimenter (the “Wizard”) sat at a terminal in an adjacent room separated by a one-way mirror so the subject could be observed. Every input from the user was processed correctly by a combination of software processing and real-time experimenter intervention. As the process was repeated in subsequent sessions, more and more software components were added so that the experimenter had less and less to do during each session until asymptote was reached on phrase/word dictionary growth and the experimenter could “go get a cup of coffee” during the session (which at this point was a cross-validation of the final system’s unattended performance).
A final point: Dr. Kelley's recollection of the coinage of the term is backed up by that of the late professor Al Chapanis. In their 1985 University of Michigan technical report, Green and Wei-Haas state the following: The first appearance of the "Wizard of Oz" name in print was in Jeff Kelley's thesis (Kelley, 1983a, 1983b, 1984a). It is thought the name was coined in response to a question at a graduate seminar at Hopkins (Chapanis, 1984; Kelley, 1984b). "What happens if the subject sees the experimenter [behind the "curtain" in an adjacent room acting as the computer]?" Kelley answered: "Well, that's just like what happened to Dorothy in the Wizard of Oz." And so the name stuck. (Cited by permission.)
There is also a passing reference to planned use of the "Wizard of Oz experiments" in a 1982 proceedings paper by Ford and Smith.
One fact, presented in Kelley's dissertation, about the etymology of the term in this context: Dr. Kelley did originally have a definition for the “OZ” acronym (aside from the obvious parallels with the 1900 book The Wonderful Wizard of Oz by L. Frank Baum). “Offline Zero” was a reference to the fact that an experimenter (the “Wizard”) was interpreting the users’ inputs in real time during the simulation phase.
Similar experimental setups had occasionally been used earlier, but without the "wizard of oz" name. Design researcher Nigel Cross conducted studies in the 1960s with "simulated" computer-aided design systems where the purported simulator was actually a human operator, using text and graphical communication via CCTV. As he explained, "All that the user perceives of the system is this remote-access console, and the remainder is a black box to him. ... one may as well fill the black box with people as with machinery. Doing so provides a comparatively cheap simulator, with the remarkable advantages of the human operator's flexibility, memory, and intelligence, and which can be reprogrammed to give a wide range of computer roles merely by changing the rules of operation. It sometimes lacks the real computer's speed and accuracy, but a team of experts working simultaneously can compensate to a sufficient degree to provide an acceptable simulation." Cross later referred to this as a kind of Reverse Turing test.
The Wizard of OZ method (unlike the eponymous “wizard” in the film) is very powerful. In its original application, Dr. Kelley was able to create a simple keyboard-input natural language recognition system that far exceeded the recognition rates of any of the far more complex systems of the day.
The thinking current among many computer scientists and linguists at the time was that, in order for a computer to be able to “understand” natural language enough to be able to assist in useful tasks, the software would have to be attached to a formidable “dictionary” having a large number of categories for each word. The categories would enable a very complex parsing algorithm to unravel the ambiguities inherent in naturally produced language. The daunting task of creating such a dictionary led many to believe that computers simply would never truly “understand” language until they could be “raised” and “experience life” as humans, since humans seem to apply a life’s worth of experiences to the interpretation of language.
The key enabling factor for the first use of the OZ method was that the system was designed to work in a single context (calendar-keeping), which constrained the complexity of language encountered from users to the extent where a simple language processing model was sufficient to meet the goals of the application. The processing model was a two-pass keyword/keyphrase matching approach, based loosely on the algorithms employed in Weizenbaum’s famous Eliza program. By inducing participants to generate language samples in the context of solving an actual task (using a computer that they believed actually understood what they were typing), the variety and complexity of the lexical structures gathered was greatly reduced and simple keyword matching algorithms could be developed to address the actual language collected.
This first use of OZ was in the context of an iterative design approach. In the early development sessions, the experimenter simulated the system in toto, performing all the database queries and composing all the responses to the participants by hand. As the process matured, the experimenter was able to replace human interventions, piece by piece, with newly created developed code (which, at each phase, was designed to accurately process all the inputs that were generated in preceding steps). By the end of the process, the experimenter was able to observe the sessions in a “hands-off” mode (and measure the recognition rates of the completed program).
OZ was important because it addressed the obvious criticism:
- Who can afford to use an iterative method to build a separate natural language system (dictionaries, syntax) for each new context? Wouldn’t you be forever adding new structures and algorithms to handle each new batch of inputs?
The answer turned out to be:
By using an empirical approach like OZ, anyone can afford to do this; Dr. Kelley’s dictionary and syntax growth reached asymptote (achieving from 86% to 97% recognition rates, depending on the measurements employed) after only 16 experimental trials and the resulting program, with dictionaries, was less than 300k of code.
In the 23 years that followed initial publication, the OZ method has been employed in a wide variety of settings, notably in the prototyping and usability testing of proposed user interface designs in advance of having actual application software in place.
The name of the experiment comes from The Wonderful Wizard of Oz story, in which an ordinary man hides behind a curtain and pretends, through the use of “amplifying” technology, to be a powerful wizard.
In David Lodge’s novel Small World, a university lecturer in English literature is introduced to a computer program named ELIZA, which he believes is capable of conducting a coherent conversation with him. It transpires that a computer lecturer is operating the computer and providing all the responses. The original ELIZA program, providing basic but often surprisingly human-like responses to questions, was written at MIT by Joseph Weizenbaum in the 1960s.
Here are some of the original (and subsequent) references on the subject (the method has been picked up in many research domains, and there are numerous subsequent references, only a few of which are listed here).
Summary of the technical aspects of the work:
Kelley, J. F., “CAL – A Natural Language program developed with the OZ Paradigm: Implications for Supercomputing Systems”. First International Conference on Supercomputing Systems (St. Petersburg, Florida, 16–20 December 1985), New York, ACM, pp. 238–248.
Brief description of the method:
Kelley, J. F., “An empirical methodology for writing user-friendly natural language computer applications”. Proceedings of ACM SIG-CHI ’83 Human Factors in Computing systems (Boston, 12–15 December 1983), New York, ACM, pp. 193-196. 
The best description of the method:
Kelley, J. F., “An iterative design methodology for user-friendly natural language office information applications”. ACM Transactions on Office Information Systems, March 1984, 2:1, pp. 26–41. 
The unpublished dissertation itself:
Kelley, J. F., “Natural Language and computers: Six empirical steps for writing an easy-to-use computer application”. Unpublished doctoral dissertation, the Johns Hopkins University, 1983. (Item 8321592 can be obtained from University Microfilms International; 300 North Zeeb Road; Ann Arbor; Michigan; 48106; USA.)
Subsequent references and implementations (a sampling of approximately 30 years of citations):
Schieben, A., et al. 2009, "The theater-system technique: agile designing and testing of system behavior and interaction, applied to highly automated vehicles." In Proceedings of the 1st International Conference on Automotive User Interfaces and Interactive Vehicular Applications (Essen, Germany, 2009). ACM Press, New York, USA. 
Akers, D. 2006. Wizard of Oz for participatory design: inventing a gestural interface for 3D selection of neural pathway estimates. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems (Montréal, Québec, Canada, April 22–27, 2006). CHI ’06. ACM Press, New York, USA, 454-459. 
Höysniemi, J., Hämäläinen, P., and Turkki, L. 2004. Wizard of Oz prototyping of computer vision based action games for children. In Proceeding of the 2004 Conference on interaction Design and Children: Building A Community (Maryland, June 1–03, 2004). IDC ’04. ACM Press, New York, USA, 27-34. 
Molin, L. 2004. Wizard-of-Oz prototyping for co-operative interaction design of graphical user interfaces. In Proceedings of the Third Nordic Conference on Human-Computer interaction (Tampere, Finland, October 23–27, 2004). NordiCHI ’04, vol. 82. ACM Press, New York, USA, 425-428. 
Lai, J., and Yankelovich, N. 2003. Conversational speech interfaces. In the Human-Computer interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. A. Jacko and A. Sears, Eds. [clarification needed] Human Factors And Ergonomics. Lawrence Erlbaum Associates, Mahwah, New Jersey, USA, 698-713.
Gleicher, M. L., Heck, R. M., and Wallick, M. N. 2002. A framework for virtual videography. In Proceedings of the 2nd international Symposium on Smart Graphics (Hawthorne, New York, June 11–13, 2002). SMARTGRAPH ’02, vol. 24. ACM Press, New York, USA, 9-16. 
Klemmer, S. R., Sinha, A. K., Chen, J., Landay, J. A., Aboobaker, N., and Wang, A. 2000. Suede: a Wizard of Oz prototyping tool for speech user interfaces. In Proceedings of the 13th Annual ACM Symposium on User interface Software and Technology (San Diego, California, United States, November 06–08, 2000). UIST ’00. ACM Press, New York, USA, 1-10. 
Hewett, Thomas T. (et al.), “Curricula for Human-Computer Interaction”, ACM SIGCHI, 1992, 1996, Chapter 2. 
Piernot, P. P., Felciano, R. M., Stancel, R., Marsh, J., and Yvon, M. 1995. Designing the PenPal: blending hardware and software in a user-interface for children. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Denver, Colorado, United States, May 7–11, 1995). I. R. Katz, R. Mack, L. Marks, M. B. Rosson, and J. Nielsen, Eds. [clarification needed] Conference on Human Factors in Computing Systems. ACM Press/Addison-Wesley Publishing Co., New York, USA, 511-518. 
Prager, J. M., Lamberti, D. M., Gardner, D. L., and Balzac, S. R. 1990. REASON: an intelligent user assistant for interactive environments. IBM Syst. J. 29, 1 (Jan. 1990), 141-164.
Dahlbäck, N. and Jönsson, A. 1989. Empirical studies of discourse representations for natural language interfaces. In Proceedings of the Fourth Conference on European Chapter of the Association For Computational Linguistics (Manchester, England, April 10–12, 1989). European Chapter Meeting of the ACL. Association for Computational Linguistics, Morristown, New Jersey, USA, 291-298. 
Carroll, J., and Aaronson, A. 1988. Learning by doing with simulated intelligent help. Commun. ACM 31, 9 (Aug. 1988), 1064-1079. 
Gould, J. D. and Lewis, C. 1985. Designing for usability: key principles and what designers think. Commun. ACM 28, 3 (Mar. 1985), 300-311. 
Green, P., and Wei-Haas, L. 1985. The Rapid Development of User Interfaces: Experience with the Wizard of OZ Method. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 29, number 5, 1985, pp. 470 – 474 (5). 
Embley, D. W., and Kimbrell, R. E. 1985. A scheme-driven natural language query translator. In Proceedings of the 1985 ACM Thirteenth Annual Conference on Computer Science (New Orleans, Louisiana, United States). CSC ’85. ACM Press, New York, USA, 292-297. 
Good, M. D., Whiteside, J. A., Wixon, D. R., and Jones, S. J. 1984. Building a user-derived interface. Commun. ACM 27, 10 (Oct. 1984), 1032-1043.