Multiple choice is a form of assessment in which respondents are asked to select the best possible answer (or answers) out of the choices from a list. If randomly guessing an answer, there's usually a 25 percent chance of getting it correct on a 4 answer choice question. Finding the right answer from multiple choices can be automated using multiple choice question answering systems. The multiple choice format is most frequently used in educational testing, in market research, and in elections, when a person chooses between multiple candidates, parties, or policies.
Although E. L. Thorndike developed an early multiple choice test, Frederick J. Kelly was the first to use such items as part of a large scale assessment. While Director of the Training School at Kansas State Normal School (now Emporia State University) in 1915, he developed and administered the Kansas Silent Reading Test. Soon after, Kelly became the third Dean of the College of Education at the University of Kansas. The first all multiple choice, large scale assessment was the Army Alpha, used to assess the intelligence and more specifically the aptitudes of World War I military recruits. Multiple choice testing is particularly popular in the United States.
The items of a multiple choice test are often colloquially referred to as "questions," but this is a misnomer because many items are not phrased as questions. For example, they can be presented as incomplete statements, analogies, or mathematical equations. Thus, the more general term "item" is a more appropriate label. Items are stored in an item bank.
Multiple choice items consist of a stem and a set of options. The stem is the beginning part of the item that presents the item as a problem to be solved, a question asked of the respondent, or an incomplete statement to be completed, as well as any other relevant information. The options are the possible answers that the examiner can choose from, with the correct answer called the key and the incorrect answers called distractors. Only one answer can be keyed as correct. This contrasts with multiple response items in which more than one answer may be keyed as correct.
Usually, a correct answer earns a set number of points toward the total mark, and an incorrect answer earns nothing. However, tests may also award partial credit for unanswered questions or penalize students for incorrect answers, to discourage guessing. For example, the SAT removes a quarter point from the test taker's score for an incorrect answer.
For advanced items, such as an applied knowledge item, the stem can consist of multiple parts. The stem can include extended or ancillary material such as a vignette, a case study, a graph, a table, or a detailed description which has multiple elements to it. Anything may be included as long as it is necessary to ensure the utmost validity and authenticity to the item. The stem ends with a lead-in question explaining how the respondent must answer. In a medical multiple choice items, a lead-in question may ask "What is the most likely diagnosis?" or "What pathogen is the most likely cause?" in reference to a case study that was previously presented.
If a=1 and b=2, what is a+b?
In the equation , solve for x.
Ideally, the Multiple Choice Question should be asked as a "stem", with plausible options, for example:
The IT capital of India is
A well written multiple-choice question avoids obviously wrong or silly distractors (such as Mexico in the example above), so that the question makes sense when read with each of the distractors as well as with the correct answer. It is good practice to avoid "All of the above" or "None of the above" answers. If "All of the above" is used, then technically the student is correct no matter which option they select.
A more difficult and well-written multiple choice question is as follows:
Consider the following:
- An eight-by-eight chessboard.
- An eight-by-eight chessboard with two opposite corners removed.
- An eight-by-eight chessboard with all four corners removed.
Which of these can be tiled by two-by-one dominoes (with no overlaps or gaps, and every domino contained within the board)?
- I only
- II only
- I and II only
- I and III only
- I, II, and III
There are several advantages to multiple choice tests. If item writers are well trained and items are quality assured, it can be a very effective assessment technique. If students are instructed on the way in which the item format works and myths surrounding the tests are corrected, they will perform better on the test. On many assessments, reliability has been shown to improve with larger numbers of items on a test, and with good sampling and care over case specificity, overall test reliability can be further increased.
Multiple choice tests often require less time to administer for a given amount of material than would tests requiring written responses. This results in a more comprehensive evaluation of the candidate's extent of knowledge. Even greater efficiency can be created by the use of online examination delivery software. This increase in efficiency can offset the advantages offered by free-response items. That is, if free-response items provide twice as much information but take four times as long to complete, multiple-choice items present a better measurement tool.
Multiple choice questions lend themselves to the development of objective assessment items, but without author training, questions can be subjective in nature. Because this style of test does not require a teacher to interpret answers, test-takers are graded purely on their selections, creating a lower likelihood of teacher bias in the results. Factors irrelevant to the assessed material (such as handwriting and clarity of presentation) do not come into play in a multiple-choice assessment, and so the candidate is graded purely on their knowledge of the topic. Finally, if test-takers are aware of how to use answer sheets or online examination tick boxes, their responses can be relied upon with clarity. Overall, multiple choice tests are the strongest predictors of overall student performance compared with other forms of evaluations, such as in-class participation, case exams, written assignments, and simulation games.
The most serious disadvantage is the limited types of knowledge that can be assessed by multiple choice tests. Multiple choice tests are best adapted for testing well-defined or lower-order skills. Problem-solving and higher-order reasoning skills are better assessed through short-answer and essay tests. However, multiple choice tests are often chosen, not because of the type of knowledge being assessed, but because they are more affordable for testing a large number of students. This is especially true in the United States where multiple choice tests are the preferred form of high-stakes testing.
Another disadvantage of multiple choice tests is possible ambiguity in the examinee's interpretation of the item. Failing to interpret information as the test maker intended can result in an "incorrect" response, even if the taker's response is potentially valid. The term "multiple guess" has been used to describe this scenario because test-takers may attempt to guess rather than determine the correct answer. A free response test allows the test taker to make an argument for their viewpoint and potentially receive credit.
In addition, even if students have some knowledge of a question, they receive no credit for knowing that information if they select the wrong answer and the item is scored dichotomously. However, free response questions may allow an examinee to demonstrate partial understanding of the subject and receive partial credit. Additionally if more questions on a particular subject area or topic are asked to create a larger sample then statistically their level of knowledge for that topic will be reflected more accurately in the number of correct answers and final results.
Another disadvantage of multiple choice examinations is that a student who is incapable of answering a particular question can simply select a random answer and still have a chance of receiving a mark for it. It is common practice for students with no time left to give all remaining questions random answers in the hope that they will get at least some of them right. Many exams, such as the Australian Mathematics Competition and the SAT, have systems in place to negate this, in this case by making it no more beneficial to choose a random answer than to give none.
Another system of negating the effects of random selection is formula scoring, in which a score is proportionally reduced based on the number of incorrect responses and the number of possible choices. In this method, the score is reduced by the number of wrong answers divided by the average number of possible answers for all questions in the test, W/(c – 1) where w is the number of wrong responses on the test and c is the average number of possible choices for all questions on the test. All exams scored with the three-parameter model of item response theory also account for guessing. This is usually not a great issue, moreover, since the odds of a student receiving significant marks by guessing are very low when four or more selections are available.
Additionally, it is important to note that questions phrased ambiguously may confuse test-takers. It is generally accepted that multiple choice questions allow for only one answer, where the one answer may encapsulate a collection of previous options. However, some test creators are unaware of this and might expect the student to select multiple answers without being given explicit permission, or providing the trailing encapsulation options. Of course, untrained test developers are a threat to validity regardless of the item format.
Critics like philosopher and education proponent Jacques Derrida, said that while the demand for dispensing and checking basic knowledge is valid, there are other means to respond to this need than resorting to crib sheets.
Despite being sometimes contested, the format remains popular due to its utility, reliability, and cost effectiveness.
The theory that students should trust their first instinct and stay with their initial answer on a multiple choice test is a myth worth dispelling. Researchers have found that although some people believe that changing answers is bad, it generally results in a higher test score. The data across twenty separate studies indicate that the percentage of "right to wrong" changes is 20.2%, whereas the percentage of "wrong to right" changes is 57.8%, nearly triple. Changing from "right to wrong" may be more painful and memorable (Von Restorff effect), but it is probably a good idea to change an answer after additional reflection indicates that a better choice could be made. In fact, a person's initial attraction to a particular answer choice could well derive from the surface plausibility that the test writer has intentionally built into a distractor (or incorrect answer choice). Test item writers are instructed to make their distractors plausible yet clearly incorrect. A test taker's first-instinct attraction to a distractor is thus often a reaction that probably should be revised in light of a careful consideration of each of the answer choices. Some test takers for some examination subjects might have accurate first instincts about a particular test item, but that does not mean that all test takers should trust their first instinct.
Notable multiple-choice examinations
- AIEEE in India
- ARRT registry test for student radiologic technologists
- Australian Mathematics Competition
- F = ma
- IB Diploma Programme science subject exams
- IIT-JEE in India, which had, until 2006, a high-stakes phase after the initial MCQ based screening phase.
- Multistate Bar Examination
- PLAB = Professional and Linguistic Assessments Board for non-EEA medical graduates to practise in the UK
- Concept inventory
- Extended matching items
- Objective test
- Test (student assessment)
- Closed-ended question
- "Versatile question answering systems: seeing in synthesis", Mittal et al., IJIIDS, 5(2), 119-142, 2011.
- Mathews, J: "Just Whose Idea Was All This Testing?", The Washington Post, November 14, 2006.
- Phelps, Richard (Fall 1996), "Are US Students the Most Heavily Tested on Earth?", Educational Measurement: Issues and Practice 15 (3): 19–27, doi:10.1111/j.1745-3992.1996.tb00819.x
- Kehoe, Jerard (1995). Writing multiple-choice test items Practical Assessment, Research & Evaluation, 4(9). Retrieved February 12, 2008.
- Item Writing Manual by the National Board of Medical Examiners
- Beckert, L., Wilkinson, T. J., & Sainsbury, R. (2003). A needs-based study and examination skills course improves students' performance Medical Education 37 (5), 424–428. doi:10.1046/j.1365-2923.2003.01499.x
- Steven M Downing (2004) Reliability: on the reproducibility of assessment data Medical Education 38 (9), 1006–1012. doi:10.1111/j.1365-2929.2004.01932.x
- DePalma, Anthony (1 November 1990). "Revisions Adopted in College Entrance Tests". New York Times. Retrieved 22 August 2012.
- Bontis, N., Hardie, T., & Serenko, A. (2009). Techniques for assessing skills and knowledge in a business strategy classroom International Journal of Teaching and Case Studies, 2, 2, 162-180.
- Jacques Derrida (1990) pp.334-5 Once Again from the Top: Of the Right to Philosophy, interview with Robert Maggiori for Libération, November 15, 1990, republished in Points (1995).
- Benjamin, L. T., Cavell, T. A., & Shallenberger, W. R. (1984). Staying with the initial answers on objective tests: Is it a myth? Teaching of Psychology, 11, 133-141.