A concept inventory is a criterion-referenced test designed to help determine whether student has an accurate working knowledge of a specific set of concepts. Historically, concept inventories have been in the form of multiple-choice tests in order to aid interpretability and facilitate administration in large classes. Unlike a typical, teacher-authored multiple-choice test, questions and response choices on concept inventories are the subject of extensive research. The aims of the research include ascertaining (a) the range of what individuals think a particular question is asking and (b) the most common responses to the questions. Concept inventories are evaluated to ensure test reliability and validity. In its final form, each question includes one correct answer and several distractors.
Ideally, a score on a criterion-referenced test reflects the amount of content knowledge a student has mastered. Criterion-referenced tests differ from norm-referenced tests in that (in theory) the former is not used to compare an individual's score to the scores of the group. Ordinarily, the purpose of a criterion-referenced test is to ascertain whether a student mastered a predetermined amount of content knowledge; upon obtaining a test score that is at or above a cutoff score, the student can move on to study a body of content knowledge that follows next in a learning sequence. In general, item difficulty values ranging between 30% and 70% are best able to provide information about student understanding.
The distractors are incorrect or irrelevant answers that are usually (but not always) based on students' commonly held misconceptions. Test developers often research student misconceptions by examining students' responses to open-ended essay questions and conducting "think-aloud" interviews with students. The distractors chosen by students help researchers understand student thinking and give instructors insights into students' prior knowledge (and, sometimes, firmly held beliefs). This foundation in research underlies instrument construction and design, and plays a role in helping educators obtain clues about students' ideas, scientific misconceptions, and didaskalogenic, that is, teacher-induced confusions and conceptual lacunae that interfere with learning.
Concept inventories in use
Concept inventories are education-related diagnostic tests. In 1985 Halloun and Hestenes introduced a "multiple-choice mechanics diagnostic test" to examine students' concepts about motion. It evaluates student understanding of basic concepts in classical (macroscopic) mechanics. A little later, the Force Concept Inventory (FCI), another concept inventory, was developed. The FCI was designed to assess student understanding of the Newtonian concepts of force. Hestenes (1998) found that while "nearly 80% of the [students completing introductory college physics courses] could state Newton's Third Law at the beginning of the course. FCI data showed that less than 15% of them fully understood it at the end".These results have been replicated in a number of studies involving students at a range of institutions (see sources section below). That said, there remains questions as what exactly the FCI measures. Results from Hake (1998) using the FCI have led to greater recognition in the science education community of the importance of students' "interactive engagement" with the materials to be mastered.
Since the development of the FCI, other physics instruments have been developed. These include the Force and Motion Conceptual Evaluation developed by Thornton and Sokoloff and the Brief Electricity and Magnetism Assessment developed by Ding et al. For a discussion of how a number of concept inventories were developed see Beichner. Information about physics concept tests can be found at the NC State Physics Education Research Group website (see the external links below).
In addition to physics, concept inventories have been developed in statistics, chemistry, astronomy, basic biology, natural selection, genetics, engineering, and geoscience.
In many areas, foundational scientific concepts transcend disciplinary boundaries. An example of an inventory that assesses knowledge of such concepts is an instrument developed by Odom and Barrow (1995) to evaluate understanding of diffusion and osmosis. In addition, there are non-multiple choice conceptual instruments, such as the essay-based approach suggested by Wright et al. (1998) and the essay and oral exams used by Nehm and Schonfeld (2008). and Cooper et al  to measure student understanding of Lewis structures in chemistry.
Caveats associated with concept inventory use
Some concept inventories are problematic. The concepts tested may not be fundamental or important in a particular discipline, the concepts involved may not be explicitly taught in a class or curriculum, or answering a question correctly may require only a superficial understanding of a topic. It is therefore possible to either over-estimate or under-estimate student content mastery. While concept inventories designed to identify trends in student thinking may not useful in monitoring learning gains as a result of pedagogical interventions, while disciplinary mastery may not be the variable measure by a particular instrument. Users should be careful to ensure that concept inventories are actually testing conceptual understanding, rather than test-taking ability, language skills, or other abilities that can influence test performance.
The use of multiple-choice exams as concept inventories is not without controversy. The very structure of multiple-choice type concept inventories raises questions involving the extent to which complex, and often nuanced situations and ideas must be simplified or clarified to produce unambiguous responses. For example, a multiple-choice exam designed to assess knowledge of key concepts in natural selection does not meet a number of standards of quality control. One problem with the exam is that the two members of each of several pairs of parallel items, with each pair designed to measure exactly one key concept in natural selection, sometimes have very different levels of difficulty. Another problem is that the multiple-choice exam overestimates knowledge of natural selection as reflected in student performance on a diagnostic essay exam and a diagnostic oral exam, two instruments with reasonably good construct validity. Although scoring concept inventories in the form of essay or oral exams is labor-intensive, costly, and difficult to implement with large numbers of students, such exams can offer a more realistic appraisal of the actual levels of students' conceptual mastery as well as their misconceptions. Recently, however, computer technology has been developed that can score essay responses on concept inventories in biology and other domains (Nehm, Ha, & Mayfield, 2011), promising to facilitate the scoring of concept inventories organized as (transcribed) oral exams as well as essays.
- Authentic assessment
- Classical test theory
- Confidence-based learning
- Construct validity
- Constructive alignment
- Criterion-referenced test
- Educational assessment
- Item response theory
- Norm-referenced test
- Rubrics for assessment
- Standards-based education reform
- Standardized test
- Standards-based assessment
- "Development and Validation of Instruments to Measure Learning of Expert-Like Thinking." W. K. Adams & C. E. Wieman, 2010. International Journal of Science Education, 1-24. iFirst, doi:10.1080/09500693.2010.512369
- ref name=Tregust, 1988">Tregust, D.F. 1988. Development and use of diagnostic tests to evaluate students' misconceptions in science. International Journal of Science Education 10: 159-169 
- Hallouin, I. A., & Hestenes, D. Common sense concepts about motion (1985). American Journal of Physics, 53, 1043-1055
- Hestenes D, Wells M, Swackhamer G 1992 Force concept inventory. The Physics Teacher 30: 141-166.
- Hestenes D 1998. Am. J. Phys. 66:465
- Huffman, D. & P. Heller. 1995. What does the force concept inventory actually measure? The Physics Teacher 33: 138-143; Heller, P. & D. Huffman. 1995.
- Hake, R. R. 1998. Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. Am. J. Physics 66, 64-74.
- Redish page. Visited Feb. 14, 2011
- Thornton RK, Sokoloff DR (1998) Assessing student learning of Newton's laws: The Force and Motion Conceptual Evaluation and Evaluation of Active Learning Laboratory and Lecture Curricula. . Amer J Physics 66: 338-352. 
- Ding, L, Chabay, R, Sherwood, B, & Beichner, R (2006). Evaluating an electricity and magnetism assessment tool: Brief electricity and magnetism assessment Brief Electricity and Magnetism Assessment (BEMA). Phys. Rev. ST Physics Ed. Research 2, 7 pages. 
- Beichner, R. Testing student interpretation of kinematics graphs, Am. J. Phys., 62, 750-762, (1994).
- Allen, K (2006) The Statistics Concept Inventory: Development and Analysis of a Cognitive Assessment Instrument in Statistics. Doctoral dissertation, The University of Oklahoma. 
- The Chemical Concepts Inventory. Visited Feb. 14, 2011
- Wright et al., 1998. J. Chem. Ed. 75: 986-992
-  Astronomy Diagnostic Test (ADT) Version 2.0, visited Feb. 14, 2011
- Garvin-Doxas, K. & M.W. Klymkowsky. 2008. Understanding randomness and its impact on student learning: lessons learned from building the Biology Concept Inventory (BCI). CBE Life Sci Educ. 7:227-33. doi: 10.1187/cbe.07-08-0063.
- D'Avanzo. C. 2008. Biology concept inventories: overview, status, and next steps. BioScience 58: 1079-85
- D'Avanzo C, Anderson CW, Griffith A, Merrill J. 2010. Thinking like a biologist: Using diagnostic questions to help students reason with biological principles. (17 January 2010; www.biodqc.org/)
- Wilson CD, Anderson CW, Heidemann M, Merrill JE, Merritt BW, Richmond G, Sibley DF, Parker JM. 2007. Assessing students' ability to trace matter in dynamic systems in cell biology. CBE Life Sciences Education 5: 323-331.
- Anderson DL, Fisher KM, Norman GJ (2002) Development and evaluation of the conceptual inventory natural selection. Journal of Research In Science Teaching 39: 952-978. 
- Nehm R & Schonfeld IS (2008). Measuring knowledge of natural selection: A comparison of the C.I.N.S., an open-response instrument, and an oral interview. Journal of Research in Science Teaching, 45, 1131-1160. 
- Nehm R & Schonfeld IS (2010). The future of natural selection knowledge measurement: A reply to Anderson et al. (2010). Journal of Research in Science Teaching, 47, 358-362. 
- Smith MK, Wood WB, Knight JK (2008)The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics. CBE Life Sci Educ 7): 422-430. 
- Concept Inventory Assessment Instruments for Engineering Science. Visited Feb. 14, 2011. 
- Libarkin, J.C., Ward, E.M.G., Anderson, S.W., Kortemeyer, G., Raeburn, S.P., 2011, Revisiting the Geoscience Concept Inventory: A call to the community: GSA Today, v. 21, n. 8, p. 26-28. 
- Odom AL, Barrow LH 1995 Development and application of a two-tier diagnostic test measuring college biology students' understanding of diffusion and osmosis after a course of instruction. Journal of Research In Science Teaching 32: 45-61.
- Cooper, M. M.; Underwood, S. M.; Hilley, C. Z., 2012. Development and validation of the Implicit Information from Lewis Structures Instrument (IILSI): Do students connect structures with properties? Chem. Educ. Res. Pract. 13, 195-200.
- Nehm, R.H., Ha, M., Mayfield, E. (in press). Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations. Journal of Science Education and Technology. 
- Biology Concept Inventory
- Bio-Diagnostic Question Clusters
- Classroom Concepts and Diagnostic Tests
- Diagnostic Question Clusters in Biology
- Evolution Assessment
- Force Concept Inventory
- Molecular Life Sciences Concept Inventory
- Thinking Like a Biologist