A concept inventory is a criterion-referenced test designed to evaluate whether a student has an accurate working knowledge of a specific set of concepts. To ensure interpretability, it is common to have multiple items that address a single idea. Typically, concept inventories are organized as multiple-choice tests in order to ensure that they are scored in a reproducible manner, a feature that also facilitates administration in large classes. Unlike a typical, teacher-made multiple-choice test, questions and response choices on concept inventories are the subject of extensive research. The aims of the research include ascertaining (a) the range of what individuals think a particular question is asking and (b) the most common responses to the questions. Concept inventories are evaluated to ensure test reliability and validity. In its final form, each question includes one correct answer and several distractors. The distractors are incorrect answers that are usually (but not always) based on students' commonly held misconceptions.
Ideally, a score on a criterion-referenced test reflects the amount of content knowledge a student has mastered. Criterion-referenced tests differ from norm-referenced tests in that (in theory) the former is not used to compare an individual's score to the scores of the group. Ordinarily, the purpose of a criterion-referenced test is to ascertain whether a student mastered a predetermined amount of content knowledge; upon obtaining a test score that is at or above a cutoff score, the student can move on to study a body of content knowledge that follows next in a learning sequence. In general, item difficulty values ranging between 30% and 70% are best able to provide information about student understanding.
Distractors are often based on ideas commonly held by students, as determined by years of research on misconceptions. Test developers often research student misconceptions by examining students' responses to open-ended essay questions and conducting "think-aloud" interviews with students. The distractors chosen by students help researchers understand student thinking and give instructors insights into students' prior knowledge (and, sometimes, firmly held beliefs). This foundation in research underlies instrument construction and design, and plays a role in helping educators obtain clues about students' ideas, scientific misconceptions, and didaskalogenic, that is, teacher-induced confusions and conceptual lacunae that interfere with learning.
Concept inventories in use
The first concept inventory was developed in 1985. It covers the understanding of basic concepts in classical mechanics. Hestenes, Halloun, Wells, and Swackhamer developed the first of the concept inventories to be widely disseminated, the Force Concept Inventory (FCI). The FCI was designed to assess student understanding of the Newtonian concepts of force. Hestenes (1998) found that while "nearly 80% of the [students completing introductory college physics courses] could state Newton's Third Law at the beginning of the course. FCI data showed that less than 15% of them fully understood it at the end". These results have been replicated in a number of studies involving students at a range of institutions (see sources section below), and have led to greater recognition in the physics education research community of the importance of students' "active engagement" with the materials to be mastered.
Since the development of the FCI, other physics instruments have been developed. These include the Force and Motion Conceptual Evaluation developed by Thornton and Sokoloff and the Brief Electricity and Magnetism Assessment developed by Ding et al. For a discussion of how a number of concept inventories were developed see Beichner. Information about physics concept tests can be found at the NC State Physics Education Research Group website (see the external links below).
In addition to physics, concept inventories have been developed in statistics, chemistry, astronomy, basic biology, natural selection, genetics, engineering, and geoscience.
In many areas, foundational scientific concepts transcend disciplinary boundaries. An example of an inventory that assesses knowledge of such concepts is an instrument developed by Odom and Barrow (1995) to evaluate understanding of diffusion and osmosis. In addition, there are non-multiple choice conceptual instruments, such as the essay-based approach suggested by Wright et al. (1998) and the essay and oral exams used by Nehm and Schonfeld (2008).
The data collected with a CI are only useful for measuring student learning when the CI is itself valid and reliable. All users are cautioned to carefully review papers for measures of validity and reliability before employing any CI to measure student learning.
Caveats associated with concept inventory use
Some concept inventories are problematic. Some inventories created by scientists do not align with best practices in scale development. Concept inventories created to simply diagnose student ideas may not be viable as research-quality measures of conceptual understanding. Users should be careful to ensure that concept inventories are actually testing conceptual understanding, rather than test-taking ability, language skills, or other abilities that can influence test performance.
The use of multiple-choice exams as concept inventories is not without controversy. The very structure of multiple-choice type concept inventories raises questions involving the extent to which complex, and often nuanced situations and ideas must be simplified or clarified to produce unambiguous responses. For example, a multiple-choice exam designed to assess knowledge of key concepts in natural selection does not meet a number of standards of quality control. One problem with the exam is that the two members of each of several pairs of parallel items, with each pair designed to measure exactly one key concept in natural selection, sometimes have very different levels of difficulty. Another problem is that the multiple-choice exam overestimates knowledge of natural selection as reflected in student performance on a diagnostic essay exam and a diagnostic oral exam, two instruments with reasonably good construct validity. Although scoring concept inventories in the form of essay or oral exams is labor intensive, costly, and difficult to implement with large numbers of students, such exams can offer a more realistic appraisal of the actual levels of students' conceptual mastery as well as their misconceptions. Recently, however, computer technology has been developed that can score essay responses on concept inventories in biology and other domains (Nehm, Ha, & Mayfield, 2011), promising to facilitate the scoring of concept inventories organized as (transcribed) oral exams as well as essays.
- Authentic assessment
- Classical test theory
- Confidence-based learning
- Construct validity
- Constructive alignment
- Criterion-referenced test
- Educational assessment
- Item response theory
- Norm-referenced test
- Rubrics for assessment
- Standards-based education reform
- Standardized test
- Standards-based assessment
- "Development and Validation of Instruments to Measure Learning of Expert-Like Thinking." W. K. Adams & C. E. Wieman, 2010. International Journal of Science Education, 1-24. iFirst, doi:10.1080/09500693.2010.512369
- Hallouin, I. A., & Hestenes, D. Common sense concepts about motion (1985). American Journal of Physics, 53, 1043-1055.
- Hestenes D, Wells M, Swackhamer G 1992 Force concept inventory. The Physics Teacher 30: 141-166.
- Hestenes D 1998. Am. J. Phys. 66:465
- Redish page. Visited Feb. 14, 2011
- Thornton RK, Sokoloff DR (1998) Assessing student learning of Newton's laws: The Force and Motion Conceptual Evaluation and Evaluation of Active Learning Laboratory and Lecture Curricula. . Amer J Physics 66: 338-352. 
- Ding, L, Chabay, R, Sherwood, B, & Beichner, R (2006). Evaluating an electricity and magnetism assessment tool: Brief electricity and magnetism assessment Brief Electricity and Magnetism Assessment (BEMA). Phys. Rev. ST Physics Ed. Research 2, 7 pages. 
- Beichner, R. Testing student interpretation of kinematics graphs, Am. J. Phys., 62, 750-762, (1994).
- Allen, K (2006) The Statistics Concept Inventory: Development and Analysis of a Cognitive Assessment Instrument in Statistics. Doctoral dissertation, The University of Oklahoma. 
- The Chemical Concepts Inventory. Visited Feb. 14, 2011
- Wright et al., 1998. J. Chem. Ed. 75: 986-992
-  Astronomy Diagnostic Test (ADT) Version 2.0, visited Feb. 14, 2011
- D'Avanzo. C. 2008. Biology concept inventories: overview, status, and next steps. BioScience 58: 1079-85
- D'Avanzo C, Anderson CW, Griffith A, Merrill J. 2010. Thinking like a biologist: Using diagnostic questions to help students reason with biological principles. (17 January 2010; www.biodqc.org/)
- Wilson CD, Anderson CW, Heidemann M, Merrill JE, Merritt BW, Richmond G, Sibley DF, Parker JM. 2007. Assessing students' ability to trace matter in dynamic systems in cell biology. CBE Life Sciences Education 5: 323-331.
- Anderson DL, Fisher KM, Norman GJ (2002) Development and evaluation of the conceptual inventory natural selection. Journal of Research In Science Teaching 39: 952-978. 
- Nehm R & Schonfeld IS (2008). Measuring knowledge of natural selection: A comparison of the C.I.N.S., an open-response instrument, and an oral interview. Journal of Research in Science Teaching, 45, 1131-1160. 
- Nehm R & Schonfeld IS (2010). The future of natural selection knowledge measurement: A reply to Anderson et al. (2010). Journal of Research in Science Teaching, 47, 358-362. 
- Smith MK, Wood WB, Knight JK (2008)The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics. CBE Life Sci Educ 7(4): 422-430. 
- Concept Inventory Assessment Instruments for Engineering Science. Visited Feb. 14, 2011. 
- Libarkin, J.C., Ward, E.M.G., Anderson, S.W., Kortemeyer, G., Raeburn, S.P., 2011, Revisiting the Geoscience Concept Inventory: A call to the community: GSA Today, v. 21, n. 8, p. 26-28. 
- Odom AL, Barrow LH 1995 Development and application of a two-tier diagnostic test measuring college biology students' understanding of diffusion and osmosis after a course of instruction. Journal of Research In Science Teaching 32: 45-61.
- Nehm, R.H., Ha, M., Mayfield, E. (in press). Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations. Journal of Science Education and Technology. 
- Basic Biology
- Bio-Diagnostic Question Clusters
- Classroom Concepts and Diagnostic Tests
- Diagnostic Question Clusters in Biology
- Evolution Assessment
- Force Concept Inventory
- Molecular Life Sciences Concept Inventory
- Thinking Like a Biologist