A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. The estimate is derived from the analysis of test scores and possibly other relevant data from a sample drawn from the population. That is, this type of test identifies whether the test taker performed better or worse than other test takers, not whether the test taker knows either more or less material than is necessary for a given purpose.
The term normative assessment refers to the process of comparing one test-taker to his or her peers.
Norm-referenced assessment can be contrasted with criterion-referenced assessment and ipsative assessment. In a criterion-referenced assessment, the score shows whether or not test takers performed well or poorly on a given task, not how that compares to other test takers; in an ipsative system, test takers are compared to previous performance.
The same test can be used in both ways.
Robert Glaser originally coined the terms norm-referenced test and criterion-referenced test.
Many college entrance exams and nationally used school tests use norm-referenced tests. The SAT, Graduate Record Examination (GRE), and Wechsler Intelligence Scale for Children (WISC) compare individual student performance to the performance of a normative sample. Test takers cannot "fail" a norm-referenced test, as each test taker receives a score that compares the individual to others that have taken the test, usually given by a percentile. This is useful when there is a wide range of acceptable scores, and the goal is to find out who performs better.
IQ tests are norm-referenced tests, because their goal is to see which test taker is more intelligent than the other test takers. The median IQ is set to 100, and all test takers are ranked up or down in comparison to that level.
Theater auditions and job interviews are norm-referenced tests, because their goal is to identify the best candidate compared to the other candidates, not to determine how many of the candidates meet a fixed list of standards.
As alternatives to normative testing, tests can be ipsative assessments or criterion-referenced assessments.
In an ipsative assessment, the individuals' performance is compared only to their previous performances. For example, a person on a weight-loss diet is judged by how his current weight compares to his own previous weight, rather than how his weight compares to an ideal or how it compares to another person.
A test is criterion-referenced when the performance is judged according to the expected or desired behavior. Tests that judge the test taker based on a set standard (e.g., everyone should be able to run one kilometer in less than five minutes) are criterion-referenced tests. The goal of a criterion-referenced test is to find out whether the individual can run as fast as the test giver wants, not to find out whether the individual is faster or slower than the other runners. Standards-based education reform focuses on criterion-referenced testing. Most everyday tests and quizzes taken in school, as well as most state achievement tests and high school graduation examinations, are criterion-referenced. In this model, it is possible for all test takers to pass or for all test takers to fail.
Advantages and limitations
The primary advantage of norm-reference tests is that they can provide information on how an individual's performance on the test compares to others in the reference group.
A serious limitation of norm-reference tests is that the reference group may not represent the current population of interest. As noted by the Oregon Research Institute's International Personality Item Pool website, "One should be very wary of using canned "norms" because it isn't obvious that one could ever find a population of which one's present sample is a representative subset. Most "norms" are misleading, and therefore they should not be used. Far more defensible are local norms, which one develops oneself. For example, if one wants to give feedback to members of a class of students, one should relate the score of each individual to the means and standard deviations derived from the class itself. To maximize informativeness, one can provide the students with the frequency distribution for each scale, based on these local norms, and the individuals can then find (and circle) their own scores on these relevant distributions." 
Norm-referencing does not ensure that a test is valid (i.e. that it measures the construct it is intended to measure).
Another disadvantage of norm-referenced tests is that they cannot measure progress of the population as a whole, only where individuals fall within the whole. Rather, one must measure against a fixed goal, for instance, to measure the success of an educational reform program that seeks to raise the achievement of all students.
With a norm-referenced test, grade level was traditionally set at the level set by the middle 50 percent of scores. By contrast, the National Children's Reading Foundation believes that it is essential to assure that virtually all children read at or above grade level by third grade, a goal which cannot be achieved with a norm-referenced definition of grade level.
Norms do not automatically imply a standard. A norm-referenced test does not seek to enforce any expectation of what test takers should know or be able to do. It measures the test takers' current level by comparing the test takers to their peers. A rank-based system produces only data that tell which students perform at an average level, which students do better, and which students do worse. It does not identify which test takers are able to correctly perform the tasks at a level that would be acceptable for employment or further education.
- Curved grading
- Macabre constant
- Concept inventory
- Educational assessment
- Standardized test — all individuals are given the same test under the same conditions. Used for both norm-referenced and criterion-referenced tests.
- Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.
- Glaser, R. (1963). "Instructional technology and the measurement of learning outcomes". American Psychologist. 18: 510–522. doi:10.1037/h0049294.
- "PDF presentation" (PDF). Archived from the original (PDF) on 2015-09-24. Retrieved 2006-07-21.
- stories 5-01.html[permanent dead link] Fairtest.org: Times on Testing "criterion referenced" tests measure students against a fixed yardstick, not against each other.
- "Archived copy". Archived from the original on 2010-04-14. Retrieved 2010-04-14.CS1 maint: archived copy as title (link) Illinois Learning Standards
- Oregon Research Institute, IPIP website, http://ipip.ori.org/newNorms.htm
-  NCTM: News & Media: Assessment Issues (Newsbulletin April 2004) "by definition, half of the nation's students are below grade level at any particular moment"
-  Archived 2007-03-11 at the Wayback Machine National Children's Reading Foundation website