||The examples and perspective in this article deal primarily with USA and do not represent a worldwide view of the subject. (October 2011)|
A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. The estimate is derived from the analysis of test scores and possibly other relevant data from a sample drawn from the population. That is, this type of test identifies whether the test taker performed better or worse than other test takers, not whether the test taker knows either more or less material than is necessary for a given purpose.
The term normative assessment refers to the process of comparing one test-taker to his or her peers.
Norm-referenced assessment can be contrasted with criterion-referenced assessment and ipsative assessment. In a criterion-referenced assessment, the score shows whether or not test takers performed well or poorly on a given task, not how that compares to other test takers; in an ipsative system, test takers are compared to previous performance.
By contrast, a test is criterion-referenced when provision is made for translating the test score into a statement about the behavior to be expected of a person with that score. The same test can be used in both ways. Robert Glaser originally coined the terms norm-referenced test and criterion-referenced test.
Standards-based education reform is based on the belief that public education should establish what every student should know and be able to do. Students should be tested against a fixed yardstick, rather than against each other or sorted into a mathematical bell curve.
By assessing that every student must pass these new, higher standards, education officials believe that all students will achieve a diploma that prepares them for success in the 21st century.
Most state achievement tests are criterion-referenced. In other words, a predetermined level of acceptable performance is developed and students pass or fail in achieving or not achieving this level. Tests that set goals for students based on the average student's performance are norm-referenced tests. Tests that set goals for students based on a set standard (e.g., 80 words spelled correctly) are criterion-referenced tests.
Many college entrance exams and nationally used school tests use norm-referenced tests. The SAT, Graduate Record Examination (GRE), and Wechsler Intelligence Scale for Children (WISC) compare individual student performance to the performance of a normative sample. Test takers cannot "fail" a norm-referenced test, as each testtaker receives a score that compares the individual to others that have taken the test, usually given by a percentile. This is useful when there is a wide range of acceptable scores that is different for each college.
By contrast, nearly two-thirds of US high school students will be required to pass a criterion-referenced high school graduation examination. One high fixed score is set at a level adequate for university admission whether the high school graduate is college bound or not. Each state gives its own test and sets its own passing level, with states like Massachusetts showing very high pass rates, while in Washington State, even average students are failing, as well as 80 percent of some minority groups. This practice is opposed by many in the education community such as Alfie Kohn as unfair to groups and individuals who score lower than others.
Advantages and limitations
An obvious disadvantage of norm-referenced tests is that it cannot measure progress of the population as a whole, only where individuals fall within the whole. Thus, measuring against only a fixed goal can be used to measure the success of an educational reform program that seeks to raise the achievement of all students against new standards that seek to assess skills beyond choosing among multiple choices. However, while this is attractive in theory, in practice, the bar has often been moved in the face of excessive failure rates, and improvement sometimes occurs simply because of familiarity with and teaching to the same test.
With a norm-referenced test, grade level was traditionally set at the level set by the middle 50 percent of scores. By contrast, the National Children's Reading Foundation believes that it is essential to assure that virtually all of read at or above grade level by third grade, a goal which cannot be achieved with a norm referenced definition of grade level.
Advantages to this type of assessment include that students and teachers know what to expect from the test and just how the test will be conducted and graded. Likewise, all schools will conduct the exam in the same manner, reducing such inaccuracies as time differences or environmental differences that may cause distractions to the students. This also makes these assessments fairly accurate as far as results are concerned, a major advantage for a test.
Critics of criterion-referenced tests point out that judges set bookmarks around items of varying difficulty without considering whether the items actually are compliant with grade level content standards or are developmentally appropriate. Thus, the original 1997 sample problems published for the WASL 4th grade mathematics contained items that were difficult for college educated adults, or easily solved with 10th grade level methods such as similar triangles.
The difficulty level of items themselves and the cut-scores to determine passing levels are also changed from year to year. Pass rates also vary greatly from the 4th to the 7th and 10th grade graduation tests in some states.
One of the limitations of No Child Left Behind is that each state can choose or construct its own test, which cannot be compared to any other state. A Rand study of Kentucky results found indications of artificial inflation of pass rates which were not reflected in increasing scores in other tests such as the NAEP or SAT given to the same student populations over the same time.
Graduation test standards are typically set at a level consistent for native born 4 year university applicants. An unusual side effect is that while colleges often admit immigrants with very strong math skills who may be deficient in English, there is no such leeway in high school graduation tests, which usually require passing all sections, including language. Thus, it is not unusual for institutions like the University of Washington to admit strong Asian American or Latino students who did not pass the writing portion of the state WASL test, but such students would not even receive a diploma once the testing requirement is in place.
Although the tests such as the WASL are intended as a minimal bar for high school, 27 percent of 10th graders applying for Running Start in Washington State failed the math portion of the WASL. These students applied to take college level courses in high school, and achieve at a much higher level than average students. The same study concluded the level of difficulty was comparable to, or greater than that of tests intended to place students already admitted to the college.
A norm-referenced test has none of these problems because it does not seek to enforce any expectation of what all students should know or be able to do other than what actual students demonstrate. Present levels of performance and inequity are taken as fact, not as defects to be removed by a redesigned system. Goals of student performance are not raised every year until all are proficient. Scores are not required to show continuous improvement through Total Quality Management systems. Disadvantages include standards based assessments measure the level that students are currently by measuring against where their peers are currently at instead of the level that both students should be at.
A rank-based system produces only data that tell which average students perform at an average level, which students do better, and which students do worse, contradicting fundamental beliefs, whether optimistic or simply unfounded, that all will perform at one uniformly high level in a standards based system if enough incentives and punishments are put into place. This difference in beliefs underlies the most significant differences between a traditional and a standards based education system.
- IQ tests are norm-referenced tests, because their goal is to see which test taker is more intelligent than the other test takers.
- Theater auditions and job interviews are norm-referenced tests, because their goal is to identify the best candidate compared to the other candidates, not to determine how many of the candidates meet a fixed list of standards.
- Concept inventory
- Criterion-referenced assessment
- Educational assessment
- Ipsative assessment
- Standardized test
- Assessment Guided Practices
- PDF presentation
- Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row.
- Glaser, R. (1963). "Instructional technology and the measurement of learning outcomes". American Psychologist 18: 510–522.
-  Illinois Learning Standards
- stories 5-01.html Fairtest.org: Times on Testing "criterion referenced" tests measure students against a fixed yardstick, not against each other.
-  By the Numbers: Rising Student Achievement in Washington State by Terry Bergesn "She continues her pledge... to ensure all students achieve a diploma that prepares them for success in the 21st century."
-  NCTM: News & Media: Assessment Issues (Newsbulletin April 2004) "by definition, half of the nation's students are below grade level at any particular moment"
-  National Children's Reading Foundation website
-  HOUSE BILL REPORT HB 2087 "A number of critics... continue to assert that the mathematics WASL is not developmentally appropriate for fourth grade students."
- Prof Don Orlich, Washington State University
-  Panel lowers bar for passing parts of WASL By Linda Shaw, Seattle Times May 11, 2004 "A blue-ribbon panel voted unanimously yesterday to lower the passing bar in reading and math for the fourth- and seventh-grade exam, and in reading on the 10th-grade test"
-  Seattle Times December 06, 2002 Study: Math in 7th-grade WASL is hard By Linda Shaw "Those of you who failed the math section ... last spring had a harder test than your counterparts in the fourth or 10th grades."
-  New Jersey Department of Education: "But we already have tests in New Jersey, why have another test? Our statewide test is an assessment that only New Jersey students take. No comparisons should be made to other states, or to the nation as a whole.
-  Test-Based Accountability Systems (Rand) "NAEP data are particularly important.... Taken together, these trends suggest appreciable inflation of gains on KIRIS.
-  Relationship of the Washington Assessment of Student Learning (WASL) and Placement Tests Used at Community and Technical Colleges By: Dave Pavelchek, Paul Stern and Dennis Olson Social & Economic Sciences Research Center, Puget Sound Office, WSU "The average difficulty ratings for WASL test questions fall in the middle of the range of difficulty ratings for the college placement tests."
- A webpage about instruction that discusses assessment