|WikiProject Statistics||(Rated C-class, High-importance)|
Why limited to psychology?
This is an artical putatively about validity the statistical concept. Why does the first sentence limit the discussion to the domain of psychology? If there are distinct validity concerns in the realm of psychology, shouldn't that be dealt with in a subsection? The overall article should presumably be as domain independent as possible. —Preceding unsigned comment added by 18.104.22.168 (talk) 21:00, 3 May 2009 (UTC)
Moved note re:Criterion validity
Anon 22.214.171.124 left the following note in the article space for Criterion validity, red-linked from this article. Since the note is more appropriate for a talk page, and that article doesn't exist yet, I'm moving the note here before deleting the article.
- Criterion validity is not the right terminology. It ought to be "criterion related validity" which means the validity is actually of the predictor using the specific criterion. It is safer to term it "predictive validity" of a measure.
As far as I know "A test can be reliable, but not valid" is the right definition or description. Imagine a clock shows every day 3.00pm, it is everyday the same, therefore it is reliable. However, if you try to measure your weight with a clock, you have a reliable measurement without validity.
Validity and reliability
The graphic illustrating the relationship between validity and reliability is incorrectly labeled. It is actually showing the relationship between precision and accuracy. — Preceding unsigned comment added by 126.96.36.199 (talk) 22:19, 30 May 2013 (UTC)
The article states that A valid measure must be reliable, but a reliable measure need not be valid. , but Earl Babbie's 'The Practice of Social Research', 10th edition, p.145 has a graph that implies that a valid measure does not have to be reliable. Can anybody elaborate on this? --Piotr Konieczny aka Prokonsul Piotrus Talk 18:47, 16 October 2005 (UTC)
- What is meant is that if measurements of a person's weight are to be valid (i.e. they actually measure weight) they must me reliable. They cannot change from instrument to instrument etc. However, an instrument can give consistent measurements - hence be reliabile - yet not measure what it is supposed to measure. Thus, it would be 'reliable' but not valid. This is a common argument. The problem with it is that the distinction between validity and reliability is blurry at best. It is taken for granted that validity means something measures the trait or attribute it purports to measure. Generally, people implicitly take it that something is only reliable if it measures what it purports to, and if so, the statement you cited ceases to make sense. It depends on how reliability is defined, in precise terms. Holon 02:00, 1 April 2006 (UTC)
- I'm not sure I agree with you. Most textbooks and articles in both psychology (e.g. D. Borsboom et al., "The concept of validity". Psychological Review, 111,4,pp. 1061-71) and the social sciences (e.g. King, Keohane and Verba's well-known textbook) define validity and reliability clearly as separate concepts. Neither one necessarily implies the other. The classical explanation of this view is that of a rifle pointed at a target; a rifle aimed exactly at the bull's eye represents a valid measurement. But it may still be off because of random errors (imprecision). On the other hand, another rifle may be very precise (reliable) but pointed somewhere else completely, and thus invalid if you want to hit the bull's eye. A more statistical formulation is that unreliability is about random error, while invalidity is about systematic error. Your statement that "validity means something measures the trait or attribute it purports to measure" is indeed common, and can be expanded with "but not necessarily with perfect precision". 188.8.131.52 16:02, 4 November 2006 (UTC)
- Appealing to "most textbooks" is not going to get you anywhere since most textbooks have yet to incorporate the 1999 AERA, APA, NCME Standards for Psychological and Educational Testing. Simply put, reliability means consistency. Cronbach's coefficient is actually a"coefficient of internal consistency". Inter-rater reliability actually concerns how consistent rating is from rater to rater. G-theory also concerns isolating sources of inconcsistency.
- Validity is the degree to which evidence supports the interpretations of test scores required for specific purposes (AREA, APA, NCME, 1999). You cannot validly interpret test scores if the consistency of the results is unknown. In other words, if there is no evidence of reliability, you cannot know if the scores mean what you need them to mean. Did participant A really perform lower (or possess less of the target trait) than participant B? or was the difference only because rater A gives consistently lower scores than rater B? You could find out by analyzing the inter-rater reliability (consistency), but without that evidence, you cannot know that the scores reflect the target of measurement.
- Per the 1999 Standards, reliability is most certainly a validity concern. It fits under the "Evidence Based on [the Test's] Internal Structure."
Reliability and validity are related but independent. They are analogous to the engineering terms precision and accuracy respectively. An analog wristwatch that does not work is accurate (valid) twice a day to as many decimal places as you can measure. But it lacks precision (reliability). A watch than is always 10 minutes fast is never accurate but is very precise. These terms are well defined and accepted in engineering.
The problem comes in when mapping these concepts into social science because the terms acquire linguistic uncertainty from colloquial usage. In every day usage for example, a reliable person is always on time. Using the scientific definition of reliability, a person that is always 10 minutes late is also reliable.
Needs Rewrite for Clarity
- I would like to see this expanded using this outline as a guide:
I. Validity A. Internal B. External C. Statistical Conclusion D. Construct i. Intentional ii. Representation a. Face b. Content iii. Observation a. Predictive b. Criterion c. Concurrent d. Convergent
The article confuses two main objects of validity, namely (1) a test, and (2) a (quasi)experiment. In case (1) validity is about the psychometric properties of the test, and for this case de APA Standards apply. In case (2) validity is about the validity of the causal inferences, and there the Cook & Campbell terminology applies. These are entirely different concepts of validity, and by mixing them into one list the accuracy of the article is compromised. Within the psychometric validity family the main concepts are Content, Criterion and Construct validity. Within the causal validity family, the main concepts are Statistical, Internal, Construct and External. What contributes to the confusion is that both families contain the concept Construct validity. However, these are actually two different concepts of construct validity. E.g. if you construct a test and uses this in an experiment, then an expert analysis of the test items contents contributed to the Content validity of the test, which in turn contributes to the Construct validity of the experiment. The expert analysis does not contribute to the Construct validity of the test, however. JulesEllis (talk) 14:50, 20 March 2008 (UTC)
- I agree that this is in severe need of work. In fact, this is the first time I've checked the article on validity, and I'm quite embarrassed by its state given that Popham (2008) rightly observed that there is no more important issue in modern assessment.
- JulesEllis: The "Holy Trinity" (Guion, 1980) of validity (Content, Criterion and Construct) is inconsistent with the APA/AERA/NCME 1999 standards, which inherited a lot of Messick's (1995) framework. I'm going to put some work in on this, but I'll leave this talk comment up to get some feedback before I start.
- Validity is the degree to which evidence and theory support the interpretations required for specific uses of test results (AERA, APA, NCME, 1999). Nitko & Brookhart (2005) similarly define it as the "soundness of the interpretations and uses made of test results." The modern view is that validity is a unitary concept, but that the evidence used to support it may be categorized. For example, what used to be "criterion validity" is now simply "evidence of relation to other variables."
- Also, validity is not a property of the test or the test results, but of the interpretations and uses made of the test results. Unfortunately, the shortcut phraseology persists (e.g. "this test is not valid"), but looking for such speech is an easy way to identify which version of validity theory (classical or modern) the speaker follows.
- The distinction is more than semantic; it is extremely useful to practitioners. No longer is validity a checklist of necessary activities, but your specific uses and interpretations of the test result guide which types of validity evidence you will gather.
- Over the next few days I'll post drafts here in the talk before I move them to the article. Please provide feedback.
So I commented almost a year ago and I haven't made any changes. I'm looking at this article wondering why it's even here. I'm no deletionist, but there are already articles on validity in research design and now there is one on test validity, plus articles on all the little test "validities". So why do we have this one here? Jmbrowne (talk) 02:53, 5 December 2009 (UTC)
- If provided a citation for the concept, then we could determine whether it belongs here. I don't see "incremental" listed in the AERA, APA, NCME standards, so I assume it's something from the "Statistical conclusion validity" section? The plurality of validity is tiresome, anachronistic, and useless. I say we trim what's here rather than add anything new. —Preceding unsigned comment added by Jmbrowne (talk • contribs) 14:46, 12 December 2009 (UTC)
First line vandalism
There's a little vandalism going on in the first line there...someone who knows what they're doing ought to take care of that... — Preceding unsigned comment added by 184.108.40.206 (talk) 00:57, 12 September 2011 (UTC)