External validity

From Wikipedia, the free encyclopedia
Jump to: navigation, search

External validity is the validity of generalized (causal) inferences in scientific studies, usually based on experiments as experimental validity.[1] In other words, it is the extent to which the results of a study can be generalized to other situations and to other people.[2] For example, inferences based on comparative psychotherapy studies often employ specific samples (e.g. volunteers, highly depressed, no comorbidity). If psychotherapy is found effective for these sample patients, will it also be effective for non-volunteers or the mildly depressed or patients with concurrent other disorders?

  • Situation: All situational specifics (e.g. treatment conditions, time, location, lighting, noise, treatment administration, investigator, timing, scope and extent of measurement, etc. etc.) of a study potentially limit generalizability.
  • Pre-test effects: If cause-effect relationships can only be found when pre-tests are carried out, then this also limits the generality of the findings.
  • Post-test effects: If cause-effect relationships can only be found when post-tests are carried out, then this also limits the generality of the findings.
  • Reactivity (placebo, novelty, and Hawthorne effects): If cause-effect relationships are found they might not be generalizable to other settings or situations if the effects found only occurred as an effect of studying the situation.
  • Rosenthal effects: Inferences about cause-consequence relationships may not be generalizable to other investigators or researchers.

Threats to external validity[edit]

"A threat to external validity is an explanation of how you might be wrong in making a generalization."[3] Generally, generalizability is limited when the cause (i.e. the independent variable) depends on other factors; therefore, all threats to external validity interact with the independent variable.

  • Aptitude–treatment Interaction: The sample may have certain features that may interact with the independent variable, limiting generalizability. For example, inferences based on comparative psychotherapy studies often employ specific samples (e.g. volunteers, highly depressed, no comorbidity). If psychotherapy is found effective for these sample patients, will it also be effective for non-volunteers or the mildly depressed or patients with concurrent other disorders?
  • Situation: All situational specifics (e.g. treatment conditions, time, location, lighting, noise, treatment administration, investigator, timing, scope and extent of measurement, etc. etc.) of a study potentially limit generalizability.
  • Pre-test effects: If cause-effect relationships can only be found when pre-tests are carried out, then this also limits the generality of the findings.
  • Post-test effects: If cause-effect relationships can only be found when post-tests are carried out, then this also limits the generality of the findings.
  • Reactivity (placebo, novelty, and Hawthorne effects): If cause-effect relationships are found they might not be generalizable to other settings or situations if the effects found only occurred as an effect of studying the situation.
  • Rosenthal effects: Inferences about cause-consequence relationships may not be generalizable to other investigators or researchers.

External, internal, and ecological validity[edit]

In many studies and research designs, there may be a "trade-off" between internal validity and external validity: When measures are taken or procedures implemented aiming at increasing the chance for higher degrees of internal validity, these measures may also limit the generalizability of the findings. This situation has led many researchers call for "ecologically valid" experiments. By that they mean that experimental procedures should resemble "real-world" conditions. They criticize the lack of ecological validity in many laboratory-based studies with a focus on artificially controlled and constricted environments. External validity and ecological validity are closely related in the sense that causal inferences based on ecologically valid research designs often allow for higher degrees of generalizability than those obtained in an artificially produced lab environment. However, this is not always the case: Some findings produced in ecologically valid research settings may hardly be generalizable, and some findings produced in highly controlled settings may claim near-universal external validity. Thus, External and Ecological Validity are independent – a study may possess external validity but not ecological validity, and vice-versa.

Qualitative research[edit]

Within the qualitative research paradigm, external validity is replaced by the concept of transferability. Transferability is the ability of research results to transfer to situations with similar parameters, populations and characteristics.[4]

External validity in experiments[edit]

Many drawbacks can occur when following the experimental method. By the virtue of gaining enough control over the situation so as to randomly assign people to conditions and rule out the effects of extraneous variables, the situation can become somewhat artificial and distant from real life. There are two kinds of generalizability at issue:

  1. The extent to which we can generalize from the situation constructed by an experimenter to real-life situations (generalizability across situations),[5] and
  2. The extent to which we can generalize from the people who participated in the experiment to people in general (generalizability across people)[6]

Generalizability across situations[edit]

Research in psychology experiments attempted in universities are often criticized for being conducted in artificial situations and that it cannot be generalized to real life.[7] To solve this problem, social psychologists attempt to increase the generalizability of their results by making their studies as realistic as possible. However, there is more than one way that an experiment can be realistic:

  1. The similarity of an experimental situation to events that occur frequently in everyday life—it is clear that many experiments are decidedly unreal.
  2. In many experiments, people are placed in situations they would rarely encounter in everyday life.

This is referred to the extent to which an experiment is similar to real-life situations as the experiment's mundane realism.[8]

It is more important to ensure that a study is high in psychological realism -- how similar the psychological processes triggered in an experiment are to psychological processes that occur in everyday life.[9]

Psychological realism is heightened if people find themselves engrossed in a real event. To accomplish this, you must tell the participants a cover story -- a false description of the study's purpose. If however, the experimenters were to tell the participants the purpose of the experiment then such a procedure would be low in psychological realism. In everyday life, no one knows when emergencies are going to occur and people do not have time to plan responses to them. This means that the kinds of psychological processes triggered would differ widely from those of a real emergency, reducing the psychological realism of the study.[10]

People don't always know why they do what they do, or what they do until it happens. Therefore, describing an experimental situation to participants and then asking them to respond normally will produce responses that are suspect. We cannot depend on people's predictions about what they would do in a hypothetical situation; we can only find out what people will really do when we construct a situation that triggers the same psychological processes as occur in the real world.

Generalizability across people[edit]

Social psychologists study the way in which people in general are susceptible to social influence. Several experiments have documented an interesting, unexpected example of social influence, whereby the mere knowledge that others were present reduced the likelihood that people helped.

The only way to be certain that the results of an experiment represent the behaviour of a particular population is to ensure that participants are randomly selected from that population. Samples in experiments cannot be randomly selected just as they are in surveys because it is impractical and expensive to select random samples for social psychology experiments. It is difficult enough to convince a random sample of people to agree to answer a few questions over the telephone as part of a political poll, and such polls can cost thousands of dollars to conduct.

Many research addresses this problem by studying basic psychological processes that make people susceptible to social influence, assuming that these processes are so fundamental that they are universally shared. Some social psychologist processes do vary in different cultures and in those cases, diverse samples of people have to be studied.[11]

Replications[edit]

The ultimate test of an experiment's external validity is replication — conducting the study over again, generally with different subject populations or in different settings. Researches will often use different methods, to see if they still get the same results.

When many studies of one problem are conducted, the results can vary. Several studies might find an effect of the number of bystanders on helping behaviour, whereas a few do not. To make sense out of this, there is a statistical technique called meta-analysis that averages the results of two or more studies to see if the effect of an independent variable is reliable. A meta analysis essentially tells us the probability that the findings across the results of many studies are attributable to chance or to the independent variable. If an independent variable is found to have an effect in only of 20 studies, the meta-analysis will tell you that that one study was an exception and that, on average, the independent variable is not influencing the dependent variable. If an independent variable is having an effect in most of the studies, the meta analysis is likely to tell us that, on average, it does influence the dependent variable.

There can be reliable phenomena that are not limited to the laboratory. For example, increasing the number of bystanders has been found to inhibit helping behaviour with many kinds of people, including children, university students, and future ministers;[12] in Israel;[13] in small towns and large cities in the U.S.;[14] in a variety of settings, such as psychology laboratories, city streets, and subway trains;[15] and with a variety of types of emergencies, such as seizures, potential fires, fights, and accidents,[16] as well as with less serious events, such as having a flat tire.[17] Many of these replications have been conducted in real-life settings where people could not possibly have known that an experiment was being conducted.

The basic dilemma of the social psychologist[edit]

When conducting experiments in psychology, there is always a trade-off between internal and external validity—that is between

  1. having enough control over the situation to ensure that no extraneous variables are influencing the results and to randomly assign people to conditions, and
  2. ensuring that the results can be generalized to everyday life.

One way to increase external validity is by conducting field experiments. In a field experiment, people's behaviour is studied outside the laboratory, in its natural setting. A field experiment is identical in design to a laboratory experiment, except that it is conducted in a real-life setting. The participants in a field experiment are unaware that the events they experience are in fact an experiment. The external validity of such an experiment is high because it is taking place in the real world, with real people who are more diverse than a typical university student sample.

Experimentes Latane and Darley (1970) tested their hypothesis about group size and bystander intervention in a convenience store outside New York City. Two "robbers" -- with the full knowledge and permission of the cashier and manager of the store -- waited until there were either one or two other customers at the checkout counter. Then they asked the cashier to name the most expensive beer the store carried. The cashier answered the question and then said he would have to check in the back to see how much of that brand was in stock. While the cashier was gone, the robbers picked up a case of beer in the front of the store,put the beer in their car, and drove off.

Because the robbers were intimidating, no one attempted to intervene directly to stop the theft. But when the cashier returned, fewer people reported the theft when there was another witness/customer in the store than when they were alone. Real life studies can best be captured by doing field studies, but it is difficult to control all extraneous variables in such studies.

The trade-off between internal and external validity is referred to as the basic dilemma of the social psychologist.[18]

To maximize both, Wendy Josephson conducted a study on the relation between television violence and aggressive behaviour. In this study, boys in grades 2 and 3 from 13 schools in Winnipeg watched either a violent or a nonviolent television show. Internal validity was achieved by controlling the television show the participants watched. Josephson ensure that the violent and nonviolent shows were equivalent in terms of excitement, liking, and physiological arousal. This level of control ensured that any differences in subsequent behaviour between the two groups were because of differences in violent content, rather than other variables that might be associated with violent programming, such as excitement. Internal validity was further enhanced by random assignment of participants to either the violent or nonviolent condition. External validity was maximized by having the participants play floor hockey in their school gymnasium after they had finished viewing the television segment.[19]

The observers recorded instances of aggression after the boys had viewed the shows. The observers were unaware whether the boys had seen the violent or the nonviolent show. To make the observation as natural as possible, participants were told that observers would be doing "play by plays" just the way they do in real hockey games. The observers spoke into microphones, noted the number on a child's jersey, and recorded the kind of aggression that occurred. The results indicated that exposure to violent programming did increase aggression—but only among boys who were predisposed toward aggression.[20]

Both internal and external validity are not captured in a single experiment. Social psychologists opt first for internal validity, conducting laboratory experiments in which people are randomly assigned to different conditions and all extraneous variables are controlled. Other social psychologists prefer external validity to control, conducting most of their research in field studies. And many do both. Taken together, both types of studies meet the requirements of the perfect experiment. Through replication, researchers can study a given research question with maximal internal and external validity.[21]

See also[edit]

Notes[edit]

  1. ^ Mitchell, M. & Jolley, J. (2001). Research Design Explained (4th Ed) New York:Harcourt.
  2. ^ Aronson, E., Wilson, T. D., Akert, R. M., & Fehr, B. (2007). Social psychology. (4 ed.). Toronto, ON: Pearson Education.
  3. ^ Trochim, William M. The Research Methods Knowledge Base, 2nd Edition.
  4. ^ Lincoln, Y.S. & Guba, E.G. (1986). But is it rigorous? Trustworthiness and authenticity in naturalistic evaluation. In D.D. Williams (Ed.), Naturalistic evaluation (pp. 73-84). New Directions for Program Evaluation, 30. San Francisco, CA: Jossey-Bass.
  5. ^ Aronson, E., Wilson, T. D., Akert, R. M., & Fehr, B. (2007). Social psychology. (4 ed.). Toronto, ON: Pearson Education.
  6. ^ Aronson, E., Wilson, T. D., Akert, R. M., & Fehr, B. (2007). Social psychology. (4 ed.). Toronto, ON: Pearson Education.
  7. ^ Aronson, E., & Carlsmith, J.M. (1968). Experimentation in social psychology. In G. Lindzey & E. Aronson(Eds.), The Handbook of social psychology. (Vol. 2, pp. 1-79.) Reading, MA: Addison-Wesley.
  8. ^ Aronson, E., & Carlsmith, J.M. (1968). Experimentation in social psychology. In G. Lindzey & E. Aronson(Eds.), The Handbook of social psychology. (Vol. 2, pp. 1-79.) Reading, MA: Addison-Wesley.
  9. ^ Aronson, E., Wilson, T.D., & Brewer, m. (1998). Experimental methods. In D. Gilbert, S. Fiske, & G. Lindzey (Eds.), The handbook of social psychology. (4th ed., Vol. 1, pp. 99-142.) New York: Random House.
  10. ^ Aronson, E., Wilson, T. D., Akert, R. M., & Fehr, B. (2007). Social psychology. (4 ed.). Toronto, ON: Pearson Education.
  11. ^ Darley, J.M., & Batson, C.,D. (1973). From Jerusalem to Jericho: A study of situational and dispositional variables in helping behaviour. Journal of Personality and Social Psychology, 27, 100-108.
  12. ^ Darley, J.M., & Batson, C.,D. (1973). From Jerusalem to Jericho: A study of situational and dispositional variables in helping behaviour. Journal of Personality and Social Psychology, 27, 100-108.
  13. ^ Schwartz, S.H., & Gottlieb, A. (1976). Bystander reactions to a violent theft: Crime in Jerusalem. Journal of Personality and Social Psychology, 34, 1188-1199.
  14. ^ Latane, B., & Dabbs, J.M. (1975). Sex, group size, and helping in three cities. Sociometry, 38, 108-194.
  15. ^ Harrison, J.A., & Wells, R.B. (1991). Bystander effects on male helping behaviour: Social comparison and diffusion of responsibility. Representative Research in Social Psychology, 96, 187-192
  16. ^ Latane, B., & Darley, J.M. (1968). Group inhibition of bystander intervention. Journal of Personality and Social Psychology, 10, 215-221.
  17. ^ Hurley, D., & Allen, B.P. (1974). The effect of the number of people present in a nonemergency situation. Journal of Social Psychology, 92, 27-29.
  18. ^ Aronson, E., & Carlsmith, J.M. (1968) Experimentation in social psychology. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology. (Vol. 2, pp. 1-79.) Reading, MA: Addison-Wesley
  19. ^ Josephson, W. (1987) Television violence and children's aggression: Testing the priming, social script, and disinhibition prediction. Journal of personality and Social Psychology, 53, 882-890.
  20. ^ Josephson, W. (1987) Television violence and children's aggression: Testing the priming, social script, and disinhibition prediction. Journal of personality and Social Psychology, 53, 882-890.
  21. ^ Latane, B., & Darley, J.M. (1970). The unresponsive bystander: Why doesn't he help? Englewood Cliffs, NJ: Prentice Hall