Jump to content

Theoretical behaviorism: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
cv-cleaned; no violation here - just a series of long quotations
Rayner111 (talk | contribs)
The is a major rewrite. I hope I have dealt with all, or most, of the reviewers' previous objections.
Line 13: Line 13:
----
----


Experimental psychologist B. F. Skinner proposed radical behaviorism <ref> Skinner, B. F. (1945). The operational analysis of psychological terms. ''Psychological Review'', 52, 270-277, 291-294; Schneider, S. M & Morris, E. K. (1987) A history of the term Radical Behaviorism: from Watson to Skinner ''The Behavior Analyst'' 1987, 10, 27-39 </ref> as an alternative to the then-prevailing more mentalistic philosophies of mind and behavior. His new philosophy accompanied a novel program of experimental research based on the study of individual organisms rather than groups. Two books, ''The behavior of organisms'' (1938) and ''Schedules of reinforcement'' (1957, with C. B. Ferster), describe the methods of ''operant conditioning'' and present much experimental data.
Experimental psychologist B. F. Skinner proposed [[Radical_behaviorism|radical behaviorism]] <ref> Skinner, B. F. (1945). The operational analysis of psychological terms. ''Psychological Review'', 52, 270-277, 291-294; Schneider, S. M & Morris, E. K. (1987) A history of the term Radical Behaviorism: from Watson to Skinner ''The Behavior Analyst'' 1987, 10, 27-39 </ref> as an alternative to the then-prevailing more mentalistic philosophies of mind and behavior. His new philosophy accompanied a novel program of experimental research based on the study of individual organisms rather than groups. Two books, ''The behavior of organisms'' (1938) and ''Schedules of reinforcement'' (1957, with C. B. Ferster), describe the methods of ''operant conditioning'' and present much experimental data.

The new experimental methods were very successful, producing hundreds of papers reporting many new phenomena. But radical behaviorism was criticized as a philosophy for neglecting the processes that interpose between stimulus and response <ref>Staddon, J. E. R. (1973). On the notion of cause, with applications to behaviorism. Behaviorism, 1, 25-63; Staddon, J. (1993) Behaviorism: Mind, Mechanism and Society. London: Duckworth)</ref>. The movement was in fact actively hostile to theory; but experimental success overshadowed theoretical weakness.
The new experimental methods were very successful, producing hundreds of papers reporting many new phenomena. But radical behaviorism was criticized as a philosophy for neglecting the processes that interpose between stimulus and response <ref>Staddon, J. E. R. (1973). On the notion of cause, with applications to behaviorism. Behaviorism, 1, 25-63; Staddon, J. (1993) Behaviorism: Mind, Mechanism and Society. London: Duckworth)</ref>. The movement was in fact actively hostile to theory; but experimental success overshadowed theoretical weakness.


==Operant Conditioning==
==Skinner’s operant conditioning==
Two technical inventions, the [[Operant_conditioning_chamber|Skinner box]] and the cumulative recorder, were central. The Skinner box facilitated long-term (many weeks rather than a few days) automated experiments on learned behavior in individual organisms. The cumulative recorder showed real-time data as opposed to the group averages commonly used by other researchers.
Two inventions, the [[Operant_conditioning_chamber|Skinner box]] and the cumulative recorder, were central to Skinner’s new experimental method. The Skinner box facilitated long-term (many weeks rather than a few days) automated experiments on learned behavior in individual organisms. The cumulative recorder showed real-time data as opposed to the group averages commonly used by other researchers.


Skinner was able to demonstrate rapid learning in individual animals. The method was to present small rewards (now called ''reinforcements'') right after the animal makes a desired response. The process could begin with approximations to the target behavior. Skinner called the technique ''shaping by successive approximations.'' The process as a whole he termed [[Operant_conditioning|operant conditioning]], a re-naming of what was already known as ‘instrumental learning’.
Skinner was able to demonstrate rapid learning in individual animals. The method was to present small rewards (now called ''reinforcements'') right after the animal makes a desired response. The process could begin with approximations to the target behavior. Skinner called the technique ''shaping by successive approximations.'' The process as a whole he termed '[[Operant_conditioning|operant conditioning]]', a re-naming of what was already known as ‘instrumental learning’.


Skinner recognized the spontaneity of operant behavior in advance of any reinforcement. The organism must make the desired response before it can be reinforced. He called such behavior ''emitted'' and contrasted it with the elicited, reflex-like behavior of classical (Pavlovian) conditioning, which he called ''respondent behavior''. Skinner's emphasis on a pre-existing behavioral ''repertoire'' in advance of reinforcement remains important; but the sharpness of the operant-respondent distinction has been called into question by theoretical behaviorism.
Skinner recognized that behavior must occur spontaneously before it can be reinforced. He called behavior that is both spontaneous and reinforcible ''emitted behavior'', in contrast to the elicited, reflex-like behavior of classical (Pavlovian) conditioning (''respondent'' behavior). The operant-respondent distinction turned out to be flawed, but the idea that behavior must be emitted before it can be rewarded is critical.


The operant method showed that a given response, be it lever-pressing by a rat or key-pecking by a pigeon, need not be reinforced on every occasion. Responding can be maintained by various ''partial-reinforcement'' schedules. For example, if only the first response after 30 s is reinforced (a ''fixed-interval 30 s'' schedule), pigeons, rats and humans will soon learn to wait for 15 s or so after each reinforcement before resuming responding yield the familiar ''scalloped'' cumulative record. If reinforcement is intermittent and unpredictable, responding may be maintained for many hours in the absence of further reinforcement.
The operant method showed that a given response, be it lever-pressing by a rat or key-pecking by a pigeon, need not be reinforced on every occasion. Responding can be maintained by various ''partial-reinforcement'' schedules. For example, if only the first response after 30 s is reinforced (a ''fixed-interval-30 s'' schedule), pigeons, rats and humans will soon learn to wait for 15 s or so after each reinforcement before resuming responding, yielding the familiar ''scalloped'' cumulative record. If reinforcement is intermittent and unpredictable, responding may be maintained for many hours in the absence of further reinforcement.


Experiment soon revealed hitherto unsuspected regularities: the stable cumulative records associated with different schedules. Most important, these stable patterns could be recovered after exposure to another schedule. The typical scallop pattern on an FI schedule, for example, would reappear after a few days on second exposure after an intervening experience with another procedure. Behavior in condition A would be the same after different prior experiences B, C, D, etc.
Experiment soon revealed hitherto unsuspected regularities: the stable cumulative records associated with different schedules. Most important, these stable patterns could be recovered after exposure to another schedule. The typical scallop pattern on an FI schedule, for example, would reappear after a few days on second exposure after an intervening experience with another procedure. Behavior in condition A would be the same after different prior experiences B, C, D, etc.


Learning is, almost by definition, irreversible. The effect of treatment X will therefore be different if preceded by treatment A than if it is preceded by B. Two learning treatments cannot be meaningfully compared successively in the same subject. Most learning psychologists therefore assumed that learning must be studied by comparing groups of subjects. In fact, the [[Replication_crisis|group method]], and its associated statistical analyses, has turned out to have many unsuspected pitfalls - problems which are avoided by Skinner's approach.
Learning is, almost by definition, irreversible. The effect of treatment X will therefore be different if preceded by treatment A than if it is preceded by B. Two learning treatments cannot be meaningfully compared successively in the same subject. Most learning psychologists therefore assumed that learning must be studied by comparing groups of subjects. In fact, the [[Replication_crisis|group method]], and its associated statistical analyses, has turned out to have unsuspected pitfalls - problems which are avoided by the single-subject approach.


On the other hand, the fact that behavior under a given reinforcement schedule is stable, the same no matter what the preceding treatment, suggested to Skinner and his followers that learning – operant conditioning – ''can'' be studied in single subjects. Neither averaging across individuals nor comparisons between groups is required. Since the individual, not the group, is the target of all psychological investigation, and since there were known to be serious problems inferring the properties of individuals from group averages <ref>See, for example Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53, 134-140 and Staddon, J. Scientific Method: How Science Works, Fails to Work, and Pretends to Work (Psychology Press, 2018). </ref>, Skinner’s method seemed to provide a powerful technique for understanding the effects of reward and punishment on the behavior of individual organisms.
On the other hand, the fact that behavior under a given reinforcement schedule is recoverable, the same no matter what the preceding treatment, suggested to Skinner and his followers that learning – operant conditioning – ''can'' be studied in single subjects. Neither averaging across individuals nor comparisons between groups is required. Since the individual, not the group, is the target of all psychological investigation, and since there were known to be serious problems inferring the properties of individuals from group averages <ref>See, for example Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53, 134-140. For a review, see Staddon, J. Scientific Method: How Science Works, Fails to Work, and Pretends to Work (Psychology Press, 2018). </ref>, Skinner’s method seemed to provide a powerful technique for understanding the effects of reward and punishment on the behavior of individual organisms.


''Rate of response'' is visible as the slope of a cumulative record. As a subject learns a typical operant task, the slope of the record, the rate, increases: "The rate at which a response is emitted in such a situation comes close to our preconception of the learning process. As the organism learns, the rate rises." <ref>Skinner, B. F. (1950). Are theories of learning necessary? ''Psychological Review,''57, 193-216.</ref>[https://zourpri.files.wordpress.com/2014/01/are-theories-of-learning-necessary.pdf] Skinner continued:
''Rate of response'' is visible as the slope of a cumulative record. As a subject learns a typical operant task, the slope of the record, the rate, increases: "The rate at which a response is emitted in such a situation comes close to our preconception of the learning process. As the organism learns, the rate rises." <ref>Skinner, B. F. (1950). Are theories of learning necessary? ''Psychological Review,''57, 193-216.</ref>[https://zourpri.files.wordpress.com/2014/01/are-theories-of-learning-necessary.pdf] Skinner continued:


:"It is no accident that rate of responding is successful as a datum...If we are to predict behavior (and possibly to control it), we must deal with probability of response...Rate of responding is not a ‘measure’ of probability but it is the only appropriate datum in a formulation in these terms."
:"It is no accident that rate of responding is successful as a datum...If we are to predict behavior (and possibly to control it), we must deal with ''probability of response''...Rate of responding is not a ‘measure’ of probability but it is the only appropriate datum in a formulation in these terms."

So, response rate was used by most operant conditioners as a measure of response probability. Researchers noticed that if reinforcement is available only at random times (a ''random-interval – RI'' – schedule, one kind of variable-interval, VI), a procedure which ensures that reinforcement probability is essentially constant, subjects adapt by responding at a constant rate which is positively related to reinforcement rate.
So, response rate was used by most operant conditioners as a measure of response strength/probability. Researchers noticed that if reinforcement is available only at random times (a ''random-interval – RI'' – schedule, one kind of variable-interval, VI), a procedure which ensures that reinforcement probability is essentially constant, subjects adapt by responding at a constant rate which is positively related to reinforcement rate.
With data like this, and Skinner’s suggestion that response rate can be used as a measure of response probability, average response rate became the standard dependent variable for operant psychology <ref> Skinner was not happy at the abandonment of cumulative records that followed: Skinner, B. F. (1976). EDITORIAL: Farewell my LOVELY! ''Journal of the Experimental Analysis of Behavior,'' 25(2), p.218. The order enabled by averaging – if not across subjects, within subject – deflected attention away from moment-by-moment changes in behavior </ref>.

In this way, average response rate became the standard dependent variable for operant psychology<ref>Skinner was not happy at the abandonment of cumulative records that followed: Skinner, B. F. (1976). EDITORIAL: Farewell my LOVELY! ''Journal of the Experimental Analysis of Behavior,'' 25(2), p.218. The order enabled by averaging – if not across subjects, within subject – overcame Skinner’s objection to a retreat from real-time data.</ref>.


==Discriminative stimuli==
==Discriminative stimuli==
Line 45: Line 48:
By inventing new concepts, and re-naming several old ones, Skinner created a distinctive terminology that helped to define a new and self-contained movement: the experimental analysis of behavior, aka behavior analysis aka operant conditioning.
By inventing new concepts, and re-naming several old ones, Skinner created a distinctive terminology that helped to define a new and self-contained movement: the experimental analysis of behavior, aka behavior analysis aka operant conditioning.


With the sole exception of the three-term contingency, these ideas were summarized by Skinner in a ground-breaking 1950 paper <ref>Skinner, B. F. (1950). Are theories of learning necessary? ''Psychological Review,'' 57, 193-216.</ref> , ''Are theories of learning necessary?.'' He defined ''unnecessary'' “theory” as “any explanation of an observed fact which appeals to events taking place somewhere else, at some other level of observation, described in different terms, and measured, if at all, in different dimensions. This definition would rule out many well-accepted theories in other areas of science. The temperature of a liquid, for example, is directly related to movement. It is not clear that the “dimensions” of temperature are the same as the kinetic energy of molecules. The spectra of hot elements – the red flame of lithium, for example – can be derived from the element’s atomic properties. Again, it is not the case that the atomic properties that underlie emission spectra have same dimensions as wavelength. It cannot be right to rule out theories like this<ref>There is something in physics called [[Dimensional_analysis|dimensional analysis]], which says that the dimensions (typically mass, length and time) on both sides of an equation must match. But is not clear that this was Skinner’s meaning for “dimension.” </ref> .
With the sole exception of the three-term contingency, these ideas were summarized by Skinner in an influential 1950 paper <ref>Skinner, B. F. (1950). Are theories of learning necessary? ''Psychological Review,'' 57, 193-216.</ref> , ''Are theories of learning necessary?.'' He defined ''theory'' as "any explanation of an observed fact which appeals to events taking place somewhere else, at some other level of observation, described in different terms, and measured, if at all, in different dimensions." This definition would rule out many well-accepted theories in other areas of science. The temperature of a liquid, for example, is directly related to movement. It is not clear that the “dimensions” of temperature are the same as the kinetic energy of molecules. The spectra of hot elements – the red flame of lithium, for example – can be derived from the element’s atomic properties. Again, it is not the case that the atomic properties that underlie emission spectra have same dimensions as wavelength. It cannot be right to rule out theories like this<ref>There is something in physics called [[Dimensional_analysis|dimensional analysis]], which says that the dimensions (typically mass, length and time) on both sides of an equation must match. But is not clear that this was Skinner’s meaning for “dimension.” </ref> .


Skinner argued that learning theories are for the most part impediments to scientific advance: “Much useless experimentation results from theories, and much energy and skill are absorbed by them”, although he also conceded that, “It would be foolhardy to deny the achievements of theories of this sort in the history of science. “This sort” refers to a previous paragraph in which he attempts to distinguish between “postulates, “theorems, and “theories.He admits, in a widely cited phrase, there is a “need for a formal representation of the data reduced to a minimal number of terms” but at the end of his article says that, “We do not seem to be ready for theory in this sense. By the 1990s, the field was ready.
Skinner argued that learning theories are for the most part impediments to scientific advance: "Much useless experimentation results from theories, and much energy and skill are absorbed by them", although he also conceded that, "It would be foolhardy to deny the achievements of theories of this sort in the history of science. This sort refers to a previous paragraph in which he attempts to distinguish between postulates", "theorems", and "theories".'He admits, in a widely cited phrase, there is a "need for a formal representation of the data reduced to a minimal number of terms" but at the end of his article says that,
"We do not seem to be ready for theory in this sense." By the 1990s, the field was ready.


==Problems with atheoretical behaviorism==
==Problems with atheoretical behaviorism==


The incomplete logical basis for Skinner’s anti-theory argument was overshadowed by the very compelling experimental examples he described in the rest of the 1950 article. His novel method produced strikingly orderly real-time patterns of behavior in individual organisms. He proceeded to use these data to identify what he called ''controlling variables'', those aspects of the training procedure responsible for the observed patterns: “the independent variables of which probability of response is a function. When we know the controlling variables, he argued, theory is unnecessary. Defending his idea that response probability is the correct dependent variable for learning psychology, he showed that the alternative favored by reflex-type theorists, latency, did not behave in the appropriate way. Motivated and unmotivated animals show the same modal response latency in a suitable task. Motivated animals do not respond sooner, as they should if latency is an adequate measure of response strength. As hunger motivation is reduced, latencies are more ''variable'' however (what Skinner referred to as a "scattering of latencies"). The increase in variability as motivation is reduced is consistent with the selectionist view of learning proposed by theoretical behaviorism <ref>See, for example, Staddon, J. E. R., & Simmelhag, V. (1971). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. ''Psychological Review'', 78, 3-43.</ref>
The shaky epistemological basis for Skinner’s anti-theory argument was overshadowed by the very compelling experimental examples he described in the rest of the 1950 article. His novel method produced strikingly orderly real-time patterns of behavior in individual organisms. He proceeded to use these data to identify what he called "controlling variables", those aspects of the training procedure responsible for the observed patterns: "the independent variables of which probability of response is a function." When we know the controlling variables, he argued, theory is unnecessary.
Defending his idea that response probability is the correct dependent variable for learning psychology, he showed that the alternative favored by reflex-type theorists, ''latency'', did not behave in the appropriate way. Motivated and unmotivated animals show the same modal response latency in a suitable task. Motivated animals do not respond sooner, as they should if latency is an adequate measure of response strength. As hunger motivation is reduced, latencies are more ''variable'' however (what Skinner referred to as a "scattering of latencies"). The increase in variability as motivation is reduced is consistent with the selectionist view of learning espoused by theoretical behaviorism <ref>See, for example, Staddon, J. E. R., & Simmelhag, V. (1971). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. ''Psychological Review'', 78, 3-43.</ref>
In another experiment, arguing against the inhibition theory of extinction, Skinner showed that well-trained pigeons forget little even after a lapse of four years between successive exposures to a task. He also showed that the pattern of a cumulative record in extinction is related to the pattern built up during training. He attributed the difference between extinction of a periodic vs. an aperiodic schedule to ''novelty'' and dissipation of emotional responses. He described the method that would later be used by Guttman and Kalish <ref>Guttman, N., & Kalish, H. I. (1956). Discriminability and stimulus generalization. ''Journal of Experimental Psychology'', 51, 79-88.</ref> to measure stimulus generalization.
In another experiment, arguing against the inhibition theory of extinction, Skinner showed that well-trained pigeons forget little even after a lapse of four years between successive exposures to a task. He also showed that the pattern of a cumulative record in extinction is related to the pattern built up during training. He attributed the difference between extinction of a periodic vs. an aperiodic schedule to ''novelty'' and dissipation of emotional responses. He described the method that would later be used by Guttman and Kalish <ref>Guttman, N., & Kalish, H. I. (1956). Discriminability and stimulus generalization. ''Journal of Experimental Psychology'', 51, 79-88.</ref> to measure stimulus generalization.


Skinner’s examples were striking. His conclusion was persuasive. Many researchers came to accept his claim that theories of learning – not just the poor theories then current but perhaps all learning theories – are in fact impediments to progress in scientific psychology.
Skinner’s examples were striking. His conclusion was persuasive. Many readers came to accept his claim that theories of learning – not just the poor theories then current but perhaps all learning theories – are in fact impediments to progress in scientific psychology.

But Skinner’s behaviorism was criticized as theoretically weak. The argument can be illustrated by re-visiting three of his examples to see how they lead to the theory and philosophy of theoretical behaviorism.
But atheoretical behaviorism was criticized by theorists on several points, which can be illustrated by re-visiting three of Skinner’s examples: response rate as a dependent variable, extinction and memory, and the operant-respondent distinction. These problems led to the theory and philosophy of theoretical behaviorism.


==Response rate==
==Response rate==


Skinner wrote in the 1950 paper:
Skinner wrote in the 1950 paper:
:"Rate of responding appears to be the only datum that varies significantly and in the expected direction under conditions which are relevant to the “learning process.” We may, therefore, be tempted to accept it as our long-sought for measure of strength of bond, excitatory potential, etc. Once in possession of an effective datum, however, we may feel little need for any theoretical construct of this sort"<ref> Skinner 1950, op. cit. p.198.</ref>.
:"Rate of responding appears to be the only datum that varies significantly and in the expected direction under conditions which are relevant to the “learning process.” We may, therefore, be tempted to accept it as our long-sought for measure of strength of bond, excitatory potential, etc. Once in possession of an effective datum, however, we may feel little need for any theoretical construct of this sort"<ref> Skinner 1950, op. cit. p.198.</ref>.
Which suggests that response rate might be an independent indicator of something like 'strength of learning'. Skinner wasn’t sure, but succumbed to temptation and settled on response rate as the most useful response measure.
Which suggests that response rate might be an independent indicator of something like 'strength of learning'. Skinner wasn’t sure, but succumbed to temptation and settled on response rate as the most useful response measure.


A nagging problem for the idea that response rate is always valid as a measure of response probability is that rate of response can itself be controlled by the appropriate contingencies of reinforcement. For example, animals will learn, albeit with some difficulty, to space their pecks or lever presses 10-s apart (spaced-responding schedule) if that is a condition for reinforcement,<ref>See Staddon, J. E. R. (2016). ''Adaptive Behavior and Learning,'' 2nd edition. Cambridge University Press, for an experimental review, and other references. </ref> – and even though the ‘natural’ rate for an equally rewarding schedule that lacks the spaced-responding requirement is much higher, perhaps 60 pecks per minute. Since rate of response, over the typical period of 30-min or so, on a spaced-responding schedule is low, then probability of response hence response “strength’ must also be low, according to one reading of Skinner – lower than on, say, a variable-interval schedule dispensing reinforcements at the same rate. This is clearly wrong. Response rate, ''per se,'' is not an adequate measure of response strength.
A nagging problem for the idea that response rate is always valid as a measure of response strength is that rate of response can itself be controlled by the appropriate contingencies of reinforcement. For example, animals will learn, albeit with some difficulty, to space their pecks or lever presses 10-s apart (spaced-responding schedule) if that is a condition for reinforcement,<ref>See Staddon, J. E. R. (2016). ''Adaptive Behavior and Learning,'' 2nd edition. Cambridge University Press, for an experimental review, and other references. </ref> – even though the ‘natural’ rate for an equally rewarding schedule that lacks the spaced-responding requirement is much higher, perhaps 60 pecks per minute. Since rate of response, over the typical period of 30-min or so, on a spaced-responding schedule is low, then probability of response hence response 'strength'must also be low, according to one reading of Skinner – lower than on, say, a variable-interval schedule dispensing reinforcements at the same rate. This is clearly wrong. Response rate, ''per se,'' is not an adequate measure of response strength.


Skinner never wrote explicitly about this issue. But an obvious objection to response rate as a universal measure of response strength is that the spaced-responding schedule involves the explicit discrimination of time. Response probability, hence, response strength is high at some times and low at others. It is high close to 10 s after each response and low in between, for example. Much the same is true of a fixed-interval schedule. This is of course true. But it makes response rate less attractive as a universal measure of response strength. Indeed, perhaps ''time'' should take over as the appropriate dependent variable<ref> See, for example, Williams, D. A., Lawson, C., Cook, R., Johns, K. W and Mather, A. A. (2008). Timed excitatory conditioning under zero and negative contingencies. ''Journal of Experimental Psychology: Animal Behavior Processes,'' 34(1), 94–105.</ref>? Perhaps the question should be not “how does schedule X affect response rate?” but, “How does schedule X affect the temporal location of behavior?”
Skinner never wrote explicitly about this problem. But an obvious response is that the spaced-responding schedule involves the explicit discrimination of time. Response probability, hence, response strength is high at some times and low at others. It is high close to 10 s after each response and low in between, for example. Much the same is true of a fixed-interval schedule. This is of course true. But it makes response rate less attractive as a universal measure of response strength. Indeed, perhaps ''time'' should take over as the appropriate dependent variable<ref> See, for example, Williams, D. A., Lawson, C., Cook, R., Johns, K. W and Mather, A. A. (2008). Timed excitatory conditioning under zero and negative contingencies. ''Journal of Experimental Psychology: Animal Behavior Processes,'' 34(1), 94–105.</ref>? Perhaps the question should be not “how does schedule X affect response rate?” but, “How does schedule X affect the temporal location of behavior?”


Using time as a dependent measure also avoids a problem that is rarely addressed: over what time period (minutes? hours?) should rates be computed – and why? In operant-conditioning experiments, rates are usually computed over intervals of 30-min or more. The choice of denominator is justified not by any theoretical rationale, but by the orderly functional relations that result. In Skinner’s [http://www.bfskinner.org/newtestsite/wp-content/uploads/2014/02/ScienceHumanBehavior.pdf] own words: “Science is …is an attempt to discover order, to show that certain events stand in lawful relations to other events.” Order was its own justification for Skinner.
Using time as a dependent measure also avoids a problem that is rarely addressed: over what time period (minutes? hours?) should rates be computed? In operant-conditioning experiments, rates are usually computed over intervals of 30-min or more. The choice of denominator is justified not by any theoretical rationale, but by the orderly functional relations that result. In Skinner’s [http://www.bfskinner.org/newtestsite/wp-content/uploads/2014/02/ScienceHumanBehavior.pdf] own words: “Science is...is an attempt to discover order, to show that certain events stand in lawful relations to other events.” Order was its own justification for Skinner.


==Memory==
==Extinction and Memory==


Skinner never mentioned the word ''memory'' in the 1950 article, and rarely afterwards. But he did discuss ''spontaneous recovery,'' a paradoxical property of experimental extinction: After sufficient training, an organism responds. If reinforcement is withdrawn, responding ceases (extinction), usually within a single experimental session. But the next day, returned to the apparatus, the animal begins to respond again. Since we know (argued Skinner) that little or no forgetting should occur from one day to the next, this recovery of the extinguished response, an apparent forgetting of the extinction on the previous day, needs explaining.
Skinner never mentioned the word ''memory'' in the 1950 article, and rarely afterwards. But he did discuss ''spontaneous recovery,'' a paradoxical property of experimental extinction: After sufficient training, an organism responds. If reinforcement is withdrawn, responding ceases (extinction), usually within a single experimental session. But the next day, returned to the apparatus, the animal begins to respond again. Since we know (argued Skinner) that little or no forgetting should occur from one day to the next, this recovery of the extinguished response, an apparent forgetting of the extinction on the previous day, needs explaining.
Until Skinner’s paper, the standard explanation for spontaneous recovery was that during the extinction session, ''inhibition'' builds up, but by the next day it has dissipated so responding recovers, at least for a while. (Unless inhibition is given some testable properties, this is little more than re-description of the facts.) But Skinner already showed that mere passage of time has little effect on level of responding -- although we will have reason to question that in a moment. So perhaps some other variables are operating? In the 1950 paper, Skinner proposed two: ''emotion'' and ''novelty'':
Until Skinner’s paper, the standard explanation for spontaneous recovery was that during the extinction session, ''inhibition'' builds up, but by the next day it has dissipated so responding recovers, at least for a while. <ref>Of course, unless inhibition is given some testable properties, this is little more than re-description of the facts.</ref> But Skinner already showed that mere passage of time has little effect on level of responding -- although we will have reason to question that in a moment. So perhaps some other variables are operating? In the 1950 paper, Skinner proposed two: ''emotion'' and ''novelty'':


:"When we fail to reinforce a response that has previously been reinforced, we not only initiate a process of extinction, we set up an ''emotional response.''...The pigeon coos in an identifiable pattern, moves rapidly about the cage, defecates, or flaps its wings rapidly in a squatting position that suggests treading (mating) behavior. This ''competes'' with the response of striking a key and is perhaps enough to account for the decline in rate in early extinction…Whatever its nature, the effect of this variable is eliminated through ''adaptation''." [my emphases]
:"When we fail to reinforce a response that has previously been reinforced, we not only initiate a process of extinction, we set up an ''emotional response.''...The pigeon coos in an identifiable pattern, moves rapidly about the cage, defecates, or flaps its wings rapidly in a squatting position that suggests treading (mating) behavior. This ''competes'' with the response of striking a key and is perhaps enough to account for the decline in rate in early extinction…Whatever its nature, the effect of this variable is eliminated through ''adaptation''." [emphases added]


Skinner said no more than this about “emotion,” but his description is interesting for two reasons. First, it involves ''observation'', actually watching the (pigeon, rat) subjects. This practice soon fell out of fashion in behavior analysis. Yet direct observation of behavior was later to prove critical in shedding new light on Skinner’s theoretical approach. Second, he might have said something more about ''competition'', which is apparently also involved. As it is, emotion is unsatisfactory as an explanation because the new process he invokes to explain its dissipation, ''adaptation'' , is not itself explained.
Skinner said no more than this about “emotion,” but his description is interesting for two reasons. First, it involves ''observation'', actually watching the (pigeon, rat) subjects. This practice soon fell out of fashion in behavior analysis. Yet direct observation of behavior was later to prove critical in shedding new light on Skinner’s theoretical approach. Second, he might have said something more about ''competition'', which is apparently also involved. As it is, emotion is unsatisfactory as an explanation because the new process he invokes to explain its dissipation, ''adaptation''<ref>Emotion, which competes with the learned behavior and adapts with time may seem to many readers hard to distinguish from the ''reactive inhibition'' that Skinner was criticizing.</ref>, cannot be independently measured.


But novelty is the variable Skinner thought most important: “Maximal responding during extinction is obtained only when the conditions under which the response was reinforced are precisely reproduced.” First Skinner describes ''stimulus generalization'', the decline in responding in the presence of stimuli different from the training stimulus. Then he goes on:
But novelty is the variable Skinner thought most important: “Maximal responding during extinction is obtained only when the conditions under which the response was reinforced are precisely reproduced.” First Skinner describes ''stimulus generalization'', the decline in responding in the presence of stimuli different from the training stimulus. Then he goes on:

:"Something very much like this must go on during extinction. Let us suppose that all responses to a key have been reinforced and that each has been followed by a short period of eating. When we extinguish the behavior, we create a situation in which responses are not reinforced, in which no eating takes place, and in which there are probably new emotional responses. The situation could easily be as novel as a red triangle after a yellow [his earlier example of stimulus generalization]. If so, it could explain the decline in rate during extinction."
:"Something very much like this must go on during extinction. Let us suppose that all responses to a key have been reinforced and that each has been followed by a short period of eating. When we extinguish the behavior, we create a situation in which responses are not reinforced, in which no eating takes place, and in which there are probably new emotional responses. The situation could easily be as novel as a red triangle after a yellow [his earlier example of stimulus generalization]. If so, it could explain the decline in rate during extinction."


Novelty, as subsequently precisely measured in the stimulus generalization experiments of [https://www.researchgate.net/publication/6973129_The_legacy_of_Guttman_and_Kalish_1956_Twenty-five_years_of_research_on_stimulus_generalization Guttman and Kalish and] many others, is the real explanation for spontaneous recovery, said Skinner. But again, this is an incomplete account, because we cannot measure the stimulus in this case. In regular stimulus generalization, to a color or a shape, for example, both physical stimulus properties and the effects of changes on responding can be measured objectively. Not so in the case of extinction, the case that Skinner is attempting to explain. How exactly should ‘novelty’ be manipulated? Something more was needed: a ''theory of memory'', perhaps?
Novelty, as subsequently precisely measured in the stimulus generalization experiments of [https://www.researchgate.net/publication/6973129_The_legacy_of_Guttman_and_Kalish_1956_Twenty-five_years_of_research_on_stimulus_generalization Guttman and Kalish and] many others, is the real explanation for spontaneous recovery, said Skinner. But again, this is an incomplete account, because we cannot measure the stimulus in this case. In regular stimulus generalization, to a color or a shape, for example, both physical stimulus properties and the effects of changes on responding can be measured objectively. Not so in the case of extinction, the case that Skinner is attempting to explain. How exactly should ‘novelty’ be measured/manipulated? Something more was needed: a ''theory of memory'', perhaps?
A relevant theory was in fact available. At the end of the previous century, Adolf Jost proposed two memory laws<ref>Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abhängigkeit von der Verteilung der Wiederholungen. Zeitschrift fűr Psychologie und Physiologie der Sinnesorgane, 14, 436-472.</ref>, the second of which is: given two associations (equivalently, habits, memories, operants) of the same strength, but of different ages, the older one will fall off less rapidly with time. Jost’s law implies that the strength of a habit does not decay exponentially, by the same fixed fraction each day, because if it did, the relative strength of two memories would not change with lapse of time.
A relevant theory was in fact available. At the end of the previous century, Adolf Jost proposed two memory laws<ref>Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abhängigkeit von der Verteilung der Wiederholungen. Zeitschrift fűr Psychologie und Physiologie der Sinnesorgane, 14, 436-472.</ref>, the second of which is: Given two associations (equivalently, habits, memories, operants) of the same strength, but of different ages, the older one will fall off less rapidly with time. Jost’s law implies that the strength of a habit does not decay exponentially, by the same fixed fraction each day, because if it did, the relative strength of two memories would not change with lapse of time.


On the other hand, suppose that the decline in strength of a habit, V<sub>i</sub>, after time t<sub>i</sub>, is not exponential but hyperbolic, like this:
On the other hand, suppose that the decline in strength of a habit, V<sub>i</sub>, after time t<sub>i</sub>, is not exponential but hyperbolic, like this:
Line 98: Line 107:
Equation 1 is hyperbolic, but many other monotonic, decreasing functions will do as well to model Jost’s Law.
Equation 1 is hyperbolic, but many other monotonic, decreasing functions will do as well to model Jost’s Law.


Jost’s Law explains spontaneous recovery. Since the first extinction session is necessarily more recent than the many days of conditioning that preceded it, the associated behavior should lose more strength from one day to the next than the earlier conditioning. At the end of the first day of extinction, responding ceases, which means that the strengths of the two memories, for responding and for not responding, must be roughly equal. Once responding ceases, no further decline in the tendency to respond can occur. But the next day, the older tendency – to respond – must gain (according to Jost) over the more recent one (not responding), hence: spontaneous recovery.
Jost’s Law explains spontaneous recovery. Since the first extinction session is necessarily more recent than the many days of conditioning that preceded it, the associated behavior should lose more strength from one day to the next than the earlier conditioning. At the end of the first day of extinction, responding ceases, which means that the strengths of the two memories, for responding and for not responding, must be roughly equal. Thereafter both memories decline in strength, the new faster than the older. On the next day, therefore, the older tendency – to respond – must gain (according to Jost) over the more recent one (not responding), hence: spontaneous recovery. With continued extinction session, the "respond" memory continues to decline, but the "not-respond"tendency is strengthened day by day.
A model like this could make predictions about the effects of different delays before returning the animal to the apparatus, and on different amounts of training on subsequent extinction . If the second extinction session follows closely on the first, recovery should be less, for example. In other words, the theory draws attention to historical variables as possibly involved in recovery after extinction, a useful advance over the “novelty” idea, which looks only at a contemporary cause, and one that is difficult to measure objectively.
A model like this could make predictions about the effects of different delays before returning the animal to the apparatus, and on different amounts of training on subsequent extinction . If the second extinction session follows closely on the first, recovery should be less, for example. In other words, the theory draws attention to historical variables as possibly involved in recovery after extinction, a useful advance over the “novelty” idea, which looks only at a contemporary cause, and one that is difficult to measure objectively.


A Jost’s Law account implies that memories compete in some way. The competition idea also speaks to the apparent contradiction between the very slow decay of well-learned operant behavior demonstrated in Skinner’s 4-year experiment and the apparent rapid forgetting of an extinction experience illustrated by spontaneous recovery. The key is competition between memories. In the absence of any competition, a habit may persist for a long time, as Skinner’s pigeons showed. But when the competition is weak – just one extinction session – memory for many earlier conditioning sessions reasserts itself, and responding recovers until more extinction experience has accumulated.
A Jost’s Law account implies that memories compete in some way. The competition idea also speaks to the apparent contradiction between the very slow decay of well-learned operant behavior demonstrated in Skinner’s 4-year experiment and the apparent rapid forgetting of an extinction experience illustrated by spontaneous recovery. The key is competition between memories. In the absence of any competition, a long-learned habit may persist for a long time, as Skinner’s pigeons showed. But when the competition is weak – just one extinction session – memory for many earlier conditioning sessions reasserts itself, and responding recovers until more extinction experience has accumulated.


[[Hyperbolic_discounting|Hyperbolic discounting]] is a phenomenon much studied by behavioral economists with both human and animal subjects. In a choice situation, subjects usually prefer a reward of size 2 after a delay of 10-s, say, over a reward of size 5 after a delay of 20-s, even though the rate of return is better for the larger, later reward. This contradicts the standard exponential discounting assumption, which assumes that rate of return is key.
[[Hyperbolic_discounting|Hyperbolic discounting]] is a phenomenon much studied by behavioral economists with both human and animal subjects. In a choice situation, subjects usually prefer a reward of size 2 after a delay of 10-s, say, over a reward of size 5 after a delay of 20-s, even though the rate of return is better for the larger, later reward. This contradicts the standard exponential discounting assumption, which assumes that rate of return is key.
Line 108: Line 117:
It is tempting to relate hyperbolic memory decay to hyperbolic discounting in choice experiments and there may be some theoretical link. But also involved is the fact that organisms typically time their responses to be [https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/3387/WynneStaddon1988.pdf?sequence=1&isAllowed=y proportional to the expected time of a reward.] There is also some evidence that the larger the anticipated reward, the [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347824/ sooner animals will respond.] Offered a choice, therefore, between two stimuli, one signaling a small reward after 5-s vs one more than twice the size after 10-s, preference will be a balance between the tendency to wait a time proportional to the expected delay (which favors the smaller, sooner reward) and an opposite tendency to respond sooner if the expected reward is larger. The experimental evidence seems to suggest that the latter effect is smaller than the former. Animals are likely to respond sooner to the shorter delay, even if the associated reward size is smaller – and even if the overall rate of reward associated with the smaller choice is less than for the larger.
It is tempting to relate hyperbolic memory decay to hyperbolic discounting in choice experiments and there may be some theoretical link. But also involved is the fact that organisms typically time their responses to be [https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/3387/WynneStaddon1988.pdf?sequence=1&isAllowed=y proportional to the expected time of a reward.] There is also some evidence that the larger the anticipated reward, the [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1347824/ sooner animals will respond.] Offered a choice, therefore, between two stimuli, one signaling a small reward after 5-s vs one more than twice the size after 10-s, preference will be a balance between the tendency to wait a time proportional to the expected delay (which favors the smaller, sooner reward) and an opposite tendency to respond sooner if the expected reward is larger. The experimental evidence seems to suggest that the latter effect is smaller than the former. Animals are likely to respond sooner to the shorter delay, even if the associated reward size is smaller – and even if the overall rate of reward associated with the smaller choice is less than for the larger.
Note that these two accounts are not in conflict: ''discounting'' is a functional, economic account; the timing account represents the potential underlying mechanism.
These two accounts are not in conflict: ''discounting'' is a functional, economic account; the timing account represents the potential underlying mechanism.


==The Operant-Respondent Distinction==
==The Operant-Respondent Distinction==


Ivan Petrovich Pavlov (1849-1936) did pioneering work on conditioned responses like salivation (typically by a dog following several pairings between a buzzer or a metronome, say, and the delivery of food) -- physiology not psychology. The focus of Pavlov, and many who followed him, was on the reflex-like behavior maintained by ''classical'' or [[Classical_conditioning|Pavlovian]] or, in Skinner’s terms ''respondent'', conditioning. Pavlov found that the conditioned response was most rapidly obtained if the food followed closely on the stimulus (temporal contiguity). Subsequent work by [http://www.scholarpedia.org/article/Rescorla-Wagner_model Robert Rescorla] and others showed that the key was prediction. The signaling stimulus need not be close in time to the reinforcer so long as it is closer than any other signal: ''relative proximity''. Skinner justified his term ''respondent'' by pointing out that conditioned responses like salivation are products of the autonomic (involuntary), not the somatic (skeletal, voluntary) nervous system. Operant behavior, he thought, depended on the somatic system.
Ivan Petrovich Pavlov (1849-1936) was not a psychologist. His pioneering work on conditioned responses like salivation (typically by a dog following several pairings between a buzzer or a metronome, say, and the delivery of food) was physiology not psychology. The focus of Pavlov, and many who followed him, was on the reflex-like behavior maintained by ''classical'' or [[Classical_conditioning|Pavlovian]] or, in Skinner’s terms ''respondent'', conditioning. Pavlov found that the conditioned response was most rapidly obtained if the food followed closely on the stimulus (temporal contiguity). Subsequent work by [http://www.scholarpedia.org/article/Rescorla-Wagner_model Robert Rescorla] and others showed that the key was prediction. The signaling stimulus need not be close in time to the reinforcer so long as it is closer than any other signal: relative proximity. Skinner justified his term respondent by pointing out that conditioned responses like salivation are products of the autonomic (involuntary), not the somatic (skeletal) nervous system. Operant behavior, he thought, depended on the somatic system.

But after Pavlov, researchers began to pay attention to responses other than salivation that might occur during a conditioning experiment. An early experiment by Karl Zener <ref> Zener, K. (1937) The Significance of Behavior Accompanying Conditioned Salivary Secretion for Theories of the Conditioned Response. American Journal of Psychology, Vol. 50, No. 1/4, pp. 384-403.</ref>showed that salivation is not the only activity that classically conditioned dogs exhibit. A story<ref>Lorenz, K. (1969). Innate bases of learning. In K. Pribram (Ed.) On the biology of learning. NY: Harcourt Brace; see also H. M. Jenkins, .F. J. Barrera, C. Ireland, & B. Woodside (1978) Signal-centered action patterns of dogs in appetitive classical conditioning. ''Learning and Motivation,'' 9, 272-296.</ref> recounted by the great ethologist Konrad Lorenz provides a clue:
But the field might have developed very differently if Pavlov and the very many others who followed him had looked either at weaker conditioning or conditioning in an unrestrained animal.


A story<ref>Lorenz, K. (1969). Innate bases of learning. In K. Pribram (Ed.) On the biology of learning. NY: Harcourt Brace; see also H. M. Jenkins, .F. J. Barrera, C. Ireland, & B. Woodside (1978) Signal-centered action patterns of dogs in appetitive classical conditioning. ''Learning and Motivation,'' 9, 272-296.</ref> recounted by the great ethologist Konrad Lorenz provides a clue:
:"My late friend Howard Liddell told me about an unpublished experiment he did while working as a guest in Pavlov’s laboratory. It consisted simply in freeing from its harness a dog that had been conditioned to salivate at the acceleration in the beat of a metronome. The dog at once ran to the machine, wagged its tail at it, tried to jump up to it, barked, and so on; in other words, it showed as clearly as possible the whole system of behavior patterns serving, in a number of ''Canidae'', to beg food from a conspecific. It is, in fact, this whole system that is being conditioned in the classical experiment."
:"My late friend Howard Liddell told me about an unpublished experiment he did while working as a guest in Pavlov’s laboratory. It consisted simply in freeing from its harness a dog that had been conditioned to salivate at the acceleration in the beat of a metronome. The dog at once ran to the machine, wagged its tail at it, tried to jump up to it, barked, and so on; in other words, it showed as clearly as possible the whole system of behavior patterns serving, in a number of ''Canidae'', to beg food from a conspecific. It is, in fact, this whole system that is being conditioned in the classical experiment."


This rich repertoire of operant behavior [https://books.google.com/books?id=UPWTCwAAQBAJ&pg=PA119&lpg=PA119&dq=karl+zener+classical&source=bl&ots=hpcmpqa_MU&sig=xzuZqrM0TTCG5z4LA4UCui3pFfs&hl=en&sa=X&ved=0ahUKEwj03LLpt67cAhWCVN8KHRHnDjQQ6AEIZjAF#v=onepage&q=karl%20zener%20classical&f=false will appear] even if the conditioned stimulus is too long to produce much salivation but is sufficiently predictive to allow the dog to anticipate food.
This rich repertoire of operant behavior even if the conditioned stimulus is too long to produce much salivation but is sufficiently predictive to allow the dog to anticipate food.


Another sign that something was wrong with the rigid dichotomy between operant and respondent was provided by a pair of experiments: a very influential [https://psych.hanover.edu/classes/Learning/papers/Skinner%20Superstion%20%281948%20orig%29.pdf short paper] "'Superstition' in the pigeon" by Skinner and a much longer [https://www.researchgate.net/publication/232491501_The_Superstition_Experiment_A_Reexamination_of_its_Implications_for_the_Principles_of_Adaptive_Behavior experimental and theoretical paper] many years later.
Another sign that something was wrong with the rigid dichotomy between operant and respondent was provided by a pair of experiments: a very influential [https://psych.hanover.edu/classes/Learning/papers/Skinner%20Superstion%20%281948%20orig%29.pdf short paper] "'Superstition' in the pigeon" by Skinner and a much longer [https://www.researchgate.net/publication/232491501_The_Superstition_Experiment_A_Reexamination_of_its_Implications_for_the_Principles_of_Adaptive_Behavior experimental and theoretical paper] by Staddon and Simmelhag many years later.
Here is what Skinner did in 1948: Hungry pigeons were placed in a box and given brief access to food at fixed periods – 15-s for some animals, longer periods for others. This is ''temporal conditioning'' (a ''fixed-time'' – FT – schedule in operant terminology), which is a Pavlovian procedure since the animal’s behavior has no effect on food delivery.
Here is what Skinner did in 1948: Hungry pigeons were placed in a box and given brief access to food at fixed periods – 15-s for some animals, longer periods for others. This is ''temporal conditioning'' (a ''fixed-time'' – FT – schedule in operant terminology), which is a Pavlovian procedure since the animal’s behavior has no effect on food delivery.


Despite the absence of an operant contingency, all the animals developed vigorous stereotyped, apparently operant, activities in between feeder operations. Skinner attributed this behavior to accidental contiguity between some spontaneous behavior by the pigeon and the delivery of food: adventitious reinforcement . Since these conjunctions were accidental, not causal, Skinner termed the activities “superstitious” and likened them to human superstitions<ref>Indeed, he presented the experiment as a test of the adventitious reinforcement hypothesis. This seems to be the only time, in any publication, that Skinner described an experiment as a test of a hypothesis. See the video snippet: https://www.youtube.com/watch?v=7XbH78wscGw in which biologist Richard Dawkins, long a foe of religion, shows a pigeon in a Skinner box. He slightly mis-describes the superstition’ experiment but then correctly explains Skinner’s (mistaken) adventitious reinforcement explanation. “Humans can be no better than pigeons,” Dawkins concludes. Skinner’s plausible though wrong account still flies Phoenix-like around the Internet.</ref>
Despite the absence of an operant contingency, all the animals developed vigorous stereotyped, apparently operant, activities in between feeder operations. Skinner attributed this behavior to accidental contiguity between some spontaneous behavior by the pigeon and the delivery of food. He called it ''adventitious reinforcement''. Since these conjunctions were accidental, not causal, Skinner termed the activities “superstitious” and likened them to human superstitions <ref>Indeed, he presented the experiment as a test of the adventitious reinforcement hypothesis. This seems to be the only time, in any publication, that Skinner described an experiment as a test of a hypothesis. See the video snippet: https://www.youtube.com/watch?v=7XbH78wscGw in which biologist Richard Dawkins, long a foe of religion, shows a pigeon in a Skinner box. He slightly mis-describes the superstition’ experiment but then correctly explains Skinner’s (mistaken) adventitious reinforcement explanation. “Humans can be no better than pigeons,” Dawkins concludes.</ref>


More than 20 years after the superstition paper, Staddon and Simmelhag repeated Skinner’s experiment, and observed the pigeons’ behavior second-by-second in each interfood interval from the very beginning of training. Their aim was atheoretical. They were simply curious: let’s see what happens, in detail, and let’s see if the interfood interval has to be constant (as in Skinner’s experiment) or can it be variable?
More than 20 years after the superstition paper, Staddon and Simmelhag repeated Skinner’s experiment, and observed the pigeons’ behavior second-by-second in each interfood interval from the very beginning of training. Their aim was atheoretical. They were simply curious: let’s see what happens, in detail, and let’s see if the interfood interval has to be constant (as in Skinner’s experiment) or can it be variable?
Line 131: Line 139:


# The activities that develop are of two kinds: ''interim activities'' that occur in the first two-thirds or so of the fixed interfood interval, and a single ''terminal response'', that occurs during the last third of the interval. Both interim and terminal responses trend to occur at a higher rate the shorter the interval; <ref>A later account identified a third class, ''facultative activities'', that seem to be unrelated to the food schedule and also occur in the middle of the interval: See Staddon, J. E. R. (1977). Schedule-induced behavior. In W. K. Honig & J. E. R. Staddon (Eds.), ''Handbook of operant behavior''. Englewood Cliffs, N.J.: Prentice-Hall</ref> the reverse.
# The activities that develop are of two kinds: ''interim activities'' that occur in the first two-thirds or so of the fixed interfood interval, and a single ''terminal response'', that occurs during the last third of the interval. Both interim and terminal responses trend to occur at a higher rate the shorter the interval; <ref>A later account identified a third class, ''facultative activities'', that seem to be unrelated to the food schedule and also occur in the middle of the interval: See Staddon, J. E. R. (1977). Schedule-induced behavior. In W. K. Honig & J. E. R. Staddon (Eds.), ''Handbook of operant behavior''. Englewood Cliffs, N.J.: Prentice-Hall</ref> the reverse.
# The terminal response is either pecking or a stereotyped pacing activity obviously related to it; the terminal response does not differ from animal to animal in the capricious way implied by Skinner’s account.
# The terminal response is either pecking or a stereotyped pacing activity related to it; the terminal response does not differ from animal to animal in the capricious way implied by Skinner’s account.
# Terminal pecking often appeared suddenly after a little training. It did not develop following an accidental conjunction with food as the adventitious reinforcement hypothesis implies. Interim activities are rarely contiguous with food, so also cannot be explained by adventitious reinforcement<ref>Unless the mechanism of operant reinforcement allows some behaviors to be more “reinforcible” than others, in which case a more-reinforcible behavior relatively remote from food might overtake a less reinforcible one in just the manner observed in “instinctive drift”. Killeen and Pellón have developed this idea into an integrated model of conditioning and schedule induction: Killeen, P. R & Pellón, R. (2013) Adjunctive behaviors are operants. ''Learning and Behavior,'' 41:1, 1-24. For a related analysis, see also Staddon, J. E. R., & Zhang, Y. (1989) Response selection in operant learning. ''Behavioural Processes'', 20,189-97, especially Figure 5: http://psycrit.com/w/File:StaddonZhang1989.pdf</ref>.
# Terminal pecking often appeared suddenly after a little training. It did not develop following an accidental conjunction with food as the adventitious reinforcement hypothesis implies. Interim activities are rarely contiguous with food, so also cannot be explained by adventitious reinforcement<ref>Unless the mechanism of operant reinforcement allows some behaviors to be more “reinforcible” than others, in which case a more-reinforcible behavior relatively remote from food might overtake a less reinforcible one in just the manner observed in “instinctive drift”. Killeen and Pellón have developed this idea into an integrated model of conditioning and schedule induction: Killeen, P. R & Pellón, R. (2013) Adjunctive behaviors are operants. ''Learning and Behavior,'' 41:1, 1-24. For a related analysis, see also Staddon, J. E. R., & Zhang, Y. (1989) Response selection in operant learning. ''Behavioural Processes'', 20,189-97, especially Figure 5: http://psycrit.com/w/File:StaddonZhang1989.pdf</ref>.


In short, Skinner’s account is wrong. The superstitious behavior he observed was not the result of happenstance, accidental contiguity between an emitted behavior and response-independent food.
In short, Skinner’s account is wrong. The superstitious behavior he observed was not the result of happenstance, accidental contiguity between an emitted behavior and response-independent food.
This experiment, and an earlier one<ref>Brown, P. L., & Jenkins, H. M. (1968). Auto-shaping of the pigeon's key-peck. ''Journal of the Experimental Analysis of Behavior'', 11, 1-8. See also Williams, D. R., & Williams, H. (1969). Auto-maintenance in the pigeon: sustained pecking despite contingent nonreinforcement. ''Journal of the Experimental Analysis of Behavior'', 12, 511-520.</ref> showing that untrained pigeons will learn to peck in Pavlovian procedure where an intermittent 7-s light (conditioned stimulus: CS) that ends with free food (unconditioned stimulus: US), showed that Skinner’s dichotomy between operant (somatic) and respondent (autonomic) behavior does not hold, since pecking – the prototypical operant response – can behave just like salivation, the prototypical respondent.
This experiment, and an earlier one<ref>Brown, P. L., & Jenkins, H. M. (1968). Auto-shaping of the pigeon's key-peck. ''Journal of the Experimental Analysis of Behavior'', 11, 1-8. See also Williams, D. R., & Williams, H. (1969). Auto-maintenance in the pigeon: sustained pecking despite contingent nonreinforcement. ''Journal of the Experimental Analysis of Behavior'', 12, 511-520.</ref> showing that untrained pigeons will learn to peck in a Pavlovian procedure where an intermittent 7-s light (conditioned stimulus: CS) that ends with free food (unconditioned stimulus: US. Brown and Jenkins termed this effect ''autoshaping''), showed that Skinner’s dichotomy between operant (somatic) and respondent (autonomic) behavior does not hold, since pecking – the prototypical operant response – can behave just like salivation, the prototypical respondent.

These results demanded a revision of the standard framework for the study of operant conditioning. If pecking is both an operant and a respondent, but salivation (for example) can be classically but not operantly conditioned, if supposedly ‘[https://psychclassics.yorku.ca/Breland/misbehavior.htm instinctive’ activities can supersede] already learned operant behavior, the sharp distinction between classical and operant conditioning becomes untenable.
These results demanded a revision of the standard framework for the study of operant conditioning. If pecking is both an operant and a respondent, but salivation (for example) can be classically but not operantly conditioned, if supposedly ‘[https://psychclassics.yorku.ca/Breland/misbehavior.htm instinctive’ activities can supersede] already learned operant behavior, the sharp distinction between classical and operant conditioning becomes untenable.


==Selection and variation==
==Selection and variation==


Beginning in the early 1950s, people began to point out the similarities between the learning process and evolution through variation and selection<ref>For example, Pringle, J. W. S. (1951). On the parallel between learning and evolution. ''Behaviour'', 3, 174-215. Russell, W. M. S. (1961) Evolutionary concepts in behavioral science: III. The evolution of behavior in the individual animal, and the principle of combinatorial selection. ''General Systems'', 6, 51-92.</ref> . Recently, models explicitly analogous to gene mutation and selection by reinforcement have successfully duplicated many operant conditioning phenomena<ref>McDowell, J. J. (2013). A quantitative evolutionary theory of adaptive behavior dynamics. ''Psychological Review'', Vol. 120, No. 4, 731–750. See also Edelman, G. ''Neural Darwinism'' https://www.webofstories.com/play/gerald.edelman/37;jsessionid=4B59A75EAF082B9FF369CB6D98C19671
Beginning in the early 1950s, people began to point out the similarities between the learning process and evolution through variation and selection<ref>For example, Pringle, J. W. S. (1951). On the parallel between learning and evolution. ''Behaviour'', 3, 174-215. Russell, W. M. S. (1961) Evolutionary concepts in behavioral science: III. The evolution of behavior in the individual animal, and the principle of combinatorial selection. ''General Systems'', 6, 51-92.</ref>. Recently, models explicitly analogous to gene mutation and selection by reinforcement have successfully duplicated many operant conditioning phenomena<ref>McDowell, J. J. (2013). A quantitative evolutionary theory of adaptive behavior dynamics. ''Psychological Review'', Vol. 120, No. 4, 731–750. See also Edelman, G. ''Neural Darwinism'' https://www.webofstories.com/play/gerald.edelman/37;jsessionid=4B59A75EAF082B9FF369CB6D98C19671
</ref>. Skinner’s idea of ''emitted'' behavior fits quite naturally into a Darwinian scheme. Behavior varies; a variant that is contiguous with reward is strengthened and thus increases in frequency.
</ref>. Skinner’s idea of ''emitted'' behavior fits quite naturally into a Darwinian scheme. Behavior varies; a variant that is contiguous with reward is strengthened and thus increases in frequency.


Unlike Darwin, Skinner had little to say about the causes and types of variation. He left the impression that variation is unstructured, ‘random’. On the other hand, observations like Liddell’s show that the repertoire from which reinforcement selects is very far from random. It is different for food than for sex or social reward, for example. Remember Lorenz’s account: A dog, released after being conditioned in Pavlov’s restraining harness, now, at the sound of the metronome, showed a wide range of operant-type food-related behavior in addition to salivation. Lorenz, an ethologist, identified the dog’s behavior as a particular instinctive pattern. A cognitive psychologist might say that the dog is showing an ''expectation'' of food. A more behavioral account is that the conditioning process causes the conditioned stimulus to evoke a particular behavioral ''repertoire''. The emitted behavior to which that repertoire gives rise is not at all random.
Unlike Darwin, Skinner had little to say about the causes and types of variation. He left the impression that variation is unstructured, ‘random’. On the other hand, observations like Zener's and Liddell’s show that the repertoire from which reinforcement selects is very far from random. It is different for food than for sex or social reward, for example. Lorenz, an ethologist, identified the dog’s behavior as a particular instinctive pattern. A cognitive psychologist might say that the dog is showing an ''expectation'' of food. A theoretical behaviorist account is that the conditioning process causes the conditioned stimulus to evoke a particular behavioral ''repertoire''. The emitted behavior to which that repertoire gives rise is not at all random.


==The Repertoire==
==The Repertoire==


The composition of the repertoire will depend on the animal’s training – learning the signal properties of the metronome – motivational state and species. Anticipation of food will lead to a different repertoire than anticipation of electric shock. Food → vigorous activity, tail-wagging, etc. Electric shock → ‘freezing,’ crouching – suppression of all activity. Indeed, conditioned suppression is the name for the shock-anticipation procedure used by Rescorla<ref>E.g., Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. ''Psychological Review'', 74, 71-80. It is a curious historical aside that no one seemed to be troubled by the fact that the suppression response, used to study Pavlovian (respondent) conditioning in rats, is skeletal not autonomic.</ref>and others to establish the necessary and sufficient conditions for respondent conditioning.
The composition of the repertoire will depend on the animal’s training – learning the signal properties of the metronome – motivational state and species. Anticipation of food will lead to a different repertoire than anticipation of electric shock. Food → vigorous activity, tail-wagging, etc. Electric shock → ‘freezing,’ crouching – suppression of all activity. Indeed, ''conditioned suppression'' is the name for the shock-anticipation procedure used by Estes and Skinner<ref>Estes, W. K., & Skinner, B. F. Some quantitative properties of anxiety. ''Journal of Experimental Psychology'', 1941, 29, 390-400. It is surprising that no one seemed to be troubled by the fact that the suppression response, used to study Pavlovian (respondent) conditioning in rats, is skeletal not autonomic − in apparent violation of Skinner’s criterion for the operant-respondent distinction.</ref> and others to establish the necessary and sufficient conditions for respondent conditioning.


The idea of a repertoire implies that some behaviors are potential, lying in wait, but ready to occur if the active behavior goes unrewarded. The stronger the animal’s motivation and the better the predictive properties of the stimulus (how close, how big, the reward?) the more restricted the repertoire is likely to be. In the limit, if the stimulus (as in the autoshaping experiments) or the interfood interval (as in the superstition experiment) is very short, the pigeon’s repertoire may be limited to a single response: pecking. But if the situation is not too “hot” the repertoire will be larger although perhaps less vigorous.
The idea of a repertoire implies that some behaviors are potential, lying in wait, but ready to occur if the active behavior goes unrewarded. The stronger the animal’s motivation and the better the predictive properties of the stimulus (how close, how big, the reward?) the more restricted the repertoire is likely to be. In the limit, if the stimulus (as in the autoshaping experiments) or the interfood interval (as in the superstition experiment) is very short, the pigeon’s repertoire may be limited to a single response: pecking. But if the situation is not too “hot” the repertoire will be larger although perhaps less vigorous.


In addition to the active behavior, at any moment, a repertoire comprises latent or covert activities that can occur. This idea of a latent response should not be upsetting to radical behaviorists. It was suggested by [[William_James_Lectures|Skinner himself]] in the same year that he published the “superstition” experiment:
In addition to the active behavior, at any moment, a repertoire comprises latent or covert activities that can occur. This idea of a latent response was in fact suggested by [[William_James_Lectures|Skinner himself]] in the same year that he published the “superstition” experiment:


:"Our basic datum…is the probability that a response will be emitted…We recognize …that … every response may be conceived of as having at any moment an assignable probability of emission... A ''latent'' response with a certain probability of emission is not directly observed. It is a scientific construct. But it can be given a respectable status, and it enormously increases our analytical power…. It is assumed that the strength of a response ''must reach a certain value'' before the response will be emitted. This value is called the ''threshold''." [My emphases]
:"Our basic datum…is the probability that a response will be emitted…We recognize …that … every response may be conceived of as having at any moment an assignable probability of emission... A ''latent'' response with a certain probability of emission is not directly observed. It is a scientific construct. But it can be given a respectable status, and it enormously increases our analytical power…. It is assumed that the strength of a response ''must reach a certain value'' before the response will be emitted. This value is called the ''threshold''." [emphases added]
Skinner was writing about language and never extended the idea to the operant behavior of non-human animals. But his proposal is different from theoretical behaviorism in only one respect: for the ThB hypothesis, the ''threshold'' is simply competition from other latent/silent responses.
Skinner was writing about language and never extended the idea to the operant behavior of non-human animals. But his proposal is different from theoretical behaviorism in only one respect: for the ThB hypothesis, the ''threshold'' is simply competition from other latent/silent responses.
The idea that any predictive relation between a stimulus and a reward creates an expectation, equivalently, a repertoire of potential actions, answers the question about weak conditioning that arose earlier. Imagine a conditioning situation in which the CS is just a bit too long to yield conditioning, as measured by, say, salivation, or an auto-shaped key-peck. So long as the CS is still predictive (e.g., signals a shorter time to the US than other signals) the animal can still form an expectation, and develop a repertoire. Members of the repertoire will be available as candidates for operant conditioning, which is to say selection by temporal contiguity. But the repertoire itself, active response excepted, will be covert and may not reveal itself at once. If the animal, like Pavlov’s dog, is restrained, for example, its behavioral potential is necessarily limited. But freed from restraint, the dog shows at once the wide range of activities induced by a stimulus that signals imminent food.
The idea that any predictive relation between a stimulus and a reward creates an expectation, equivalently, a repertoire of potential actions, implies that a repertoire will be created, conditioning will occur, even if the dog doesn’t salivate. Imagine a conditioning situation in which the CS is just a bit too long to yield conditioning, as measured by, say, salivation, or an auto-shaped key-peck. So long as the CS is still predictive (e.g., signals a shorter time to the US than other signals) the animal may still form an expectation, and develop a repertoire. Members of the repertoire will be available as candidates for operant conditioning, which is to say selection by temporal contiguity. But the repertoire itself, active response excepted, will be covert and may not reveal itself at once. If the animal, like Pavlov’s dog, is restrained, for example, its behavioral potential is necessarily limited. But freed from restraint, the dog shows at once the wide range of activities induced by a stimulus that signals imminent food.


Emitted responses can be induced in other ways. An unexpected reward will at once elicit a range of food-related activities, for example. Similarity of a new situation to one associated with food or a mate will similarly elicit a historically relevant repertoire.
Emitted responses can be induced in other ways. An unexpected reward will at once elicit a range of food-related activities, for example. Similarity of a new situation to one associated with food or a mate will similarly elicit a historically relevant repertoire.
''Extinction'', cessation of an established reinforcement schedule, shows the effects of relaxing selection. When reinforcement is withdrawn, the selection process ceases, and the trained response declines. But observation, and Skinner’s "scattering of latencies", show that other activities, suppressed by the training schedule, now occur again. This is the normal increase in variability when selection is relaxed, either natural selection or selection by reinforcement schedule<ref>See Staddon & Simmelhag (1971). ''op. cit''. p. 23 ''et seq.''</ref> . Extinction usually leads to more variable behavior.
''Extinction'', cessation of an established reinforcement schedule, shows the effects of relaxing selection. When reinforcement is withdrawn, the selection process ceases, and the trained response declines. But observation, and Skinner’s "scattering of latencies", show that other activities, suppressed by the training schedule, now occur again. This is the normal increase in variability when selection is relaxed, either natural selection or selection by reinforcement schedule<ref>See Staddon & Simmelhag (1971). ''op. cit''. p. 23 ''et seq.''</ref> . Extinction usually leads to more variable behavior.

==Teaching==

Operant learning involves both selection and variation, but almost all experimental research has been on selection: the effect of contingencies of reinforcement on behavior. Unfortunately, behavior analysis has treated ''teaching'' in the same way. Operant reinforcement and punishment is an appropriate way to deal with behaviors that have a non-zero ''operant level,'' things pupils already do, like sitting still and not fidgeting, talking to one another during class, and bullying – disciplinary matters. The teacher’s task is to increase the level for some (paying attention, doing chores, polite behavior, etc.) and reduce it for others (fidgeting, bullying distracting other pupils). Contingencies of reinforcement do have some application here.

But most teaching is an effort to get pupils to grasp something for the first time, to get new behavior, not an effort to change the rate of occurrence of something already known. The most difficult kind of teaching, imparting new knowledge and skills, is much more about ''variation'', the source of a pupil’s repertoire, than about selection, changing the strength of an [http://dukespace.lib.duke.edu/dspace/handle/10161/5119 existing behavior]. A 2018 [http://www.economist.com/printedition/2017-07-22 review], one of many, suggests that simply rewarding answers to multiple-choice tests, Skinner’s original teaching-machine approach, based on his "shaping"procedure, is not an adequate way to foster much learning. If a task or a skill can be broken down in advance in to small, programmable items — if it can be ''shaped'', in the jargon of operant conditioning — then Skinner’s method may be useful. But if creativity is required, if the end-point cannot be defined in advance, then it is to ''variation'' and the sources of the student’s repertoire that the teacher’s attention should be directed.

Many writers have described how their schooldays, perhaps at a boarding school where control by the educational environment can be very strong, provided them with an environment that fostered study, creativity and critical thinking<ref>See for example, Richard Dawkins’ moving account of his own public school: Oundle: https://www.theguardian.com/books/2002/jul/06/schools.news and Alan Macfarlane The image of the good imperial education. In ''The character of human institutions: Robin Fox and the rise of biosocial science''. Michael Egan (Ed.) Transaction Pub. NJ, 2014.</ref>. Creativity is not an not operant. Creativity is a property of a ''repertoire'' of potential operant behavior. Unscientific and anecdotal as they are, these first-person accounts nevertheless give hint at what is needed if education is not to become mere schedule control.
The emitted repertoire is set by processes usually studied under the rubric of classical conditioning. The repertoire depends on what the subject can expect (predictive stimulus-stimulus relations he has experienced in a given situation), on his motivation (hunger, thirst, sex, fear, etc.), and on what kind of organism he is. But the organism doesn’t begin with nothing. Even without conditioning, a sheepdog, for example, knows (more or less) what a sheep is and what needs to be done about it even before he sees one. A novice herder will herd children or geese if sheep are not available. A puppy let off leash, and perhaps after some exploration, will return to his human companion. Unless distracted, the dog will follow his master (or mistress!) and food reward has little to do with this habit. Much of the adult repertoire already exists in rudimentary form, needing only a little training to mature (at least in most dogs!). Humans come with repertoires like this that can be expanded (or contracted!) and directed in ways known to great teachers but still not codified by science. From the point of view of theoretical behaviorism, education would benefit if less attention were paid to selection and much more to behavioral variation.


==SUMMARY==
==SUMMARY==
Line 177: Line 176:
Treating classical (respondent) conditioning and operant conditioning as different processes has taught us much about the necessary and sufficient conditions for conditioning to occur. But it has also, says ThB, led learning psychology into a blind alley. Learning researchers were misled by Pavlov’s genius and the neurophysiological differences between typical classically conditioned responses and typical responses conditioned operantly. Salivation and lever pressing are obviously very different.
Treating classical (respondent) conditioning and operant conditioning as different processes has taught us much about the necessary and sufficient conditions for conditioning to occur. But it has also, says ThB, led learning psychology into a blind alley. Learning researchers were misled by Pavlov’s genius and the neurophysiological differences between typical classically conditioned responses and typical responses conditioned operantly. Salivation and lever pressing are obviously very different.
In fact, classical and operant conditioning are parts of the same process. Classical conditioning detects correlations between environmental features and something of value, positive or negative, to the organism. This correlation induces<ref>Segal, E. F (1970) Induction and the provenance of operants. In R. M Gilbert & J. R. Millenson (Eds.) ''Reinforcement: Behavioral Analyses'' Academic Press.</ref> a repertoire from which operant conditioning can select. If the correlation is very strong and he unconditioned stimulus is imminent, then the induced repertoire may be limited – to pecking (in a hungry pigeon) or to salivation (in a restrained dog). Selection, in the sense of a response contingency, may be unnecessary. The result may look like a reflex, but isn’t, although restricted behavioral options and extreme motivation may make it appear so.
Theoretical behaviorism treats classical and operant conditioning as parts of the same psychological process. Classical conditioning detects correlations between environmental features and something of value, positive or negative, to the organism. This correlation induces<ref>Segal, E. F (1970) Induction and the provenance of operants. In R. M Gilbert & J. R. Millenson (Eds.) ''Reinforcement: Behavioral Analyses'' Academic Press.</ref> a repertoire from which operant conditioning can select. If the correlation is very strong and he unconditioned stimulus is imminent, then the induced repertoire may be limited – to pecking (in a hungry pigeon) or to salivation (in a restrained dog). Selection, in the sense of a response contingency, may be unnecessary. The result may look like a reflex, but isn’t, although restricted behavioral options and extreme motivation may make it appear so.
If the selection is weaker, some ‘expectation’ may still be formed and the repertoire may comprise many responses. Operant reinforcement must select from this pool. If there is no reinforcement, the behaviors that comprise the repertoire will occur one after another, back and forth, each time weaker and weaker. Eventually vigorous activity may cease altogether, leaving a passive, behavioral residue.
If the selection is weaker, some ‘expectation’ may still be formed and the repertoire may comprise many responses. Operant reinforcement must select from this pool. If there is no reinforcement, the behaviors that comprise the repertoire will occur one after another, back and forth, each time weaker and weaker. Eventually vigorous activity may cease altogether, leaving a passive, behavioral residue.
The old Yerkes-Dodson law (1908) shows that learning is fastest at intermediate levels of motivation, which suggests that the size of the repertoire is then at its maximum. As the organism learns, behavior adapts, reinforcement rate increases, and the repertoire shrinks to a class of responses defined by their consequences and controlled by a class of stimuli that are a reliable signal of the contingencies. This is Skinner’s three-term operant. Another name for it is ''state'' – not internal state or physiological state or even mental state, but state as repertoire controlled, in the well-trained organism, by identifiable stimuli under certain motivational conditions. (For the computational details on state as ''equivalent history'', see ''The New Behaviorism''<ref> Staddon, John (2014) The New Behaviorism (2nd edition) Philadelphia, PA: Psychology Press.</ref> but these details are not necessary to see the need to add state to stimulus and response to arrive at an accurate picture of the behaving organism.)
The old [https://www.verywellmind.com/what-is-the-yerkes-dodson-law-2796027 Yerkes-Dodson law (]1908) shows that learning is fastest at intermediate levels of motivation, which suggests that the size of the repertoire is then at its maximum. As the organism learns, behavior adapts, reinforcement rate increases, and the repertoire shrinks to a class of responses defined by their consequences and controlled by a class of stimuli that are a reliable signal of the contingencies. This is Skinner’s three-term operant. Another name for it is ''state'' – not internal state or physiological state or even mental state, but state as repertoire controlled, in the well-trained organism, by identifiable stimuli under certain motivational conditions. (For the computational details on state as ''equivalent history'', see ''The New Behaviorism''<ref> Staddon, J. (2014). The New Behaviorism (2nd edition) Philadelphia, PA: Psychology Press, and also Staddon, J. E. R. (2017). Simply too many notes. The Behavior Analyst, 40(1), 101-106.</ref> but these details are not necessary to see the need to add state to stimulus and response to arrive at an accurate picture of the behaving organism.)


Theoretical behaviorism repeals Skinner’s proscription of theory. The “ism” is unfortunate because ThB is not a rigid ideology that rules things out. It is theoretical but eclectic. It does require that ideas be testable by a third party. But in that sense it is just. . . science. Concepts like memory and expectation are perfectly acceptable, just so long as they can be given some explanatory and predictive meaning.
Theoretical behaviorism repeals Skinner’s proscription of theory. The “ism” is unfortunate because ThB is not a rigid ideology that rules things out. It is theoretical but eclectic. It does require that ideas be testable by a third party. But in that sense it is just. . . science. Concepts like memory and expectation are perfectly acceptable, just so long as they can be given some explanatory and predictive meaning.

Revision as of 11:10, 6 October 2018

  • Comment: Please revise your draft to remove the editorializing. Rhetorical questions (e.g. "Which suggests that response rate is an independent indicator of…what? The learning process?") aren't appropriate for an encyclopedia. Take some time to read through Philosophy of mind, Attachment theory, and the other featured/good articles in WikiProject Psychology for examples of strong articles written in an encyclopedic tone. Please submit the draft again after your revisions. Thanks! — Newslinger talk 15:43, 9 August 2018 (UTC)
  • Comment: The material copied from elsewhere is correctly in quotation marks or block quotes, but is not adequately attributed. WP:MINREF requires inline citations for all direct quotations. Worldbruce (talk) 04:02, 27 July 2018 (UTC)

Experimental psychologist B. F. Skinner proposed radical behaviorism [1] as an alternative to the then-prevailing more mentalistic philosophies of mind and behavior. His new philosophy accompanied a novel program of experimental research based on the study of individual organisms rather than groups. Two books, The behavior of organisms (1938) and Schedules of reinforcement (1957, with C. B. Ferster), describe the methods of operant conditioning and present much experimental data.

The new experimental methods were very successful, producing hundreds of papers reporting many new phenomena. But radical behaviorism was criticized as a philosophy for neglecting the processes that interpose between stimulus and response [2]. The movement was in fact actively hostile to theory; but experimental success overshadowed theoretical weakness.

Operant Conditioning

Two inventions, the Skinner box and the cumulative recorder, were central to Skinner’s new experimental method. The Skinner box facilitated long-term (many weeks rather than a few days) automated experiments on learned behavior in individual organisms. The cumulative recorder showed real-time data as opposed to the group averages commonly used by other researchers.

Skinner was able to demonstrate rapid learning in individual animals. The method was to present small rewards (now called reinforcements) right after the animal makes a desired response. The process could begin with approximations to the target behavior. Skinner called the technique shaping by successive approximations. The process as a whole he termed 'operant conditioning', a re-naming of what was already known as ‘instrumental learning’.

Skinner recognized that behavior must occur spontaneously before it can be reinforced. He called behavior that is both spontaneous and reinforcible emitted behavior, in contrast to the elicited, reflex-like behavior of classical (Pavlovian) conditioning (respondent behavior). The operant-respondent distinction turned out to be flawed, but the idea that behavior must be emitted before it can be rewarded is critical.

The operant method showed that a given response, be it lever-pressing by a rat or key-pecking by a pigeon, need not be reinforced on every occasion. Responding can be maintained by various partial-reinforcement schedules. For example, if only the first response after 30 s is reinforced (a fixed-interval-30 s schedule), pigeons, rats and humans will soon learn to wait for 15 s or so after each reinforcement before resuming responding, yielding the familiar scalloped cumulative record. If reinforcement is intermittent and unpredictable, responding may be maintained for many hours in the absence of further reinforcement.

Experiment soon revealed hitherto unsuspected regularities: the stable cumulative records associated with different schedules. Most important, these stable patterns could be recovered after exposure to another schedule. The typical scallop pattern on an FI schedule, for example, would reappear after a few days on second exposure after an intervening experience with another procedure. Behavior in condition A would be the same after different prior experiences B, C, D, etc.

Learning is, almost by definition, irreversible. The effect of treatment X will therefore be different if preceded by treatment A than if it is preceded by B. Two learning treatments cannot be meaningfully compared successively in the same subject. Most learning psychologists therefore assumed that learning must be studied by comparing groups of subjects. In fact, the group method, and its associated statistical analyses, has turned out to have unsuspected pitfalls - problems which are avoided by the single-subject approach.

On the other hand, the fact that behavior under a given reinforcement schedule is recoverable, the same no matter what the preceding treatment, suggested to Skinner and his followers that learning – operant conditioning – can be studied in single subjects. Neither averaging across individuals nor comparisons between groups is required. Since the individual, not the group, is the target of all psychological investigation, and since there were known to be serious problems inferring the properties of individuals from group averages [3], Skinner’s method seemed to provide a powerful technique for understanding the effects of reward and punishment on the behavior of individual organisms.

Rate of response is visible as the slope of a cumulative record. As a subject learns a typical operant task, the slope of the record, the rate, increases: "The rate at which a response is emitted in such a situation comes close to our preconception of the learning process. As the organism learns, the rate rises." [4][1] Skinner continued:

"It is no accident that rate of responding is successful as a datum...If we are to predict behavior (and possibly to control it), we must deal with probability of response...Rate of responding is not a ‘measure’ of probability but it is the only appropriate datum in a formulation in these terms."

So, response rate was used by most operant conditioners as a measure of response strength/probability. Researchers noticed that if reinforcement is available only at random times (a random-interval – RI – schedule, one kind of variable-interval, VI), a procedure which ensures that reinforcement probability is essentially constant, subjects adapt by responding at a constant rate which is positively related to reinforcement rate.

In this way, average response rate became the standard dependent variable for operant psychology[5].

Discriminative stimuli

Pigeons, rats, and people can easily be trained to respond differentially in the presence of different stimuli, depending on consequences. If a hungry pigeon, confronted with two adjacent pecking keys, is paid off with bits of grain only for pecking the red, and not the green, key, he will soon learn to peck only the red; and similarly if the payoffs are reversed. Skinner called discriminations like this examples of stimulus control.

Skinner went on to propose the three-term contingency as a behavioral unit incorporating stimulus, response and reinforcement. The idea is that reinforcing a response in the presence of a given stimulus establishes control by the stimulus of the pattern of behavior established by the prevailing reinforcement schedule. Skinner called this unit the operant, his word for what might previously have been called a habit and resembles the theoretical behaviorist notion of state.

By inventing new concepts, and re-naming several old ones, Skinner created a distinctive terminology that helped to define a new and self-contained movement: the experimental analysis of behavior, aka behavior analysis aka operant conditioning.

With the sole exception of the three-term contingency, these ideas were summarized by Skinner in an influential 1950 paper [6] , Are theories of learning necessary?. He defined theory as "any explanation of an observed fact which appeals to events taking place somewhere else, at some other level of observation, described in different terms, and measured, if at all, in different dimensions." This definition would rule out many well-accepted theories in other areas of science. The temperature of a liquid, for example, is directly related to movement. It is not clear that the “dimensions” of temperature are the same as the kinetic energy of molecules. The spectra of hot elements – the red flame of lithium, for example – can be derived from the element’s atomic properties. Again, it is not the case that the atomic properties that underlie emission spectra have same dimensions as wavelength. It cannot be right to rule out theories like this[7] .

Skinner argued that learning theories are for the most part impediments to scientific advance: "Much useless experimentation results from theories, and much energy and skill are absorbed by them", although he also conceded that, "It would be foolhardy to deny the achievements of theories of this sort in the history of science. This sort refers to a previous paragraph in which he attempts to distinguish between postulates", "theorems", and "theories".'He admits, in a widely cited phrase, there is a "need for a formal representation of the data reduced to a minimal number of terms" but at the end of his article says that, "We do not seem to be ready for theory in this sense." By the 1990s, the field was ready.

Problems with atheoretical behaviorism

The shaky epistemological basis for Skinner’s anti-theory argument was overshadowed by the very compelling experimental examples he described in the rest of the 1950 article. His novel method produced strikingly orderly real-time patterns of behavior in individual organisms. He proceeded to use these data to identify what he called "controlling variables", those aspects of the training procedure responsible for the observed patterns: "the independent variables of which probability of response is a function." When we know the controlling variables, he argued, theory is unnecessary.

Defending his idea that response probability is the correct dependent variable for learning psychology, he showed that the alternative favored by reflex-type theorists, latency, did not behave in the appropriate way. Motivated and unmotivated animals show the same modal response latency in a suitable task. Motivated animals do not respond sooner, as they should if latency is an adequate measure of response strength. As hunger motivation is reduced, latencies are more variable however (what Skinner referred to as a "scattering of latencies"). The increase in variability as motivation is reduced is consistent with the selectionist view of learning espoused by theoretical behaviorism [8]

In another experiment, arguing against the inhibition theory of extinction, Skinner showed that well-trained pigeons forget little even after a lapse of four years between successive exposures to a task. He also showed that the pattern of a cumulative record in extinction is related to the pattern built up during training. He attributed the difference between extinction of a periodic vs. an aperiodic schedule to novelty and dissipation of emotional responses. He described the method that would later be used by Guttman and Kalish [9] to measure stimulus generalization.

Skinner’s examples were striking. His conclusion was persuasive. Many readers came to accept his claim that theories of learning – not just the poor theories then current but perhaps all learning theories – are in fact impediments to progress in scientific psychology.

But atheoretical behaviorism was criticized by theorists on several points, which can be illustrated by re-visiting three of Skinner’s examples: response rate as a dependent variable, extinction and memory, and the operant-respondent distinction. These problems led to the theory and philosophy of theoretical behaviorism.

Response rate

Skinner wrote in the 1950 paper:

"Rate of responding appears to be the only datum that varies significantly and in the expected direction under conditions which are relevant to the “learning process.” We may, therefore, be tempted to accept it as our long-sought for measure of strength of bond, excitatory potential, etc. Once in possession of an effective datum, however, we may feel little need for any theoretical construct of this sort"[10].

Which suggests that response rate might be an independent indicator of something like 'strength of learning'. Skinner wasn’t sure, but succumbed to temptation and settled on response rate as the most useful response measure.

A nagging problem for the idea that response rate is always valid as a measure of response strength is that rate of response can itself be controlled by the appropriate contingencies of reinforcement. For example, animals will learn, albeit with some difficulty, to space their pecks or lever presses 10-s apart (spaced-responding schedule) if that is a condition for reinforcement,[11] – even though the ‘natural’ rate for an equally rewarding schedule that lacks the spaced-responding requirement is much higher, perhaps 60 pecks per minute. Since rate of response, over the typical period of 30-min or so, on a spaced-responding schedule is low, then probability of response hence response 'strength'must also be low, according to one reading of Skinner – lower than on, say, a variable-interval schedule dispensing reinforcements at the same rate. This is clearly wrong. Response rate, per se, is not an adequate measure of response strength.

Skinner never wrote explicitly about this problem. But an obvious response is that the spaced-responding schedule involves the explicit discrimination of time. Response probability, hence, response strength is high at some times and low at others. It is high close to 10 s after each response and low in between, for example. Much the same is true of a fixed-interval schedule. This is of course true. But it makes response rate less attractive as a universal measure of response strength. Indeed, perhaps time should take over as the appropriate dependent variable[12]? Perhaps the question should be not “how does schedule X affect response rate?” but, “How does schedule X affect the temporal location of behavior?”

Using time as a dependent measure also avoids a problem that is rarely addressed: over what time period (minutes? hours?) should rates be computed? In operant-conditioning experiments, rates are usually computed over intervals of 30-min or more. The choice of denominator is justified not by any theoretical rationale, but by the orderly functional relations that result. In Skinner’s [2] own words: “Science is...is an attempt to discover order, to show that certain events stand in lawful relations to other events.” Order was its own justification for Skinner.

Extinction and Memory

Skinner never mentioned the word memory in the 1950 article, and rarely afterwards. But he did discuss spontaneous recovery, a paradoxical property of experimental extinction: After sufficient training, an organism responds. If reinforcement is withdrawn, responding ceases (extinction), usually within a single experimental session. But the next day, returned to the apparatus, the animal begins to respond again. Since we know (argued Skinner) that little or no forgetting should occur from one day to the next, this recovery of the extinguished response, an apparent forgetting of the extinction on the previous day, needs explaining.

Until Skinner’s paper, the standard explanation for spontaneous recovery was that during the extinction session, inhibition builds up, but by the next day it has dissipated so responding recovers, at least for a while. [13] But Skinner already showed that mere passage of time has little effect on level of responding -- although we will have reason to question that in a moment. So perhaps some other variables are operating? In the 1950 paper, Skinner proposed two: emotion and novelty:

"When we fail to reinforce a response that has previously been reinforced, we not only initiate a process of extinction, we set up an emotional response....The pigeon coos in an identifiable pattern, moves rapidly about the cage, defecates, or flaps its wings rapidly in a squatting position that suggests treading (mating) behavior. This competes with the response of striking a key and is perhaps enough to account for the decline in rate in early extinction…Whatever its nature, the effect of this variable is eliminated through adaptation." [emphases added]

Skinner said no more than this about “emotion,” but his description is interesting for two reasons. First, it involves observation, actually watching the (pigeon, rat) subjects. This practice soon fell out of fashion in behavior analysis. Yet direct observation of behavior was later to prove critical in shedding new light on Skinner’s theoretical approach. Second, he might have said something more about competition, which is apparently also involved. As it is, emotion is unsatisfactory as an explanation because the new process he invokes to explain its dissipation, adaptation[14], cannot be independently measured.

But novelty is the variable Skinner thought most important: “Maximal responding during extinction is obtained only when the conditions under which the response was reinforced are precisely reproduced.” First Skinner describes stimulus generalization, the decline in responding in the presence of stimuli different from the training stimulus. Then he goes on:

"Something very much like this must go on during extinction. Let us suppose that all responses to a key have been reinforced and that each has been followed by a short period of eating. When we extinguish the behavior, we create a situation in which responses are not reinforced, in which no eating takes place, and in which there are probably new emotional responses. The situation could easily be as novel as a red triangle after a yellow [his earlier example of stimulus generalization]. If so, it could explain the decline in rate during extinction."

Novelty, as subsequently precisely measured in the stimulus generalization experiments of Guttman and Kalish and many others, is the real explanation for spontaneous recovery, said Skinner. But again, this is an incomplete account, because we cannot measure the stimulus in this case. In regular stimulus generalization, to a color or a shape, for example, both physical stimulus properties and the effects of changes on responding can be measured objectively. Not so in the case of extinction, the case that Skinner is attempting to explain. How exactly should ‘novelty’ be measured/manipulated? Something more was needed: a theory of memory, perhaps?

A relevant theory was in fact available. At the end of the previous century, Adolf Jost proposed two memory laws[15], the second of which is: Given two associations (equivalently, habits, memories, operants) of the same strength, but of different ages, the older one will fall off less rapidly with time. Jost’s law implies that the strength of a habit does not decay exponentially, by the same fixed fraction each day, because if it did, the relative strength of two memories would not change with lapse of time.

On the other hand, suppose that the decline in strength of a habit, Vi, after time ti, is not exponential but hyperbolic, like this:

Vi = K/(A+ti),.... (1)

where K/A is the salience – strength – of habit i at time zero and A is a parameter representing the memory decay rate. If we look at the rate of change of Vi with time we get

(dVi)/dt = -K/(Vi+ti )2.... (2)

Now, suppose that at a particular time after learning, with values t1 and t2, t1 > t2, representing the ages of two memories, the memory strengths, Vi, of events 1 and 2 are equal. Then the rate of decline for each strength will be given just by Eq. 2, with V the same for both memories. Since t2 < t1 clearly the rate of decline of memory strength will be greater for t2, the more recent memory. Equation 1 is hyperbolic, but many other monotonic, decreasing functions will do as well to model Jost’s Law.

Jost’s Law explains spontaneous recovery. Since the first extinction session is necessarily more recent than the many days of conditioning that preceded it, the associated behavior should lose more strength from one day to the next than the earlier conditioning. At the end of the first day of extinction, responding ceases, which means that the strengths of the two memories, for responding and for not responding, must be roughly equal. Thereafter both memories decline in strength, the new faster than the older. On the next day, therefore, the older tendency – to respond – must gain (according to Jost) over the more recent one (not responding), hence: spontaneous recovery. With continued extinction session, the "respond" memory continues to decline, but the "not-respond"tendency is strengthened day by day.

A model like this could make predictions about the effects of different delays before returning the animal to the apparatus, and on different amounts of training on subsequent extinction . If the second extinction session follows closely on the first, recovery should be less, for example. In other words, the theory draws attention to historical variables as possibly involved in recovery after extinction, a useful advance over the “novelty” idea, which looks only at a contemporary cause, and one that is difficult to measure objectively.

A Jost’s Law account implies that memories compete in some way. The competition idea also speaks to the apparent contradiction between the very slow decay of well-learned operant behavior demonstrated in Skinner’s 4-year experiment and the apparent rapid forgetting of an extinction experience illustrated by spontaneous recovery. The key is competition between memories. In the absence of any competition, a long-learned habit may persist for a long time, as Skinner’s pigeons showed. But when the competition is weak – just one extinction session – memory for many earlier conditioning sessions reasserts itself, and responding recovers until more extinction experience has accumulated.

Hyperbolic discounting is a phenomenon much studied by behavioral economists with both human and animal subjects. In a choice situation, subjects usually prefer a reward of size 2 after a delay of 10-s, say, over a reward of size 5 after a delay of 20-s, even though the rate of return is better for the larger, later reward. This contradicts the standard exponential discounting assumption, which assumes that rate of return is key.

It is tempting to relate hyperbolic memory decay to hyperbolic discounting in choice experiments and there may be some theoretical link. But also involved is the fact that organisms typically time their responses to be proportional to the expected time of a reward. There is also some evidence that the larger the anticipated reward, the sooner animals will respond. Offered a choice, therefore, between two stimuli, one signaling a small reward after 5-s vs one more than twice the size after 10-s, preference will be a balance between the tendency to wait a time proportional to the expected delay (which favors the smaller, sooner reward) and an opposite tendency to respond sooner if the expected reward is larger. The experimental evidence seems to suggest that the latter effect is smaller than the former. Animals are likely to respond sooner to the shorter delay, even if the associated reward size is smaller – and even if the overall rate of reward associated with the smaller choice is less than for the larger.

These two accounts are not in conflict: discounting is a functional, economic account; the timing account represents the potential underlying mechanism.

The Operant-Respondent Distinction

Ivan Petrovich Pavlov (1849-1936) was not a psychologist. His pioneering work on conditioned responses like salivation (typically by a dog following several pairings between a buzzer or a metronome, say, and the delivery of food) was physiology not psychology. The focus of Pavlov, and many who followed him, was on the reflex-like behavior maintained by classical or Pavlovian or, in Skinner’s terms respondent, conditioning. Pavlov found that the conditioned response was most rapidly obtained if the food followed closely on the stimulus (temporal contiguity). Subsequent work by Robert Rescorla and others showed that the key was prediction. The signaling stimulus need not be close in time to the reinforcer so long as it is closer than any other signal: relative proximity. Skinner justified his term respondent by pointing out that conditioned responses like salivation are products of the autonomic (involuntary), not the somatic (skeletal) nervous system. Operant behavior, he thought, depended on the somatic system.

But after Pavlov, researchers began to pay attention to responses other than salivation that might occur during a conditioning experiment. An early experiment by Karl Zener [16]showed that salivation is not the only activity that classically conditioned dogs exhibit. A story[17] recounted by the great ethologist Konrad Lorenz provides a clue:

"My late friend Howard Liddell told me about an unpublished experiment he did while working as a guest in Pavlov’s laboratory. It consisted simply in freeing from its harness a dog that had been conditioned to salivate at the acceleration in the beat of a metronome. The dog at once ran to the machine, wagged its tail at it, tried to jump up to it, barked, and so on; in other words, it showed as clearly as possible the whole system of behavior patterns serving, in a number of Canidae, to beg food from a conspecific. It is, in fact, this whole system that is being conditioned in the classical experiment."

This rich repertoire of operant behavior even if the conditioned stimulus is too long to produce much salivation but is sufficiently predictive to allow the dog to anticipate food.

Another sign that something was wrong with the rigid dichotomy between operant and respondent was provided by a pair of experiments: a very influential short paper "'Superstition' in the pigeon" by Skinner and a much longer experimental and theoretical paper by Staddon and Simmelhag many years later. Here is what Skinner did in 1948: Hungry pigeons were placed in a box and given brief access to food at fixed periods – 15-s for some animals, longer periods for others. This is temporal conditioning (a fixed-time – FT – schedule in operant terminology), which is a Pavlovian procedure since the animal’s behavior has no effect on food delivery.

Despite the absence of an operant contingency, all the animals developed vigorous stereotyped, apparently operant, activities in between feeder operations. Skinner attributed this behavior to accidental contiguity between some spontaneous behavior by the pigeon and the delivery of food. He called it adventitious reinforcement. Since these conjunctions were accidental, not causal, Skinner termed the activities “superstitious” and likened them to human superstitions [18]

More than 20 years after the superstition paper, Staddon and Simmelhag repeated Skinner’s experiment, and observed the pigeons’ behavior second-by-second in each interfood interval from the very beginning of training. Their aim was atheoretical. They were simply curious: let’s see what happens, in detail, and let’s see if the interfood interval has to be constant (as in Skinner’s experiment) or can it be variable?

It turns out that variable intervals work fine; a variable-time schedule also induces “superstitious” activity. But Staddon & Simmelhag also found three things that differ from Skinner’s account:

  1. The activities that develop are of two kinds: interim activities that occur in the first two-thirds or so of the fixed interfood interval, and a single terminal response, that occurs during the last third of the interval. Both interim and terminal responses trend to occur at a higher rate the shorter the interval; [19] the reverse.
  2. The terminal response is either pecking or a stereotyped pacing activity related to it; the terminal response does not differ from animal to animal in the capricious way implied by Skinner’s account.
  3. Terminal pecking often appeared suddenly after a little training. It did not develop following an accidental conjunction with food as the adventitious reinforcement hypothesis implies. Interim activities are rarely contiguous with food, so also cannot be explained by adventitious reinforcement[20].

In short, Skinner’s account is wrong. The superstitious behavior he observed was not the result of happenstance, accidental contiguity between an emitted behavior and response-independent food. This experiment, and an earlier one[21] showing that untrained pigeons will learn to peck in a Pavlovian procedure where an intermittent 7-s light (conditioned stimulus: CS) that ends with free food (unconditioned stimulus: US. Brown and Jenkins termed this effect autoshaping), showed that Skinner’s dichotomy between operant (somatic) and respondent (autonomic) behavior does not hold, since pecking – the prototypical operant response – can behave just like salivation, the prototypical respondent.

These results demanded a revision of the standard framework for the study of operant conditioning. If pecking is both an operant and a respondent, but salivation (for example) can be classically but not operantly conditioned, if supposedly ‘instinctive’ activities can supersede already learned operant behavior, the sharp distinction between classical and operant conditioning becomes untenable.

Selection and variation

Beginning in the early 1950s, people began to point out the similarities between the learning process and evolution through variation and selection[22]. Recently, models explicitly analogous to gene mutation and selection by reinforcement have successfully duplicated many operant conditioning phenomena[23]. Skinner’s idea of emitted behavior fits quite naturally into a Darwinian scheme. Behavior varies; a variant that is contiguous with reward is strengthened and thus increases in frequency.

Unlike Darwin, Skinner had little to say about the causes and types of variation. He left the impression that variation is unstructured, ‘random’. On the other hand, observations like Zener's and Liddell’s show that the repertoire from which reinforcement selects is very far from random. It is different for food than for sex or social reward, for example. Lorenz, an ethologist, identified the dog’s behavior as a particular instinctive pattern. A cognitive psychologist might say that the dog is showing an expectation of food. A theoretical behaviorist account is that the conditioning process causes the conditioned stimulus to evoke a particular behavioral repertoire. The emitted behavior to which that repertoire gives rise is not at all random.

The Repertoire

The composition of the repertoire will depend on the animal’s training – learning the signal properties of the metronome – motivational state and species. Anticipation of food will lead to a different repertoire than anticipation of electric shock. Food → vigorous activity, tail-wagging, etc. Electric shock → ‘freezing,’ crouching – suppression of all activity. Indeed, conditioned suppression is the name for the shock-anticipation procedure used by Estes and Skinner[24] and others to establish the necessary and sufficient conditions for respondent conditioning.

The idea of a repertoire implies that some behaviors are potential, lying in wait, but ready to occur if the active behavior goes unrewarded. The stronger the animal’s motivation and the better the predictive properties of the stimulus (how close, how big, the reward?) the more restricted the repertoire is likely to be. In the limit, if the stimulus (as in the autoshaping experiments) or the interfood interval (as in the superstition experiment) is very short, the pigeon’s repertoire may be limited to a single response: pecking. But if the situation is not too “hot” the repertoire will be larger although perhaps less vigorous.

In addition to the active behavior, at any moment, a repertoire comprises latent or covert activities that can occur. This idea of a latent response was in fact suggested by Skinner himself in the same year that he published the “superstition” experiment:

"Our basic datum…is the probability that a response will be emitted…We recognize …that … every response may be conceived of as having at any moment an assignable probability of emission... A latent response with a certain probability of emission is not directly observed. It is a scientific construct. But it can be given a respectable status, and it enormously increases our analytical power…. It is assumed that the strength of a response must reach a certain value before the response will be emitted. This value is called the threshold." [emphases added]

Skinner was writing about language and never extended the idea to the operant behavior of non-human animals. But his proposal is different from theoretical behaviorism in only one respect: for the ThB hypothesis, the threshold is simply competition from other latent/silent responses.

The idea that any predictive relation between a stimulus and a reward creates an expectation, equivalently, a repertoire of potential actions, implies that a repertoire will be created, conditioning will occur, even if the dog doesn’t salivate. Imagine a conditioning situation in which the CS is just a bit too long to yield conditioning, as measured by, say, salivation, or an auto-shaped key-peck. So long as the CS is still predictive (e.g., signals a shorter time to the US than other signals) the animal may still form an expectation, and develop a repertoire. Members of the repertoire will be available as candidates for operant conditioning, which is to say selection by temporal contiguity. But the repertoire itself, active response excepted, will be covert and may not reveal itself at once. If the animal, like Pavlov’s dog, is restrained, for example, its behavioral potential is necessarily limited. But freed from restraint, the dog shows at once the wide range of activities induced by a stimulus that signals imminent food.

Emitted responses can be induced in other ways. An unexpected reward will at once elicit a range of food-related activities, for example. Similarity of a new situation to one associated with food or a mate will similarly elicit a historically relevant repertoire.

Extinction, cessation of an established reinforcement schedule, shows the effects of relaxing selection. When reinforcement is withdrawn, the selection process ceases, and the trained response declines. But observation, and Skinner’s "scattering of latencies", show that other activities, suppressed by the training schedule, now occur again. This is the normal increase in variability when selection is relaxed, either natural selection or selection by reinforcement schedule[25] . Extinction usually leads to more variable behavior.

SUMMARY

Treating classical (respondent) conditioning and operant conditioning as different processes has taught us much about the necessary and sufficient conditions for conditioning to occur. But it has also, says ThB, led learning psychology into a blind alley. Learning researchers were misled by Pavlov’s genius and the neurophysiological differences between typical classically conditioned responses and typical responses conditioned operantly. Salivation and lever pressing are obviously very different.

Theoretical behaviorism treats classical and operant conditioning as parts of the same psychological process. Classical conditioning detects correlations between environmental features and something of value, positive or negative, to the organism. This correlation induces[26] a repertoire from which operant conditioning can select. If the correlation is very strong and he unconditioned stimulus is imminent, then the induced repertoire may be limited – to pecking (in a hungry pigeon) or to salivation (in a restrained dog). Selection, in the sense of a response contingency, may be unnecessary. The result may look like a reflex, but isn’t, although restricted behavioral options and extreme motivation may make it appear so.

If the selection is weaker, some ‘expectation’ may still be formed and the repertoire may comprise many responses. Operant reinforcement must select from this pool. If there is no reinforcement, the behaviors that comprise the repertoire will occur one after another, back and forth, each time weaker and weaker. Eventually vigorous activity may cease altogether, leaving a passive, behavioral residue.

The old Yerkes-Dodson law (1908) shows that learning is fastest at intermediate levels of motivation, which suggests that the size of the repertoire is then at its maximum. As the organism learns, behavior adapts, reinforcement rate increases, and the repertoire shrinks to a class of responses defined by their consequences and controlled by a class of stimuli that are a reliable signal of the contingencies. This is Skinner’s three-term operant. Another name for it is state – not internal state or physiological state or even mental state, but state as repertoire controlled, in the well-trained organism, by identifiable stimuli under certain motivational conditions. (For the computational details on state as equivalent history, see The New Behaviorism[27] but these details are not necessary to see the need to add state to stimulus and response to arrive at an accurate picture of the behaving organism.)

Theoretical behaviorism repeals Skinner’s proscription of theory. The “ism” is unfortunate because ThB is not a rigid ideology that rules things out. It is theoretical but eclectic. It does require that ideas be testable by a third party. But in that sense it is just. . . science. Concepts like memory and expectation are perfectly acceptable, just so long as they can be given some explanatory and predictive meaning.

The selection/variation view of learning implies that there is no sharp distinction between classical and operant conditioning. Operant reinforcement selects from a repertoire, just as Skinner argued. But that repertoire comes from somewhere. It has causes. One of them is stimulus-stimulus correlations detected by the processes labeled as classical conditioning. Classical and operant conditioning are a team, but one process (a repertoire set by classical conditioning) always limits the other (selection from the repertoire by response-reinforcer contiguity). Autoshaping, superstitious behavior, memory, and expectation are problems for Skinner’s radical behaviorism. Theoretical behaviorism offers a solution.

References

  1. ^ Skinner, B. F. (1945). The operational analysis of psychological terms. Psychological Review, 52, 270-277, 291-294; Schneider, S. M & Morris, E. K. (1987) A history of the term Radical Behaviorism: from Watson to Skinner The Behavior Analyst 1987, 10, 27-39
  2. ^ Staddon, J. E. R. (1973). On the notion of cause, with applications to behaviorism. Behaviorism, 1, 25-63; Staddon, J. (1993) Behaviorism: Mind, Mechanism and Society. London: Duckworth)
  3. ^ See, for example Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53, 134-140. For a review, see Staddon, J. Scientific Method: How Science Works, Fails to Work, and Pretends to Work (Psychology Press, 2018).
  4. ^ Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review,57, 193-216.
  5. ^ Skinner was not happy at the abandonment of cumulative records that followed: Skinner, B. F. (1976). EDITORIAL: Farewell my LOVELY! Journal of the Experimental Analysis of Behavior, 25(2), p.218. The order enabled by averaging – if not across subjects, within subject – overcame Skinner’s objection to a retreat from real-time data.
  6. ^ Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review, 57, 193-216.
  7. ^ There is something in physics called dimensional analysis, which says that the dimensions (typically mass, length and time) on both sides of an equation must match. But is not clear that this was Skinner’s meaning for “dimension.”
  8. ^ See, for example, Staddon, J. E. R., & Simmelhag, V. (1971). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3-43.
  9. ^ Guttman, N., & Kalish, H. I. (1956). Discriminability and stimulus generalization. Journal of Experimental Psychology, 51, 79-88.
  10. ^ Skinner 1950, op. cit. p.198.
  11. ^ See Staddon, J. E. R. (2016). Adaptive Behavior and Learning, 2nd edition. Cambridge University Press, for an experimental review, and other references.
  12. ^ See, for example, Williams, D. A., Lawson, C., Cook, R., Johns, K. W and Mather, A. A. (2008). Timed excitatory conditioning under zero and negative contingencies. Journal of Experimental Psychology: Animal Behavior Processes, 34(1), 94–105.
  13. ^ Of course, unless inhibition is given some testable properties, this is little more than re-description of the facts.
  14. ^ Emotion, which competes with the learned behavior and adapts with time may seem to many readers hard to distinguish from the reactive inhibition that Skinner was criticizing.
  15. ^ Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abhängigkeit von der Verteilung der Wiederholungen. Zeitschrift fűr Psychologie und Physiologie der Sinnesorgane, 14, 436-472.
  16. ^ Zener, K. (1937) The Significance of Behavior Accompanying Conditioned Salivary Secretion for Theories of the Conditioned Response. American Journal of Psychology, Vol. 50, No. 1/4, pp. 384-403.
  17. ^ Lorenz, K. (1969). Innate bases of learning. In K. Pribram (Ed.) On the biology of learning. NY: Harcourt Brace; see also H. M. Jenkins, .F. J. Barrera, C. Ireland, & B. Woodside (1978) Signal-centered action patterns of dogs in appetitive classical conditioning. Learning and Motivation, 9, 272-296.
  18. ^ Indeed, he presented the experiment as a test of the adventitious reinforcement hypothesis. This seems to be the only time, in any publication, that Skinner described an experiment as a test of a hypothesis. See the video snippet: https://www.youtube.com/watch?v=7XbH78wscGw in which biologist Richard Dawkins, long a foe of religion, shows a pigeon in a Skinner box. He slightly mis-describes the superstition’ experiment but then correctly explains Skinner’s (mistaken) adventitious reinforcement explanation. “Humans can be no better than pigeons,” Dawkins concludes.
  19. ^ A later account identified a third class, facultative activities, that seem to be unrelated to the food schedule and also occur in the middle of the interval: See Staddon, J. E. R. (1977). Schedule-induced behavior. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior. Englewood Cliffs, N.J.: Prentice-Hall
  20. ^ Unless the mechanism of operant reinforcement allows some behaviors to be more “reinforcible” than others, in which case a more-reinforcible behavior relatively remote from food might overtake a less reinforcible one in just the manner observed in “instinctive drift”. Killeen and Pellón have developed this idea into an integrated model of conditioning and schedule induction: Killeen, P. R & Pellón, R. (2013) Adjunctive behaviors are operants. Learning and Behavior, 41:1, 1-24. For a related analysis, see also Staddon, J. E. R., & Zhang, Y. (1989) Response selection in operant learning. Behavioural Processes, 20,189-97, especially Figure 5: http://psycrit.com/w/File:StaddonZhang1989.pdf
  21. ^ Brown, P. L., & Jenkins, H. M. (1968). Auto-shaping of the pigeon's key-peck. Journal of the Experimental Analysis of Behavior, 11, 1-8. See also Williams, D. R., & Williams, H. (1969). Auto-maintenance in the pigeon: sustained pecking despite contingent nonreinforcement. Journal of the Experimental Analysis of Behavior, 12, 511-520.
  22. ^ For example, Pringle, J. W. S. (1951). On the parallel between learning and evolution. Behaviour, 3, 174-215. Russell, W. M. S. (1961) Evolutionary concepts in behavioral science: III. The evolution of behavior in the individual animal, and the principle of combinatorial selection. General Systems, 6, 51-92.
  23. ^ McDowell, J. J. (2013). A quantitative evolutionary theory of adaptive behavior dynamics. Psychological Review, Vol. 120, No. 4, 731–750. See also Edelman, G. Neural Darwinism https://www.webofstories.com/play/gerald.edelman/37;jsessionid=4B59A75EAF082B9FF369CB6D98C19671
  24. ^ Estes, W. K., & Skinner, B. F. Some quantitative properties of anxiety. Journal of Experimental Psychology, 1941, 29, 390-400. It is surprising that no one seemed to be troubled by the fact that the suppression response, used to study Pavlovian (respondent) conditioning in rats, is skeletal not autonomic − in apparent violation of Skinner’s criterion for the operant-respondent distinction.
  25. ^ See Staddon & Simmelhag (1971). op. cit. p. 23 et seq.
  26. ^ Segal, E. F (1970) Induction and the provenance of operants. In R. M Gilbert & J. R. Millenson (Eds.) Reinforcement: Behavioral Analyses Academic Press.
  27. ^ Staddon, J. (2014). The New Behaviorism (2nd edition) Philadelphia, PA: Psychology Press, and also Staddon, J. E. R. (2017). Simply too many notes. The Behavior Analyst, 40(1), 101-106.