Operant conditioning (or instrumental conditioning) is a type of learning in which an individual's behavior is modified by its consequences; the behaviour may change in form, frequency, or strength. Operant conditioning is a term that was coined by B.F Skinner in 1937. The word operant can be described as, "an item of behavior that is initially spontaneous, rather than a response to a prior stimulus, but whose consequences may reinforce or inhibit recurrence of that behavior".
Operant conditioning is distinguished from classical conditioning (or respondent conditioning) in that operant conditioning deals with the modification of "voluntary behaviour" or operant behaviour. Operant behavior operates on the environment and is maintained by its consequences, while classical conditioning deals with the conditioning of reflexive (reflex) behaviours which are elicited by antecedent conditions. Behaviours conditioned via a classical conditioning procedure are not maintained by consequences.
Reinforcement, punishment, and extinction 
Reinforcement and punishment, the core tools of operant conditioning, are either positive (delivered following a response), or negative (withdrawn following a response). This creates a total of four basic consequences, with the addition of a fifth procedure known as extinction (i.e. no change in consequences following a response).
It is important to note that actors are not spoken of as being reinforced, punished, or extinguished; it is the actions that are reinforced, punished, or extinguished. Additionally, reinforcement, punishment, and extinction are not terms whose use is restricted to the laboratory. Naturally occurring consequences can also be said to reinforce, punish, or extinguish behavior and are not always delivered by people.
- Reinforcement is a consequence that causes a behavior to occur with greater frequency.
- Punishment is a consequence that causes a behavior to occur with less frequency.
- Extinction is caused by the lack of any consequence following a behavior. When a behavior is inconsequential (i.e., producing neither favorable nor unfavorable consequences) it will occur less frequently. When a previously reinforced behavior is no longer reinforced with either positive or negative reinforcement, it leads to a decline in that behavior.
Four contexts of operant conditioning 
Here the terms positive and negative are not used in their popular sense, but rather: positive refers to addition, and negative refers to subtraction.
What is added or subtracted may be either reinforcement or punishment. Hence positive punishment is sometimes a confusing term, as it denotes the "addition" of a stimulus or increase in the intensity of a stimulus that is aversive (such as spanking or an electric shock). The four procedures are:
- Positive reinforcement (Reinforcement): occurs when a behavior (response) is followed by a stimulus that is appetitive or rewarding, increasing the frequency of that behavior. In the Skinner box experiment, a stimulus such as food or a sugar solution can be delivered when the rat engages in a target behavior, such as pressing a lever.
- Negative reinforcement (Escape): occurs when a behavior (response) is followed by the removal of an aversive stimulus, thereby increasing that behavior's frequency. In the Skinner box experiment, negative reinforcement can be a loud noise continuously sounding inside the rat's cage until it engages in the target behavior, such as pressing a lever, upon which the loud noise is removed.
- Positive punishment (Punishment) (also called "Punishment by contingent stimulation"): occurs when a behavior (response) is followed by a stimulus, such as introducing a shock or loud noise, resulting in a decrease in that behavior.
- Negative punishment (Penalty) (also called "Punishment by contingent withdrawal"): occurs when a behavior (response) is followed by the removal of a stimulus, such as taking away a child's toy following an undesired behavior, resulting in a decrease in that behavior.
Operant conditioning to change human behavior 
- State goal (aims for the study)
- Monitor behavior (log conditions)
- Reinforce desired behavior (give reward for proper behavior)
- Reduce incentives to perform undesirable behavior
- Avoidance learning is a type of learning in which a certain behavior results in the cessation of an aversive stimulus. For example, performing the behavior of shielding one's eyes when in the sunlight (or going outdoors) will help avoid the aversive stimulation of having light in one's eyes.
- Extinction occurs when a behavior (response) that had previously been reinforced is no longer effective. In the Skinner box experiment, this is the rat pushing the lever and being rewarded with a food pellet several times, and then pushing the lever again and never receiving a food pellet again. Eventually the rat would cease pushing the lever.
- Noncontingent reinforcement refers to delivery of reinforcing stimuli regardless of the organism's (aberrant) behavior. The idea is that the target behavior decreases because it is no longer necessary to receive the reinforcement. This typically entails time-based delivery of stimuli identified as maintaining aberrant behavior, which serves to decrease the rate of the target behavior. As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".
- Token economy is an exchange system using the principles of operant conditioning where a token is given as a reward for a desired behaviour. Tokens may later be exchanged for a desired prize or rewards such as power, prestige, goods or services.
- Shaping is a form of operant conditioning in which the increasingly accurate approximations of a desired response are reinforced.
- Chaining is an instructional procedure which involves reinforcing individual responses occurring in a sequence to form a complex behavior.
- Response cost is a form of punishment in which the annihilation of an appetitive stimulus always follows the reducing in the occurrence of a response.
- Myers, Psychology text (300-400)
- Discrimination, generalization and the importance of context.
- Learning takes place in contexts, not in the free range of any plausible situation.
- Most behaviour is under stimulus control which developed when a particular response only occurs when an appropriate discriminative stimulus is present.
- Stimulus control, and its ability to foster stimulus discrimination and stimulus generalization, is effective even if the stimulus has no meaning to the respondent.
- Extinction: operant behaviour undergoes extinction when the reinforcements stop.
- The reinforcements only occur when the proper response has been made, and they don’t always occur even then. Behaviours don’t weaken and gradually extinguish because of this.
- Depends partly on how often reinforcement is received.
- Schedules of reinforcement: the pattern with which reinforcements appeared is crucial.
- Interval schedules: based on the time intervals between reinforcements.
- Fixed interval schedule: reinforcers are presented at fixed time periods, provided that the appropriate response is made.
- Variable interval schedule: a behaviour is reinforced based on an average time that has expired since the last reinforcement.
- Ratio schedules: based on the ratio of responses to reinforcements.
- The special case of presenting reinforcement after each response is called continuous reinforcement.
- Interval schedules: based on the time intervals between reinforcements.
Thorndike's law of effect 
Operant conditioning, sometimes called instrumental learning, was first extensively studied by Edward L. Thorndike (1874–1949), who observed the behavior of cats trying to escape from home-made puzzle boxes. When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his law of effect, Thorndike theorized that behaviors followed by satisfying consequences tends to be repeated and those that produce unpleasant consequences are less likely to be repeated. In short, some consequences strengthened behavior and some consequences weakened behavior. Thorndike produced the first known learning curves through this procedure.
B.F. Skinner (1904–1990) formulated a more detailed analysis of operant conditioning based on reinforcement, punishment, and extinction. Following the ideas of Ernst Mach, Skinner rejected Thorndike's mediating structures[clarification needed] required by "satisfaction" and constructed a new conceptualization of behavior without any such references. So, while experimenting with some homemade feeding mechanisms, Skinner invented the operant conditioning chamber which allowed him to measure rate of response as a key dependent variable using a cumulative record of lever presses or key pecks.
Although the notion of operant conditioning was not unknown during his time, Skinner's method of operant conditioning combined "automatic training" with constant reinforcement was new. Thus, with the help of his colleagues and students, this led to the discovery of reinforcement schedules which is described as, "any procedure that delivers a reinforcer to an organism according to some well-defined rule". For example, pressing a lever for food. To show the effects of operant conditioning, B. F. Skinner created the Skinner box, or operant conditioning chamber. A rat or other suitably small animal is placed in a typical Skinner box, and observed during learning trials that use operant conditioning principles.
In 1957, Skinner published Verbal Behavior, a theoretical extension of the work he had pioneered since 1938. This work extended the theory of operant conditioning to human behavior previously assigned to the areas of language, linguistics and other areas. Verbal Behavior is the logical extension of Skinner's ideas, in which he introduced new functional relationship categories such as intraverbals, Autoclitics, mands, tacts and the controlling relationship of the audience. All of these relationships were based on operant conditioning and relied on no new mechanisms despite the introduction of new functional categories.
Biological correlates of operant conditioning 
The first scientific studies identifying neurons that responded in ways that suggested they encode for conditioned stimuli came from work by Mahlon deLong and by R.T. Richardson. They showed that nucleus basalis neurons, which release acetylcholine broadly throughout the cerebral cortex, are activated shortly after a conditioned stimulus, or after a primary reward if no conditioned stimulus exists. These neurons are equally active for positive and negative reinforcers, and have been demonstrated to cause plasticity in many cortical regions. Evidence also exists that dopamine is activated at similar times. There is considerable evidence that dopamine participates in both reinforcement and aversive learning. Dopamine pathways project much more densely onto frontal cortex regions. Cholinergic projections, in contrast, are dense even in the posterior cortical regions like the primary visual cortex. A study of patients with Parkinson's disease, a condition attributed to the insufficient action of dopamine, further illustrates the role of dopamine in positive reinforcement. It showed that while off their medication, patients learned more readily with aversive consequences than with positive reinforcement. Patients who were on their medication showed the opposite to be the case, positive reinforcement proving to be the more effective form of learning when the action of dopamine is high.
Factors that alter the effectiveness of consequences 
When using consequences to modify a response, the effectiveness of a consequence can be increased or decreased by various factors. These factors can apply to either reinforcing or punishing consequences.
- Satiation/Deprivation: The effectiveness of a consequence will be reduced if the individual's "appetite" for that source of stimulation has been satisfied. Inversely, the effectiveness of a consequence will increase as the individual becomes deprived of that stimulus. If someone is not hungry, food will not be an effective reinforcer for behavior. Satiation is generally only a potential problem with primary reinforcers, those that do not need to be learned such as food and water.
- Immediacy: After a response, how immediately a consequence is then felt determines the effectiveness of the consequence. More immediate feedback will be more effective than less immediate feedback. If someone's license plate is caught by a traffic camera for speeding and they receive a speeding ticket in the mail a week later, this consequence will not be very effective against speeding. But if someone is speeding and is caught in the act by an officer who pulls them over, then their speeding behavior is more likely to be affected.
- Contingency: If a consequence does not contingently (reliably, or consistently) follow the target response, its effectiveness upon the response is reduced. But if a consequence follows the response consistently after successive instances, its ability to modify the response is increased. The schedule of reinforcement, when consistent, leads to faster learning. When the schedule is variable the learning is slower. Extinction is more difficult when learning occurs during intermittent reinforcement and more easily extinguished when learning occurs during a highly consistent schedule.
- Size: This is a "cost-benefit" determinant of whether a consequence will be effective. If the size, or amount, of the consequence is large enough to be worth the effort, the consequence will be more effective upon the behavior. An unusually large lottery jackpot, for example, might be enough to get someone to buy a one-dollar lottery ticket (or even buying multiple tickets). But if a lottery jackpot is small, the same person might not feel it to be worth the effort of driving out and finding a place to buy a ticket. In this example, it's also useful to note that "effort" is a punishing consequence. How these opposing expected consequences (reinforcing and punishing) balance out will determine whether the behavior is performed or not.
Most of these factors exist for biological reasons. The biological purpose of the Principle of Satiation is to maintain the organism's homeostasis. When an organism has been deprived of sugar, for example, the effectiveness of the taste of sugar as a reinforcer is high. However, as the organism reaches or exceeds their optimum blood-sugar levels, the taste of sugar becomes less effective, perhaps even aversive.
The Principles of Immediacy and Contingency exist for neurochemical reasons. When an organism experiences a reinforcing stimulus, dopamine pathways in the brain are activated. This network of pathways "releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons." This results in the plasticity of these synapses allowing recently activated synapses to increase their sensitivity to efferent signals, hence increasing the probability of occurrence for the recent responses preceding the reinforcement. These responses are, statistically, the most likely to have been the behavior responsible for successfully achieving reinforcement. But when the application of reinforcement is either less immediate or less contingent (less consistent), the ability of dopamine to act upon the appropriate synapses is reduced.
Operant variability 
Operant variability is what allows a response to adapt to new situations. Operant behavior is distinguished from reflexes in that its response topography (the form of the response) is subject to slight variations from one performance to another. These slight variations can include small differences in the specific motions involved, differences in the amount of force applied, and small changes in the timing of the response. If a subject's history of reinforcement is consistent, such variations will remain stable because the same successful variations are more likely to be reinforced than less successful variations. However, behavioral variability can also be altered when subjected to certain controlling variables.
Avoidance learning 
Avoidance learning belongs to negative reinforcement schedules. The subject learns that a certain response will result in the termination or prevention of an aversive stimulus. There are two kinds of commonly used experimental settings: discriminated and free-operant avoidance learning.
Discriminated avoidance learning 
In discriminated avoidance learning, a novel stimulus such as a light or a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). During the first trials (called escape-trials) the animal usually experiences both the CS (Conditioned Stimulus) and the US (Unconditioned Stimulus), showing the operant response to terminate the aversive US. During later trials, the animal will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occurring. Such trials are called "avoidance trials."
Free-operant avoidance learning 
In this experimental session, no discrete stimulus is used to signal the occurrence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli. There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the operant response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an operant response during which no shocks will be delivered. Note that each time the organism performs the operant response, the R-S-interval without shocks begins anew.
Two-process theory of avoidance 
This theory was originally proposed in order to explain discriminated avoidance learning, in which an organism learns to avoid an aversive stimulus by escaping from a signal for that stimulus. The theory assumes that two processes take place:
- a) Classical conditioning of fear.
- During the first trials of the training, the organism experiences the pairing of a CS with an aversive US. The theory assumes that during these trials an association develops between the CS and the US through classical conditioning and, because of the aversive nature of the US, the CS comes to elicit a conditioned emotional reaction (CER) – "fear."
- b) Reinforcement of the operant response by fear-reduction.
- As a result of the first process, the CS now signals fear; this unpleasant emotional reaction serves to motivate operant responses, and those responses that terminate the CS are reinforced by fear termination. Although, after this training, the organism no longer experiences the aversive US, the term "avoidance" may be something of a misnomer, because the theory does not say that the organism "avoids" the US in the sense of anticipating it, but rather that the organism escapes an aversive internal state that is caused by the CS.
Four term contingency 
Applied behavior analysis, which is the name of the discipline directly descended from Skinner's work, holds that behavior is explained in four terms: conditioned stimulus (SC), a discriminative stimulus (Sd), a response (R), and a reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli).
Operant hoarding 
Operant hoarding is a referring to the choice made by a rat, on a compound schedule called a multiple schedule, that maximizes its rate of reinforcement in an operant conditioning context. More specifically, rats were shown to have allowed food pellets to accumulate in a food tray by continuing to press a lever on a continuous reinforcement schedule instead of retrieving those pellets. Retrieval of the pellets always instituted a one-minute period of extinction during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See schedules of reinforcement.
Operant conditioning in economics 
Psychologists have become interested in economic constrictions in the marketplace and schedule restraints on instrumental conditioning. One concept that encompasses both of economics and instrumental conditioning is consumer demand. With consumer demand,the focus is one the price of the commodity and the amount purchased. The degree to which price influences consumption is defined as being the elasticity of demand. Certain commodities are more elastic than others. Price change in certain foods can effect the amount bought, while gasoline and essentials seem to be less effected by price changes. For these examples, gasoline and essentials would be less elastic than certain foods like cake and candy. On a graph model representation, something less elastic would not be stretched out as far as a commodity that's consumption fluctuates greatly due to the price. 
Questions about the law of effect 
A number of observations seem to show that operant behavior can be established without reinforcement in the sense defined above. Most cited is the phenomenon of autoshaping (sometimes called "sign tracking"), in which a stimulus is repeatedly followed by reinforcement, and in consequence the animal begins to respond to the stimulus. For example, a response key is lighted and then food is presented. When this is repeated a few times a pigeon subject begins to peck the key even though food comes whether the bird pecks or not. Similarly, rats begin to handle small objects, such as a lever, when food is presented nearby. Strikingly, pigeons and rats persist in this behavior even when pecking the key or pressing the lever leads to less food (omission training).
These observations and others appear to contradict the law of effect, and they have prompted some researchers to propose new conceptualizations of operant reinforcement (e.g. A more general view is that autoshaping is an instance of classical conditioning; the autoshaping procedure has, in fact, become one of the most common ways to measure classical conditioning. In this view, many behaviors can be influenced by both classical contingencies (stimulus-reinforcement) and operant contingencies (response-reinforcement), and the experimenter’s task is to work out how these interact.
See also 
- Animal testing
- Applied behavior analysis
- Behaviorism (philosophies behind operant conditioning)
- Behavioral contrast
- Cognitivism (psychology) (theory of internal mechanisms without reference to behavior)
- Consumer demand tests (animals)
- Educational psychology
- Educational technology
- Experimental analysis of behavior
- Exposure therapy
- Jerzy Konorski
- Learned industriousness
- Matching law
- Negative (positive) contrast effect
- Preference tests (animals)
- Premack principle
- Reinforcement learning
- Reward system
- Social conditioning
- Spontaneous recovery
- J.E.R Staddon and D.T Cerutii, department of psychological and brain sciences, Duke University
- Domjan, Michael, Ed., The Principles of Learning and Behaviour, Fifth Edition, Belmont, CA: Thomson/Wadsworth, 2003
- Tucker, M., Sigafoos, J., & Bushell, H. (1998). Use of noncontingent reinforcement in the treatment of challenging behavior. Behavior Modification, 22, 529–547.
- Poling, A., & Normand, M. (1999). Noncontingent reinforcement: an inappropriate description of time-based schedules that reduce behavior. Journal of Applied Behavior Analysis, 32, 237–238.
- Carlson, Heth, Neil R. , C. Donald (2007). Psychology the Science of Behavious. New Jersey, USA: Pearson. pp. 700. ISBN 978-0-205-64524-4
- Schacter et al.2011 Psychology 2nd ed. pg.280-284 Reference for entire section Principles version 130317
- Thorndike, E.L. (1901). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph Supplement, 2, 1–109.
- Miltenberger, R. G. “Behavioral Modification: Principles and Procedures”. Thomson/Wadsworth, 2008. p. 9.
- Mecca Chiesa (2004) Radical Behaviorism: the philosophy and the science
- Staddon, J. E. R; D.T Cerutti (February 2003). "Operant Conditioning". Annual Review of Psychology 54 (1): 115. Retrieved 23 March 2013.
- Schacter, Daniel L., Daniel T. Gilbert, and Daniel M. Wegner. "B.F Skinner: The role of reinforcement and Punishment", subsection in: Psychology; Second Edition. New York: Worth, Incorporated, 2011. 278-288. Print.
- "Activity of pallidal neurons during movement", M.R. DeLong, J. Neurophysiol., 34:414–27, 1971
- Richardson RT, DeLong MR (1991): Electrophysiological studies of the function of the nucleus basalis in primates. In Napier TC, Kalivas P, Hamin I (eds), The Basal Forebrain: Anatomy to Function (Advances in Experimental Medicine and Biology, vol. 295. New York, Plenum, pp. 232–252
- PNAS 93:11219-24 1996, Science 279:1714–8 1998
- Neuron 63:244–253, 2009, Frontiers in Behavioral Neuroscience, 3: Article 13, 2009
- Michael J. Frank, Lauren C. Seeberger, and Randall C. O'Reilly (2004) "By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism," Science 4, November 2004
- Miltenberger, R. G. “Behavioral Modification: Principles and Procedures”. Thomson/Wadsworth, 2008. p. 84.
- Miltenberger, R. G. “Behavioral Modification: Principles and Procedures”. Thomson/Wadsworth, 2008. p. 86.
- Schultz, Wolfram (1998). Predictive Reward Signal of Dopamine Neurons. The Journal of Neurophysiology, 80(1), 1–27.
- Neuringer, A. (2002). Operant variability: Evidence, functions, and theory. Psychonometric Bulletin & Review, 9(4), 672–705.
- Pierce & Cheney (2004) Behavior Analysis and Learning
- Cole, M.R. (1990). Operant hoarding: A new paradigm for the study of self-control. Journal of the Experimental Analysis of Behavior, 53, 247–262.
- Domjan, M. (2009). The Principles of Learning and Behavior. Wadsworth Publishing Company. 6th Edition. pages 244-249.
- Timberlake, W. (1983). Rats' responses to a moving object related to food or water: A behavior-systems analysis. Animal Learning & Behavior. 11(3):309–320.
- Neuringer, A.J. (1969). Animals respond for food in the presence of free food. Science. 166:399-401.
- Williams, D.R. and Williams, H. (1969). Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement. J. Exper. Analys. of Behav. 12:511–520.
- Peden, B.F., Brown, M.P., & Hearst, E. (1977). Journal of Experimental Psychology: Animal Behavior Processes. 3(4):377–399.
- Gardner, R.A., & Gardner, B.T. (1988). Feedforward vs feedbackward: An ethological alternative to the law of effect. Behavioral and Brain Sciences. 11:429–447.
- Gardner, R. A. & Gardner B.T.(1998) The structure of learning from sign stimuli to sign language. Mahwah NJ: Lawrence Erlbaum Associates.
- Baum, W. M. (2012) Rethinking reinforcement: Allocation, induction and contingency. Journal of the Experimental Analysis of Behavior, 97, 101-124 .
- Locurto, C. M., Terrace, H. S., & Gibbon, J. (1981) Autoshaping and conditioning theory. New York: Academic Press.
|Look up operant in Wiktionary, the free dictionary.|
- Journal of Applied Behavior Analysis
- Journal of the Experimental Analysis of Behavior
- Negative reinforcement
- Scholarpedia Operant conditioning
- Society for Quantitative Analysis of Behavior
- An Introduction to Verbal Behavior Online Tutorial
- An Introduction to Relational Frame Theory Online Tutorial
|Wikimedia Commons has media related to: Operant conditioning|