How Reinforcement Schedules Work
Operant conditioning is a learning process in which new behaviors are acquired and modified though their association with consequences. Reinforcing a behavior increases the likelihood it will occur again in the future, while punishing a behavior decreases the likelihood that it will be repeated. In operant conditioning, schedules of reinforcement are an important component of the learning process. When and how often we reinforce a behavior can have a dramatic impact on the strength and rate of the response.
So what exactly is a schedule of reinforcement and how does it work in the conditioning process? A schedule of reinforcement is basically a rule stating which instances of a behavior will be reinforced. In some cases, a behavior might be reinforced every time it occurs. Sometimes, a behavior might not be reinforced at all.
Either positive reinforcement or negative reinforcement might be used, depending on the situation. In both cases, the goal of reinforcement is always to strengthen the behavior and increase the likelihood that it will occur again in the future.
You can get a better feel for how reinforcement schedules operate by thinking about how learning takes place in both naturally occurring learning situations as well as more structured training situations. In real-world settings, behaviors are probably not going to be reinforced each and every time they occur. For situations where you are purposely trying to train and reinforce an action, such as in the classroom, in sports, or in animal training, you might opt to follow a specific reinforcement schedule.
As you’ll see below, some schedules are best suited to certain types of training situations. In some cases, training might call for starting out with one schedule and switching to another once the desired behavior has been taught. Certain schedules of reinforcement may be more effective in specific situations.
The two types of reinforcement schedules are continuous reinforcement and partial reinforcement (with four variants).
In continuous reinforcement, the desired behavior is reinforced every single time it occurs. This schedule is best used during the initial stages of learning in order to create a strong association between the behavior and the response.
For example, imagine that you are trying to teach a dog to shake your hand. During the initial stages of learning, you would probably stick to a continuous reinforcement schedule in order to teach and establish the behavior. You might start by grabbing the animal’s paw, performing the shaking motion, saying “Shake,” and then offering a reward each and every time you perform this sequence of steps. Eventually, the dog will start to perform the action on his own, and you might opt to continue reinforcing every single correct response until the behavior is well established.
Once the response if firmly attached, continuous reinforcement is usually switched to a partial reinforcement schedule.
In partial or intermittent reinforcement, the response is reinforced only part of the time. Learned behaviors are acquired more slowly with partial reinforcement, but the response is more resistant to extinction.
For example, think of our earlier example where you were training a dog to shake. While you initially used a continuous schedule, reinforcing every single instance of the behavior may not always be realistic. Eventually, you might decide to switch to a partial schedule where you provide reinforcement after so many responses occur or after so much time has elapsed.
There are four schedules of partial reinforcement:
Fixed-ratio schedules are those where a response is reinforced only after a specified number of responses. This schedule produces a high, steady rate of responding with only a brief pause after the delivery of the reinforcer. An example of a fixed-ratio schedule would be delivering a food pellet to a rat after it presses a bar five times.
Variable-ratio schedules occur when a response is reinforced after an unpredictable number of responses. This schedule creates a high steady rate of responding. Gambling and lottery games are good examples of a reward based on a variable ratio schedule. In a lab setting, this might involve delivering food pellets to a rat after one bar press, again after four bar presses, and a third pellet after two bar presses.
Fixed-interval schedules are those where the first response is rewarded only after a specified amount of time has elapsed. This schedule causes high amounts of responding near the end of the interval but much slower responding immediately after the delivery of the reinforcer. An example of this in a lab setting would be reinforcing a rat with a lab pellet for the first bar press after a 30-second interval has elapsed.
Variable-interval schedules occur when a response is rewarded after an unpredictable amount of time has passed. This schedule produces a slow, steady rate of response. An example of this would be delivering a food pellet to a rat after the first bar press following a one-minute interval, another pellet for the first response following a five-minute interval, and a third food pellet for the first response following a three-minute interval.
Deciding when to reinforce a behavior can depend on a number of factors. In cases where you are specifically trying to teach a new behavior, a continuous schedule is often a good choice.
Once the behavior has been learned, switching to a partial schedule is often preferable.
In daily life, partial schedules of reinforcement occur much more frequently than do continuous ones. For example, imagine if you received some type of reward every time you showed up to work on time. Instead, such rewards are usually doled out on a much less predictable partial reinforcement schedule. Not only are these schedules much more realistic and easier to implement, they also tend to produce higher response rates while being less susceptible to extinction.
Realistically, reinforcing a behavior every single time it occurs can be difficult and requires a great deal of attention and resources. Partial schedules not only tend to lead to behaviors that are more resistant to extinction, they also reduce the risk that the subject will become satiated. If the reinforcer being used is no longer desired or rewarding, the subject may stop performing the desired behavior.
For example, imagine that you are trying to teach a dog to sit. If you are using food as a reward, the dog might stop performing the action once he is full. In such instances, something like praise or attention might be a more effective reinforcer.
What could cause a person or animal to stop engaging in a previously conditioned behavior? Extinction is one explanation. In psychology, extinction refers to the gradual weakening of a conditioned response that results in the behavior decreasing or disappearing. In other words, the conditioned behavior eventually stops.
For example, imagine that you taught your dog to shake hands. Over time, the trick became less interesting. You stop rewarding the behavior and eventually stop asking your dog to shake. Eventually, the response becomes extinct, and your dog no longer displays the behavior.
In classical conditioning, when a conditioned stimulus is presented alone without an unconditioned stimulus, the conditioned response will eventually cease. For example, in Pavlov’s classic experiment, a dog was conditioned to salivate to the sound of a bell. When the bell was repeatedly presented without the presentation of food, the salivation response eventually became extinct.
In operant conditioning, extinction occurs when a response is no longer reinforced following a discriminative stimulus. B. F. Skinner described how he first observed this phenomenon:
“My first extinction curve showed up by accident. A rat was pressing the lever in an experiment on satiation when the pellet dispenser jammed. I was not there at the time, and when I returned I found a beautiful curve. The rat had gone on pressing although no pellets were received. . . . The change was more orderly than the extinction of a salivary reflex in Pavlov’s setting, and I was terribly excited. It was a Friday afternoon and there was no one in the laboratory who I could tell. All that weekend I crossed streets with particular care and avoided all unnecessary risks to protect my discovery from loss through my accidental death.”
Let’s take a closer look at a few more examples of extinction.
Imagine that a researcher has trained a lab rat to press a key to receive a food pellet. What happens when the researcher stops delivering the food? While extinction will not occur immediately, it will after time. If the rat continues to press the key but does not get the pellet, the behavior will eventually dwindle until it disappears entirely.
Conditioned taste aversions can also be affected by extinction. Imagine that you ate some ice cream right before getting sick and throwing it up. As a result, you developed a taste aversion to ice cream and avoided eating it, even though it was formerly one of your favorite foods.
One way to overcome this reluctance would be to expose yourself to ice cream, even if just the thought of eating it made you feel a little queasy. You might start by taking just a few small tastes over and over again. As you continued to eat the food without getting sick, your conditioned aversion would eventually diminish.
If the conditioned response is no longer displayed, does that really mean that it’s gone forever? In his research on classical conditioning, Pavlov found that when extinction occurs, it doesn’t mean that the subject returns to their unconditioned state. Allowing several hours or even days to elapse after a response has been extinguished can result in spontaneous recovery of the response. Spontaneous recovery refers to the sudden reappearance of a previously extinct response.
In his research on operant conditioning, Skinner discovered that how and when a behavior is reinforced could influence how resistant it was to extinction. He found that a partial schedule of reinforcement (reinforcing a behavior only part of the time) helped reduce the chances of extinction. Rather than reinforcing the behavior each and every time it occurs, the reinforcement is given only after a certain amount of time has elapsed or a certain number of responses have occurred. This sort of partial schedule results in behavior that is stronger and more resistant to extinction.
A number of factors can influence how resistant a behavior is to extinction. The strength of the original conditioning can play an important role. The longer the conditioning has taken place and the magnitude of the conditioned response may make the response more resistant to extinction. Behaviors that are very well established may become almost impervious to extinction and may continue to be displayed even after the reinforcement has been removed altogether.
Some research has suggested that habituation may play a role in extinction as well. For example, repeated exposure to a conditioned stimulus may eventually lead you to become used to it, or habituated. Because you have become habituated to the conditioned stimulus, you are more likely to ignore it and it’s less likely to elicit a response, eventually leading to the extinction of the conditioned behavior.
Personality factors might also play a role in extinction. One study found that children who were more anxious were slower to habituate to a sound. As a result, their fear response to the sound was slower to become extinct than non-anxious children.