Operant conditioning is the learning rubric based on a contingency between a response of the organism and some event. That is, if the organism makes a specified response, then there will be a change in some contingent event. If a rat presses the bar, then a food pellet will drop into the food dish. If the child pulls the plug on the television, then the television will go off. In operant conditioning emphasis is on the rate of responding, that is, how often the response is made during some unit of time. If the contingent event results in an increase in the rate of responding, the event is called reinforcement. If the contingent event results in a decrease in the rate of responding, the event is called punishment.
The change in the contingent event may be an increase (positive) or a decrease (negative). This results in four combinations: positive reinforcement, negative reinforcement, positive punishment, and negative punishment. Positive reinforcement is an event whose onset or increase results in an increase in the rate of the response it is contingent on. If the rat presses the bar, there is an increase in the food in the dish; so the rate of bar pressing increases. Therefore, the food in this case is a positive reinforcement. Negative reinforcement is a contingent event whose offset or decrease results in an increase in the rate of response. If by pressing the bar the rat can turn off an electric shock on the grid floor, there will be an increase in the rate at which the rat presses the bar when the foot-shock is on. Here the offset of the foot-shock is negative reinforcement.
Positive punishment is a contingent event whose onset or increase results in a decrease in the rate of response. If when a rat presses a bar, he receives a foot-shock, the rate of bar pressing will probably decrease. If so, the onset of the foot-shock is positive punishment. Negative punishment is a contingent event whose offset or decrease results in a decrease in the rate of response. If when the rat presses the bar, food is taken away from the rat, the bar-pressing rate will probably decrease. Here the decrease in the food is negative punishment.
It can be seen that the onset of an event can have one effect, while the offset another. The onset of food can be positive reinforcement, while the offset is negative punishment. The onset of shock can be positive punishment, while the offset is negative reinforcement. The effect is due to whether the onset or offset is contingent on a particular response.
Extinction in operant conditioning generally refers to a reduction in the rate of response due to the termination of the contingency between the response and reinforcement. If the rat were pressing the bar to receive food and a change was made so food was no longer contingent on pressing the bar (i.e., pressing the bar would not produce food), the rate of bar pressing would decrease. This decrease is called extinction.
There have been many attempts to formalize exactly what types of events will function as reinforcement. The simplest is the empirical approach which identifies an event as reinforcement if it does what reinforcements are defined to do, that is, increase the rate of the response. Many psychologists, however, have sought more general theoretical analyses of reinforcement. A few of these are given below.
Some theorists, mostly coming from the influence of Clark Hull (e.g., Hull, 1943), identify reinforcement as an event that reduces the need or drive. When an animal is deprived of food there is an increase in drive. This drive might be conceptualized as specific to food or more general and nonspecific. A nonspecific drive is one that receives input from a number of different sources such as hunger, thirst, and sex. The rat’s bar pressing for food is assumed to be reinforced because eating the food reduces the drive that was at least partially elicited by food deprivation. A main issue for this orientation is identifying the variables that elicit the original increases in drive.
Sheffield (1966a, 1966b), on the other hand, argues for a drive-induction theory of reinforcement. According to this theory, animals learn responses that arouse motivation. If a rat receives food for turning right in a maze but not for turning left, the consummatory response of eating becomes conditioned to the cues of the right side. Now when the animal approaches the choice point, the right side stimuli tend to elicit the consummatory response, but the rat can’t consume until it gets to the food. This consummatory stimulation without consummation is drive induction and motivates the rat to turn right, because turning right is the response that in the past preceded the consummatory response. Although Sheffield’s theory was at one time more general, now it is basically applied only to consummatory situations, as opposed to punishment situations, for example. Also the consummatory response may be a central response, that is, a response within the organism without overt behaviors.
Miller (1963) suggests there are one or more go-mechanisms activated by reinforcements such as drive reduction or the removal of discrepancy between intention and achievement. Activation of a go-mechanism is then postulated to intensify the ongoing responses to the present stimuli. The go-mechanism also becomes conditioned to the occurrence of the response so that future occurrences of the stimuli elicit both the response and the excitatory state. Similar to Miller’s orientation is a theory offered by Landauer (1969). According to Landauer a reinforcement is any event that strengthens response tendencies, such as contingent food or a CS-UCS pairing in respondent conditioning. The reinforcement then facilitates the consolidation of learning, where consolidation refers to “the creation of the lasting neural change which underlies learning.”
Quite a different approach to reinforcement is that suggested by Premack (1959). As originally stated the necessary and sufficient conditions for reinforcement were as follows: “Any response A will reinforce any other response B if and only if the independent rate of A is greater than that of B.” Thus, according to Premack, responses reinforce responses. To determine which responses will reinforce which responses it is necessary to measure their independent rates, the rates at which there is no contingency between the responses. From these independent rates it can be predicted that a response can reinforce any other response if the independent rate of the first is higher than that of the second. For example, if a hungry rat is put in an apparatus where he can eat food and press a bar (where pressing the bar does not yield anything), the rat’s independent rate of eating will be higher than the independent rate of bar pressing. Therefore, when eating food is made contingent on bar pressing, it will function as reinforcement.
Premack (1965) later expanded his principle of reinforcement to take into account positive and negative reinforcement. Now the principle is that if the onset or offset of one response is more probable than the onset or offset of another, the former will reinforce the latter positively if the superiority is for “on” probability and negatively if for the “off” probability. This more complex principle is illustrated in the reading by Hundt and Premack in which running in a wheel is the basis for both positive and negative reinforcement.
Although punishment is simply defined as a contingent event that results in a decrease in the frequency of a response, the mechanisms by which it produces this effect are greatly debated (cf. Campbell & Church, 1968; Church, 1963; Dunham, 1971). For example, the punishment may elicit an emotional response such as fear and/or some other response such as jumping back. These elicited responses, then, might become conditioned to the situation where the punishment occurred and/or any existing agent that administered the punishment. To the extent that these conditioned responses are elicited by the situation and are incompatible with the punished response, their occurrence may result in a decrease in the frequency of the punished response. Church (1963) refers to the “fear hypothesis” where the emphasis is on the conditioned fear and the “competing response hypothesis” where the emphasis is on the competing conditioned skeletal responses.
If the onset of an event is positive punishment, the offset will probably be negative reinforcement. Thus, if an animal is punished for response A and the punishment causes him to make response B, response B might be negatively reinforced by the offset of the aversive event. To the extent that responses A and B are incompatible, reinforcing B may result in a decrease in the frequency of A. For example, Church (1963) mentions the “escape hypothesis,” which emphasizes the escape response to the punishment which is negatively reinforced.
Dunham (1971) suggests
two basic rules of punishment due to shock:
|That particular response in the organism’s repertoire which is most frequently associated with the onset of shock and/or predicts it within a shorter time than other responses will decrease in probability and remain below its operant baseline.|
|That particular response in the organism’s repertoire which is most frequently associated with the absence of the onset of shock and/or predicts the absence of it for a longer period of time than other responses will increase in probability and remain above its operant baseline.|
In the second reading Solomon discusses a wide range of effects punishment might have and some of the variables that determine the effect of punishment.
If a neutral stimulus is paired with a reinforcement, the neutral stimulus may acquire reinforcing properties. If it does, it is then called a secondary reinforcement or a conditioned reinforcement. If when a rat presses a bar he hears a click and receives a food pellet, the click by being paired with the food acquires conditioned reinforcing properties. The rat might now make a new response simply in order to hear the click. There are both positive conditioned reinforcements (e.g., approval and money) and negative conditioned reinforcements (e.g., criticism and fines).
For awhile there were basically two theories of conditioned reinforcement: the S-S hypothesis and the discriminative stimulus hypothesis (cf. Hendry, 1969). The S-S hypothesis, as advocated by theorists such as Hull, suggested that one stimulus acquires reinforcing properties simply by being paired (occurring in close temporal contiguity) with a reinforcing stimulus. The discriminative stimulus hypothesis, suggested by Skinner, argues that only discriminative stimuli become conditioned reinforcers, the stimulus must gain discriminative control over a reinforced response in order to become a conditioned reinforcement. In other words, for a stimulus to acquire conditioned reinforcing properties it must be a cue to which the organism makes a response that is reinforced. For example, if when a red light came on the rat had learned to press the bar for food, the red light would become a conditioned reinforcement.
The discriminative stimulus hypothesis has not held up well because it appears that simple pairing of a stimulus with a reinforcement is often sufficient. And the article by Egger and Miller raises problems for the S-S hypothesis. Egger and Miller showed that an important variable in conditioned reinforcement is the amount of information the stimulus provides about the onset of the reinforcement. For example, if when a light comes on it is followed by a tone and then food, the light will become a stronger conditioned reinforcement than the tone, even though the tone is temporally closer to the food. For the light gives the most information about the food; when the light comes on food is known to follow. The tone, on the other hand, is redundant; it provides no information not already supplied by the light.
Operant conditioning, as discussed above, is concerned with contingent events. Reinforcement and punishment are also often dependent events; they don’t occur unless a specified response occurs. Many events, however, are nondependent; they occur almost regardless of the behavior of the organism. If the nondependent event is a reinforcement, the animal might “learn” to make whatever response he just happened to be doing when the nondependent event occurred. This is called superstitious behavior. For example, a device might just drop a food pellet into an operant unit at random intervals, independent of what the rat is doing. If the rat happens to be chewing on the food dish when food drops in, he might learn to chew on the dish in an attempt to get food.
If the nondependent event is aversive, such as inescapable shock, the animal might develop learned helplessness. The animal might learn there is no correlation between what he does and the onset or offset of the aversive event. This might result in the animal’s simply passively accepting whatever happens to him. The genesis and treatment of learned helplessness are discussed in the article by Seligman, Maier, and Geer.
Until fairly recently it was generally held that autonomic responses such as blood pressure and heart rate could be conditioned respondently, but not operantly. Then in a series of experiments, primarily under the influence of Neal Miller, it was found that many autonomic responses including heart rate, intestinal contractions, urine formation by the kidney, and specific brain waves can be operantly conditioned. In the fifth article Miller discusses this very important research and some of its implications for areas such as psychosomatic illnesses.
The various applications of operant conditioning procedures are impressively extensive. Included are such diverse areas as programmed instruction, design of communities, teaching people how to control their blood pressure, and training circus animals. Animals have been trained to perform a number of tasks. For example, Verhave (1966) trained two pigeons to inspect pills for a drug company, and the article by Skinner describes how pigeons were trained to fly missiles.
An important area of application of operant techniques is in the modification of human behavior (cf. Mikulas, 1972, 1974; Whaley & Malott, 1971). The basic strategy is to withhold reinforcement from undesirable behaviors or in some situations punish them, while simultaneously reinforcing approximations to desirable behaviors. In the seventh article Ayllon shows the applications of operant procedures in a mental hospital. Dealing with a forty-seven-year-old female schizophrenic, Ayllon decreased her food stealing by using negative punishment (withdrawal of a meal). The patient’s towel hoarding was decreased by flooding her with towels. This procedure, called “stimulus satiation,” decreases the reinforcing value of the towels by extinguishing their conditioned reinforcing value and/or by conditioning in aversive elements to the towels. Finally, the excessive amount of clothes the patient wore was gradually decreased with food reinforcement (although the procedure also has aspects of negative punishment, missing a meal).
ALAN G. HUNDT and DAVID PREMACK, University of Missouri
Rats were required to press a bar to activate a motor-driven wheel that forced them to run and subsequently to drink to turn off the wheel. Barpressing and licking increased, showing the onset and offset of running to be positively and negatively reinforcing, respectively. The experimental control of the offset of running, in contrast to the traditional control for onset only, served to demonstrate that since organisms stop such behaviors as they start, self-initiated behaviors will act as negative as well as positive reinforcers.
The traditional use of two kinds of events for positive and negative reinforcement, respectively, creates the impression that the environment of a species divides naturally into discrete classes of positive and negative events. In fact, this division results more from an experimenter convention than from a relation between the species and its environment. Specifically, it results from the fact that experimenters instrument only the onset of some behaviors and only the offset of others, rather than using both the onset and offset of any one behavior.
For example, although
organisms both initiate and terminate eating, only initiation is used in reinforcement.
In the standard food reinforcement case, the organism is required to make
On the other side of the coin, only the organism’s tendency to turn off (for example) electric shock is used. But will organisms initiate contact with shock and other supposedly negative events? Recent work(1) shows that rats initiate contact with electric shock, and fail to do so only at “high” voltages. Except for the “high-intensity” cases, organisms apparently initiate and terminate responding for all stimuli to which they respond. That is, they not only initiate the traditional positives, and terminate the traditional negatives, but rather initiate and terminate both. Indeed, all free responding is highly discontinuous, there being apparently characteristic burst length and interburst length distributions for each behavior(2). Accordingly, to demonstrate the positive and negative capacities of one and the same event requires that there be experimental control of both onset and offset, not one or the other as has been the case.
Of the three cases for which we are currently attempting to establish control of both onset and offset, the one reported here is locomotion. Two findings aided the implementation of this case. (i) Rats choose to press a bar that causes a wheel to rotate and force themselves to run. That is, for the rat, the opportunity to force itself to run is reinforcing; the frequency of the bar-press is increased by such a contingency. (ii) The rat is able to drink while running. These findings led to the following procedure. The rat is placed in a modified Wahmann activity wheel that contains a bar and a drinkometer(3). The wheel is not free to move but is connected to a variable-speed motor. When the rat presses the bar, the motor is activated, the wheel rotates, and the rat is forced to run. It must continue running until it licks the drinkometer a predetermined number of times, which turns off the motor, stops the wheel, and allows the rat to stop. The rat thus both starts and stops running, the former by the bar-press, the latter by licking.
The base measure for the bar-press is the usual number of bar-presses when the bar-press does not turn on the wheel. The base measure for licking is the duration of licking when licks do not turn the wheel off. That is, in determining the base lick rate, the bar-press turns the wheel on, so that the rat runs, but drinking does not turn the wheel off; instead, the experimenter turns the wheel off after each 5-second interval of running. The base condition was designed as a control for the possibility that running might either induce licking or interfere with and reduce it. In fact, running tends to reduce drinking: the rat drinks most when running is totally precluded.(4) Because of this decremental relation, we used the 5-second running burst in the base condition; this value leads to a total duration of running per session (under 200 seconds) that is close to, but less than, the smallest amount of running found in any of the experimental conditions (see Fig. 1). Accordingly, increments in licking computed relative to this base err conservatively, that is, underestimate the increment.
Three female albino rats, about 180 days old, Sprague-Dawley strain, were used. They were maintained on free food and
water. An additional question was answered by using a fixed-ratio schedule in conjunction with the “off” response. How does the “difficulty” of turning off a response affect the likelihood of its being turned on? All animals were trained with fixed-ratio lick requirements of 1, 3, 9, 19, and 13, in the order stated. That is, on different sessions the rat was required to complete a different, predetermined number of licks in order to turn the wheel off. On all sessions the drinking tube contained 8 percent sucrose by weight; sucrose was used to facilitate the drinking response. One bar-press always turned the wheel on. All sessions lasted 20 minutes and took place daily.
Figure 1 shows the principal results for one subject, results for the other two being the same in all essentials. Shown as a function of the fixed-ratio requirement on the off-response are (i) frequency of the on-response, (ii) duration of the off- response, (iii) duration of running. The onset of running clearly Increased the frequency of the bar-presses. The base frequency or operant level was zero for all three rats, in contrast to the average of 20 bar-presses that occurred for the minimal offset requirement. Increase in the bar-press was less evident when the “off” requirement was high: in general, frequency of the on-response was inversely proportional to magnitude of the off requirement. Thus, the rat turned the wheel on only about twice per session when it took 19 licks to turn it off, and turned it on about 20 times per session when it took only one lick to turn it off.
The average duration of licking per session, shown in the broken line in Fig. 1, increased moderately with the off requirement. Comparison with the point to the left of the curve, which gives the duration of licking when licks had no effect upon the wheel (base duration), shows that offset of running increased the duration of licking at all values of the fixed ratio. Furthermore, the increase is entirely in instrumental licking. The rat does two kinds of licking in this situation, some when it is running, which is instrumental to turning off the wheel, and some when it is not running, which amounts to drinking-to-drink and which is the kind of drinking that occurs in the base condition. The curve for licking in Fig. 1 includes both kinds; if only instrumental licking were shown the curve would rise still more steeply. That is, since running reduces drinking, and running increased over the fixed ratio, drinking-to-drink actually declined over the same variable. Thus, the increase in purely instrumental licking is somewhat greater than is indicated by Fig. 1, particularly at the larger fixed-ratio values.
The total duration of running per session is shown by the dark line in Fig. 1. Interestingly, it increased with the magnitude of the off requirement despite the fact that the number of times the rat turned the wheel on decreased as a function of the same variable. This is accounted for by the fact that the average burst of running was far longer in the case of the 19-lick off requirement than in the case of the one-lick requirement—an average of 140 seconds versus 10 seconds. That is, a high off requirement led to a few extremely long bursts of running, whereas the low off requirement led to numerous short bursts of running, the total duration of running being notably greater for the large than for the small requirement. This difference did not result from the rat “trying” but failing to turn the wheel off in the case of the high off requirement. From the time the animal started running to the occurrence of the first lick averaged only about 10 seconds for the one-lick requirement versus about 134 seconds for the 19-lick requirement. Thus, when faced with a large off requirement, the animal did not “try” and fail, but rather ran continuously for an unusually long period before even initiating the off response. This delay may amount to a fixed-ratio pause for the off-response, analogous to the classical increase in delay of the instrumental response that is produced by increasing the fixed ratio for the on-response.
That subjects would work to turn on and off the same stimulation had, prior to the present data, been shown only for intracranial self stimulation.(5) Indeed, the first on-off reinforcement system was discovered with intracranial stimulation, on the basis of the originally puzzling finding that subjects would learn to escape but not to avoid the stimulation.(6) Not surprisingly, we find the same relation here: the rat can be trained to escape the already-moving wheel but not to avoid its onset. The formal parallel between the neural and behavioral evidence is thus increased by the present data. More important, since all self-initiated behaviors for example, eating, drinking, and copulation, as well as running—are also self-terminated, it is likely that merely technical difficulties will impede showing that all such behaviors are on-off systems, capable of generating both positive and negative reinforcement.
These results indicate how a generalization that was stated originally for positive reinforcement may now be broadened to include negative reinforcement as well. Originally, the generalization read: for any pair of responses, the more probable one will reinforce the less probable one.(7) But this fails to distinguish between the onset and offset of an event. The generalization should now read: if the onset or offset of one response is more probable than the onset or offset of another, the former will reinforce the latter—positively, if the superiority is for “on” probability, and negatively, if for the “off” probability. Four reinforcement paradigms can be identified on the basis of the completed generalization: on-on, on-off, off-off, off-on, where the terms of each pair refer to the instrumental and contingent responses, respectively. Thus, the first two paradigms were instanced here by bar-press-run onset and lick-run offset, and represent positive and negative cases, respectively; the other two paradigms remain to be investigated.(8)
RICHARD L. SOLOMON, University of Pennsylvania
First, an introduction: I will attempt to achieve three goals today. (a) I will summarize some empirical generalizations and problems concerning the effects of punishment on behavior; (b) I will give some demonstrations of the advantages of a two-process learning theory for suggesting new procedures to be tried out in punishment experiments; and (c) finally, I shall take this opportunity today to decry some unscientific legends about punishment, and to do a little pontificating—a privilege that I might be denied in a journal such as the Psychological Review, which I edit!
Now, for a working
definition of punishment: The definition of a punishment is not operationally
simple, but some of its attributes are clear. A punishment is a noxious stimulus,
one which will support, by its termination or omission, the growth of new
escapes or avoidance responses. It is one which the subject will reject, if
given a choice between the punishment and no stimulus at all. Whether the
data on the behavioral effects of such noxious stimuli will substantiate our
commonsense view of what Constitutes an effective punishment, depends on a
wide Variety of conditions that I shall survey. Needless to say, most of
Let us first consider two sample experiments. Imagine a traditional alley runway, 6 feet long, with its delineated goal box and start box, and an electrifiable grid floor. In our first experiment, a rat is shocked in the start box and alley, but there is no shock in the goal box. We can quickly train the rat to run down the alley, if the shock commences as the start-box gate is raised and persists until the rat enters the goal box. This is escape training. If, however, we give the rat 5 seconds to reach the goal box after the start-box gate is raised, and only then do we apply the shock, the rat will usually learn to run quickly enough to avoid the shock entirely. This procedure is called avoidance training, and the resultant behavior change is called active avoidance learning. Note that the response required, either to terminate the shock or to remove the rat from the presence of the dangerous start box and alley, is well specified, while the behavior leading to the onset of these noxious stimulus conditions is left vague. It could be any item of behavior coming before the opening of the gate, and it would depend on what the rat happened to be doing when the experimenter raised the gate.
In our second sample experiment, we train a hungry rat to run to the goal box in order to obtain food. After performance appears to be asymptotic, we introduce a shock, both in the alley and goal box, and eliminate the food. The rat quickly stops running and spends its time in the start box. This procedure is called the punishment procedure, and the resultant learning- to-stay-in-the-start-box is called passive avoidance learning. Note that, while the behavior producing the punishment is well specified, the particular behavior terminating the punishment is left vague. It could be composed of any behavior that keeps the rat in the start box and out of the alley.
In the first experiment, we were teaching the rat what to do, while in the second experiment we were teaching him exactly what not to do; yet in each case, the criterion of learning was correlated with the rat’s receiving no shocks, in contrast to its previous experience of receiving several shocks in the same experimental setting. One cannot think adequately about punishment without considering what is known about the outcomes of both procedure. Yet most reviews of the aversive control of behavior emphasize active avoidance learning and ignore passive avoidance learning. I shall, in this talk, emphasize the similarities, rather than the differences between active and passive avoidance learning. I shall point out that there is a rich store of knowledge of active avoidance learning which, when applied to the punishment procedure, increases our understanding of some of the puzzling and sometimes chaotic results obtained in punishment experiments.
But first, I would like to review some of the empirical generalities which appear to describe the outcomes of experiments on punishment and passive avoidance learning. For this purpose, I divide the evidence into 5 classes: (a) the effects of punishment on behavior previously established by rewards or positive reinforcement, (b) the effects of punishment on consummatory responses, (c) the effects of punishment on complex, sequential patterns of innate responses, (d) the effects of punishment on discrete reflexes, (e) the effects of punishment on responses previously established by punishment—or, if you will, the effects of punishment on active escape and avoidance responses. The effectiveness of punishment will be seen to differ greatly across these five classes of experiments. For convenience, I mean by effectiveness the degree to which a punishment procedure produces suppression of, or facilitates the extinction of, existing response patterns.
Now, let us look at punishment for instrumental responses or habits previously established by reward or positive rein forcers. First, the outcomes of punishment procedures applied to previously rewarded habits are strongly related to the intensity of the punishing agent. Sometimes intensity is independently defined and measured, as in the case of electric shock. Sometimes we have qualitative evaluations, as in the case of Maier’s (1949) rat bumping his nose on a locked door, or Masserman’s (Masserman & Pechtel 1953) spider monkey being presented with a toy snake, or Skinner’s (1938) rat receiving a slap on the paw from a lever, or my dog receiving a swat from a rolled-up newspaper. As the intensity of shock applied to rats, cats, and dogs is in- creased from about .1 milliampere to 4 milliamperes, these orderly results can be obtained: (a) detection and arousal, wherein the punisher can be used as a cue, discriminative stimulus, response intensifier, or even as a secondary reinforcer; (b) temporary suppression, wherein punishment results in suppression of the punished response, followed by complete recovery, such that the subject later appears unaltered from his prepunished state; (c) partial suppression, wherein the subject always displays some lasting suppression of the punished response, without total recovery; and (d) finally, there is complete suppression, with no observable recovery. Any of these outcomes can be produced, other things being equal, by merely varying the intensity of the noxious stimulus used (Azrin & Holz, 1961), when we punish responses previously established by reward or positive reinforcement. No wonder different experimenters report incomparable outcomes. Azrin (1959) has produced a response-rate increase while operants are punished. Storms, Boroczi, and Broen (1962) have produced long-lasting suppression of operants in rats.2 Were punishment intensities different? Were punishment durations different? (Storms, Boroczi & Broen, 1963, have shown albino rats to be more resistant to punishment than are hooded rats, and this is another source of discrepancy between experiments.)
But other variables are possibly as important as punishment intensity, and their operation can make it unnecessary to use intense punishers in order to produce the effective suppression of a response previously established by positive reinforcement. Here are some selected examples:
|Proximity in time and space to the punished response determines to some extent the effectiveness of a punishment. There is a response-suppression gradient. This has been demonstrated in the runway (Brown, 1948; Karsh, 1962), in the lever box (Azrin, 1956), and in the shuttle box (Kamin, 1959). This phenomenon has been labeled the gradient of temporal delay of punishment.|
|The conceptualized strength of a response, as measured by its resistance to extinction after omission of positive reinforcement, predicts the effect of a punishment contingent upon the response. Strong responses, so defined, are more resistant to the suppressive effects of punishment. Thus, for example, the over- training of a response, which often decreases ordinary resistance to experimental extinction, also increases the effectiveness of punishment (Karsh, 1962; Miller, 1960) as a response suppressor.|
|Adaptation to punishment can occur, and this decreases its effectiveness. New, intense punishers are better than old, intense punishers (Miller, 1960). Punishment intensity, if slowly increased, tends not to be as effective as in the case where it is introduced initially at its high-intensity value.|
|In general, resistance to extinction is decreased whenever a previously reinforced response is punished. However, if the subject is habituated to receiving shock together with positive reinforcement during reward training, the relationship can be reversed, and punishment during extinction can actually increase resistance to extinction (Holz & Azrin, 1961). Evidently, punishment, so employed, can functionally operate as a secondary reinforcer, or as a cue for reward, or as an arouser.|
Punishments become extremely effective when the response-suppression period is tactically used as an aid to the reinforcement of new responses that are topographically incompatible with the punished one. When new instrumental acts are established which lead to the old goal (a new means to an old end), a punishment of very low intensity can have very long-lasting suppression effects. Whiting and Mowrer (1943) demonstrated this clearly. They first rewarded one route to food, then punished it. When the subjects ceased taking the punished route, they provided a new rewarded route. The old route was not traversed again. This reliable suppression effect also seems to be true of temporal, discriminative restraints on behavior. The suppression of urination in dogs, under the control of indoor stimuli, is extremely effective in housebreaking the dog, as long as urination is allowed to go unpunished under the control of outdoor stimuli There is a valuable lesson here in the effective use of punishments in producing impulse control. A rewarded alternative, under discriminative control, makes passive avoidance training a potent behavioral influence. It can produce a highly reliable dog or child. In some preliminary observations of puppy training, we have noted that puppies raised in the lab, if punished by the swat of a newspaper for eating horsemeat, and rewarded for eating pellets, will starve themselves to death when only given the opportunity to eat the taboo horsemeat. They eagerly eat the pellets when they are available.
Finally, I should point out that the attributes of effective punishments vary across species and across stages in maturational development within species. A toy snake can frighten monkeys. It does not faze a rat. A loud noise terrified Watson’s little Albert. To us it is merely a Chinese gong.
I have sketchily reviewed some effects of punishment on instrumental acts established by positive reinforcers. We have seen that any result one might desire, from response enhancement and little or no suppression, to relatively complete suppression, can be obtained with our current knowledge of appropriate experimental conditions. Now let us look at the effects of punishment on consummatory acts. Here, the data are, to me, surprising. One would think that consummatory acts, often being of biological significance for the survival of the individual and the species, would be highly resistant to suppression by punishment. The contrary appears to be so. Male sexual behavior may be seriously suppressed by weak punishment (Beach, Conovitz, Steinberg, & Goldstein, 1956; Gantt, 1944). Eating in dogs and cats can be permaneptly suppressed by a moderate shock delivered through the feet or through the food dish itself (Lichtenstein, 1950; Masserman, 1943). Such suppression effects can lead to fatal self-starvation. A toy snake presented to a spider monkey while he is eating can result in self-starvation (Masserman & pechtel, 1953).
The interference with consummatory responses by punishment needs a great deal of investigation. Punishment seems to be especially effective in breaking up this class of responses, and one can ask why, with some profit. Perhaps the intimate temporal connection between drive, incentive, and punishment results in drive or incentive becoming conditioned-stimulus (CS) patterns for aversive emotional reactions when consummatory acts are punished. Perhaps this interferes with vegetative activity: i.e., does it “kill the appetite” in a hungry subject? But, one may ask why the same punisher might not appear to be as effective when made contingent on an instrumental act as contrasted with a consummatory act. Perhaps the nature of operants is such that they are separated in time and space and response topography from consummatory behavior and positive incentive stimuli, so that appetitive reactions are not clearly present during punishment for operants. We do not know enough yet about such matters, and speculation about it is still fun.
Perhaps the most interesting parametric variation one can study, in experiments on the effects of punishment on consummatory acts, is the temporal order of rewards and punishments. If we hold hunger drive constant, shock-punishment intensity constant, and food-reward amounts constant, a huge differential effect can be obtained when we reverse the order of reward and punishment. If we train a cat to approach a food cup, its behavior in the experimental setting will become quite stereotyped. Then, if we introduce shock to the cat’s feet while it is eating, the cat will vocalize, retreat, and show fear reactions. It will be slow to recover its eating behavior in this situation. Indeed, as Masserman (1943) has shown, such a procedure is likely, if repeated a few times, to lead to self-starvation. Lichtenstein (1950) showed the same phenomenon in dogs. Contrast this outcome with that found when the temporal order of food and shock is reversed. We now use shock as a discriminative stimulus to signalize the availability of food. When the cat is performing well, the shock may produce eating with a latency of less than 5 seconds. The subject’s appetite does not seem to be disturbed. One cannot imagine a more dramatic difference than that induced by reversing the temporal order of reward and punishment (Holz & Azrin, 1962; Masserman, 1943).
Thus, the effects of punishment are partly determined by those events that directly precede it and those that directly follow it. A punishment is not just a punishment. It is an event in a temporal and spatial flow of stimulation and behavior, and its effects will be produced by its temporal and spatial point of insertion in that flow.
I have hastily surveyed some of the effects of punishment when it has been made contingent either on rewarded operants and instrumental acts or on consummatory acts. A third class of behaviors, closely related to consummatory acts, but yet a little different, are instinctive act sequences: the kinds of complex, innately governed behaviors which the ethologists study, such as nest building in birds. There has been little adequate experimentation, to my knowledge, on the effects of punishment on such innate behavior sequences. There are, however, some hints of interesting things to come. For example, sometimes frightening events will produce what the ethologists call displacement reactions—the expression of an inappropriate behavior pattern of an innate sort. We need to experiment with such phenomena in a systematic fashion. The best example I could find of this phenomenon is the imprinting of birds on moving objects, using the locomotor following response as an index. Moltz, Rosenblum, and Halikas (1959), in one experiment, and Kovach and Hess (1963; see also Hess, 1959a, 1959b) in another, have shown that the punishment of such imprinted behavior sometimes depresses its occurrence. However, if birds are punished prior to the presentation of an imprinted object, often the following response will be energized. It is hard to understand what this finding means, except that punishment can either arouse or inhibit such behavior, depending on the manner of presentation of punishment. The suggestion is that imprinting is partially a function of fear or distress. The effectiveness of punishment also is found to be related to the critical period for imprinting (Kovach & Hess, 1963).
However, the systematic study of known punishment parameters as they affect a wide variety of complex sequences of innate behaviors is yet to be carried out. It would appear to be a worthwhile enterprise, for it is the type of work which would enable us to make a new attack on the effects of experience on innate behavior patterns. Ultimately the outcomes of such experiments could affect psychoanalytic conceptions of the effects of trauma on impulses of an innate sort.3
A fourth class of behavior upon which punishment can be made contingent, is the simple, discrete reflex. For example, what might happen if a conditioned or an unconditioned knee jerk were punished? We are completely lacking in information on this point. Can subjects be trained to inhibit reflexes under aversive motivation? Or does such motivation sensitize and enhance reflexes? Some simple experiments are appropriate, but I was unable to find them in the published work I read.
A fifth class of behavior, upon which punishment can be made contingent, is behavior previously established by punishment procedures: in other words, the effect of passive avoidance training on existing, active avoidance learned responses. This use of punishment produces an unexpected outcome. In general, if the same noxious stimulus is used to punish a response as was used to establish it in the first place, the response becomes strengthened during initial applications of punishment. After several such events, however, the response may weaken, but not always. The similarity of the noxious stimulus used for active avoidance training to that used for punishment of the established avoidance response can be of great importance. For example, Carlsmith (1961) has shown that one can increase resistance to extinction by using the same noxious stimuli for both purposes and yet decrease resistance to extinction by using equally noxious, but discriminatively different, punishments. He trained Some rats to run in order to avoid shock, then punished them during extinction by blowing a loud horn. He trained other rats to run in order to avoid the loud horn, then during extinction he punished them by shocking them for running. In two control groups, the punisher stimulus and training stimulus were the same. The groups which were trained and then punished by different noxious stimuli extinguished more rapidly during punishment than did the groups in which the active avoidance training unconditioned stimulus (US) was the same as the passive avoidance training US. Thus, punishment for responses established originally by punishment may be ineffective in eliminating the avoidance responses they are supposed to eliminate. Indeed, the punishment may strengthen the responses. We need to know more about this puzzling phenomenon. It is interesting to me that in Japan, Imada (1959) has been systematically exploring shock intensity as it affects this phenomenon.
Our quick survey of the effects of punishment on five classes of responses revealed a wide variety of discrepant phenomena. Thus, to predict in even the grossest way the action of punishment on a response, one has to know how that particular response was originally inserted in the subject’s response repertoire. Is the response an instrumental one which was strengthened by reward? Is it instead of consummatory response? Is it an innate sequential response pattern? Is it a discrete reflex? Was it originally established by means of punishment? Where, temporally, in a behavior sequence, was the punishment used? How intense was it? These are but a few of the relevant, critical questions, the answers to which are necessary in order for us to make reasonable predictions about the effects of punishment. Thus, to conclude, as some psychologists have, that the punishment procedure is typically either effective or ineffective, typically either a temporary suppressor or a permanent one, is to oversimplify irresponsibly a complex area of scientific knowledge, one still containing a myriad of intriguing problems for experimental attack.
Yet, the complexities involved in ascertaining the effects of punishment on behavior need not be a bar to useful speculation ultimately leading to experimentation of a fruitful sort. The complexities should, however, dictate a great deal of caution in making dogmatic statements about whether punishment is effective or ineffective as a behavioral influence, or whether it is good or bad. I do not wish to do that. I would like now to speculate about the data-oriented theories, rather than support or derogate the dogmas and the social philosophies dealing with punishment. I will get to the dogmas later.
Here is a theoretical approach that, for me, has high pragmatic value in stimulating new lines of experimentation. Many psychologists today consider the punishment procedure to be a special case of avoidance training, and the resultant learning processes to be theoretically identical in nature. Woodworth and Schlosberg (1954) distinguish the two training procedures, “punishment for action” from “punishment for inaction;’ but assume that the same theoretical motive, a “positive incentive value of safety” can explain the learning produced by both procedures. Dinsmoor (1955) argues that the facts related to both procedures are well explained by simple stimulus-response (S-R) principles of avoidance learning. He says:
If we punish the subject for making a given response or sequence of responses—that is, apply aversive stimulation, like shock—the cues or discriminative stimuli for this response will correspond to the warning signals that are typically used in more direct studies of avoidance training. By his own response to these stimuli, the subject himself produces the punishing stimulus and pairs or correlates it with these signals. As a result, they too become aversive. In the meantime, any variations in the subject’s behavior that interfere or conflict with the chain of reactions leading to the punishment delay the occurrence of the final response and the receipt of the stimulation that follows it. These variations in behavior disrupt the discriminative stimulus pattern for the continuation of the punished chain, changing the current stimulation from an aversive to a nonaversive compound; they are conditioned, differentiated, and maintained by the reinforcing effects of the change in stimulation [p. 96].
The foci of the Dinsmoor analysis are the processes whereby: (a) discriminative stimuli become aversive, and (b) instrumental acts are reinforced. He stays at the quasi-descriptive level. He uses a peripheralistic, S-R analysis, in which response-produced proprioceptive stimuli and exteroceptive stimuli serve to hold be avior chains together. He rejects, as unnecessary, concepts such as fear or anxiety, in explaining the effectiveness of punishment. Mowrer (1960) also argues that the facts related to the two training procedures are explained by a common set of principles, but Mowrer’s principles are somewhat different than those of either Woodworth and Schlosberg, or Dinsmoor, cited above. Mowrer says:
In both instances, there is fear conditioning; and in both instances a way of behaving is found which eliminates or controls the fear. The only important distinction, it seems is that the stimuli to which the fear gets connected are different. In so-called punishment, these stimuli are produced by (correlated with) the behavior, or response, which we wish to block; whereas, in so-called avoidance learning, the fear-arousing stimuli are not response-produced—they are, so to say, extrinsic rather than intrinsic, independent rather than response- dependent. But in both cases there is avoidance and in both cases there is its antithesis, punishment; hence the impropriety of referring to the one as “punishment” and to the other as “avoidance learning.” Obviously precision and clarity of understanding are better served by the alternative terms here suggested, namely, passive avoidance learning and active avoidance learning, respectively. . - - But, as we have seen, the two phenomena involve exactly the same basic principles of fear conditioning and of the reinforcement of whatever action (or inaction) eliminates the fear [pp. 31—32].
I like the simple beauty of each of the three unifying positions; what holds for punishment and its action on behavior should hold also for escape and avoidance training, and vice versa. Generalizations about one process should tell us something about the other. New experimental relationships discovered in the one experimental setting should tell us how to predict a new empirical event in the other experimental setting. A brief discussion of a few selected examples can illustrate this possibility.
APPLICATIONS OF THEORY
I use a case in point stemming from work done in our own laboratory. It gives us new hints about some hidden sources of effectiveness of punishment. Remember, for the sake of argument, that we are assuming many important similarities to exist between active and passive avoidance-learning processes. Therefore, we can look at active avoidance learning as a theoretical device to suggest to us new, unstudied variables pertaining to the effectiveness of punishment. Turner and I have recently published an extensive monograph (1962) on human traumatic avoidance learning. Our experiments showed that when a very reflexive, short-latency, skeletal response, such as a toe twitch, was used as an escape and avoidance response, grave difficulties in active avoidance learning were experienced by the subject. Experimental variations which tended to render the escape responses more emitted, more deliberate, more voluntary, more operant, or less reflexive, tended also to render the avoidance responses easier to learn. Thus, when a subject was required to move a knob in a slot in order to avoid shock, learning was rapid, in contrast to the many failures to learn with a toe-flexion avoidance response.
There are descriptions of this phenomenon already available in several published experiments on active avoidance learning, but their implications have not previously been noted. When Schlosberg (1934) used for the avoidance response a highly reflexive, short-latency, paw-flexion response in the rat, he found active avoidance learning to be unreliable, unstable, and quick to extinguish. Whenever the rats made active avoidance flexions, a decrement in response strength ensued. When the rats were shocked on several escape trials, the avoidance response tended to reappear for a few trials. Thus, learning to avoid was a tortuous, cyclical process, never exceeding 30% success. Contrast these results with the active avoidance training of nonreflexive, long-latency operants, such as rats running in Hunter’s (1935) circular maze. Hunter found that the occurrence of avoidance responses tended to produce more avoidance responses. Omission of shock seemed to reinforce the avoidance running response. Omission of shock seemed to extinguish the avoidance paw flexion. Clearly the operant-respondent distinction has predictive value in active avoidance learning.
The same trend can be detected in experiments using dogs as subjects. For example, Brogden (1949), using the forepaw- flexion response, found that meeting a 20/20 criterion of avoidance learning was quite difficult. He found that 30 dogs took from approximately 200—600 trials to reach the avoidance criterion. The response used was, in our language, highly reflexive—it was totally elicited by the shock on escape trials with a very short latency, approximately .3 second. Compare, if you will, the learning of active avoidance by dogs in the shuttle box with that found in the forelimb-flexion experiment. In the shuttle box, a large number of dogs were able to embark on their criterion trials after 5—15 active avoidance-training trials. Early escape response latencies were long. Resistance to extinction is, across these two types of avoidance responses, inversely related to trials needed for a subject to achieve criterion. Conditions leading to quick acquisition are, in this case, those conducive to slow extinction. Our conclusion, then, is that high-probability, short-latency, respondents are not as good as medium-probability, long-latency operants when they are required experimentally to function as active avoidance responses. This generalization seems to hold for rats, dogs, and college students.
How can we make the inferential leap from such findings in active avoidance training to possible variations in punishment experiments? It is relatively simple to generalize across the two kinds of experiments in the case of CS-US interval, US intensity, and CS duration. But the inferential steps are not as obvious in the case of the operant-respondent distinction. So I will trace out the logic in some detail. If one of the major effects of punishment is to motivate or elicit new behaviors, and reinforce them through removal of punishment, and thus, as Dinsmoor describes, establish avoidance responses incompatible with a punished response, how does the operant-respondent distinction logically enter? Here, Mowrer’s two-process avoidance-learning theory can suggest a possible answer. Suppose, for example, that a hungry rat has been trained to lever press for food and is performing at a stable rate. Now we make a short-duration, high-intensity pulse of shock contingent upon the bar press. The pulse elicits a startle pattern that produces a release of the lever in .2 second, and the shock is gone. The rat freezes for a few seconds, breathing heavily, and he urinates and defecates. It is our supposition that a conditioned emotional reaction (CER) is thereby established, with its major stimulus control coming from the sight of the bar, the touch of the bar, and proprioceptive stimuli aroused by the lever-press movements themselves. This is, as Dinsmoor describes it, the development of acquired aversiveness of stimuli; or, as Mowrer describes it, the acquisition of conditioned fear reactions. Therefore, Pavlovian conditioning variables should be the important ones in the development of this process. The reappearance of lever pressing in this punished rat would thus depend on the extinction of the CER and skeletal freezing. If no further shocks are administered, then the CER should extinguish according to the laws of Pavlovian extinction, and reappearance of the lever press should not take long, even if the shock-intensity level were high enough to have been able to produce active avoidance learning in another apparatus.
Two-process avoidance theory tells us that something very important for successful and durable response suppression was missing in the punishment procedure we just described. What was lacking in this punishment procedure was a good operant to allow us to reinforce a reliable avoidance response. Because the reaction to shock was a respondent, was highly reflexive, and was quick to occur, I am led to argue that the termination of shock will not reinforce it, nor will it lead to stable avoidance responses. This conclusion follows directly from our experiments çn human avoidance learning. If the termination of shock is made contingent on the occurrence of an operant, especially an operant topographically incompatible with the lever press, an active avoidance learning process should then ensue. So I will now propose that we shock the rat until he huddles in a corner of the box. The rat will have learned to do something arbitrary whenever the controlling CSs reappear. Thus, the rat in the latter procedure, if he is to press the lever again, must undergo two extinction processes. The CER, established by the pairing of CS patterns and shock, must become weaker. Second, the learned huddling response must extinguish. This combination of requirements should make the effect of punishment more lasting, if my inferences are correct. Two problems must be solved by the subject, not one. The experiments needed to test these speculations are, it would appear, easy to design, and there is no reason why one should not be able to gather the requisite information in the near future. I feel that there is much to be gained in carrying On theoretical games like this, with the major assumptions being (a) that active and passive avoidance learning are similar processes, ones in which the same variables have analogous effects, and, (b) that two processes, the conditioning of fear reactions, and the reinforcement of operants incompatible with the punished response, may operate in punishment experiments.
There is another gain in playing theoretical games of this sort. One can use them to question the usual significance imputed to past findings. Take, for example, the extensive studies of Neal Miller (1959) and his students, and Brown (1948) and his students, on gradients of approach and avoidance in conflict situations. Our foregoing analysis of the role of the operant-respondent distinction puts to question one of their central assumptions — that the avoidance gradient is unconditionally steeper than is the approach gradient in approach-avoidance conflicts. In such experiments, the subject is typically trained while hungry to run down a short alley to obtain food. After the running is reliable, the subject is shocked, usually near the goal, in such a way that entering the goal box is discouraged temporarily. The subsequent behavior of the typical subject consists of remaining in the start box, making abortive approaches to the food box, showing hesitancy, oscillation, and various displacement activities, like grooming. Eventually, if shock is eliminated by the experimenter, the subject resumes running to food, The avoidance tendency is therefore thought to have extinguished sufficiently SO that the magnitude of the conceptualized approach gradient exceeds that of the avoidance gradient at the goal box. The steepness of the avoidance gradient as a function of distance from the goal box is inferred from the behavior of the subject prior to the extinction of the avoidance tendencies. If the subject stays as far away from the goal box as possible, the avoidance gradient may be inferred to be either displaced upward, or if the subject slowly creeps up on the goal box from trial to trial, it may be inferred to be less steep than the approach gradient. Which alternative is more plausible? Miller and his collaborators very cleverly have shown that the latter alternative is a better interpretation.
The differential-steepness assumption appears to be substantiated by several studies by Miller and his collaborators (Miller & Murray, 1952; Murray and Berkun, 1955). They studied the displacement of conflicted approach responses along both spatial and color dimensions, and clearly showed that the approach re spouses generalized more readily than did the avoidance responses. Rats whose running in an alley had been completely suppressed by shock punishment showed recovery of running in a similar alley. Thus the inference made was that the avoidance gradient is steeper than is the approach gradient; avoidance tendencies weaken more rapidly with changes in the external environmental setting than do approach tendencies. On the basis of the analysis I made of the action of punishment, both as a US for the establishment of a Pavlovian CER and as a potent event for the reinforcement of instrumental escape and avoidance responses, it seems to me very likely that the approach-avoidance conflict experiments have been carried out in such a way as to produce inevitably the steeper avoidance gradients. In other words, these experiments from my particular viewpoint have been inadvertently biased, and they were not appropriate for testing hypotheses about the gradient slopes.
My argument is as follows: Typically, the subject in an approach-avoidance experiment is trained to perform a specific sequence of responses under reward incentive and appetitive drive conditions. He runs to food when hungry. In contrast, when the shock is introduced into the runway, it is usually placed near the goal, and no specific, long sequence of instrumental responses is required of the subject before the shock is terminated. Thus, the initial strengths of the approach and avoidance instrumental responses (which are in conflict) are not equated by analogous or symmetrical procedures. Miller has thoroughly and carefully discussed this, and has suggested that the avoidance gradient would not have as steep a slope if the shock were encountered by the rat early in the runway in the case where the whole runway is electrified. While this comment is probably correct, it does not go far enough, and I would like to elaborate on it. I would argue that if one wants to study the relative steepnesses of approach and avoidance responses in an unbiased way, the competing instrumental responses should be established in a symmetrical fashion. After learning to run down an alley to food, the subject should be shocked near the goal box or in it, and the shock should not be terminated until the subject has escaped all the way into the Start box. Then one can argue that two conflicting instrumental responses have been established. First, the subject runs one way for food; now he runs the same distance in the opposite direction in order to escape shock. When he stays in the start box, he avoids shock entirely. Then the generalization or displacement of the approach and avoidance responses can be fairly studied.
I am arguing that we need instrumental-response balancing, as well as Pavlovian-conditioning balancing, in such conflict experiments, if the slopes of gradients are to be determined for a test of the differential-steepness assumption. Two-process avoidance-learning theory requires such a symmetrical test. In previous experiments, an aversive CER and its respondent motor pattern, not a well-reinforced avoidance response, has been pitted against a well-reinforced instrumental-approach response. Since the instrumental behavior of the subject is being used subsequently to test for the slope of the gradients, the usual asymmetrical procedure is, I think, not appropriate. My guess is that, if the symmetrical procedure I described is actually used, the slopes of the two gradients will be essentially the same, and the recovery of the subject from the effects of punishment will be seen to be nearly all-or-none. That is, the avoidance gradient, as extinction of the CER proceeds in time, will drop below the approach gradient, and this will hold all along the runway if the slopes of the two gradients are indeed the same. Using the test of displacement, subjects should stay in the starting area of a similar alley on initial tests and when they finally move forward they should go all the way to the goal box.
The outcomes of such experiments would be a matter of great interest to me, for, as you will read in a moment, I feel that the suppressive power of punishment over instrumental acts has been understated. The approach-avoidance conflict experiment is but one example among many wherein the outcome may have been inadvertently biased in the direction of showing reward- training influences to be superior, in some particular way, to punishment-training procedures. Now let us look more closely at this matter of bias.
Skinner, in 1938, described the effect of a short-duration slap on the paw on the extinction of lever pressing in the rat. Temporary suppression of lever-pressing rate was obtained. When the rate increased, it exceeded the usual extinction performance. The total number of responses before extinction occurred was not affected by the punishment for lever pressing. Estes (1944) obtained similar results, and attributed the temporary suppression to the establishment of a CER (anxiety) which dissipated rapidly. Tolman, Hall, and Bretnall (1932) had shown earlier that punishment could enhance maze learning by serving as a cue for correct, rewarded behavior. Skinner made these observations (on the seemingly ineffective nature of punishment as a response weakener) the basis for his advocacy of a positive reinforcement regime in his utopia, Walden Two.
In Walden Two, Skinner (1948), speaking through the words of Frazier, wrote: “We are now discovering at an untold cost in human suffering—that in the long run punishment doesn’t reduce the probability that an act will occur [p. 260].” No punishments would be used there, because they would produce poor behavioral control, he claimed.
During the decade following the publication of Walden Two, Skinner (1953) maintained his position concerning the effects of punishment on instrumental responses: Response suppression is but temporary, and the side effects, such as fear and neurotic and psychotic disturbances, are not worth the temporary advantages of the use of punishment. He said:
In the long run, punishment, unlike reinforcement works to the disadvantage of both the punished organism and the punishing agency [p. 183].The fact that punishment does not permanently reduce a tendency to respond is in agreement with Freud’s discovery of the surviving activity of what he called repressed wishes [p. 184].Punishment, as we have seen, does not create a negative probability that a response will be made but rather a positive probability that incompatible behavior will occur [p. 222].
It must be said in Skinner’s defense, that in 1953 he devoted about 12 pages to the topic of punishment in his introductory textbook. Other texts had devoted but a few words to this topic.
In Bugeiski’s (1956) words about the early work on punishment: “The purport of the experiments mentioned above appears to be to demonstrate that punishment is ineffective in eliminating behavior. This conclusion appears to win favor with various sentimentalists [p. 275].” Skinner (1961) summarized his position most recently in this way:
Ultimate advantages seem to be particularly easy to overlook in the control of behavior, where a quick though slight advantage may have undue weight. Thus, although we boast that the birch rod has been abandoned, most school children are still under aversive control—not because punishment is more effective in the long run, but because it yields immediate results. It is easier for the teacher to control the student by threatening punishment than by using positive reinforcement with its deferred, though more powerful, effects [p. 36.08, italics mine].
Skinner’s conclusions were drawn over a span of time when, just as is the case now, there was no conclusive evidence about the supposedly more powerful and long-lasting effects of positive reinforcement. I admire the humanitarian and kindly dispositions contained in such writings. But the scientific basis for the conclusions therein was shabby, because, even in 1938, there were conflicting data which demonstrated the great effectiveness of punishment in controlling instrumental behavior. For example, the widely cited experiments of Warden and Aylesworth (1927) showed that discrimination learning in the rat was more rapid and more stable when incorrect responses were punished with shock than when reward alone for the correct response was used. Later on, avoidance-training experiments in the 1940s and 1950s added impressive data on the long-lasting behavioral control exerted by noxious stimuli (Solomon & Brush, 1956). In spite of this empirical development, many writers of books in the field of learning now devote but a few lines to the problem of punishment, perhaps a reflection of the undesirability of trying to bring satisfying order out of seeming chaos. In this category are the recent books of Spence, Hull, and Kimble. An exception is Bugelski (1956) who devotes several pages to the complexities of this topic. Most contemporary introductory psychology texts devote but a paragraph or two to punishment as a scientific problem. Conspicuously, George Miller’s new book, Psychology, the Science of Mental Life, has no discussion of punishment in it.
The most exhaustive textbook treatment today is that of Deese (1958), and it is a thoughtful and objective evaluation, a singular event in this area of our science. The most exhaustive journal article is that by Church (1963), who has thoroughly summarized our knowledge of punishment. I am indebted to Church for letting me borrow freely from his fine essay in prepublication form. Without this assistance, the organization of this paper would have been much more difficult, indeed.
Perhaps one reason for the usual textbook relegation of the topic of punishment to the fringe of experimental psychology is the wide-spread belief that punishment is unimportant because it does not really weaken habits; that it pragmatically is a poor controller of behavior; that it is extremely cruel and unnecessary; and that it is a technique leading to neurosis and worse, This legend, and it is a legend without sufficient empirical basis, probably arose with Thorndike (1931). Punishment, in the time of Thorndike, used to be called punishment, not passive avoidance training. The term referred to the use of noxious stimuli for the avowed purpose of discouraging some selected kind of behavior. Thorndike (1931) came to the conclusion that punishment did not really accomplish its major purpose, the destruction or extinction of habits. In his book, Human Learning, he said:
Annoyers do not act on learning in general by weakening whatever connection they follow. If they do anything in learning, they do it indirectly, by informing the learner that such and such a response in such and such a situation brings distress, or by making the learner feel fear of a certain object, or by making him jump back from a certain place, or by some other definite and specific change which they produce in him [p. 46].
This argument is similar to that of Guthrie (1935), and of Wendt (1936), in explaining the extinction of instrumental acts and conditioned reflexes. They maintained that extinction was not the weakening of a habit, but the replacement of a habit by a new one, even though the new one might only be sitting still and doing very little.
When Thorndike claimed that the effects of punishment were indirect, he was emphasizing the power of punishment to evoke behavior other than that which produced the punishment; in much the same manner, Guthrie emphasized the extinction procedure as one arousing competing responses. The competing response theory of extinction today cannot yet be empirically chosen over other theories such as Pavlovian and Hullian inhibition theory, or the frustration theories of Amsel or Spence. The Thorndikian position on punishment is limited in the same way. It is difficult to designate the empirical criteria which would enable us to know, on those occasions when punishment for a response results in a weakening of performance of that response, whether a habit was indeed weakened or not. How can one tell whether competing responses have displaced the punished response, or whether the punished habit is itself weakened by punishment? Thorndike could not tell, and neither could Guthrie. Yet a legend was perpetuated. Perhaps the acceptance of the legend had something to do with the lack of concerted research on punishment from 1930—1955. For example, psychologists were not then particularly adventuresome in their search for experimentally effective punishments.
Or, in addition to the legend, perhaps a bit of softheartedness is partly responsible for limiting our inventiveness. (The Inquisitors, the Barbarians, and the Puritans could have given us some good hints! They did not have electric shock, but they had a variety of interesting ideas, which, regrettably, they often put to practice.) We clearly need to study new kinds of punishments in the laboratory. For most psychologists, a punishment in the laboratory means electric shock. A few enterprising experimenters have used air blasts, the presentation of an innate fear releaser, or a signal for the coming omission of reinforcement, as punishments. But we still do not know enough about using these stimuli in a controlled fashion to produce either behavior suppression, or a CER effect, or the facilitation of extinction. Many aversive states have gone unstudied. For example, conditioned nausea and vomiting is easy to produce, but it has not been used in the role of punishment. Even the brain stimulators, though they have since 1954 tickled brain areas that will instigate active escape learning, have not used this knowledge to study systematically the punishing effects of such stimulation on existing responses.
While the more humanitarian ones of us were bent on the discovery of new positive reinforcers, there was no such concerted effort on the part of the more brutal ones of us. Thus, for reasons that now completely escape me, some of us in the past were thrilled by the discovery that, under some limited conditions, either a light onset or a light termination could raise lever- pressing rate significantly, though trivially, above operant level. If one is looking for agents to help in the task of getting strong predictive power, and strong control of behavior, such discoveries seem not too exciting. Yet, in contrast, discoveries already have been made of the powerful aversive control of behavior. Clearly, we have been afraid of their implications. Humanitarian guilt and normal kindness are undoubtedly involved, as they should be. But I believe that one reason for our fear has been the widespread implication of the neurotic syndrome as a necessary outcome of all severe punishment procedures. A second reason has been the general acceptance of the behavioral phenomena of rigidity, inflexibility, or narrowed cognitive map, as necessary outcomes of experiments in which noxious stimuli have been used. I shall question both of these conclusions.
If one should feel that the Skinnerian generalizations about the inadequate effects of punishment on instrumental responses are tinged with a laudable, though thoroughly incorrect and unscientific, sentimentalism and softness, then, in contrast, one can find more than a lurid tinge in discussions of the effects of punishment on the emotional balance of the individual. When punishments are asserted to be ineffective controllers of instrumental behavior, they are, in contrast, often asserted to be devastating controllers of emotional reactions, leading to neurotic and psychotic symptoms, and to general pessimism, depressiveness, constriction of thinking, horrible psychosomatic diseases, and even death! This is somewhat of a paradox, I think. The convincing part of such generalizations is only their face validity. There are experiments, many of them carefully done, in which these neurotic outcomes were clearly observed. Gantt’s (1944) work on neurotic dogs, Masserman’s (1943) work on neurotic cats and monkeys, Brady’s (1958) recent work on ulcerous monkeys, Maier’s (1949) work on fixated rats, show some of the devastating consequences of the utilization of punishment to control behavior. The side effects are frightening, indeed, and should not be ignored! But there must be some rules, some principles, governing the appearance of such side effects, for they do not appear in all experiments involving the use of strong punishment or the elicitation of terror. In Yates’ (1962) new book, Frustration and Conflict, we find a thorough discussion of punishment as a creator of conflict. Major attention is paid to the instrumental-response outcomes of conflict due to punishment. Phenomena such as rigidity, fixation, regression, aggression, displacement, and primitivization are discussed. Yates accepts the definition of neurosis developed by Maier and by Mowrer: self-defeating behavior oriented toward no goal, yet compulsive in quality. The behavioral phenomena that reveal neuroses are said to be fixations, regressions, aggressions, or resignations. But we are not told the necessary or sufficient experimental conditions under which these dramatic phenomena emerge.
Anyone who has tried to train a rat in a T maze, using food reward for a correct response, and shock to the feet for an incorrect response, knows that there is a period of emotionality during early training, but that, thereafter, the rat, when the percentage of correct responses is high, looks like a hungry, well-motivated, happy rat, eager to get from his cage to the experimenter’s hand, and thence to the start box. Evidently, merely going through conflict is not a condition for neurosis. The rat is reliable, unswerving in his choices. Is he neurotic? Should this be called subservient resignation? Or a happy adjustment to an inevitable event? Is the behavior constricted? Is it a fixation, an evidence of behavioral rigidity? The criteria for answering such questions are vague today. Even if we should suggest some specific tests for rigidity, they lack face validity. For example, we might examine discrimination reversal as a test of rigidity. Do subjects who have received reward for the correct response, and punishment for the incorrect response, find it harder to reverse when the contingencies are reversed, as compared with subjects trained with reward alone? Or, we might try a transfer test, introducing our subject to a new maze, or to a new jumping stand. Would the previously punished subject generalize more readily than one not so punished? And if he did, would he then be less discriminating and thus neurotic? Or, would the previously punished subject generalize poorly and hesitantly, thus being too discriminating, and thus neurotic, too? What are the criteria for behavioral malfunction as a consequence of the use of punishment? When instrumental responses are used as the indicator, we are, alas, left in doubt!
The most convincing demonstrations of neurotic disturbances stemming from the use of punishment are seen in Masserman’s (Masserman & Pechtel, 1953) work with monkeys. But here the criterion for neurosis is not based on instrumental responding. Instead, it is based on emotionality expressed in consummatory acts and innate impulses. Masserman’s monkeys were frightened by a toy snake while they were eating. Feeding inhibition, shifts in food preferences, odd sexual behavior, tics, long periods of crying, were observed. Here, the criteria have a face validity that is hard to reject. Clearly, punishment was a dangerous and disruptive behavioral influence in Masserman’s experiments. Such findings are consonant with the Freudian position postulating the pervasive influences of traumatic experiences, permeating all phases of the affective existence of the individual, and persisting for long time periods.
To harmonize all of the considerations I have raised concerning the conditions leading to neurosis due to punishment is a formidable task. My guess at the moment is that neurotic disturbances arise often in those cases where consummatory behavior or instinctive behavior is punished, and punished under nondiscriminatory control. But this is merely a guess, and in order for it to be adequately tested, Masserman’s interesting procedures would have to be repeated, using discriminative stimuli to signalize when it is safe and not safe for the monkey. Such experiments should be carried out if we are to explore adequately the possible effects of punishment on emotionality. Another possibility is that the number of rewarded behavior alternatives in an otherwise punishing situation will determine the emotional aftereffects of punishments. We have seen that Whiting and Mowrer (1943) gave their rats a rewarding alternative, and the resulting behavior was highly reliable. Their rats remained easy to handle and eager to enter the experimental situation. One guess is that increasing the number of behavioral alternatives leading to a consummatory response will, in a situation where only one behavior alternative is being punished, re suit in reliable behavior and the absence of neurotic emotional manifestations. However, I suspect that matters cannot be that simple. If our animal subject is punished for Response A, and the punishment quickly elicits Response B, and then Response B is ‘quickly rewarded, we have the stimulus contingencies for the establishment of a masochistic habit. Reward follows punishment quickly. Perhaps the subject would then persist in performing the punished Response A? Such questions need to be worked out empirically, and the important parameters must be identified. We are certainly in no position today to specify the necessary or sufficient conditions for experimental neurosis.
I have, in this talk, decried the stultifying effects of legends concerning punishment. To some extent, my tone was reflective of bias, and so I overstated some conclusions. Perhaps now it would be prudent to soften my claims.4 I must admit that all is not lost! Recently, I have noted a definite increase in good parametric studies of the effects of punishment on several kinds of behavior. For example, the pages of the Journal of the Experimental Analysis of Behavior have, in the last 5 years, become liberally sprinkled with reports of punishment experiments. This is a heartening development, and though it comes 20 years delayed, it is welcome.
I have covered a great deal of ground here, perhaps too much for the creation of a clear picture. The major points I have made are as follows: First, the effectiveness of punishment as a controller of instrumental behavior varies with a wide variety of known parameters. Some of these are: (a) intensity of the punishment stimulus, (b) whether the response being punished is an instrumental one or a consummatory one, (c) whether the response is instinctive or reflexive, (d) whether it was established originally by reward or by punishment, (e) whether or not the punishment is closely associated in time with the punished response, (f) the temporal arrangements of reward and punishment, (g) the strength of the response to be punished, (h) the familiarity of the subject with the punishment being used, (i) whether or not a reward alternative is offered during the behavior-suppression period induced by punishment, (j) whether a distinctive, incompatible avoidance response is strengthened by omission of punishment, (k) the age of the subject, and (1) the strain and species of the subject.
Second, I have tried to show the theoretical virtues of considering active and passive avoidance learning to be similar processes, and have shown the utility of a two-process learning theory. I have described some examples of the application of findings in active avoidance-learning experiments to the creation of new punishment experiments and to the reanalysis of approach-avoidance conflict experiments.
Third, I have questioned persisting legends concerning both the ineffectiveness of punishment as an agent for behavioral change as well as the inevitability of the neurotic outcome as a legacy of all punishment procedures.
Finally, I have indicated where new experimentation might be especially interesting or useful in furthering our understanding of the effects of punishment.
If there is one idea I would have you retain, it is this: Our laboratory knowledge of the effects of punishment on instrumental and emotional behavior is still rudimentary—much too rudimentary to make an intelligent choice among conflicting ideas about it. The polarized doctrines are probably inadequate and in error. The popularized Skinnerian position concerning the inadequacy of punishment in suppressing instrumental behavior is, if correct at all, only conditionally correct. The Freudian position, pointing to pain or trauma as an agent for the pervasive and long-lasting distortion of affective behavior is equally questionable, and only conditionally correct.
Happily, there is now growing attention being paid to the effects of punishment on behavior, and this new development will undoubtedly accelerate, because the complexity of our current knowledge, and the perplexity it engenders, are, I think, exciting and challenging.
|This is a slightly revised text of the author’s Presidential Address to the Eastern Psychological Association, New York City, April 1963. The research associated with this address was supported by Grant No. M-4202 from the United States Public Health Service.|
|Since the delivery of this address, several articles have appeared concerning the punishment intensity problem. See especially Karsh (1963), Appel (1963), and Walters and Rogers (1963) . All these studies support the conclusion that shock intensity is a crucial variable, and high intensities produce lasting suppression effects.|
|Since the delivery of this address, an article has appeared on this specific problem. See Adler and Hogan (1963). The authors showed that the gill-extension response of Betta splendens could be conditioned to a previously neutral stimulus by a Pavlovian technique, and it could also be suppressed by electric-shock punishment. This is an important finding, because there are very few known cases where the same response can be both conditioned and trained. Here, the gill-extension response is typically elicited by a rival fish, and is usually interpreted to be aggressive or hostile in nature.|
|Presidential addresses sometimes produce statements that may be plausible at the moment, but on second thought may seem inappropriate. In contrast to my complaints about inadequate research on punishment and the nature of active and passive avoidance learning are Hebb’s (1960) recent remarks in his APA Presidential Address. He said: “The choice is whether to prosecute the attack, or to go on with the endless and trivial elaboration of the same set of basic experiments (on pain avoidance for example); trivial because they have added nothing to knowledge for some time, though the early work was of great value [p. 740].”|
Secondary Reinforcement in Rats
as a Function of Information Value and Reliability of the Stimulus1
DAVID EGGER and NEAL E. MILLER, Yale University
Although secondary reinforcement has been of major importance to behavior theory, especially in explanations of complex learning phenomena (e.g., Hull, 1943; Miller, 1951; Skinner, 1938), little is known about the conditions for its occurrence in any but the simplest situations. The first hypothesis explored in the experiments reported here is that in a situation in which there is more than one stimulus predicting primary reinforcement, e.g., food, the more informative stimulus will be the more effective secondary reinforcer. Further it is asserted that a necessary condition for establishing any stimulus as a secondary reinforcer is that the stimulus provide information about the occurrence of primary reinforcement; a redundant predictor of primary reinforcement should not acquire secondary reinforcement strength.
A possible situation
in which to test this hypothesis is the following: a short stimulus always precedes
the delivery of food. But it is made essentially redundant by being overlapped
by a longer
There is a way, however, to make S2 informative. If S1 occurs a number of times without S2, unaccompanied by the food pellet, and randomly interspersed with occurrences of the stimulus sequence shown at the bottom of Fig. 1, then S2, when it occurs, is no longer redundant; for now S2 is the only reliable predicator of food. Thus, it is predicted that for a group of rats who receive the stimulus sequence depicted in Fig. 1 interspersed with occurrences of S1 alone, S2 will be a considerably more effective secondary reinforcer than for the group of rats who receive only the stimulus sequence depicted in Fig. 1.
It should be noted that both groups will receive exactly the same number of pairings of S2 with food and in exactly the same immediate stimulus context, so that if a difference were found between the groups in the secondary reinforcing value of S2, it could not be due to simple patterning, stimulus-generalization decrement, or differences in association with food.
Our predicted results would be compatible with a strict interpretation of the drive-reduction hypothesis of reinforcement (Miller, 1959). Such a theoretical analysis is represented schematically in the upper portion of Fig. 1. According to the drive- reduction hypothesis, a stimulus acquires secondary reinforcing value by acquiring the ability to elicit a drive-reducing response. The left side of Fig. 1 illustrates that if most of the learnable drive already has been reduced by S1, little drive-reduction remains to be conditioned to S2. On the other hand, if S1 sometimes fails to predict food, some of the conditioned drive-reduction to it should extinguish. Hence, as is depicted on the right side of Fig. 1, more of the drive-reduction should occur to, and be conditioned to, S2.
From Fig. 1, one can also see that the drive-reduction analysis also demands that the secondary reinforcing value of 51 should be greater when it is a reliable predictor (making S2 redundant) than when it is an unreliable predictor (making S2 informative). Thus we are led to our second hypothesis, namely, that in a situation in which a predictor of primary reinforcement exists which is both reliable and informative, this predictor should become a more effective secondary reinforcer than an unreliable predictor. Note that here we predict the opposite of a partial-reinforcement effect, which would be expected to increase the resistance to extinction of the unreliable predictor, that is, the stimulus which had been paired with food only part of the time. In any prolonged test for secondary reinforcement, this increased resistance to extinction should show up as a greater total secondary- reinforcing effect.
Subjects. The Ss were 88 male rats of the Sprague-Dawley strain who were approximately 90 days old at the beginning of their experimental training. Owing to deaths and equipment failures, the data from 4 Ss were lost, and the data from another 4, selected at random, were discarded in order to have equal sized groups for an analysis of variance. The Ss, fed once daily following the experimental session, were maintained at approximately 80 percent of their ad lib. weight.
Apparatus. The apparatus consisted of two identical Skinner boxes, 19 in. long, 8 in. wide, and 63/4 in. high (inside dimensions) - The floors of the boxes consisted of six ½ in. diameter rods running parallel to the side containing the Plexiglas door. Each box was enclosed in a large, light-proof, sound-deadened crate into which a stream of air was piped for ventilation and masking noise. Inside each of the Skinner boxes were two lights, one located 2 in. above the food cup, another located in the mid- die of the long back wall, opposite the Plexiglas door. The food cup was in the center of the front, 8-in, wall; the bar, a bent steel strip 1½ in. wide, protruded ½ in. into the inner chamber of the box. The entire bar assembly was removable and, when withdrawn, its opening was sealed with a metal panel. The bar was located to the right of and slightly above the food cup. A downward force of at least 12 gm. on the bar activated a microswitch normally connected in the circuit of a Gerbrands feeder which delivered a standard .045-gm. Noyes pellet into the food cup. A loudspeaker was located 3 in. behind and slightly to the left of the front wall of the Skinner box. Both flashing lights (12 per sec.) and tones (750 cps) were used as stimuli.
Procedure. All training sessions lasted 25 min. per day. During the first three sessions, Ss were magazine-trained in the absence of the bar. Then the bar was inserted, and, for two sessions, each bar press was followed by a pellet of food. A few rats who did not spontaneously learn to press were given an extra remedial session during which bar pressing was “shaped.” Over the next four sessions the required ratio of responses to reinforcements was gradually increased to 4:1.
Then, for the subsequent five sessions, the bar was removed, and Ss were randomly assigned to Group A (for whom S2 was reliable but redund#nt) and Group B (for whom S2 was reliable and informative). Group A received the following sequence of events during each of its five “stimulus-training” sessions: once every 56 sec. on the average, a pellet of food was delivered into the food cup. The pellet was inevitably preceded by 2 sec. of S1 and 1½ sec. of S2. Both stimuli overlapped the delivery of the food pellet by 1/4 sec., and both terminated together.
Group B also received this stimulus sequence immediately preceding the delivery of the food pellet. But in addition, Group B Ss received aperiodically, interspersed with the stimulus-food sequence, 2 sec. of S1 alone. The events for Group B occurred on the average of once every 30 sec.
For half the Ss in each group, S2 was a flashing light and S2 was a tone, and for the other half, the conditions were reversed:
S1 was a tone and S2 was a flashing light.
During 5 days of such training, each group received 135 pairings of S1 and S2 with food, and Group B received in addition about 110 occurrences of S1 alone. Thus for both groups S2 was followed 100 percent of the time by food, while S1 was followed by food 100 percent of the time for Group A, but only 55 percent of the time for Group B.
The above description of training applies to all but 16 Ss, 8 Group B and 8 Group A. For these Ss, training was exactly as described above except that the stimulus-food pairings occurred for both groups on the average of one every 75 sec. instead of 56 sec., and Group B received a stimulus event on the average of once every 15 sec. instead of 30 sec., so that S1 was followed by food only 20 percent of the time for Group B. These 16 Ss were given seven 25-min. “stimulus-training” sessions. The data from these Ss were analyzed separately and not included in the overall analysis of variance.
Testing. On the day following the final stimulus-training session, Ss were tested as follows: the bar was reinserted and Test Session 1 began with each S pressing for food pellets on a fixed ratio of 3:1. The retraining presses continued until S had received 30 pellets. At this point the bar was disconnected and 10 min. of extinction ensued.
At the end of the 10 min., the bar was reconnected, not to the feeder, but to a timer which delivered on the same 3:1 schedule 1 sec. of whatever stimulus was being tested for secondary reinforcing strength. The test session continued until 25 min. had elapsed since the beginning of the extinction period, or until 10 min. after the first occurrence of a stimulus, whichever was longer.
In the foregoing procedure, relearning following experimental extinction was used as the measure of secondary reinforcing strength on the assumption that it would be more rapid and less variable than would de novo learning of the skill of pressing the bar. A preliminary study had validated this technique showing that in such a test more bar presses would occur when followed by a stimulus previously associated with food than when the stimulus had not been associated with food.
After an interval of 1 day, Test Session 2 was conducted, identical to the first, except that this time the stimulus delivered following the 10-min. extinction period was the opposite from that tested in Test Session 1: for half of the Ss, S3 was tested in Test Session 1 and S1 was tested in Session 2; for the other half of the Ss, trained and tested subsequent to the first half, the stimuli were tested in the opposite order.
For Ss tested first with S3 and then with S1, Test Session 3 followed another intervening day, this time with Ss pressing for S2 again. Throughout the course of the 10-min. extinction and ensuing “pressing for stimuli” period, the cumulative total number of bar presses for each S was recorded each minute.
Response measures. The total number of bar presses in a 10-min. period following the first occurrence of the stimulus was the measure of secondary reinforcing strength. Since there were significant between-S and within-S correlations (Tb = .53; Tw = .34) of this measure with the total number of bar presses in the 10-min. extinction periods, this total number of bar presses in extinction was used as a control variable in analyses of covariance. (It should be noted that most of the bar presses during extinction occurred within the first 2—4 min. of the 10-min. extinction period.)
Furthermore, since it was found that in no case would analyses based only on data from Test Session 1 have led to any substantially different conclusions from those reported below, the means and results of analyses reported (unless otherwise noted) are based on combined data from Test Sessions 1 and 2.
Since by Test Session 3, there no longer appeared to be any differences between the experimental groups, the data from this session were not included in the final analyses.
Overall analysis. Neither of the hypotheses being tested depended upon the significance of the main effects of the overall analysis, but instead upon comparisons between the means shown in specific subcells of Table 1. The marginal entries in Table 1 give the overall means for Groups A and B (rows), and for S1 and S2 (columns). The overall mean for each group is based on data from 32 Ss each tested with S1 and with S2 the overall mean for each stimulus position is based on data from all 64 Ss. As seen from an inspection of Tables 1 and 2, Group A responded significantly more than Group B and the position of S1 was reliably more effective than that of S2.
It should be noted that although the groups were identically treated in all other respects, the 32 Ss tested with S1 first and S2 second were run subsequent to the 32 Ss tested with S2 first and with S1 second. No significant differences between these groups existed in the control variable, total presses in 10 min. of extinction. Nor did an analysis of covariance reveal any significant effects of order of testing (0), or of the interaction of order of testing with experimental group (G), or with stimulus position (P) (see Table 2).
Across all groups, the Ss responded more for the flashing lights than for the tones (F = 8.45; df = 1/55; P < .01, analysis of covariance).
Examination of the minute-by-minute response totals during the “pressing for stimuli” period revealed that the differences between groups tested at 10 min. had generally begun to appear after 3—5 min., and continued to increase out to 15 min., which was the longest period any S was permitted to bar press for stimuli during a given test session.
As expected from our hypotheses, the (P x G) interaction was highly significant (F = 17.71; df = 1/55; P < .001, analysis of covariance) - Hence, we were justified in making within experimental group and stimulus position comparisons.
S2: Group B vs. Group A. On the basis of our first hypothesis, we expected that Group B Ss, for whom S2 was informative, should press more for S3 than Group A Ss, for whom S2 was redundant. The difference between the group means on the secondary reinforcing measure was in the predicted direction and significant beyond the .05 level (F = 4.03; df = 1/56). (The means are given in Table 1.) However, the effect was not statistically reliable in an analysis of covariance.
As mentioned above, 16 Ss, 8 each in Groups A and B, were trained with the number of occurrences of S1 alone for Group B increased so that 80 percent of the stimulus events for Group B were unaccompanied occurrences of S1. For these Ss, tested with S2 in Test Session 1, the means on the secondary reinforcing measure were in the predicted direction, 97.5 vs. 88.0, but the difference was short of statistical significance. However, when these data were analyzed in an analysis of covariance and combined by means of a critical ratio test with the data discussed above, the predicted effect was significant beyond the .05 level. (CR = 1.97 if the data from these 16 Ss are combined with those from the 64 Ss tested with S3 in Test Session 1 or Test Session 2; CR = 2.02 if the data are combined with those from the 32 Ss tested with S2 in Test Session 1 only.)
S1: Group A vs. Group B. Our second hypothesis predicted that S1 would be a more effective secondary reinforcer for Group A, for whom it was reliable and informative, than for Group B, for whom it was unreliable. This prediction was borne out by the data beyond the .001 level (F = 15.71; df = 1/55; analysis of covariance).
Group A: S1 vs. S2. As predicted from our first hypothesis, S1 was a much more effective secondary reinforcer than 52 for Group A. The difference between the means for these two stimulus positions, 115.1 vs. 65.8, was significant beyond the .001 level (F = 26.35; df = 1/27; analysis of covariance).
Pseudoconditioned and unconditioned control. Fourteen Ss, male albino rats, handled exactly as in the Main Experiment, were trained in groups of 7 Ss each with stimulus sequences identical to those of Groups A and B, except that the stimuli were never paired with the occurrence of food, which was delivered at least 10 sec. after the occurrence of the stimuli. The two different patterns of stimuli used in training had no effect upon the pseudoconditioned rate of bar pressing. The mean for the 14 Ss with both test sessions combined with 64.3. These 14 Ss bar pressed for the stimuli significantly less in both Test Session 1 (t = 3.41; df = 28; P < .005) and Test Session 2 (t = 2.72; df = 28; P = < .02) than did the 16 Group A Ss bar pressing for the informative stimulus (S1) in each of the Main Experi ment test sessions. Hence, in a group predicted to show a large secondary reinforcing effect, we did indeed find such an effect produced by our training procedure.
Eight Ss were exposed to the stimuli during training exactly as described above, except that the food pellets were eliminated entirely. The unconditioned rate of pressing for the stimuli was comparable to that of the pseudoconditioned group (M = 73.4).
The mean for the total group of pseudoconditioned and unconditioned Ss with both test sessions combined was 67.6, indicating that the secondary reinforcing value of the redundant stimulus for Group A of the Main Experiment (M = 65.8), once the unconditioned rate of pressing for stimuli is taken into account, was small, if not zero, as we predicted from our first hypothesis. The estimates of the pseudoconditioned and unconditioned scores may be somewhat high, however, since these Ss tended to have higher 10-min. extinction scores than did Ss of the Main Experiment.
Activation control. To test whether the effects studied in the Main Experiment were related to secondary reinforcement or only to a possible activation effect of a stimulus formerly associated with food (Wyckoff, Sidowski, & Chambliss, 1958), 10 additional Ss were trained exactly as in the Main Experiment, 5 as in Group A and 5 as in Group B. However, during the testing of these Ss, the bar remained nonfunctional once it was disconnected from the feeder. Each S was tested at the same time as an identically trained S used in the Main Experiment. The yoked Activation Control S received only the stimuli earned by his Main Experiment partner. If the Main Experiment S pressed for a stimulus within 7 1/2 sec. of a yoked Activation Control S’s response, the stimulus for the Activation Control S was delayed so that it was not delivered until 71/2 sec. after his response. Hence spurious pairings of stimuli and pressing could not occur.
Thus, for these 10 Ss, any pressing which occurred during the retraining test period could have been due only to the activation effects of the stimuli plus remaining operant level; the possibility of secondary reinforcement was eliminated.
In Test Session 1, all 10 of the Activation Control Ss pressed less than did their secondary-reinforced partners (P < .002, binomial test, two-tailed). In Test Session 2, 9 out of 10 pressed less than did their yoked partners (P < .02, binomial test, two- tailed). Hence, we are quite certain that in the Main Experiment we were indeed studying secondary reinforcement.
Partial reinforcement effect control. In the Main Experiment we had found that in the presence of a reliable predictor (S2), training with partial reinforcement of S1 produced less total pressing for S1 as a secondary reinforcer than did 100 percent reinforcement. This confirmed our hypothesis but was opposite to the effect of increased resistance to extinction usually found with partial reinforcement. In order to see whether the presence of the reliable predictor was indeed the crucial factor, we ran two special control groups of 8 Ss each, one with the usual partial reinforcement procedure and one with 100 percent reinforcement. These groups were identical in all respects to those of the Main Experiment, except that the reliable predictor, S2, was omitted. When these groups were tested, the partial reinforcement group tended to press more for the stimuli than did the continuous reinforcement group (though the difference between the group means, 128.6 vs. 115.6, was not statistically significant). However, the difference between these two groups was in the opposite direction and significantly different (F = 5.71; df = 1/35; P < .025) from the difference found between Test Session
1 means of the 32 Ss of the Main Experiment tested with S1 during Test Session 1. Thus it appears that the presence of S2, the reliable predictor of food, did play the crucial role in determining the direction of the results obtained in our tests of the secondary reinforcing value of S1.
Our situation differed from those in which the effect of partial reinforcement on the establishment of secondary reinforcement has been studied (e.g., Klein, 1959; Zimmerman, 1957, 1959) in that during training all our Ss had a reliable predictor of food. The seemingly crucial importance of the presence or absence of a reliable predictor during training may help to explain the apparently conflicting results obtained from single-group vs. separate-group experimental designs in determining the effects of partial reinforcement on the strength of a secondary reinforcer (e.g., D’Amato, Lachman, Sc Kivy, 1958). It may be that partial reinforcement will increase resistance to extinction of a secondary reinforcer only if training occurs in the absence of a reliable predictor.
It should be noted that our formulation of the conditions necessary for the establishment of a secondary reinforcer is compatible with the well-known “discriminative stimulus hypothesis” of secondary reinforcement (Keller Sc Schoenfeld, 1950; Schoenfeld, Antonitis, & Bersh, 1950). Furthermore, our results with respect to S2: Group B vs. Group A could perhaps be considered analogous to those reported by Notterman (1951) in studies using rats as Ss in both a Skinner box and a straight alley.
Albino rats (N = 88, male) were trained to press a bar for food, then divided randomly into two groups and trained as fol lows for 135 trials in the same Skinner boxes with the bars removed: two stimuli, when paired, ended together and always preceded food. for Group A, the second, shorter stimulus (S2) was always redundant because the first stimulus (S1) had already given reliable information that food was to come. But for Group B, S3 was informative, because for them S1 also occurred sometimes alone without food.
After the training sessions, the bars were reinserted, bar pressing was retained with food pellets, extinguished, and then retrained again, this time using 1 sec. of one of the training stimuli as a secondary reinforcer in place of the food. The total number of bar presses in 10 min. following the first occurrence of the secondary reinforcing stimulus was used as the measure of secondary reinforcing strength. The testing procedure was repeated after 48 hr. using the other training stimulus as secondary reinforcer, so that all Ss were tested with both stimuli in a balanced sequence.
Control experiments were run to provide baseline levels for pseudoconditioned and unconditioned rates of pressing, and for any activating effect of the stimuli.
As predicted, S2 was a stronger secondary reinforcer when it was informative than when it was redundant; S1 was a more effective secondary reinforcer than S2 in that group for which S2 was a redundant predictor of primary reinforcement. In addition, S1 was a more effective secondary reinforcer when it had been a reliable predictor of food.
A portion of the data reported in this paper was presented by Neal Miller in his Presidential Address to the American Psychological Association.
Learned Helplessness in the Dog
MARTIN E. P. SELIGMAN,2 Cornell University;
STEVEN F. MAIER,3 University of Pennsylvania; and
JAMES H. GEER, State University of New York at Stony Brook
Dogs given inescapable shock in a Pavlovian harness later seem to “give up” and passively accept traumatic shock in shuttlebox escape/avoidance training. A theoretical analysis of this phenomenon was presented. As predicted by this analysis, the failure to escape was alleviated by repeatedly compelling the dog to make the response which terminated shock. This maladaptive passive behavior in the face of trauma may be related to maladaptive passive behavior in humans. The importance of instrumental control over aversive events in the cause, prevention, and treatment of such behaviors was discussed.
This paper discusses a procedure that produces a striking behavior abnormality in dogs, outlines an analysis which predicts a method for eliminating the abnormality, and presents data which support the prediction. When a normal, naïve dog receives escape/avoidance training in a shuttlebox, the following behavior typically occurs: At the onset of electric shock, the dog runs frantically about, defecating, urinating, and howling, until it scrambles over the barrier and so escapes from shock. On the
next trial, the dog, running and howling, crosses the barrier more quickly, and so on until efficient avoidance emerges. See Solomon and Wynne (1953) for a detailed description.
Overmier and Seligman (1967) have reported the behavior of dogs which had received inescapable shock while strapped in a Pavlovian harness 24 hr. before shuttlebox training. Typically, such a dog reacts initially to shock in the shuttlebox in the same manner as the naïve dog. However, in dramatic contrast to the naïve dog, it soon stops running and remains silent until shock terminates. The dog does not cross the barrier and escape from shock. Rather, it seems to “give up” and passively “accept” the shock. On succeeding trials, the dog continues to fail to make escape movements and thus takes 50 sec. of severe, pulsating shock on each trial. If the dog makes an escape or avoidance response, this does not reliably predict occurrence of future responses, as it does for the normal dog. Pretreated dogs occasionally escape or avoid by jumping the barrier and then revert to taking the shock. The behavior abnormality produced by prior inescapable shock is highly maladaptive: a naïve dog receives little shock in shuttlebox training because it escapes quickly and eventually avoids shock altogether. A dog previously exposed to inescapable shock, in contrast, may take unlimited shock without escaping or avoiding at all.
Aside from establishing the existence of this interference effect, the experiments of Overmier and Seligrnan (1967) and Seligman and Maier (1967) have pointed to the variables controlling this phenomenon. Three hypotheses concerning the necessary conditions under which this phenomenon occurs have been disconfirmed, and one has been confirmed.
Overmier and Seligman (1967) tested two hypotheses which had been advanced to explain similar phenomena: a competing- motor-response hypothesis (Carison & Black, 1960) and an adaptation hypothesis (MacDonald, 1946). The competing-response hypothesis holds that, in the harness, the dog learned some motor response which alleviated shock. When placed in the shuttlebox, the dog performed this response, which was antagonistic to barrier jumping, and thus was retarded in its acquisition of barrier jumping. This hypothesis was tested in the following way: Dogs, whose skeleto-musculature was paralyzed by curare (eliminating the possibility of the execution of overt motor responses), received inescapable shock in the harness. These dogs subsequently failed to escape4n the shuttlebox. Dogs, paralyzed by curare, but not given inescapable shock, escaped normally. These results disconfirmed the competing-response hypothesis. The adaptation hypothesis holds that the dogs adapted to shock in the harness and therefore were not motivated enough to escape shock in the shuttlebox. Overmier and Seligman (1967) found that dogs failed to escape in the shuttlebox, even when the shock intensity was increased to a point just below whiéh some dogs are tetanized and thus physically prevented from jumping the barrier. These results are inconsistent with the adaptation hypothesis.
Seligman and Maier (1967) presented and tested an analysis of the phenomenon in terms of learned independence between shock termination and instrumental responding. Learning theory has traditionally stressed that two relationships between events produce learning: explicit contiguity (acquisition) and explicit dissociation (extinction). Seligman and Maier (1967) suggested that organisms are sensitive to a third relationship: independence between events. In particular, they proposed that, during inescapable shock in the harness, the dogs learned that shock termination occurred independently of their responses. Conventional learning theory allows that animals are sensitive to the conditional probability of shock termination given any specific response, and are also sensitive to the conditional probability of shock termination not given that response. In the special case in which these two probabilities are equal (independence), it is suggested that the animal integrates these two experiences. Thus, learning that shock termination is independent of a response reduces to learning that shock termination follows the response with a given probability, that shock termination occurs with a given probability if the response does not occur, and that these two probabilities do not differ. Such an integration could be called an expectation that shock termination is independent of responding. Seligman and Maier (1967) further proposed that one condition for the emission of active responses in the presence of electric shock is the expectation that responding leads to shock termination. In the absence of such an expectation, emitted responding should be less likely. When the dogs are subsequently placed in the shuttlebox, shock mediates the generalization of the initial learning to the new situation, and the probability of escape responding is thereby decreased.
This analysis was tested by varying the dogs’ control over shock termination in their initial experience with shock. For one group (Escape), pressing panels located about 3 in. from either side of their heads terminated shock. Another group (Yoked) received the identical shock, but shock termination occurred independently of its responses (since shock duration was determined by the responses of the Escape group) - The Escape group escaped normally in the shuttlebox, while the Yoked group failed to escape in the shuttlebox. This result confirmed the hypothesis that the learning of independence of shock termination and instrumental responding is a necessary condition for the interference effect. It disconfirmed a punishment interpretation of interference to the effect that the dogs failed to escape in the shuttle- box because they had been punished in the harness by the onset of shock for active responding. This experiment equated the groups for punishment by the onset of shock; the groups differed only with respect to the independence and nonindependence of shock termination and the head-turning response. This theoretical analysis, as noted below, predicts that failure to escape shock should be eliminable by compelling the dog to respond in a situation in which its responses terminate shock. Repeated exposure to the response-relief contingency should replace the expectation that shock termination is independent of responding with the expectation that responding produces shock termination.
Learned “helplessness” was defined as the learning (or perception) of independence between the emitted responses of the organism and the presentation and/or withdrawal of aversive events. This term is not defined as the occurrence of a subjective feeling of helplessness (although such a possibility is not excluded), nor is it to be taken as a description of the appearance of the organism. Such learning seems to be a necessary condition for the occurrence of the interference effect. That such learning occurs, moreover, seems to be a necessary premise for any systematic explication of the concept of “hopelessness” advanced by Mowrer (1960, p. 197) and by Richter (1957), the concept of “helplessness” advanced by Cofer and Appley (1964, p. 452), and the concept of “external control of reinforcement” of Lefcourt (1966) - Overmier and Seligman (1967) found that if 48 hr. elapsed between the inescapable shock in the harness and escape/avoidance training in the shuttlebox, dogs did not show the interference effect. Thus, although experience with inescapable trauma might be a necessary precondition for such maladaptive behavior, it was not a sufficient condition. However, Seligman and Maier (1967) found that the interference effect could be prolonged, perhaps indefinitely. If 24 hr. after inescapable shock in the harness the dog passively accepted shock in the shuttlebox, the dog again failed to escape after further rests of 168 hr. or longer. Thus, chronic failure to escape occurred when an additional experience with nonescaped shock followed the first experience.
Other work with infrahumans also suggests that lack of control (the independence of response and reinforcement) over the important events in an animal’s environment produces abnormal behavior. Richter (1957) reported that wild rats rapidly gave up swimming and drowned when placed in tanks of water from which there was no escape. If, however, the experimenter (F) repeatedly placed the rats in the tank and then took them out, or if F allowed them repeatedly to escape from his grasp, they swam for approximately 60 hr. before drowning. Richter concluded that loss of hope was responsible for the sudden deaths. Maier (1949) reported that rats showed positional fixations when they were given insoluble discrimination problems (problems in which the responses of the rat and the outcome are independent). Making the problems soluble, alone, did not break up these fixations. But the “therapeutic” technique of forcing the rats to jump to the nonfixated side when the problem was soluble eliminated the fixations. Liddell (1956) reported that inescapable shocks produced experimental “neurosis” in lambs. Masserman (1943, pp. 79—85) reported that cats which instrumentally controlled the presentation of food were less prone to experimental neurosis than cats which did not have such control.
The maladaptive failure of dogs to escape shock resembles some human behavior disorders in which individuals passively accept aversive events without attempting to resist or escape. Bettelheim (1960) described the reaction of certain prisoners to the Nazi concentration camps:
Prisoners who came to believe the repeated statements of the guards—that there was no hope for them, that they would never leave the camp except as a corpse—who came to feel that their environment was one over which they could exercise no influence whatsoever, these prisoners were in a literal sense, walking corpses. In the camps they were called “moslems” (Müselmänner) because of what we erroneously viewed as a fatalistic surrender to the environment, as Mohammedans are supposed to blandly accept their fate. . . they were people who were so deprived of affect, self-esteem, and every form of stimulation, so totally exhausted, both physically and emotionally, that they had given the environment total power over them [pp. 151—152].
Bleueler (1950, p. 40) described the passive behavior of some of his patients:
The sense of self-preservation is often reduced to zero. The patients do not bother anymore about whether they starve or not, whether they lie on a snowbank or on a red-hot oven. During a fire in the hospital, a number of patients had to be led out of the threatened area; they themselves would never have moved from their places; they would have allowed themselves to be burned or suffocated without showing an affective response.
It is suggested that an explanation which parallels the analysis of the interference effect in dogs may hold for such psychopathological behavior in humans. Consider an individual who has learned that his responses and the occurrence and withdrawal of traumatic events are independent. If a necessary condition for the initiation of responding is the expectation that his responses may control the trauma, such an individual should react passively in the face of trauma.
The time course of the interference effect found with dogs suggests that such human disorders may also be subject to tempo ral variables. Experience with traumatic inescapable shock produces interference with subsequent escape learning. This interference dissipates over time. Traumatic events must intervene if permanent failure to escape shock is to occur. This suggests that one traumatic experience may be sufficient to predispose an individual to future maladaptive behavior, producing, perhaps, a temporary disturbance which Wallace (1957) has called the “disaster syndrome.” In order for this experience to be translated into a chronic disorder, however, subsequent traumatic events may have to occur.
Because the interference effect in dogs and these forms of human psychopathology may be acquired in similar ways, information about the modification of the interference effect may lead to insights concerning the treatment of such psychopathological behavior in humans. Two categories of treatment could be attempted: prevention or “immunization” against the effects of future inescapable shock (proactive), or modification of maladaptive behavior after inescapable shock has had its effect (retroactive). Seligman and Maier (1967) reported that prior experience with escapable shock immunizes dogs against the effects of later inescapable shock. Thus, preventive steps have been shown to be effective:
The above analysis of the interference effect predicts that by exposing a dog to the contingent relationship of shock termination and its responses the interference effect established by prior exposure to unavoidable shock should be eliminated. This experiment reports an elimination of learned “helplessness” in dogs that had chronically failed to escape from traumatic shock. Such retroactive treatment resembles the traditional treatment of human psychopathology more than does the preventive procedure.
The Ss were four mongrel dogs. They weighed 25—29 lb., were 15—19 in. high at the shoulder, and were housed in individual cages with food and water freely available. Each dog chronically failed to escape shock (see Procedure) as a result of receiving inescapable shock in Experiment I of Seligman and Maier (1967) -
The apparatus is described fully by Overmier and Seligman (1967). In brief, it consisted of two separate units: a Pavlovian harness, in which initial exposure to inescapable shock occurred, and a dog shuttlebox, in which escape/avoidance training and modification of the failure to escape were carried out.
The unit in which each S was exposed to inescapable shock was a rubberized cloth hammock located inside a shielded white sound-reducing cubicle. The hammock was constructed so that S’s legs hung down below his body through four holes. The S’s legs were secured in this position, and S was strapped into the hammock. The S’s head was held in position by panels placed on either side and a yoke between them across S’s neck. Shock was applied from a 500-VAC transformer through a fixed resistor of 20,000 ohms. The shock was applied to S through brass- plate electrodes coated with electrode paste and taped to the foot- pads of S’s hind feet. The shock intensity was 6.0 ma.
The unit in which S received escape/avoidance trials was a two-way shuttlebox with two black compartments separated by an adjustable barrier. Running along the upper part of the front of the shuttlebox were two one-way mirror windows, through which E could observe and which E could open. The barrier was set at S’s shoulder height. Each compartment was illuminated by two 50-w. and one 7½-w. lamps. The CS consisted of turning off the four 50-w. lamps which resulted in a sharp decrease in illumination. The UCS was 4.5-ma, electric shock applied through the grid floors from a 500-VAC source. The po- larity pattern of the grid bars was scrambled four times a second. Whenever S crossed from one side of the shuttlebox to the other, photocell beams were interrupted, and the trial was terminated. Latency of crossing was measured from CS onset to the nearest .01 sec. by an electric clock. Seventy decibels (SPL) white noise was present in both units.
Inescapable shock exposure. Each S was strapped into the harness and given 64 trials of inescapable shock. The shocks were presented in a sequence of trials of diminishing duration. The mean intershock interval was 90 sec. with a 60—120 sec. range. Each S received a total of 226 sec. of shock.
Instrumental escape/avoidance training. Twenty-four hours after inescapable shock exposure, Ss received 10 trials of instrumental escape/avoidance training in the shuttlebox. The onset of the CS (dimmed illumination) initiated each trial, and the CS remained on until trial termination. The CS—UCS onset interval was 10 sec. If S crossed to the other compartment during the interval, the CS terminated, and no shock was presented. If S did not cross during the CS—UCS interval, shock came on and remained on until S crossed. If no response occurred within 60 sec. of CS onset, the trial was automatically terminated, and a 60- sec. latency was recorded. The average intertrial interval was 90 sec. with a 60—120 sec. range.
All four Ss failed to escape shock on each of the 10 trials. Thus each S took 500 sec. of shock during the first escape/ avoidance session.
Testing for chronic failure to escape. Seven days later, Ss were again placed in the shuttlebox and given 10 further escape/ avoidance trials. Again, each S failed to escape shock on every trial (although one S avoided shock once, on the fifth trial) - By this time, each S was failing to make any escape movements and was remaining silent during shock on every trial. Previous work has shown that when a dog remains silent and fails to make escape movements during shock, this reliably predicts that the dog will continue to fail to escape and avoid.
Treatment. The attempt at behavioral modification consisted of two distinct phases: all Ss received Phase I; if Phase I succeeded, as it did with one of the four dogs, no further treatment was given, and “recovery” (see Recovery section below) was begun. The other three Ss received Phase II following Phase I.
Phase I: no barrier, calling. At intervals ranging from 4 to 25 days following the demonstration that the interference was chronic, Ss were again placed in the shuttlebox. The escape/ avoidance contingencies used previously remained in effect during Phase I and II trials. The barrier dividing the two sides of the shuttlebox (formerly set at shoulder height) was removed. Thus in order to escape or avoid, S had only to step over the remaining 5-in, high divider. In addition, E opened the observation window on the side of the shuttlebox opposite the side S was on and called to S (“Here, boy”) during shock and during the CS—UCS interval. The rationale for such treatment was to encourage S to make the appropriate response on its own, thus exposing itself to the response-reinforcement contingency. One S responded to this treatment and began to escape and avoid. The remaining Ss then received Phase II.
Phase II: forced escape/avoidance exposure. Phase II began when it was clear that Phase I would not produce escape and avoidance in the remaining three Ss since they remained silent and motionless during Phase I. The S was removed from the shuttiebox, and two long leashes were tied around its neck. The S was put back into the shuttlebox, and escape/avoidance trials continued. The end of each leash was brought out at opposite ends of the shuttlebox. Thus, two Es were able to drag S back and forth across the shuttlebox by pulling one of the leashes. Phase II consisted of pulling S across to the safe side on each trial during shock or during the CS—UCS interval. A maximum of 25 Phase II trials per day were given. The rationale for Phase II was to force S to expose himself to the response-reinforcement contingency. Such “directive therapy” continued until S began to respond without being pulled by E.
Recovery. Following Phase II (for three dogs) and Phase I (for the other dog), each S received further escape/avoidance trials. The barrier height was gradually increased over the course of 15 trials until shoulder height had been reached. Ten further escape/avoidance trials were then given. The last five of these recovery trials (with the barrier at shoulder height) were administered from 5 to 10 days following the first five trials with the barrier at this height. This tested the durability of the recovery.
Figure 1 presents the results of this study. It is clear that the procedures employed in Phases I and II of treatment were wholly successful in breaking up the maladaptive failure to escape and avoid shock. With the single exception of one S on one trial, the dogs had not escaped or avoided the intense shock prior to treatment. This is indicated by the mean percentage of escape or avoidance responses present at or near zero during the pretreatment phase. Following Phase I (no barrier, calling) and Phase II (forced escape/avoidance exposure) of treatment, posttreatment recovery trials without forcing or calling were given to determine the effectiveness of the treatment. All Ss escaped or avoided on every recovery trial. The behavior of one S was successfully modified by Phase I of treatment. After sporadic failures to escape shock during this phase, it began to escape and avoid reliably after 20 Phase I trials. With the barrier increased to shoulder height, it continued to avoid reliably. The other three dogs all responded to treatment in a fashion similar to one another: after failing to respond to Phase I, each of these dogs began to respond on its own after differing numbers of Phase II trials on which it had to be pulled to safety. One of the Phase II Ss required 20 forced exposures to escape and avoid in Phase II before it began to respond without being pulled; the other two required 35 and 50 such trials. During the course of Phase II trials, progressively less forceful pulls were required before S crossed to the safe side. With the barrier increased to shoulder height following Phase II, each S escaped and avoided efficiently. At this stage, the dogs responded like normal dogs at or near asymptotic avoidance performance.
The chronic failure of dogs to escape shock can be eliminated by physically compelling them to engage repeatedly in the response which terminates shock. Solomon, Kamin, and Wynne (1953) also attenuated maladaptive behavior in dogs by forcingthem to expose themselves to the experimental contingencies. They reported that dogs continued to make avoidance responses long after shock was no longer present in the situation. A glass barrier, which prevented the dogs from making the response and forced them to “reality test,” attenuated the persistent responding somewhat. Such “directive therapy” also is similar to Maier and Klee’s (1945) report that abnormal positional fixations in rats were eliminated by forcing the rat to respond to the nonfixated side, and to Masserman’s (1943, pp. 76—77) report that “neurotic” feeding inhibition could be overcome by forcing the cat into close proximity with food.
Seligman and Maier (1967) suggested that during its initial experience with inescapable shock, S learns that its responses are independent of shock termination. They further suggested that this learning not only reduces the probability of response initiation to escape shock, but also inhibits the formation of the response-relief association if S does make an escape or avoidance response in the shuttlebox. That the dogs escaped and avoided at all after being forcibly exposed to the response-relief contingency confirmed the suggestion that they had initially learned that their responses were independent of shock termination and that this learning was contravened by forcible exposure to the contingency. The finding that so many forced exposures to the contingency were required before they responded on their own (before they “caught on”) confirmed the suggestion that the initial learning inhibited the formation of a response-relief association when the dog made a relief-producing response.
The perception of degree of control over the events in one’s life seems to be an important determinant of the behavior of human beings. Lefcourt (1966) has summarized extensive evidence which supports this view. Cromwell, Rosenthal, Shakow, and Kahn (1961), for example, reported that schizophrenics perceive reinforcement to be externally controlled (reinforcement occurs independently of their responses) to a greater extent than normals. Such evidence, along with the animal data cited above, suggests that lack of control over reinforcement may be of widespread importance in the development of psychopathology in both humans and infrahumans.
In conclusion, one might speculate that experience with traumatic events in which the individual can do nothing to eliminate or mitigate the trauma results in passive responding to future aversive events in humans. The findings of Seligman and Maier (1967) suggest that an individual might be immunized against the debilitating effects of uncontrollable trauma by having had prior experience with instrumental control over the traumatic events. Finally, the findings suggest that the pathological behavior resulting from inescapable trauma might be alleviated by repeated exposure of the individual to the trauma under conditions in which his responses were instrumental in obtaining relief. It has been demonstrated that normal escape/avoidance behavior can be produced in “passive” dogs by forcibly exposing them to relief-producing responses.
|This research was supported by grants to R. L. Solomon from the Na tional Science Foundation (GB-2428) and the National Institute of Mental Health (MH-04202). The authors are grateful to him for his advice in the conduct and reporting of this experiment. The authors also thank J. P. Brady and J. Mecklenburger for their critical readings of the manuscript.|
|At the time this work was carried out, the first author was a National Science Foundation predoctoral fellow at the University of Pennsylvania.|
|National Institute of Mental Health predoctoral fellow.|
and Glandular Responses
NEAL E. MILLER, Yale University
There is a strong traditional belief in the inferiority of the autonomic nervous system and the visceral responses that it controls. The recent experiments disproving this belief have deep implications for theories of learning, for individual differences in autonomic responses, for the cause and the cure of abnormal psychosomatic symptoms, and possibly also for the understanding of normal homeostasis. Their success encourages investigators to try other unconventional types of training. Before describing these experiments, let me briefly sketch some elements in the history of the deeply entrenched, false belief in the gross inferiority of one major part of the nervous system.
HISTORICAL ROOTS AND
Since ancient times,
reason and the voluntary responses of the skeletal muscles have been considered
to -be superior, while emotions and the presumably involuntary glandular and
visceral responses have been considered to be inferior. This invidious
Many, though not all, psychiatrists have made an invidious distinction between the hysterical and other symptoms that are mediated by the cerebrospinal nervous system and the psychosomatic symptoms that are mediated by the autonomic nervous system. Whereas the former are supposed to be subject to a higher type of control that is symbolic, the latter are presumed to be only the direct physiological consequences of the type and intensity of the patient’s emotions.4
Similarly, students of learning have made a distinction between a lower form, called classical conditioning and thought to be involuntary, and a superior form variously called trial-and- error learning, operant conditioning, type II conditioning, or instrumental learning and believed to be responsible for voluntary behavior. In classical conditioning, the reinforcement must be by an unconditioned stimulus that already elicits the specific response to be learned; therefore, the possibilities are quite limited. In instrumental learning, the reinforcement, called a reward, has the property of strengthening any immediately preceding response.
Therefore, the possibilities for reinforcement are much greater; a given reward may reinforce any one of a number of different responses, and a given response may be reinforced by any one of a number of different rewards. Finally, the foregoing invidious distinctions have coalesced into the strong traditional belief that the superior type of instrumental learning involved in the superior voluntary behavior is possible only for skeletal responses mediated by the superior cerebrospinal nervous system, while, conversely, the inferior classical conditioning is the only kind possible for the inferior, presumably involuntary, visceral and emotional responses mediated by the inferior autonomic nervous system. Thus, in a recent summary generally considered authoritative, Kimble3 states the almost universal belief that “for autonomically mediated behavior, the evidence points unequivocally to the conclusion that such responses can be modified by classical, but not instrumental, training methods.” Upon examining the evidence, however, one finds that it consists only of failure to secure instrumental learning in two incompletely reported exploratory experiments and a vague allusion to the Russian literature.6 It is only against a cultural background of great prejudice that such weak evidence could lead to such a wrong conviction.
The belief that instrumental learning is possible only for the cerebrospinal system and, conversely, that the autonomic nervous system can be modified only by classical conditioning has been used as one of the strongest arguments for the notion that instrumental learning and classical conditioning are two basically different phenomena rather than different manifestations of the same phenomenon under different conditions. But for many years I have been impressed with the similarity between the laws of classical conditioning and those of instrumental learning, and with the fact that, in each of these two situations, some of the specific details of learning vary with the specific conditions of learning. Failing to see any clear-cut dichotomy, I have assumed that there is only one kind of learning7. This assumption has logically demanded that instrumental training procedures be able to produce the learning of any visceral responses that could be acquired through classical conditioning procedures. Yet it was only a little over a dozen years ago that I began some experimental work on this problem and a somewhat shorter time ago that I first, in published articles,8 made specific sharp challenges to the traditional view that the instrumental learning of visceral responses is impossible.
One of the difficulties of investigating the instrumental learning of visceral responses stems from the fact that the responses that are the easiest to measure—namely, heart rate, vasomotor responses, and the galvanic skin response—are known to be affected by skeletal responses, such as exercise, breathing, and even tensing of certain muscles, such as those in the diaphragm. Thus, it is hard to rule out the possibility that, instead of directly learning a visceral response, the subject has learned a skeletal response the performance of which causes the visceral change being recorded.
One of the controls I planned to use was the paralysis of all skeletal responses through administration of curare, a drug which selectively blocks the motor end plates of skeletal muscles without eliminating consciousness in human subjects or the neural control of visceral responses, such as the beating of the heart. The muscles involved in breathing are paralyzed, so the subject’s breathing must be maintained through artificial respiration. Since it seemed unlikely that curarization and other rigorous control techniques would be easy to use with human subjects, I decided to concentrate first on experiments with animals.
Originally I thought that learning would be more difficult when the animal was paralyzed, under the influence of curare, and therefore I decided to postpone such experiments until ones on nonparalyzed animals had yielded some definitely promising results. This turned out to be a mistake because, as I found out much later, paralyzing the animal with curare not only greatly simplifies the problem of recording visceral responses without artifacts introduced by movement but also apparently makes it easier for the animal to learn, perhaps because paralysis of the skeletal muscles removes sources of variability and distraction. Also, in certain experiments I made the mistake of using rewards that induced strong unconditioned responses that interfered with instrumental learning.
One of the greatest difficulties, however, was the strength of the belief that instrumental learning of glandular and visceral responses is impossible. It was extremely difficult to get students to work on this problem, and when paid assistants were assignedto it, their attempts were so half-hearted that it soon became more economical to let them work on some other problem which they could attack with greater faith and enthusiasm. These difficulties and a few preliminary encouraging but inconclusive early results have been described elsewhere.9
SUCCESS WITH SALIVATION
The first clear-cut results were secured by Alfredo Carmona and me in an experiment on the salivation of dogs. Initial attempts to use food as a reward for hungry dogs were unsuccessful, partly because of strong and persistent unconditioned salivation elicited by the food. Therefore, we decided to use water as a reward for thirsty dogs. Preliminary observations showed that the water had no appreciable effects one way or the other on the bursts of spontaneous salivation. As an additional precaution, however, we used the experimental design of rewarding dogs in one group whenever they showed a burst of spontaneous salivation, so that they would be trained to increase salivation, and rewarding dogs in another group whenever there was a long interval between spontaneous bursts, so that they would be trained to decrease salivation. If the reward had any unconditioned effect, this effect might be classically conditioned to the experimental situation and therefore produce a change in salivation that was not a true instance of instrumental learning. But in classical conditioning the reinforcement must elicit the response that is to be acquired. Therefore, conditioning of a response elicited by the reward could produce either an increase or a decrease in salivation, depending upon the direction of the unconditioned response elicited by the reward, but it could not produce a change in one direction for one group and in the opposite direction for the other group. The same type of logic applies for any unlearned cumulative aftereffects of the reward; they could not be in opposite directions for the two groups. With instrumental learning, however, the reward can reinforce any response that immediately precedes it; therefore, the same reward can be used to produce either increases or decreases.
The results are presented in Fig. 1, which summarizes the effects of 40 days of training with one 45-minute training session per day. It may be seen that in this experiment the learning proceeded slowly. However, statistical analysis showed that each of the trends in the predicted rewarded direction was highly reliable.10
Since the changes in salivation for the two groups were in opposite directions, they cannot be attributed to classical conditioning. It was noted, however, that the group rewarded for increases seemed to be more aroused and active than the one rewarded for decreases. Conceivably, all we were doing was to change the level of activation of the dogs, and this change was, in turn, affecting the salivation. Although we did not observe any specific skeletal responses, such as chewing movements or panting, which might be expected to elicit salivation, it was difficult to be absolutely certain that such movements did not occur. Therefore, we decided to rule out such movements by paralyzing the dogs with curare, but we immediately found that curare had two effects which were disastrous for this experiment: it elicited such copious and continuous salivation that there were no changes in salivation to reward, and the salivation was so viscous that it almost immediately gummed up the recording apparatus.
In the meantime, Jay Trowill, working with me on this problem, was displaying great ingenuity, courage, and persistence in trying to produce instrumental learning of heart rate in rats that had been paralyzed by curare to prevent them from “cheating” by muscular exertion to speed up the heart or by relaxation to slow it down. As a result of preliminary testing, he selected a dose of curare (3.6 milligrams of d-tubocurarine chloride per kilogram, injected intraperitoneally) which produced deep paralysis for at least 3 hours, and a rate of artificial respiration (inspiration-expiration ratio 1: 1; 70 breaths per minute; peak pressure reading, 20 cm-H2O) which maintained the heart at a constant and normal rate throughout this time.
In subsequent experiments, DiCara and I have obtained similar effects by starting with a smaller dose (1.2 milligrams per kilogram) and constantly infusing additional amounts of the drug, through intraperitoneal injection, at the rate of 1.2 milligrams per kilogram per hour, for the duration of the experiment. We have recorded, electromyographically, the response of the muscles, to determine that this dose does indeed produce a complete block of the action potentials, lasting for at least an hour after the end of infusion. We have found that if parameters of respiration and the face mask are adjusted carefully, the procedure not only maintains the heart rate of a 500-gram control animal constant but also maintains the vital signs of temperature, peripheral vasomotor responses, and the pCO2 of the blood constant.
Since there are not very many ways to reward an animal completely paralyzed by curare, Trowill and I decided to use direct electrical stimulation of rewarding areas of the brain. There were other technical difficulties to overcome, such as devising the automatic system for rewarding small changes in heart rate as recorded by the electrocardiogram. Nevertheless, Trowill at last succeeded in training his rats.11 Those rewarded for an increase in heart rate showed a statistically reliable increase, and those rewarded for a decrease in heart rate showed a statistically reliable decrease. The changes, however, were disappointingly small, averaging only 5 percent in each direction.
The next question was whether larger changes could be achieved by improving the technique of training. DiCara and I used the technique of shaping—in other words, of immediately rewarding first very small, and hence frequently occurring, changes in the correct direction and, as soon as these had been learned, requiring progressively larger changes as the criterion for reward. In this way, we were able to produce in 90 minutes of training changes averaging 20 percent in either direction.12
KEY PROPERTIES OF LEARNING:
DISCRIMINATION AND RETENTION
Does the learning of visceral responses have the same properties as the learning of skeletal responses? One of the important characteristics of the instrumental learning of skeletal responses is that a discrimination can be learned, so that the responses are more likely to be made in the stimulus situations in which they are rewarded than in those in which they are not. After the training of the first few rats had convinced us that we ‘could produce large changes in heart rate, DiCara and I gave all the rest of the rats in the experiment described above 45 minutes of additional training with the most difficult criterion. We did this in order to see whether they could learn to give a greater response during a “time-in” stimulus (the presence of a flashing light and a tone) which indicated that a response in the proper direction would be rewarded than during a “time-out” stimulus (absence of light and tone) which indicated that a correct response would not be rewarded.
Figure 2 shows the record of one of the rats given such training. Before the beginning of the special discrimination training it had slowed its heart from an initial rate of 350 beats per minute to a rate of 230 beats per minute. From the top record of Fig. 2 one can see that, at the beginning of the special discrimination training, there was no appreciable reduction in heart rate that was specifically associated with the time-in stimulus. Thus it took the rat considerable time after the onset of this stimulus to meet the criterion and get the reward. At the end of the discrimination training the heart rate during time-out remained approximately the same, but when the time-in light and tone came on, the heart slowed down and the criterion was promptly met. Although the other rats showed less change than this, by the end of the relatively short period of discrimination training their heart rate did change reliably (P < .001) in the predicted direction when the time-in stimulus came on. Thus, it is clear that instrumental visceral learning has at least one of the important properties of instrumental skeletal learning—namely, the ability to be brought under the control of a discriminative stimulus.
Another of the important properties of the instrumental learning of skeletal responses is that it is remembered. DiCara and I performed a special experiment to test the retention of learned changes in heart rate.13 Rats that had been given a single training session were returned to their home cages for 3 months without further training. When curarized again and returned to the experimental situation for nonreinforced test trials, rats in both the “increase” and the “decrease” groups showed good retention by exhibiting reliable changes in the direction rewarded in the earlier training.
ESCAPE AND AVOIDANCE LEARNING
Is visceral learning by any chance peculiarly limited to reinforcement by the unusual reward of direct electrical stimulation of the brain, or can it be reinforced by other rewards in the same way that skeletal learning can be? In order to answer this question, DiCara and I14 performed an experiment using the other of the two forms of thoroughly studied reward that can be conveniently used with rats which are paralyzed by curare—namely, the chance to avoid, or escape from, mild electric shock. A shock signal was turned on; after it had been on for 10 seconds it was accompanied by brief pulses of mild electric shock delivered to the rat’s tail. During the first 10 seconds the rat could turn off the shock signal and avoid the shock by making the correct response of changing its heart rate in the required direction by the required amount. If it did not make the correct response in time, the shocks continued to be delivered until the rat escaped them by making the correct response, which immediately turned off both the shock and the shock signal.
For one group of curarized rats, the correct response was an increase in heart rate; for the other group it was a decrease. After the rats had learned to make small responses in the proper direction, they were required to make larger ones. During this training the shock signals were randomly interspersed with an equal number of “safe” signals that were not followed by shock; the heart rate was also recorded during so-called blank trials—trials without any signals or shocks. For half of the rats the shock signal was a tone and the “safe” signal was a flashing light; for the other half the roles of these cues were reversed.
The results are shown in Fig. 3. Each of the 12 rats in this experiment changed its heart rate in the rewarded direction. As training progressed, the shock signal began to elicit a progressively greater change in the rewarded direction than the change recorded during the blank trials; this was a statistically reliable trend. Conversely, as training progressed, the “safe” signal came to elicit a statistically reliable change in the opposite direction, toward the initial base line. These results show learning when escape and avoidance are the rewards; this means that visceral responses in curarized rats can be reinforced by rewards other than direct electrical stimulation of the brain. These rats also discriminate between the shock and the “safe” signals. You will remember that, with noncurarized thirsty dogs, we were able to use yet another kind of reward, water, to produce learned changes in salivation.
TRANSFER TO NONCURARIZED STATE:
MORE EVIDENCE AGAINST MEDIATION
In the experiments discussed above, paralysis of the skeletal muscles by curare ruled out the possibility that the subjects were learning the overt performance of skeletal responses which were indirectly eliciting the changes in the heart rate. It is barely conceivable, however, that the rats were learning to send out from the motor cortex central impulses which would have activated the muscles had they not been paralyzed. And it is barely conceivable that these central impulses affected heart rate by means either of inborn connections or of classically conditioned ones that had been acquired when previous exercise had been accompanied by an increase in heart rate and relaxation had been accompanied by a decrease. But, if the changes in heart rate were produced in this indirect way, we would expect that, during a subsequent test without curare, any rat that showed learned changes in heart rate would show the movements in the muscles that were no longer paralyzed. Furthermore, the problem of whether or not visceral responses learned under curarization carry over to the noncurarized state is of interest in its own right.
In order to answer this question, DiCara and 115 trained two groups of curarized rats to increase or decrease, respectively, their heart rate in order to avoid, or escape from, brief pulses of mild electric shock. When these rats were tested 2 weeks later in the noncurarized state, the habit was remembered. Statistically reliable increases in heart rate averaging 5 percent and decreases averaging 16 percent occurred. Immediately subsequent retraining without curare produced additional significant changes of heart rate in the rewarded direction, bringing the total overall increase to 11 percent and the decrease to 22 percent. While, at the beginning of the test in the noncurarized state, the two groups showed some differences in respiration and activity, these differences decreased until, by the end of the retraining, they were small and far from statistically reliable (t = 0.3 and 1.3, respectively). At the same time, the difference between the two groups with respect to heart rate was increasing, until it became large and thus extremely reliable (t = 8.6, dJ. = 12, P < .001).
In short, while greater changes in heart rate were being learned, the response was becoming more specific, involving smaller changes in respiration and muscular activity. This increase in specificity with additional training is another point of similarity with the instrumental learning of skeletal responses. Early in skeletal learning, the rewarded correct response is likely to be accompanied by many unnecessary movements. With additional training during which extraneous movements are not rewarded, they tend to drop out.
It is difficult to reconcile the foregoing results with the hypothesis that the differences in heart rate were mediated primarily by a difference in either respiration or amount of general activity. This is especially true in view of the research, summarized by Ehrlich and Malmo,16 which shows that muscular activity, to affect heart rate in the rat, must be rather vigorous.
While it is difficult to rule out completely the possibility that changes in heart rate are mediated by central impulses to skeletal muscles, the possibility of such mediation is much less attractive for other responses, such as intestinal contractions and the formation of urine by the kidney. Furthermore, if the learning of these different responses can be shown to be specific in enough visceral responses, one runs out of different skeletal movements each eliciting a specific different visceral response.17 Therefore, experiments were performed on the learning of a variety of different visceral responses and on the specificity of that learning. Each of these experiments was, of course, interesting in its own right, quite apart from any bearing on the problem of mediation.
INTESTINAL VERSUS CARDIAC
The purpose of our next experiment was to determine the specificity of visceral learning. If such learning has the same properties as the instrumental learning of skeletal responses, it should be possible to learn a specific visceral response independently of other ones. Furthermore, as we have just seen, we might expect to find that, the better the rewarded response is learned, the more specific is the learning. Banuazizi and I worked on this problem.18 First we had to discover another visceral response that could be conveniently recorded and rewarded. We decided on intestinal contractions, and recorded them in the curarized rat with a little balloon filled with water thrust approximately 4 centimeters be yond the anal sphincter. Changes of pressure in the balloon were transduced into electric voltages which produced a record on a polygraph and also activated an automatic mechanism for delivering the reward, which was electrical stimulation of the brain.
The results for the first rat trained, which was a typical one, are shown in Fig. 4. From the top record it may be seen that, during habituation, there were some spontaneous contractions. When the rat was rewarded by brain stimulation for keeping contractions below a certain amplitude for a certain time, the number of contractions was reduced and the base line was lowered. After the record showed a highly reliable change indicating that relaxation had been learned (Fig. 4, second record from the top), the conditions of training were reversed and the reward was delivered whenever the amplitude of contractions rose above a certain level. From the next record (Fig. 4, middle) it may be seen that this type of training increased the number of contractions and raised the base line. Finally (Fig. 4, two bottom records) the reward was discontinued and, as would be expected, the response continued for a while but gradually became extinguished, so that the activity eventually returned to approximately its original baseline level.
After studying a number of other rats in this way and convincing ourselves that the instrumental learning of intestinal responses was a possibility, we designed an experiment to test specificity. For all the rats of the experiment, both intestinal contractions and heart rate were recorded, but half the rats were rewarded for one of these responses and half were rewarded for the other response. Each of these two groups of rats was divided into two subgroups, rewarded, respectively, for increased and decreased response. The rats were completely paralyzed by curare, maintained on artificial respiration, and rewarded by electrical stimulation of the brain.
The results are shown in Figs. 5 and 6. In Fig. 5 it may be seen that the group rewarded for increases in intestinal contractions learned an increase, the group rewarded for decreases learned a decrease, but neither of these groups showed an appreciable change in heart rate. Conversely (Fig. 6), the group rewarded for increases in heart rate showed an increase, the group rewarded for decreases showed a decrease, but neither of these groups showed a change in intestinal contractions.
The fact that each type of response changed when it was rewarded rules out the interpretation that the failure to secure a change when that change was not rewarded could have been due to either a strong and stable homeostatic regulation of that response or an inability of our techniques to measure changes reliably under the particular conditions of our experiment.
Each of the 12 rats in the experiment showed statistically reliable changes in the rewarded direction; for 11 the changes were reliable beyond the P < .001 level, while for the 12th the changes were reliable only beyond the .05 level. A statistically reliable negative correlation showed that the better the rewarded visceral response was learned, the less change occurred in the other, nonrewarded response. This greater specificity with better learning is what we had expected. The results showed that visceral learning can be specific to an organ system, and they clearly ruled out the possibility of mediation by any single general factor, such as level of activation or central commands for either general activity or relaxation.
In an additional experiment, Banuazizi19 showed that either increases or decreases in intestinal contractions can be rewarded by avoidance of, or escape from, mild electric shocks, and that the intestinal responses can be discriminatively elicited by a specific stimulus associated with reinforcement.
Encouraged by these successes, DiCara and I decided to see whether or not the rate of urine formation by the kidney could be changed in the curarized rat rewarded by electrical stimulation of the brain.20 A catheter, permanently inserted, was used to prevent accumulation of urine by the bladder, and the rate of urine formation was measured by an electronic device for counting minute drops. In order to secure a rate of urine formation fast enough so that small changes could be promptly detected and rewarded, the rats were kept constantly loaded with water through infusion by way of a catheter permanently inserted in the jugular vein.
All of the seven rats rewarded when the intervals between times of urine-drop formation lengthened showed decreases in the rate of urine formation, and all of the seven rats rewarded when these intervals shortened showed increases in the rate of urine formation. For both groups the changes were highly reliable (P < .001).
In order to determine how the change in rate of urine formation was achieved, certain additional measures were taken. As the set of bars at left in Fig. 7 shows, the rate of filtration, measured by means of 14C4labeled inulin, increased when increases in the rate of urine formation were rewarded and decreased when decreases in the rate were rewarded. Plots of the correlations showed that the changes in the rates of filtration and urine formation were not related to changes in either blood pressure or heart rate.
The middle set of bars in Fig. 7 shows that the rats rewarded for increases in the rate of urine formation had an increased rate of renal blood flow, as measured by 3H-p-aminohippuric acid, and that those rewarded for decreases had a decreased rate of renal blood flow. Since these changes in blood flow were not accompanied by changes in general blood pressure or in heart rate, they must have been achieved by vasomotor changes of the renal arteries. That these vasomotor changes were at least somewhat specific is shown by the fact that vasomotor responses of the tail, as measured by a photoelectric plethysmograph, did not differ for the two groups of rats.
The set of bars at right in Fig. 7 shows that when decreases in rate of urine formation were rewarded, a more concentrated urine, having higher osmolarity, was formed. Since the slower passage of urine through the tubules would afford more opportunity for reabsorption of water, this higher concentration does not necessarily mean an increase in the secretion of antidiuretic hormone. When an increased rate of urine formation was rewarded, the urine did not become more diluted—that is, it showed no decrease in osmolarity; therefore, the increase in rate of urine formation observed in this experiment cannot be accounted for in terms of an inhibition of the secretion of antidiuretic hormone.
From the foregoing results it appears that the learned changes in urine formation in this experiment were produced primarily by changes in the rate of filtration, which, in turn, were produced primarily by changes in the rate of blood flow through the kidneys.
In the next experiment, Carmona, Demierre, and I used a photoelectric plethysmograph to measure changes, presumably in the amount of blood, in the stomach wall.21 In an operation performed under anesthesia, a small glass tube, painted black except for a small spot, was inserted into the rat’s stomach. The same tube was used to hold the stomach wall against a small glass window inserted through the body wall. The tube was left in that position. After the animal had recovered, a bundle of optical fibers could be slipped snugly into the glass tube so that the light beamed through it would shine out through the unpainted spot in the tube inside the stomach, pass through the stomach wall, and be recorded by a photocell on the other side of the glass window Preliminary tests indicated that, as would be expected, when the amount of blood in the stomach wall increased, less light would pass through. Other tests showed that stomach contractions elicited by injections of insulin did not affect the amount of light transmitted.
In the main experiment we rewarded curarized rats by enabling them to avoid or escape from mild electric shocks. Some were rewarded when the amount of light that passed through the stomach wall increased, while others were rewarded when the amount decreased. Fourteen of the 15 rats showed changes in the rewarded direction. Thus, we demonstrated that the stomach wall, under the control of the autonomic nervous system, can be modified by instrumental learning. There is strong reason to believe that the learned changes were achieved by vasomotor responses affecting the amount of blood in the stomach wall or mucosa, or in both.
In another experiment, Carmona22 showed that stomach contractions can be either increased or decreased by instrumental learning.
It is obvious that learned changes in the blood supply of internal organs can affect their functioning—as, for example, the rate at which urine was formed by the kidneys was affected by changes in the amount of blood that flowed through them. Thus, such changes can produce psychosomatic symptoms. And if the learned changes in blood supply can be specific to a given organ, the symptom will occur in that organ rather than in another one.
PERIPHERAL VASOMOTOR RESPONSES
Having investigated the instrumental learning of internal vasomotor responses, we next studied the learning of peripheral ones. In the first experiment, the amount of blood in the tail of a curarized rat was measured by a photoelectric plethysmograph, and changes were rewarded by electrical stimulation of the brain.23 All of the four rats rewarded for vasoconstriction showed that response, and, at the same time, their average core temperature, measured rectally, decreased from 98.9° to 97.9°F. All of the four rats rewarded for vasodilatation showed that response and, at the same time, their average core temperature increased from 99.9° to 101°F. The vasomotor change for each individual rat was reliable beyond the P < .01 level, and the difference in change in temperature between the groups was reliable beyond the .01 level. The direction of the change in temperature was opposite to that which would be expected from the heat conservation caused by peripheral vasoconstriction or the heat loss caused by peripheral vasodilatation. The changes are in the direction which would be expected if the training had altered the rate of heat production, causing a change in temperature which, in turn, elicited the vasomotor response.
The next experiment was designed to try to determine the limits of the specificity of vasomotor learning. The pinnae of the rat’s ears were chosen because the blood vessels in them are believed to be innervated primarily, and perhaps exclusively, by the sympathetic branch of the autonomic nervous system, the branch that Cannon believed always fired nonspecifically as a unit.23 But Cannon’s experiments involved exposing cats to extremely strong emotion-evoking stimuli, such as barking dogs, and such stimuli will also evoke generalized activity throughout the skeletal musculature. Perhaps his results reflected the way in which sympathetic activity was elicited, rather than demonstrating any inherent inferiority of the sympathetic nervous
In order to test this interpretation, DiCara and 124 put photo- cells on both ears of the curarized rat and connected them to a bridge circuit so that only differences in the vasomotor responses of the two ears were rewarded by brain stimulation. We were somewhat surprised and greatly delighted to find that this experiment actually worked. The results are summarized in Fig. 8. Each of the six rats rewarded for relative vasodilatation of the left ear showed that response, while each of the six rats rewarded for relative vasodilatation of the right ear showed that response. Recordings from the right and left forepaws showed little if any change in vasomotor response.
It is clear that these results cannot be by-products of changes in either heart rate or blood pressure, as these would be expected to affect both ears equally. They show either that vasomotor responses mediated by the sympathetic nervous system are capable of much greater specificity than has previously been believed, or that the innervation of the blood vessels in the pinnae of the ears is not restricted almost exclusively to sympathetic-nervous system components, as has been believed, and involves functionally significant parasympathetic components. In any event, the changes in the blood flow certainly were surprisingly specific. Such changes in blood flow could account for specific psychosomatic symptoms.
BLOOD PRESSURE INDEPENDENT
OF HEART RATE
Although changes in blood pressure were not induced as by-products of rewarded changes in the rate of urine formation, another experiment on curarized rats showed that, when changes in systolic blood pressure are specifically reinforced, they can be learned.25 Blood pressure was recorded by means of a catheter permanently inserted into the aorta, and the reward was avoidance of, or escape from, mild electric shock. All seven rats rewarded for increases in blood pressure showed further increases, while all seven rewarded for decreases showed decreases, each of the changes, which were in opposite directions, being reliable beyond the P < .01 level. The increase was from 139 mm-Hg, which happens to be roughly comparable to the normal systolic blood presssure of an adult man, to 170 mm-Hg, which is on the borderline of abnormally high blood pressure in man.
Each experimental animal was “yoked” with a curarized partner, maintained on artificial respiration and having shock electrodes on its tail wired in series with electrodes on the tail of the experimental animal, so that it received exactly the same electric shocks and could do nothing to escape or avoid them. The yoked controls for both the increase-rewarded and the decrease-rewarded groups showed some elevation in blood pressure as an unconditioned effect of the shocks. By the end of training, in contrast to the large difference in the blood pressures of the two groups specifically rewarded for changes in opposite directions, there was no difference in blood pressure between the yoked control partners for these two groups. Furthermore, the increase in blood pressure in these control groups was reliably less (P < .01) than that in the group specifically rewarded for increases. Thus, it is clear that the reward for an increase in blood pressure produced an additional increase over and above the effects of the shocks per se, while the reward for a decrease was able to overcome the unconditioned increase elicited by the shocks.
For none of the four groups was there a significant change in heart rate or in temperature during training: there were no significant differences in these measures among the groups. Thus, the learned change was relatively specific to blood pressure.
TRANSFER FROM HEART RATE
TO SKELETAL AVOIDANCE
Although visceral learning can be quite specific, especially if only a specific response is rewarded, as was the case in the experiment on the two ears, under some circumstances it can involve a more generalized effect.
In handling the rats that had just recovered from curarization, DiCara noticed that those that had been trained, through the avoidance or escape reward, to increase their heart rate were more likely to squirm, squeal, defecate, and show other responses indicating emotionality than were those that had been trained to reduce their heart rate. Could instrumental learning of heart- rate changes have some generalized effects, perhaps on the level of emotionality, which might affect the behavior in a different avoidance-learning situation? In order to look for such an effect, DiCara and Weiss26 used a modified shuttle avoidance apparatus. In this apparatus, when a danger signal is given, the rat must run from compartment A to compartment B. If he runs fast enough, he avoids the shock; if not, he must run to escape it. The next time the danger signal is given, the rat must run in the opposite direction, from B to A.
Other work had shown that learning in this apparatus is an inverted U-shaped function of the strength of the shocks, with shocks that are too strong eliciting emotional behavior instead of running. DiCara and Weiss trained their rats in this apparatus with a level of shock that is approximately optimum for naive rats of this strain. They found that the rats that had been rewarded for decreasing their heart rate learned well, but that those that had been rewarded for increasing their heart rate learned less well, as if their emotionality had been increased. The difference was statistically reliable (P < .001). This experiment clearly demonstrates that training a visceral response can affect the subsequent learning of a skeletal one, but additional work will be required to prove the hypothesis that training to increase heart rate increases emotionality.
VISCERAL LEARN ING WITHOUT CURARE
Thus far, in all of the experiments except the one on teaching thirsty dogs to salivate, the initial training was given when the animal was under the influence of curare. All of the experiments, except the one on salivation, have produced surprisingly rapid learning—definitive results within 1 or 2 hours. Will learning in the normal, noncurarized state be easier, as we originally thought it should be, or will it be harder, as the experiment on the noncurarized dogs suggests? DiCara and I have started to get additional evidence on this problem. We have obtained clearcut evidence that rewarding (with the avoidance or escape reward) one group of freely moving rats for reducing heart rate and rewarding another group for increasing heart rate produces a difference between the two groups.27 That this difference was not due to the indirect effects of the overt performance of skeletal responses is shown by the fact that it persisted in subsequent tests during which the rats were paralyzed by curare. And, on subsequent retraining without curare, such differences in activity and respiration as were present earlier in training continued to decrease, while the differences in heart rate continued to increase. It seems extremely unlikely that, at the end of training, the highly reliable differences in heart rate (t = 7.2; P < .0001) can be explained by the highly unreliable differences in activity and respiration (t = .07 and 0.2, respectively).
Although the rats in this experiment showed some learning when they were trained initially in the noncurarized state, this learning was much poorer than that which we have seen in our other experiments on curarized rats. This is exactly the opposite of my original expectation, but seems plausible in the light of hindsight. My hunch is that paralysis by curare improved learning by eliminating sources of distraction and variability. The stimulus situation was kept more constant, and confusing visceral fluctuations induced indirectly by skeletal movements were eliminated.
LEARNED CHANGES IN BRAIN WAVES
Encouraged by success in the experiments on the instrumental learning of visceral responses, my colleagues and I have attempted to produce other unconventional types of learning. Electrodes placed on the skull or, better yet, touching the surface of the brain record summative effects of electrical activity over a considerable area of the brain. Such electrical effects are called brain waves, and the record of them is called an electroencephalogram. When the animal is aroused, the electroencephalogram consists of fast, low-voltage activity; when the animal is drowsy or sleeping normally, the electroencephalogram consists of considerably slower, higher-voltage activity. Carmona attempted to see whether this type of brain activity, and the state of arousal accompanying it, can be modified by direct reward of changes in the brain activity.28’29
The subjects of the first experiment were freely moving cats. In order to have a reward that was under complete control and that did not require the cat to move, Carmona used direct electrical stimulation of the medial forebrain bundle, which is a rewarding area of the brain. Such stimulation produced a slight lowering in the average voltage of the electroencephalogram and an increase in behavioral arousal. In order to provide a control for these and any other unlearned effects, he rewarded one group for changes in the direction of high-voltage activity and another group for changes in the direction of low-voltage activity.
Both groups learned. The cats rewarded for high-voltage activity showed more high-voltage slow waves and tended to sit like sphinxes, staring out into space. The cats rewarded for low- voltage activity showed much more low-voltage fast activity, and appeared to be aroused, pacing restlessly about, sniffing, and looking here and there. It was clear that this type of training had modified both the character of the electrical brain waves and the general level of the behavioral activity. It was not clear, however, whether the level of arousal of the brain was directly modified and hence modified the behavior; whether the animals learned specific items of behavior which, in turn, modified the arousal of the brain as reflected in the electroencephalogram; or whether both types of learning were occurring simultaneously.
In order to rule out the direct sensory consequences of changes in muscular tension, movement, and posture, Carmona performed the next experiment on rats that had been paralyzed by means of curare. The results, given in Fig. 9, show that both rewarded groups showed changes in the rewarded direction; that a subsequent nonrewarded rest increased the number of high- voltage responses in both groups; and that, when the conditions of reward were reversed, the direction of change in voltage was reversed.
At present we are trying to use similar techniques to modify the functions of a specific part of the vagal nucleus, by recording and specifically rewarding changes in the electrical activity there. Preliminary results suggest that this is possible. The next step is to investigate the visceral consequences of such modification. This kind of work may open up possibilities for modifying the activity of specific parts of the brain and the functions that they control. In some cases, directly rewarding brain activity may be a more convenient or more powerful technique than rewarding skeletal or visceral behavior. It also may be a new way to throw light on the functions of specific parts of the brain.30
HUMAN VISCERAL LEARNING
Another question is that of whether people are capable of instrumental learning of visceral responses. I believe that in this respect they are as smart as rats. But, as a recent critical review by Katkin and Murray31 points out, this has not yet been completely proved. These authors have comprehensively summarized the recent studies reporting successful use of instrumental training to modify human heart rate, vasomotor responses, and the galvanic skin response. Because of the difficulties in subjecting human subjects to the same rigorous controls, including deep paralysis by means of curare, that can be used with animal subjects, one of the most serious questions about the results of the human studies is whether the changes recorded represent the true instrumental learning of visceral responses or the unconscious learning of those skeletal responses that can produce visceral reactions. However, the able investigators who have courageously challenged the strong traditional belief in the inferiority of the autonomic nervous system with experiments at the more difficult but especially significant human level are developing ingenious controls, including demonstrations of the specificity of the visceral change, so that their cumulative results are becoming increasingly impressive.
POSSIBLE ROLE IN HOMEOSTASIS
The functional utility of instrumental learning by the cerebrospinal nervous system under the conditions that existed during mammalian evolution is obvious. The skeletal responses mediated by the cerebrospinal nervous system operate on the external environment, so that there is survival value in the ability to learn responses that bring rewards such as food, water, or escape from pain. The fact that the responses mediated by the autonomic nervous system do not have such direct action on the external environment was one of the reasons for believing that they are not subject to instrumental learning. Is the learning ability of the autonomic nervous system something that has no normal function other than that of providing my students with subject matter for publications? Is it a mere accidental by-product of the survival value of cerebrospinal learning, or does the instrumental learning of autonomically mediated responses have some adaptive function, such as helping to maintain that constancy of the internal environment called homeostasis?
In order for instrumental learning to function homeostatically, a deviation away from the optimum level will have to function as a drive to motivate learning, and a change toward the optimum level will have to function as a reward to reinforce the learning of the particular visceral response that produced the corrective change.
When a mammal has less than the optimum amount of water in his body, this deficiency serves as a drive of thirst to motivate learning; the overt consummatory response of drinking functions as a reward to reinforce the learning of the particular skeletal responses that were successful in securing the water that restored the optimum level. But is the consummatory response essential? Can restoration of an optimum level by a glandular response function as a reward?
In order to test for the possible rewarding effects of a glandular response, DiCara, Wolf, and J32 injected albino rats with antidiuretic hormone (ADH) if they chose one arm of a T-maze and with the isotonic saline vehicle if they chose the other, distinctively different, arm. The ADH permitted water to be reabsorbed in the kidney, so that a smaller volume of more concen trated urine was formed. Thus, for normal rats loaded in advance with H2O, the ADH interfered with the excess-water excretion required for the restoration of homeostasis, while the control injection of isotonic saline allowed the excess water to be excreted. And, indeed, such rats learned to select the side of the maze that assured them an injunction of saline so that their glandular response could restore homeostasis.
Conversely, for rats with diabetes insipidus, loaded in advance with hypertonic NaCl, the homeostatic effects of the same two injections were reversed; the ADH, causing the urine to be more concentrated, helped the rats to get rid of the excess NaCl, while the isotonic saline vehicle did not. And, indeed, a group of rats of this kind learned the opposite choice of selecting the ADH side of the maze. As a further control on the effects of the ADH per se, normal rats which had not been given H2O or NaCl exhibited no learning. This experiment showed that an excess of either H2O or NaCl functions as a drive and that the return to the normal concentration produced by the appropriate response of a gland, the kidney, functions as a reward.
When we consider the results of this experiment together with those of our experiments showing that glandular and visceral responses can be instrumentally learned, we will expect the animal to learn those glandular and visceral responses mediated by the central nervous system that promptly restore homeostasis after any considerable deviation. Whether or not this theoretically possible learning has any practical significance will depend on whether or not the innate homeostatic mechanisms control the levels closely enough to prevent any deviations large enough to function as a drive from occurring. Even if the innate control should be accurate enough to preclude learning in most cases, there remains the intriguing possibility that, when pathology interferes with innate control, visceral learning is available as a supplementary mechanism.
IMPLICATIONS AND SPECULATIONS
We have seen how the instrumental learning of visceral responses suggests a new possible homeostatic mechanism worthy of further investigation. Such learning also shows that the autonomic nervous system is not as inferior as has been so widely and firmly believed. It removes one of the strongest arguments for the hypothesis that there are two fundamentally different mechanisms of learning, involving different parts of the nervous system.
Cause of Psychosomatic Symptoms
Similarly, evidence of the instrumental learning of visceral responses removes the main basis for assuming that the psychosomatic symptoms that involve the autonomic nervous system are fundamentally different from those functional symptoms, such as hysterical ones, that involve the cerebrospinal nervous system. Such evidence allows us to extend to psychosomatic symptoms the type of learning-theory analysis that Dollard and I 7’33 have applied to other symptoms.
For example, suppose a child is terror-striken at the thought of going to school in the morning because he is completely unprepared for an important examination. The strong fear elicits a variety of fluctuating autonomic symptoms, such as a queasy stomach at one time and pallor and faintness at another; at this point his mother, who is particularly concerned about cardiovascular symptoms, says, “You are sick and must stay home.” The child feels a great relief from fear, and this reward should reinforce the cardiovascular responses producing pallor and faintness. If such experiences are repeated frequently enough, the child, theoretically, should learn to respond with that kind of symptom. Similarly, another child whose mother ignored the vasomotor responses but was particularly concerned by signs of gastric distress would learn the latter type of symptom. I want to emphasize, however, that we need careful clinical research to determine how frequently, if at all, the social conditions sufficient for such theoretically possible learning of visceral symptoms actually occur. Since a given instrumental response can be reinforced by a considerable variety of rewards, and by one reward on one occasion and a different reward on another, the fact that glandular and visceral responses can be instrumentally learned opens up many new theoretical possibilities for the reinforcement of psychosomatic symptoms.
Furthermore, we do not yet know how severe a psychosomatic effect can be produced by learning. While none of the 40 rats rewarded for speeding up their heart rates have died in the course of training under curarization, 7 of the 40 rats rewarded for slowing down their heart rates have died. This statistically reliable difference (chi square = 5.6, P < .02) is highly suggestive, but it could mean that training to speed up the heart helped the rats resist the stress of curare rather than that the reward for slowing down the heart was strong enough to overcome innate regulatory mechanisms and induce sudden death. In either event the visceral learning had a vital effect. At present, DiCara and I are trying to see whether or not the learning of visceral responses can be carried far enough in the noncurarized animal to produce physical damage. We are also investigating the possibility that there may be a critical period in early infancy during which visceral learning has particularly intense and long-lasting effects.
Individual and Cultural Differences
It is possible that, in addition to producing psychosomatic symptoms in extreme cases, visceral learning can account for certain more benign individual and cultural differences. Lacey and Lacey34 have shown that a given individual may have a tendency, which is stable over a number of years, to respond to a variety of different stresses with the same profile of autonomic responses, while other individuals may have statistically reliable tendencies to respond with different profiles. It now seems possible that differential conditions of learning may account for at least some of these individual differences in patterns of autonomic response.
Conversely, such learning may account also for certain instances in which the same individual responds to the same stress in different ways. For example, a small boy who receives a severe bump in rough-and-tumble play may learn to inhibit the secretion of tears in this situation since his peer group will punish crying by calling it “sissy.” But the same small boy may burst into tears when he gets home to his mother, who will not punish weeping and may even reward tears with sympathy.
Similarly, it seems conceivable that different conditions of reward by a culture different from our own may be responsible for the fact that Homer’s adult heroes so often “let the big tears fall.” Indeed, a former colleague of mine, Herbert Barry III, has analyzed cross-cultural data and found that the amount of crying reported for children seems to be related to the way in which the society reacts to their tears.35
have emphasized the possible role of learning in producing the observed individual
differences in visceral responses to stress, which in extreme cases may result
in one type of psychosomatic symptom in one person and a different type in
another. Such learning does not, of course, exclude innate individual differences
in the susceptibility of different organs. In fact, given social conditions
under which any form of illness will be rewarded, the
The experimental work on animals has developed a powerful technique for using instrumental learning to modify glandular and visceral responses. The improved training technique consists of moment-to-moment recording of the visceral function and immediate reward, at first, of very small changes in the desired direction and then of progressively larger ones. The success of this technique suggests that it should be able to produce therapeutic changes. If the patient who is highly motivated to get rid of a symptom understands that a signal, such as a tone, indicates a change in the desired direction, that tone could serve as a powerful reward. Instruction to try to turn the tone on as often as possible and praise for success should increase the reward. As patients find that they can secure some control of the symptom, their motivation should be strengthened. Such a procedure should be well worth trying on any symptom, functional or organic, that is under neural control, that can be continuously monitored by modern instrumentation, and for which a given direction of change is clearly indicated medically—for example, cardiac arrhythmias, spastic colitis, asthma, and those cases of high blood pressure that are not essential compensation for kidney damage.37 The obvious cases to begin with are those in which drugs are ineffective or contraindicated. In the light of the fact that our animals learned so much better when under the influence of curare and transferred their training so well to the normal, nondrugged state, it should be worth while to try to use hypnotic suggestion to achieve similar results by enhancing the reward effect of the signal indicating a change in the desired direction, by producing relaxation and regular breathing, and by removing interference from skeletal responses and distraction by irrelevant cues.
Engel and Melmon38 have reported encouraging results in the use of instrumental training to treat cardiac arrhythmias of organic origin. Randt, Korein, Carmona, and I have had some success in using the method described above to train epileptic patients in the laboratory to suppress, in one way or another, the abnormal paroxysmal spikes in their electroencephalogram. My colleagues and I are hoping to try learning therapy for other symptoms—for example, the rewarding of high-voltage electroencephalograms as a treatment for insomnia. While it is far too early to promise any cures, it certainly will be worth while to investigate thoroughly the therapeutic possibilities of improved instrumental training techniques.
B. F. SKINNER, Harvard University
This is the history of a crackpot idea, born on the wrong side of the tracks intellectually speaking, but eventually vindicated in a sort of middle class respectability. It is the story of a proposal to use living organisms to guide missiles—of a research program during World War II called “Project Pigeon” and a peacetime continuation at the Naval Research Laboratory called “ORCON,” from the words “organic control.” Both of these programs have now been declassified.
Man has always made use of the sensory capacities of animals, either because they are more acute than his own or more convenient. The watchdog probably hears better than his master and in any case listens while his master sleeps. As a detecting system the dog’s ear comes supplied with an alarm (the dog need not be taught to announce the presence of an intruder), but special forms of reporting are sometimes set up. The tracking behavior of the bloodhound and the pointing of the hunting dog are usually modified to make them more useful. Training is sometimes quite explicit. It is said that seagulls were used to detect submarines in the English Channel during World War I.
The British sent their own submarines through the Channel releasing food to the surface. Gulls could see the submarines from the air and learned to follow them, whether they were British or German. A flock of gulls, spotted from the shore, took on special significance. In the seeing-eye dog the repertoire of artificial signaling responses is so elaborate that it has the conventional character of the verbal interchange between man and man.
The detecting and signaling systems of lower organisms have a special advantage when used with explosive devices which can be guided toward the objects they are to destroy, whether by land, sea, or air. Homing systems for guided missiles have now been developed which sense and signal the position of a target by responding to visible or invisible radiation, noise, radar reflections, and so on. These have not always been available, and in any case a living organism has certain advantages. It is almost certainly cheaper and more compact and, in particular, is especially good at responding to patterns and those classes of patterns called “concepts.” The lower organism is not used because it is more sensitive than man—after all, the kamikaze did very well—but because it is readily expendable.
The ethical question of our right to convert a lower creature into an unwitting hero is a peacetime luxury. There were bigger questions to be answered in the late thirties. A group of men had come into power who promised, and eventually accomplished, the greatest mass murder in history. In 1939 the city of Warsaw was laid waste in an unprovoked bombing, and the airplane emerged as a new and horrible instrument of war against which only the feeblest defenses were available. Project Pigeon was conceived against that background. It began as a search for a homing device to be used in a surface-to-air guided missile as a defense against aircraft. As the balance between offensive and defensive weapons shifted, the direction was reversed, and the system was to be tested first in an air-to-ground missile called the “Pelican.” Its name is a useful reminder of the state of the missile art in America at that time. Its detecting and servomechanisms took up so much space that there was no room for explosives: hence the resemblance to the pelican “whose beak can hold more than its belly can.” My title is perhaps now clear.
At the University of Minnesota in the spring of 1940 the capacity of the pigeon to steer toward a target was tested with a moving hoist. The pigeon, held in a jacket and harnessed to a block, was immobilized except for its neck and head. It could eat grain from a dish and operate a control system by moving its head in appropriate directions. Movement of the head operated the motors of the hoist. The bird could ascend by lifting its head, descend by lowering it, and travel from side to side by moving appropriately. The whole system, mounted on wheels, was pushed across a room toward a bull’s-eye on the far wall. During the approach the pigeon raised or lowered itself and moved from side to side in such a way as to reach the wall in position to eat grain from the center of the bull’s-eye. The pigeon learned to reach any target within reach of the hoist, no matter what the starting position and during fairly rapid approaches.
The experiment was
shown to John T. Tate, a physicist, then Dean of the Graduate School at the
University of Minnesota, who brought it to the attention of R.
C. Tolman, one of a group of scientists engaged in early defense activities.
The result was the first of a long series of rejections. The proposal “did
not warrant further development at the time.” The project was accordingly
allowed to lapse. On December 7, 1941, the situation was suddenly restructured;
and, on the following day, with the help of Keller Breland, then a graduate
student at Minnesota, further work was planned. A simpler harnessing system
could be used if the bomb were to rotate slowly during its descent, when the
pigeon would need to steer in only one dimension: from side to side. We built
an apparatus in which a harnessed pigeon was lowered toward a large revolving
turntable across which a target was driven according to contacts made by the
bird during its descent. It was not difficult to train a pigeon to “hit” small
ship models during fairly rapid descents. We made a demonstration film showing
hits on various kinds of targets, and two psychologists then engaged in the
war effort in Washington, Charles Bray and Leonard Carmichael, undertook to
look for government support. Tolman, then at the Office of Scientific Research
The project lapsed again and would probably have been abandoned if it had not been for a young man whose last name I have ungratefully forgotten, but whose first name—Victor—we hailed as a propitious sign. His subsequent history led us to refer to him as Vanquished; and this, as it turned out, was a more reliable omen. Victor walked into the Department of Psychology at Minnesota one day in the summer of 1942 looking for an animal psychologist. He had a scheme for installing dogs in antisubmarine torpedoes. The dogs were to respond to faint acoustic signals from the submarine and to steer the torpedo toward its goal. He wanted a statement from an animal psychologist as to its feasibility. He was understandably surprised to learn of our work with pigeons but seized upon it eagerly, and citing it in support of his contention that dogs could be trained to steer torpedoes he went to a number of companies in Minneapolis. His project was rejected by everyone he approached; but one company, General Mills, Inc., asked for more information about our work with pigeons. We described the project and presented the available data to Arthur D. Hyde, Vice-President in Charge of Research. The company was not looking for new products, but Hyde thought that it might, as a public service, develop the pigeon system to the point at which a governmental agency could be persuaded to take over.
Breland and I moved into the top floor of a flour mill in Minneapolis and with the help of Norman Guttman, who had joined the project, set to work on further improvements. It had been difficult to induce the pigeon to respond to the small angular displacement of a distant target. It would start working dangerously late in the descent. Its natural pursuit behavior was not appropriate to the characteristics of a likely missile. A new system was therefore designed. An image of the target was projected on a translucent screen as in a camera obscura. The pigeon, held near the screen, was reinforced for pecking at the image on the screen. The guiding signal was to be picked up from the point of contact of screen and beak.
In an early arrangement the screen was a translucent plastic plate forming the larger end of a truncated cone bearing a lens at the smaller end. The cone was mounted, lens down, in a gimbal bearing. An object within range threw its image on the translucent screen; and the pigeon, held vertically just above the plate, pecked the image. When a target was moved about within range of the lens, the cone continued to point to it. In another apparatus a translucent disk, free to tilt slightly on gimbal bearings, closed contacts operating motors which altered the position of a large field beneath the apparatus. Small cutouts of ships and other objects were placed on the field. The field was constantly in motion, and a target would go out of range unless the pigeon continued to control it. With this apparatus we began to study the pigeon’s reactions to various patterns and to develop sustained steady rates of responding through the use of appropriate schedules of reinforcement, the reinforcement being a few grains occasionally released onto the plate. By building up large extinction curves a target could be tracked continuously for a matter of minutes without reinforcement. We trained pigeons to follow a variety of land and sea targets, to neglect large patches intended to represent clouds of flak, to concentrate on one target while another was in view, and so on. We found that a pigeon could hold the missile on a particular street intersection in an aerial map of a city. The map which came most easily to hand was of a city which, in the interests of international relations, need not be identified. Through appropriate schedules of reinforcement it was possible to maintain longer uninterrupted runs than could conceivably be required by a missile.
We also undertook a more serious study of the pigeon’s behavior, with the help of W. K. Estes and Marion Breland who joined the project at this time. We ascertained optimal conditions of deprivation, investigated other kinds of deprivations, studied the effect of special reinforcements (for example, pigeons were said to find hemp seed particularly delectable), tested the effects of energizing drugs and increased oxygen pressures, and so on. We differentially reinforced the force of the pecking response and found that pigeons could be induced to peck so energetically that the base of the beak became inflamed. We investigated the effects of extremes of temperature, of changes in atmospheric pressure, of accelerations produced by an improvised centrifuge, of increased carbon dioxide pressure, of increased and prolonged vibration, and of noises such as pistol shots. (The birds could, of course, have been deafened to eliminate auditory distractions, but we found it easy to maintain steady behavior in spite of intense noises and many other distracting conditions using the simple process of adaptation.) We investigated optimal conditions for the quick development of discriminations and began to study the pigeon’s reactions to patterns, testing for induction from a test figure to the same figure inverted, to figures of different sizes and colors, and to figures against different grounds. A simple device using carbon paper to record the points at which a pigeon pecks a figure showed a promise which has never been properly exploited.
We made another demonstration film and renewed our contact with the Office of Scientific Research and Development. An observer was sent to Minneapolis, and on the strength of his report we were given an opportunity to present our case in Washington in February 1943. At that time we were offering a homing device capable of reporting with an on-off signal the orientation of a missile toward various visual patterns. The capacity to respond to pattern was, we felt, our strongest argument, but the fact that the device used only visible radiation (the same form of information available to the human bombardier) made it superior to the radio controlled missiles then under development because it was resistant to jamming. Our film had some effect. Other observers were sent to Minneapolis to see the demonstration itself. The pigeons, as usual, behaved beautifully. One of them held the supposed missile on a particular intersection of streets in the aerial map for five minutes although the target would have been lost if the pigeon had paused for a second or two. The observers returned to Washington, and two weeks later we were asked to supply data on (a) the population of pigeons in the United States (fortunately, the census bureau has some figures) and (b) the accuracy with which pigeons struck a point on a plate. There were many arbitrary conditions to be taken into account in measuring the latter, but we supplied possibly relevant data. At long last, in June 1943, the Office of Scientific Research and Development awarded a modest contract to General Mills, Inc. to “develop a homing device.”
At that time we were given some information about the missile the pigeons were to steer. The Pelican was a wing steered glider, still under development and not yet successfully steered by any homing device. It was being tested on a target in New Jersey consisting of a stirrup shaped pattern bulldozed out of the sandy soil near the coast. The white lines of the target stood out clearly against brown and green cover. Colored photographs were taken from various distances and at various angles, and the verisimilitude of the reproduction was checked by flying over the target and looking at its image in a portable camera obscura.
Because of security restrictions we were given only very rough specifications of the signal to be supplied to the controlling system in the Pelican. It was no longer to be simply on-off; if the missile was badly off target, an especially strong correcting signal was needed. This meant that the quadrant-contact system would no longer suffice. But further requirements were left mainly to our imagination. The General Mills engineers were equal to this difficult assignment. With what now seems like unbelievable speed, they designed and constructed a pneumatic pickup system giving a graded signal. A lens in the nose of the missile threw an image on a translucent plate within reach of the pigeon in a pressure sealed chamber. Four air valves resting against the edges of the plate were jarred open momentarily as the pigeon pecked. The valves at the right and left admitted air to chambers on opposite sides of one tambour, while the valves at the top and bottom admitted air to opposite sides of another. Air on all sides was exhausted by a Venturi cone on the side of the missile. When the missile was on target, the pigeon pecked the center of the plate, all valves admitted equal amounts of air, and the tambours remained in neutral positions. But if the image moved as little as a quarter of an inch off-center, corresponding to a very smallangular displacement of the target, more air was admitted by the valves on one side, and the resulting displacement of the tambours sent appropriate correcting orders directly to the servosystem.
The device required no materials in short supply, was relatively foolproof, and delivered a graded signal. It had another advantage. By this time we had begun to realize that a pigeon was more easily controlled than a physical scientist serving on a committee. It was very difficult to convince the latter that the former was an orderly system. We therefore multiplied the probability of success by designing a multiple bird unit. There was adequate space in the nose of the Pelican for three pigeons each with its own lens and plate. A net signal could easily be generated. The majority vote of three pigeons offered an excellent guarantee against momentary pauses and aberrations. (We later worked out a system in which the majority took on a more characteristically democratic function. When a missile is falling toward two ships at sea, for example, there is no guarantee that all three pigeons will steer toward the same ship. But at least two must agree, and the third can then be punished for his minority opinion. Under proper contingencies of reinforcement a punished bird will shift immediately to the majority view. When all three are working on one ship, any defection is immediately punished and corrected.)
The arrangement in the nose of the Pelican is shown in Figure 3. Three systems of lenses and mirrors, shown at the left, throw images of the target area on the three translucent plates shown in the center. The ballistic valves resting against the edges of these plates and the tubes connecting them with the manifolds leading to the controlling tambours may be seen. A pigeon is being placed in the pressurized chamber at the right.
The General Mills engineers also built a simulator—a sort of Link trainer for pigeons—designed to have the steering characteristics of the Pelican, in so far as these had been communicated to us. Like the wing steered Pelican, the simulator tilted and turned side to side. When the three-bird nose was attached to it, the pigeons could be put in full control—the “loop could be closed”—and the adequacy of the signal tested under pursuit conditions. Targets were moved back and forth across the farwall of a room at prescribed speeds and in given patterns of oscillation, and the tracking response of the whole unit was studied quantitatively.
Meanwhile we continued our intensive study of the behavior of the pigeons. Looking ahead to combat use we designed methods for the mass production of trained birds and for handling large groups of trained subjects. We were proposing to train certain birds for certain classes of targets, such as ships at sea, while special squads were to be trained on special targets, photographs of which were to be obtained through reconnaissance. A large crew of pigeons would then be waiting for assignment, but we developed harnessing and training techniques which should have solved such problems quite esaily.
In a multiple unit trainer each box contains a jacketed pigeon held at an angle of 45° to the horizontal and perpendicular to an 8” x 8” translucent screen. A target area is projected on each screen. Two beams of light intersect at the point to be struck. All on-target responses of the pigeon are reported by the interruption of the crossed beams and by contact with the translucent screen. Only a four-inch, disk shaped portion of the field is visible to the pigeon at any time, but the boxes move slowly about the field, giving the pigeon an opportunity to respond to the target in all positions. The positions of all reinforcements are recorded to reveal any weak areas. A variable-ratio schedule is used to build sustained, rapid responding.
By December 1943, less than six months after the contract was awarded, we were ready to report to the Office of Scientific Research and Development. Observers visited the laboratory and watched the simulator follow a target about a room under the control of a team of three birds. They also reviewed our tracking data. The only questions which arose were the inevitable consequence of our lack of information about the signal required to steer the Pelican. For example, we had had to make certain arbitrary decisions in compromising between sensitivity of signal and its integration or smoothness. A high vacuum produced quick, rather erratic movements of the tambours, while a lower vacuum gave a sluggish but smooth signal. As it turned out, we had not chosen the best values in collecting our data, and in January 1944 the Office of Scientific Research and Development refused to extend the General Mills contract. The reasons given seemed to be due to misunderstandings or, rather, to lack of communication. We had already collected further data with new settings of the instruments, and these were submitted in a request for reconsideration.
We were given one more chance. We took our new data to the radiation lab at the Massachusetts Institute of Technology where they were examined by the servospecialists working on the Pelican controls. To our surprise the scientist whose task it was to predict the usefulness of the pigeon signal argued that our data were inconsistent with respect to phase lag and certain other characteristics of the signal. According to his equations, our device could not possibly yield the signals we reported. We knew, of course, that it had done so. We examined the supposed inconsistency and traced it, or so we thought, to a certain nonlinearity in our system. In pecking an image near the edge of the plate, the pigeon strikes a more glancing blow; hence the air admitted at the valves is not linearly proportional to the displacement of the target. This could be corrected in several ways: for example, by using a lens to distort radial distances. It was our understanding that in any case the signal was adequate to control the Pelican. Indeed, one servo authority, upon looking at graphs of the performance of the simulator, exclaimed: “This is better than radar!”
Two days later, encouraged by our meeting at MIT, we reached the summit. We were to present our case briefly to a committee of the country’s top scientists. The hearing began with a brief report by the scientist who had discovered the “inconsistency” in our data, and to our surprise he still regarded it as unresolved. He predicted that the signal we reported would cause the missile to “hunt” wildly and lose the target. But his prediction should have applied as well to the closed loop simulator. Fortunately another scientist was present who had seen the simulator performing under excellent control and who could confirm our report of the facts. But reality was no match for mathematics.
The basic difficulty, of course, lay in convincing a dozen distinguished physical scientists that the behavior of a pigeon could be adequately controlled. We had hoped to score on this point by bringing with us a demonstration. A small black box had a round translucent window in one end. A slide projector placed some distance away threw on the window an image of the New Jersey target. In the box, of course, was a pigeon—which, incidentally, had at that time been harnessed for 35 hours. Our intention was to let each member of the committee observe the response to the target by looking down a small tube; but time was not available for individual observation, and we were asked to take the top off the box. The translucent screen was flooded with so much light that the target was barely visible, and the peering scientists offered conditions, much more unfamiliar and threatening than those likely to be encountered in a missile. In spite of this the pigeon behaved perfectly, pecking steadily and energetically at the image of the target as it moved about on the plate. One scientist with an experimental turn of mind intercepted the beam from the projector. The pigeon stopped instantly. When the image again appeared, pecking began within a fraction of a second and continued at a steady rate.
It was a perfect performance, but it had just the wrong effect. One cart talk about phase lag in pursuit behavior and discuss mathematical predictions of hunting without reflecting too closely upon what is inside the black box. But the spectacle of a living pigeon carrying out its assignment, no matter how beautifully, simply reminded the committee of how utterly fantastic our proposal was. I will not say that the meeting was marked by unrestrained merriment, for the merriment was restrained. But it was there, and it was obvious that our case was lost.
Hyde closed our presentation with a brief summary: we were offering a homing device, unusually resistant to jamming, capable of reacting to a wide variety of target patterns, requiring no materials in short supply, and so simple to build that production could be started in 30 days. He thanked the committee, and we left. As the door closed behind us, he said to me: “Why don’t you go out and get drunk!”
Official word soon came: “Further prosecution of this project would seriously delay others which in the minds of the Division would have more immediate promise of combat application.” Possibly the reference was to a particular combat application at Hiroshima a year and a half later, when it looked for a while as if the need for accurate bombing had been eliminated for all time. In any case we had to show, for all our trouble, only a loftful of curiously useless equipment and a few dozen pigeons with a strange interest in a feature of the New Jersey coast. The equipment was scrapped, but 30 of the pigeons were kept to see how long they would retain the appropriate behavior.
In the years which followed there were faint signs of life. Winston Churchill’s personal scientific advisor, Lord Cherwell, learned of the project and “regretted its demise.” A scientist who had had some contact with the project during the war, and who evidently assumed that its classified status was not to be taken seriously, nude a good story out of it for the Atlantic Monthly, names being changed to protect the innocent. Other uses of animals began to be described. The author of the Atlantic Monthly story also published an account of the “incendiary bats.” Thousands of bats were to)e released over an enemy city, each carrying a small incendiary time bomb. The bats would take refuge, as is their custom, under eaves and in other out-of-the-way places; and shortly afterwards thousands of small fires would break out practically simultaneously. The scheme was never used because it was feared that it would be mistaken for germ warfare and might lead to retaliation in kind.
Another story circulating at the time told how the Russians trained dogs to blow up tanks. I have described the technique elsewhere (Skinner, 1956). A Swedish proposal to use seals to achieve the same end with submarines was not successful. The seals were to be trained to approach submarines to obtain fish attached to the sides. They were then to be released carrying magnetic mines in the vicinity of hostile submarines. The required training was apparently never achieved. I cannot vouch for the authenticity of probably the most fantastic story of this sort, but it ought to be recorded. The Russians were said to have trained sea lions to cut mine cables. A complicated device attached to the sea lion included a motor driven cable-cutter, a tank full of small fish, and a device which released a few fish into a muzzle covering the sea lion’s head. In order to eat, the sea lion had to find a mine cable and swim along side it so that the cutter was automatically triggered, at which point a few fish were released from the tank into the muzzle. When a given number of cables had been cut, both the energy of the cutting mechanism and the supply of fish were exhausted, and the sea lion received a special stimulus upon which it returned to its home base for special reinforcement and reloading.
The story of our own venture has a happy ending. With the discovery of German accomplishments in the field of guided missiles, feasible homing systems suddenly became very important. Franklin V. Taylor of the Naval Research Laboratory in Washington, D.C. heard about our project and asked for further details. As a psychologist Taylor appreciated the special capacity of living organisms to respond to visual patterns and was aware of recent advances in the control of behavior. More important, he was a skillful practitioner in a kind of control which our project had conspicuously lacked: he knew how to approach the people who determine the direction of research. He showed our demonstration film so often that it was completely worn out—but to good effect, for support was eventually found for a thorough investigation of “organic control” under the general title ORCON. Taylor also enlisted the support of engineers in obtaining a more effective report of the pigeon’s behavior. The translucent plate upon which the image of the target was thrown had a semiconducting surface, and the tip of the bird’s beak was covered with a gold electrode. A single contact with the plate sent an immediate report of the location of the target to the controlling mechanism. The work which went into this system contributed to the so-called Pick- off Display Converter developed as part of the Naval Data Handling System for human observers. It is no longer necessary for the radar operator to give a verbal report of the location of a pip on the screen. Like the pigeon, he has only to touch the pip with a special contact. (He holds the contact is his hand.)
At the Naval Research Laboratory in Washington the responses of pigeons were studied in detail. Average peck rate, average error rate, average hit rate, and so on were recorded under various conditions. The tracking behavior of the pigeon was analyzed with methods similar to those employed with human operators. Pattern perception was studied, including generalization from one pattern to another. A simulator was constructed in which the pigeon controlled an image projected by a moving picture film of an actual target: for example, a ship at sea as seen from a plane approaching at 600 miles per hour.
The publications from the Naval Research Laboratory which report this work (Chernikoff & Newlin, 1951; Conklin, Newlin, Taylor, & Tipton, 1953; Searle & Stafford, 1950; Taylor, 1949; White, 1952) provide a serious evaluation of the possibilities of organic control. Although in simulated tests a single pigeon occasionally loses a target, its tracking characteristics are surprisingly good. A three- or seven-bird unit with the same individual consistency should yield a signal with a reliability which is at least of the order of magnitude shown by other phases of guided missiles in their present stage of development. Moreover, in the seven years which have followed the last of these reports, a great deal of relevant information has been acquired The color vision of the pigeon is now thoroughly understood; its generalization along single properties of a stimulus has been recorded and analyzed; and the maintenance of behavior through scheduling of reinforcement has been drastically improved, particularly in the development of techniques for pacing responses for less erratic and steadier signals (Skinner, 1957). Tests made with the birds salvaged from the old Project Pigeon showed that even after six years of inactivity a pigeon will immediately and correctly strike a target to which it has been conditioned and will continue to respond for some time without reinforcement.
The use of living organisms in guiding missiles is, it seems fair to say, no longer a crackpot idea. A pigeon is an extraordinarily subtle and complex mechanism capable of performances which at the moment can be equalled by electronic equipment only of vastly greater weight and size, and it can be put to reliable use through the principles which have emerged from an experimental analysis of its behavior. But this vindication of our original proposal is perhaps the least important result. Something happened during the brief life of Project Pigeon which it has taken a long time to appreciate. The practical task before us created a new attitude toward the behavior of organisms. We had to maximize the probability that a given form of behavior would occur at a given time. We could not enjoy the luxury of observing one variable while allowing others to change in what we hoped was a random fashion. We had to discover all relevant variables and submit them to experimental control whenever possible. We were no doubt under exceptional pressure, but vigorous scientific research usually makes comparable demands. Psychologists have too often yielded to the temptation to be content with hypothetical processes and intervening variables rather than press for rigorous experimental control. It is often intellectual laziness rather than necessity which recommends the a posteriori statistical treatment of variation. Our task forced us to emphasize prior experimental control, and its success in revealing orderly processes gave us an exciting glimpse of the superiority of laboratory practice over verbal (including some kinds of mathematical) explanation.
THE CRACKPOT IDEA
If I were to conclude that crackpot ideas are to be encouraged, I should probably be told that psychology has already had more than its share of them. If it has, they have been entertained by the wrong people. Reacting against the excesses of psychological quackery, psychologists have developed an enormous concern for scientific respectability. They constantly warn their students against questionable facts and unsupported theories. As a result the usual PhD thesis is a model of compulsive cautiousness, advancing only the most timid conclusions thoroughly hedged about with qualifications. But it is just the man capable of displaying such admirable caution who needs a touch of uncontrolled speculation. Possibly a generous exposure to psychological science fiction would help. Project Pigeon might be said to support that view. Except with respect to its avowed goal, it was, as I see it, highly productive; and this was in large measure because my colleagues and I knew that, in the eyes of the world, we were crazy.
One virtue in crackpot ideas is that they breed rapidly and their progeny show extraordinary mutations. Everyone is talking about teaching machines nowadays, but Sidney Pressey can tell you what it was like to have a crackpot idea in that field 40 years ago. His self-testing devices and self-scoring test forms now need no defense, and psychomotor training devices have also achieved a substantial respectability. This did not, however, prepare the way for devices to be used in verbal instructions—that is, in the kinds of teaching which are the principal concern of our schools and colleges. (I can quote official opinion to that effect from high places.) Even five short years ago that kind of instruction by machine was still in the crackpot category. Now, there is a direct genetic connection between teaching machines and Project Pigeon. We had been forced to consider the mass education of pigeons. True, the scrap of wisdom we imparted to each was indeed small, but the required changes in behavior were similar to those which must be brought about in vaster quantities in human students. The techniques of shaping behavior and of bringing it under stimulus control which can be traced, as I have suggested elsewhere (Skinner, 1958), to a memorable episode on the top floor of that flour mill in Minneapolis needed only a detailed reformulation of verbal behavior to be directly applicable to education.
I am sure there is more to come. In the year which followed the termination of Project Pigeon I wrote Walden Two (Skinner, 1948), a utopian picture of a properly engineered society. Some psychotherapists might argue that I was suffering from personal rejection and simply retreated to a fantasied world where everything went according to plan, where there never was heard a discouraging word. But another explanation is, I think, equally plausible. That piece of science fiction was a declaration of confidence in a technology of behavior. Call it a crackpot idea if you will; it is one in which I have never lost faith. I still believe that the same kind of wide-ranging speculation about human affairs, supported by studies of compensating rigor, will make a substantial contribution toward that world of the future in which, among other things, there will be no need for guided missiles.
Psychotic Behaviour by Stimulus
Satiation and Food Reinforcement
T. AYLLON, Anna State Hospital, Illinois
This investigation demonstrates that extensive and effective behavioural modification is feasible without costly and lengthy psychotherapeutic treatment. In addition, the often heard notion that another undesirable type of behaviour will replace the original problem behaviour is not supported by the findings to date.
Until recently, the effective control of behaviour was limited to the animal laboratory. The extension of this control to human behaviour was made when Lindsley successfully adapted the methodology of operant conditioning to the study of psychotic behaviour (Lindsley, 1956). Following Lindsley’s point of departure other investigators have shown that, in its essentials, the behaviour of mental defective individuals (Orlando and Bijou, 1960), stutterers (Flanagan, Goldiamond and Azrin, 1958), mental patients (Hutchinson and Azrin, 1961), autistic (Ferster andDeMyer, 1961), and normal children (Bijou, 1961; Azrin and Lindsley, 1956) is subject to the same controls.
Despite the obvious implications of this research for applied settings there has been a conspicuous lag between the research findings and their application. The greatest limitation to the direct application of laboratory principles has been the absence of control over the subjects’ environment. Recently, however, a series of applications in a regulated psychiatric setting has clearly demonstrated the possiblities of behavioural modification (Ayllon and Michael, 1959; Ayllon and Haughton, 1962). Some of the behaviour studied has included repetitive and highly stereotyped responses such as complaining, pacing, refusal to eat, hoarding and many others.
What follows is a demonstration of behaviour techniques for the intensive individual treatment of psychotic behaviour. Specific pathological behaviour patterns of a single patient were treated by manipulating the patient’s environment.
The Experimental Ward and
Control Over the Reinforcement
This investigation was conducted in a mental hospital ward, the characteristics of which have been described elsewhere (Ayllon and Haughton, 1962). Briefly, this was a female ward to which only authorized personnel were allowed access. The ward staff was made up of psychiatric nurses and untrained aides who carried out the environmental manipulations under the direction of the experimenter. Using a time-sample technique, patients were observed daily every 30 minutes from 7:00 a.m. to 11:00
p.m. The dining room was the only place where food was available and entrance to the dining room could be regulated. Water was freely available at a drinking fountain on the ward. None of the patients had ground passes or jobs outside the ward.
The patient was a 47-year-old female diagnosed as a chronic schizophrenic. The patient had been hospitalized for 9 years. Upon studying the patient’s behaviour on the ward, it became apparent that the nursing staff2 spent considerable time caring for her. In particular, there were three aspects of her behaviour which seemed to defy solution. The first was stealing food. The second was the hoarding of the ward’s towels in her room. The third undesirable aspect of her behaviour consisted in her wearing excessive clothing, e.g., a half-dozen dresses, several pairs of stockings, sweaters, and so on.
In order to modify the patient’s behaviour systematically, each of these three types of behaviour (stealing food, hoarding, and excessive dressing) was treated separately.
Control of Stealing Food by Food Withdrawal
The patient had weighed over 250 pounds for many years. She ate the usual tray of food served to all patients, but, in addition, she stole food from the food counter and from other patients. Because the medical staff regarded her excessive weight as detrimental to her health, a special diet had been prescribed for her. However, the patient refused to diet and continued stealing food. In an effort to discourage the patient from stealing, the ward nurses had spent considerable time trying to persuade her to stop stealing food. As a last resort, the nurses would force her to return the stolen food.
To determine the extent of food stealing, nurses were instructed to record all behaviour associated with eating in the dining room. This record, taken for nearly a month, showed that the patient stole food during two thirds of all meals.
The traditional methods previously used to stop the patient from stealing food were discontinued. No longer were persuasion, coaxing, or coercion used.
The patient was assigned to a table in the dining room, and no other patients were allowed to sit with her. Nurses removed the patient from the dining room when she approached a table other than her own, or when she picked up unauthorized food from the dining room counter. In effect, this procedure resulted in the patient missing a meal whenever she attempted to steal food.
Figure 1 shows that when withdrawal of positive reinforcement (i.e. meal) was made dependent upon the patient’s ‘stealing’, this response was eliminated in two weeks. Because the patient no longer stole food, she ate only the diet prescribed for her. The effective control of the stealing response is also indicated by the gradual reduction in the patient’s body weight. At no time during the patient’s 9 years of hospitalization had she weighed less than 230 pounds. Figure 2 shows that at the conclusion of this treatment her weight stabilized at 180 pounds or 17 percent loss from her original weight. At this time, the patient’s physical condition was regarded as excellent.
A principle used in the laboratory shows that the strength of a response may be weakened by the removal of positive reinforcement following the response (Ferster, 1958). In this case, the response was food-stealing and the reinforcer was access to meals. When the patient stole food she was removed from the dining room and missed her meal.
After one year of this treatment, two occasions of food stealing occurred. The first occasion, occurring after one year of not stealing food, took the nurses by surprise and, therefore the patient ‘got away’ with it. The second occasion occurred shortly thereafter. This time, however, the controlling consequences were in force. The patient missed that meal and did not steal again to the conclusion of this investigation.
Because the patient was not informed or warned of the consequences that followed stealing, the nurses regarded the procedure as unlikely to have much effect on the patient’s behaviour. The implicit belief that verbal instructions are indispensable for learning is part of present day psychiatric lore. In keeping with this notion, prior to this behaviour treatment, the nurses had tried to persuade the patient to co-operate in dieting. Because there were strong medical reasons for her losing weight, the patient’s refusal to follow a prescribed diet was regarded as further evidence of her mental illness.
Control of One Form of Hoarding
Behavior through Stimulus Satiation
During the 9 years of hospitalization, the patient collected large numbers of towels and stored them in her room. Although many efforts had been made to discourage hoarding, this behaviour continued unaltered. The only recourse for the nursing staff was to take away the patient’s towels about twice a week.
To determine the degree of hoarding behaviour, the towels in her room were counted three times a week, when the patient was not in her room. This count showed that the number of towels kept in her room ranged from 19 to 29 despite the fact that during this time the nurses continued recovering their towel supply from the patient’s room.
The routine removal of the towels from the patient’s room was discontinued. Instead, a programme of stimulus satiation was carried out by the nurses. Intermittently, throughout the day, the nurses took a towel to the patient when she was in her room and simply handed it to her without any comment. The first week she was given an average of 7 towels daily, and by the third week this number was increased to 60.
The technique of satiation eliminated the towel hoarding. Figure 3 shows the mean number of towels per count found in the patient’s room. When the number of towels kept in her room reached the 625 mark, she started taking a few of them out. Thereafter, no more towels were given to her. During the next 12 months the mean number of towels found in her room was 1.5 per week.
The procedure used to reduce the amount of towel hoarding bears resemblance to satiation of a reinforcer. A reinforcer loses its effect when an excessive amount of that reinforcer is made available. Accordingly, the response maintained by that reinforcer is weakened. In this application, the towels constituted the reinforcing stimuli. When the number of towels in her room reached 625, continuing to give her towels seemed to make their collection aversive. The patient then proceeded to rid herself of the towels until she had virtually none.
During the first few weeks of satiation, the patient was observed patting her cheeks with a few towels, apparently enjoying them. Later, the patient was observed spending much of her time folding and stacking the approximately 600 towels in her room. A variety of remarks were made by the patient regarding receipt of towels. All verbal statements made by the patient were recorded by the nurse. The following represent typical remarks made during this experiment. First week: As the nurse entered the patient’s room carrying a towel, the patient would smile and say, “Oh, you found it for me, thank you.” Second week: When the number of towels given to patient increased rapidly, she told the nurses, “Don’t give me no more towels. I’ve got enough.” Third week: “Take them towels away. . . . I can’t sit here all night and fold towels.” Fourth and fifth weeks: “Get these dirty towels out of here.” Sixth week: After she had started taking the towels out of her room, she remarked to the nurse, “I can’t drag any more of these towels, I just can’t do it.”
The quality of these remarks suggests that the initial effect of giving towels to the patient was reinforcing. However, as the towels increased they ceased to be reinforcing, and presumably became aversive.
The ward nurses, who had undergone a three year training in psychiatric nursing, found it difficult to reconcile the procedure in this experiment with their psychiatric orientation. Most nurses subscribed to the popular psychiatric view which regards hoarding behaviour as a reflection of a deep ‘need’ for love and security. Presumably, no ‘real’ behavioural change was possible without meeting the patient’s ‘needs’ first. Even after the patient discontinued hoarding towels in her room, some nurses predicted that the change would not last and that worse behaviour would replace it. Using a time-sampling technique the patient was under continuous observation for over a year after the termination of the satiation programme. Not once during this period did the patient return to hoarding towels. Furthermore, no other behaviour problem replaced hoarding.
Control of an additional form
of hoarding through food reinforcement
Shortly after the patient had been admitted to the hospital she wore an excessive amount of clothing which included severalsweaters, shawls, dresses, undergarments and stockings. The clothing also included sheets and towels wrapped around her body, and a turban-like head-dress made up of several towels. In addition, the patient carried two to three cups on one hand while holding a bundle of miscellaneous clothing, and a large purse on the other.
To determine the amount of clothing worn by the patient, she was weighed before each meal over a period of two weeks. By subtracting her actual body weight from that recorded when she was dressed, the weight of her clothing was obtained.
The response required for reinforcement was stepping on a scale and meeting a predetermined weight. The requirement for reinforcement consisted of meeting a single weight (i.e. her body weight plus a specified number of pounds of clothing). Initially she was given an allowance of 23 pounds over her current body weight. This allowance represented a 2 pound reduction from her usual clothing weight. When the patient exceeded the weight requirement, the nurse stated in a matter-of-fact manner, “Sorry, you weigh too much, you’ll have to weigh less.” Failure to meet the required weight resulted in the patient missing the meal at which she was being weighed. Sometimes, in an effort to meet the requirement, the patient discarded more clothing than she was required. When this occurred the requirement was adjusted at the next weighing-time to correspond to the limit set by the patient on the preceding occasion.
When food reinforcement is made dependent upon the removal of superfluous clothing the response increases in frequency. Figure 4 shows that the patient gradually shed her clothing to meet the more demanding weight requirement until she dressed normally. At the conclusion of this experiment her clothes weighed 3 pounds compared to the 25 pounds she wore before this treatment.
Some verbal shaping was done in order to encourage the patient to leave the cups and bundles she carried with her. Nurses stopped her at the dining room and said, “Sorry, no things are allowed in the dining room.” No mention of clothing or specific items was made to avoid focusing undue attention upon them. Within a week, the patient typically stepped on the scale with- out her bundle and assorted objects. When her weight was over the limit, the patient was informed that she weighed “too much”. She then proceeded to take off a few clothes, stepped on the scale again, and upon meeting the weight requirement, gained access to the dining room.
According to the principle of reinforcement a class of responses is strengthened when it is followed by reinforcement. A reinforcer is such when it results in a response increase. In this application the removal of excessive clothing constituted the response and the reinforcer was food (i.e. access to meals). When the patient met the weight requirement she was reinforced by being given access to meals.
At the start of this experiment, the patient missed a few meals because she failed to meet the weight requirement, but soon thereafter she gradually discarded her superfluous clothing. First, she left behind odd items she had carried in her arms, such as bundles, cups and handbags. Next she took off the elaborate headgear and assorted “capes” or shawls she had worn over her shoulders. Although she had worn 18 pairs of stockings at one time, she eventually shed these also.
During the initial part of this experiment, the patient showed some emotional behaviour, e.g. crying, shouting and throwing chairs around. Because nurses were instructed to “ignore” this emotional behaviour, the patient obtained no sympathy or attention from them. The withholding of social reinforcement for emotional behaviour quickly led to its elimination.
At the conclusion of this behaviour treatment, the patient typically stepped on the scale wearing a dress, undergarments, a pair of stockings and a pair of light shoes. One of the behavioural changes concomitant with the current environmental manipulation was that as the patient began dressing normally she started to participate in small social events in the hospital. This was particularly new to the patient as she had previously remained seclusive spending most of the time in her room.
About this time the patient’s parents came to visit her and insisted on taking her home for a visit. This was the first time during the patient’s 9 years of hospitalization that her parents had asked to take her out. They remarked that previously they had not been interested in taking her out because the patient’s excessive dressing in addition to her weight made her look like a “circus freak”.
The research presented here was conducted under nearly ideal conditions. The variables manipulated (i.e. towels and food) were under full experimental control. Using a time-sample technique the patient was observed daily every 30 minutes from 7:00 a.m. to 11:00 p.m. Nurses and aides carried out these observations which were later analysed in terms of gross behaviour categories. These observations were in force for over a year during which time these three experiments were conducted. The results of these observations indicate that none of the three pathological behaviour patterns (i.e. food stealing, hoarding and excessive dressing) exhibited by the patient were replaced by any undesirable behaviour.
The patient displayed some emotional behaviour in each experiment, but each time it subsided when social reinforcement (i.e. attention) was not forthcoming. The patient did not become violent or seclusive as a consequence of these experiments. Instead, she became socially more accessible to patients and staff. She did not achieve a great deal of social success but she did begin to participate actively in social functions.
A frequent problem encountered in mental hospitals is overeating. In general this problem is solved by prescribing a reduction diet. Many patients, however, refuse to take a reduction diet and continue overeating. When confronted with this behaviour, psychiatric workers generally resort to two types of explanations.
One explanation of overeating points out that only with the active and sincere cooperation of the patient can weight reduction be accomplished. When the patient refuses to cooperate he is regarded as showing more signs of mental illness and all hopes of eliminating overeating come to an end.
Another type of explanation holds that overeating is not the behaviour to be concerned with. Instead, attention is focused on the psychological ‘needs’ of the patient. These ‘needs’ are said to be the cause of the observable behaviour, overeating. Therefore the emphasis is on the removal of the cause and not on the symptom or behaviour itself. Whatever theoretical merit these explanations may have, it is unfortunate that they fail to suggest practical ways of treating the behaviour itself. As a consequence, the patient continues to overeat often to the detriment of his health.
The current psychiatric emphasis on the resolution of the mental conflict that is presumably at the basis of the symptoms, is perhaps misplaced. What seems to have been forgotten is that behaviour problems such as those reported here, prevent the patient from being considered for discharge not only by the hospital personnel but also by the patient’s relatives. Indeed, as far as the patient’s relatives are concerned, the index of improvement or deterioration is the readily observable behaviour and not a detailed account of the mechanics of the mental apparatus.
Many individuals are admitted to mental hospitals because of one or more specific behaviour difficulties and not always because of a generalized ‘mental’ disturbance. For example, an individual may go into a mental hospital because he has refused to eat for several days, or because he talks to himself incessantly. If the goal of therapy were behavioral rehabilitation, these problems would be treated and normal eating and normal talking reinstated. However, the current emphasis in psychotherapy is on ‘mental-conflict resolution’ and little or no attention is given to dealing directly with the behavioural problems which prevent the patient from returning to the community.
|This report is based, in part, on a two-year research project (1959—1961),conducted by the author at the Saskatchewan Hospital, Weyburn, Saskatchçwan, Canada, and supported by a grant from the Commonwealth Fund. Grateful acknowledgment is due to H. Osmond and I. Clancey of the Saskatchewan Hospital. The author also thanks E. Haughton who assisted in the conduct of this investigation, and N. Azrin and W. Holtz for their critical reading of the manuscript.|
|As used in this paper, ‘nurse’ is a generic term including all those who actually work on the ward (attendants, aides, psychiatric and registered nurses).|