Section Four


Verbal Learning

and Retention


Many theorists have been concerned with applying principles of learning to the understanding of verbal learning and retention (cf. Hall, 1971). Of the theories dealing with retention, the most popular for quite a while was the associative interference theory. According to this theory, retention loss is due to competition from alternative responses at the time of recall. Thus, when a person can’t remember another person’s name, although he once knew it, it is probably not because the name is lost from the memory storage. Rather, it is because other responses, such as other names, interfere with the retrieval of the desired name from the memory. When trying to remember the name Shirley, names like Shelley might keep interferring.


There are basically two sources of interfering responses, retroactive inhibition (RI) and proactive inhibition (PI). RI refers to interference from sources learned after the material to be recalled. PI refers to interference from sources learned before the material to be recalled. The reading by Slamecka and Ceraso summarizes much of the work on RI and PI.


It should be noted that there is considerably more to the associative interference theory than simply RI and PI. In addition to the specific competition of RI and PT, there is also a generalized competition, a response tendency of the subject to respond with the last learned material. The subject also comes into the experimental situation with many past learning experiences that might provide another source of interference, extra-experimental interference. Also in addition to the associations between the material to be learned, the subject also develops associations between the material and the setting where the learning occurs, contextual associations. It also makes a difference how the subject codes and identifies the stimuli (Martin, 1971) and the type of cognitive structure elicited by meaningful material (e.g., Ausubel, Stager, and Garte, 1968). There are, in addition, many other relevant variables.


Distinctions are often made between short-term memory (STM) and long-term memory (LTM). The basic distinction is how long the information is stored, although there is no consensus on the time something can be stored and still be considered STM. Many theorists identify other differences between STM and LTM. Some (e.g., Broadbent, 1963) have suggested that information in STM decays over time, while forgetting in LTM is basically a function of interference due to similarity of material. Another distinction is that STM has a limited storage capacity (how much information it can hold at one time) while LTM, for practical purposes, is not limited (Broadbent, 1963; Waugh and Norman, 1965).


A different orientation, as illustrated in the reading by Melton, argues that STM and LTM are simply different points on the same continuum, and the same basic principles of verbal learning and retention apply to both STM and LTM.


A controversial issue in learning is whether a person’s verbal responses can be conditioned without his being aware of the contingencies. In many studies the subject is reinforced for emitting a class of words, such as specific pronouns to complete sentences, with a subtle reinforcement, such as a nod of the experimenter’s head or the experimenter saying “good.” The subject gradually learns in these situations and the issue is whether he must be aware of the reinforcement contingency in order to learn. Some theorists (e.g., Spielberger and DeNike, 1966) argue that if you have a sensitive enough measure of awareness, which is usually based on questioning the subject, it can be demonstrated that changes in performance occur as the subject becomes aware of the reinforcement contingencies. The problem is that in trying to assess awareness the experimenter helps make the subject aware, thus confounding the issue. The reading by Rosenfeld and Baer shows a clever way to assess awareness in verbal conditioning without making the subject aware.


In the last reading Glucksberg and King show how material associated with unpleasant events is harder to recall than neutral material. This study is important because it shows the role that affect has on retention and suggests an experimental approach to the study of repression, which is an important phenomenon in many clinical models.



Retroactive and Proactive Inhibition
of Verbal Learning


NORMAN J. SLAMECKA, University of Vermont; and JOHN CERASO, Yeshiva University


The last review of the literature solely devoted to retroactive inhibition (RI) was Swenson’s (1941) monograph whose coverage extended through 1940. The present paper extends the coverage by presenting a full bibliography and critical analysis of all published reports on the RI and proactive inhibition (PI) of verbal learning from 1941 through 1959. Studies of infrahuman Ss and of nonverbal behavior were excluded because of considerations of length and the fact that, traditionally, RI is a concept associated with verbal behavior. Excluded also were studies using interpolated convulsive seizures or surgical procedures because such treatments are qualitatively different from intervening learning as such and require other theoretical formulations to explain their effects. Following a brief summary of the field in 1940, subsequent developments will be discussed under five general headings: Degree of Acquisition, Similarity of Materials, Extrinsic Factors, Temporal Effects, Major Theoretical Positions.


The dominant theoretical position in 1940 was a transfer theory, given its fullest exposition by McGeoch and his collaborators. In essence the theory stated that RI could be explained by the general principles discovered in the study of transfer. The failure of performance of an old association could be attributed to greater strength of the new association, a mutual blocking of old and new associations, or a confusion between the two.



This theory was capable of handling a great deal of the relevant data and depended largely upon two sources of evidence for empirical support. The first source was the evidence for the effect of similarity of materials upon RI, which supported the contention that RI could be explained by the principles of transfer. The second source was intrusion errors, which are responses from the interpolated learning offered by Ss when they are asked for responses from the original learning. The existence of these errors supported the contention that old responses were not given because new ones had supplanted them.


Much of the subsequent history of RI can be viewed as a process of extension and enlargement of McGeoch’s basic position. The four major theories discussed later on in this paper serve as leading examples. The Melton-Irwin two-factor theory enlarged the competition of response theory by postulating an unlearning process in addition to competition of response. Gibson elaborated the theory by placing it within the setting of the conditioning experiment, making available the conceptual apparatus of differentiation and generalization. Underwood’s work has concentrated upon clarifying the nature of both unlearning and differentiation, while Osgood has stressed the communality of transfer and RI in his “transfer and retroaction surface.”


A consideration of terms is now in order. RI is the decrement in retention attributable to interpolated learning (McGeoch & Trion, 1952), and the operations that define it require a comparison of the retention of some original learning (OL) between two groups that differ in some aspect of the interpolated activity (IL) (Underwood, l949a). The experimental group has IL, and the control group engages in some non-learning filler task. Better retention in the control group defines RI, and better retention in the experimental group defines retroactive facilitation. Since the control group almost always shows some loss of the OL after its “rest activity,” to what can the decrement be attributed: to incidental learning, to loss of set, to sheer metabolic activity (Shaklee & Jones, 1959)? The impossibility of assuring that nointerpolated learning takes place for that group introduces an inevitable looseness into the significance of the RI measure. The control group’s decrement is sometimes assumed to be due to “natur al” forgetting, as distinct from the additional decrement attributed to the specific interfering tasks given the experimental group. But if a strict interference position is to be maintained, the “natural” forgetting must also be attributed to some source of interference, albeit beyond E’s control. The fact that different investigators may employ different filler tasks imposes a shifting base against which experimentally induced RI is calculated and renders comparison of results difficult. Osgood (1946, 1948) has dealt with the problem by simply omitting the control group and regarding RI as the difference in performance between the end of OL and the subsequent OL relearning (RL), lumping together both the specific and nonspecific decremental variables operating during the interpolated interval. This, of course, is a measure of total forgetting. Such a straightforward procedure cannot, however, distinguish between RI and retroactive facilitation, as they are usually understood, since facilitation may involve simply less decrement in retention as compared to a control group. Another troublesome problem arises with the other methods of quantifying RI, both of which rely upon control groups. Absolute RI is simply the numerical difference between the retention of the control and experimental groups, and relative RI is the percentage difference between them:


Rest-Work x 100



Each of these measures is thus dually dependent upon both the experimental and the control groups’ performance, and they may not always give the same pattern of results. This problem becomes especially important in studies of degree of OL upon RI. It is often the case that as OL increases, absolute RI increases, but relative RI decreases (Postman & Riley, 1959). To illustrate, it can be seen that, when degree of OL is low, the control group’s retention is low, and even slight departures from this baseline on the part of the experimental group will represent a substantial percentage difference; whereas when the control’s recall is high, the same absolute difference will reflect a lesser percentage change, and the relative RI will have decreased, while absolute RI will have remained the same. At present, we can only be alerted to this source of confusion and take it into account when viewing the results of any RI study. The foregoing observations apply just as fully to the quantification of PI, to which we now turn.


The PI paradigm requires a comparison of the retention of some original learning (List 2) between two groups that differ only in some aspect of the activity preceding that learning. The experimental group learns some previous material (List 1), and the control group does not. The same problem with regard to the control group’s experience applies here. Better retention in the control group defines PI, and better retention in the experimental group defines proactive facilitation. In addition, the PI design requires that a clear temporal distinction be made between the end of the acquisition phase of List 2 and its subsequent retention test. Minimally, a retention interval longer than the OL intertrial interval is needed. If this is not done, the learning and retention phases would be operationally identical, and the PI design would be indistinguishable from the transfer design.




Swenson’s (1941) generalizations about the acquisition variables were as follows:

[a] . . . susceptibility to retroaction does not tend to decrease as the amount of original activity is increased . . . (p. 17). [b] . . . the greater the degree of learning of the original activity, the less susceptible is the learning to retroactive inhibition (p. 18). [c] . . . we may retain the idea of increased retroactive inhibition with increased amount of interpolated activity (p. 19). [d] All measures show an increase in retroactive inhibition with early increases in the degree of interpolated learning and a decrease in retroactive inhibition with every high degrees of interpolated learning (p. 20).

These conclusions have been further amplified through subsequent work. (Unless otherwise noted, the results cited below refer to measures at recall—first relearning trial.)


Several papers have reported the effect of degree of IL upon RI either by varying the number of IL trials (Briggs, 1957; Highland, 1949; Melton, 1941; Postman & Riley, 1959; Slamecka, 1959, 1960a; Thune & Underwood, 1943; Underwood, 1945, 1950b), by setting a performance criterion (Archer & Underwood, 1951; Osgood, 1948; Richardson, 1956), by varying the number of interfering lists (Underwood, 1945), or by analysis of the associative strength of any single IL list item (Runquist, 1957). Most of the papers agreed that RI of recall showed a negatively accelerated increase with increasing IL, and studies that carried IL to very high degrees also agreed that the curve tended to flatten out or even to decrease (Briggs, 1957; Thune & Underwood, 1943; Underwood, 1945). In general, maximum levels of RI were obtained when the IL practice had somewhat exceeded the OL practice and further IL trials did not serve to increase the RI appreciably. An exception to this was Runquist’s (1957) finding that RI of individual items was not a function of the strength of the corresponding interpolated items. Also, in Exp. B of Underwood’s (1945) report, there were no significant recall differences among the work groups, nor was there any consistent trend toward a negatively accelerated curve of recall as a function of degree of IL. A possible explanation for this may lie in the fact that the lowest IL degree (8 trials) exceeded the mean OL trials (which averaged about 6). Under these conditions it might well be expected that increasing the IL practice would have no further decremental effect. Increasing the IL levels did, however, produce faster RI dissipation, which gives marginal support to Underwood’s differentiation hypothesis. The question of whether degree of IL, measured by trials, or amount of IL, measured by the number of different interpolated lists given, is the more powerful variable in producing RI was also specifically tested by Underwood (1945). Care was taken to equate the amount and degree levels by equal total trials, and the findings showed that RI changed at a faster rate with increases in amount than with increases in degree of IL. Both relative and absolute RI grew steadily as the number of IL lists was increased, but the frequency of overt interlist intrusions remained relatively constant, regardless of the number of lists. This is also consistent with the differentiation hypothesis, since increasing the number of lists should not increase differentiation, whereas increasing the number of trials on a single list should increase it. It is urged that a further comparison of the effect of amount against degree of IL should be made, using yet lower IL levels, so as to fill out that part of the curve at which acquisition is very slight.


Degree of OL was controlled in the following studies by varying the number of trials (Briggs, 1957; Melton, 1941; Postman & Riley, 1959; Shaw, 1942; Slamecka, 1960a), setting a performance criterion (Richardson, 1956), or analyzing individual item strengths (Runquist, 1957). All reports agreed that the susceptibility of the original material to RI was inversely related to its level of acquisition. The well-designed factorial study by Briggs (1957), using four OL and five IL levels (2, 5, 10, and 20 trials OL, compared to 0, 2, 5, 10, and 20 trials IL, all paired adjectives), confirmed previous findings as well as showing that, as OL increases, the greater must the IL level be for maximal relative RI. This was also found by Melton (1941). Further, Briggs reported more significant recall differences across the various IL levels as degree of OL increased. There was no additional information concerning the effects of amount of OL within this period.


PI as a function of List 1 acquisition has been studied by varying the number of trials (Postman & Riley, 1959; Waters, 1942), the number of lists (Underwood, 1945), setting a performance criterion (Atwater, 1953; Underwood, l949b, 1950a), and analyzing individual item strengths (Runquist, 1957). Two other studies (Greenberg & Underwood, 1950; Werner, 1947) omitted control groups and are not strictly PI designs, and a third (Peixotto, 1947) did not distinguish between learning and retention measures. When significant PI of recall was obtained, all but one of the studies agreed that it was a positive function of the degree or amount of prior learning, and there was even some indication that it leveled off at high degrees of such learning, much as with RI (Atwater, 1953). The one exception (Runquist, 1957) found that PI was not influenced by the degree of the corresponding interfering item strength. The latter is the only study that solely used such analysis and poses an important but separate question concerning the variables determining the retention of individual items per se. Underwood (1950a) found that PI was eliminated at all degrees of prior learning when recall time was extended to 8-sec. intervals. McGeoch and Underwood (1943), using paired-associates lists, found that, when the pairs were presented in fixed order, thus providing the opportunity for serial learning, significant PI was no longer obtained, as opposed to the usual method in which the order of the pairs is varied. A further indication of the sensitivity of PI to slight procedural changes was given in a report that found significant PI in a serial list at a 2-sec. rate of presentation, but not at a 2.3-sec. rate (Underwood, 1941).


One chronic problem which crops up in studies of the degree of prior learning upon PI (and also in RI designs) is that of controlling for practice and warm-up effects. Traditionally, the control group learns only List 2, whereas the experimental group has had prior practice via List 1. Taking List 2 to a common criterion does not insure equal strengths of learning since the rates of acquisition may differ. Although the problem has been recognized (McGeoch & Irion, 1952), it is not dealt with in most PI studies. Young’s (1955) is the only experimental effort at such control, wherein the learning was carried to a seven-eighths criterion on the hypothetical next trial, as determined by previous pilot study data.


The only study of PI as a function of the degree of List 2 learning appeared in the extensive investigation by Postman and Riley (1959) who used serial nonsense lists and naive Ss. This part of their work revealed a curvilinear PI (both absolute and relative) function. Maximum PI was obtained at the lowest and highest degrees of List 2 acquisition (5 and 40 trials, respectively) across all levels of List 1 training given (5, 10, 20, and 40 trials). Runquist (1957) found that the degree of PI of any individual list item is unaffected by the acquisition strength of that item— again pointing up the discrepancy between single item retention and overall list retention. The study of PI has not kept pace with the growing knowledge about RI, although recently the greater impact of long-range cumulative effects of prior learning have been brought out strikingly by Underwood (1957) who utilized data from previous retention work and showed that more forgetting is attributable to long-range PI effects than to RI. He found that, although well-practiced Ss forgot about 75 percent over 24 hours, naive Ss (no practice lists) forgot only about 30 percent. This large differential in retention could only be attributed to the strong PI effect of the practice material. Further experimental support was given by Seidel (1959), measuring concurrent PI and RI.


The transitory nature of RI and PI is exemplified in the common observation that these phenomena dissipate after a few relearning trials, sometimes even by the second trial (Osgood, 1948; Underwood, 1945). It follows that recall is the most sensitive measure, whereas if a relearning criterion is used, no interference effects may be demonstrable (McGeoch & Underwood, 1943; Thune & Underwood, 1943; Underwood, 1949b; Waters, 1942).


The rate at which RI dissipates is undoubtedly some function of the degree of learning, or the degree of differentiation of the two response systems involved; but the form of the function is not completely known. Dissipation rate is of importance theoretically and empirically. Melton and Irwin (1940) obtained fastest dissipation at the highest IL level used (40 trials), followed by the next highest level (20 trials). Thune and Underwood (1943) also found rapid dissipation at the highest levels (10 and 20 trials), but there was no difference in rate between them. This latter finding was incompatible with the two-factor theory of Melton and Irwin, in that it could not be explained by reference to the unlearning factor, because the great differences in overt intrusions obtained under the two conditions should have led to different rates of dissipation, favoring the highest level. This point will be considered again in the section devoted to theory. Data from Underwood (1945, Exp. B) also showed much faster dissipation at the high IL level, and the paper by Briggs (1957) suggests that RI dissipates fastest when the interfering material is well learned or overlearned, only at low and intermediate OL levels. RI persistence was generally found to be greatest at the intermediate IL levels used in the four latter studies. Further data on this point as well as comparable figures for rates of PI would be welcome.




Swenson’s (1941) summary of the earlier work on similarity was that “Robinson’s theoretical curve is at least roughly accurate” (p. 13). There has since been a definite waning of interest in the Skaggs-Robinson hypothesis as a useful generalization about the effects of similarity upon RI. This is partly because of the failures to duplicate the full theoretical curve within any one experiment (the last attempt at this was made by Kennelly, 1941, and was unsuccessful) and partly because a more heuristic alternative has emerged. The trend within this period may be traced from Boring’s (1941) mathematical discussion of communality; Gibson’s (1940) more analytical theory reflected in Hamilton’s (1943, p. 374) statement that “a two-variable hypothesis should be accepted in preference to the Skaggs-Robinson function”; through Haagen’s (1943, p. 44) conclusion that “the hypothesis applies, not to any dimension of similarity, but specifically to the condition in which the continuum of similarity involves a change in the SR relationship of the tasks”; to Osgood’s (1949) integration of the literature on RI and similarity in terms of his 3-dimensional transfer and retroaction surface. Ritchie (1954) argued that the Skaggs-Robinson paradox (the statement that the point of maximal OL and TL similarity is simultaneously the condition for greatest interference and also for greatest facilitation) is a pseudoproblem because of an ambiguous scoring procedure. In short, this hypothesis has been superseded by subsequent developments, to which we now turn. Studies of the effects of similarity relationships have been separated into those using paired associates and those using serial lists. The use of paired associates allows specification of the locus of the change in similarity between the lists, an advantage which is not found with serial arrangements. Three classes of change between pair items are possible: response (A_B, A_C), stimulus (A_B, C_B), and both stimulus and response changes (A_B, C_D).


The effect upon retention of learning a new response to an old stimulus has been to produce RI (Bugelski, 1942; Bugeiski & Cadwallader, 1956; Gladis & Braun, 1958; Haagen, 1943; Highland, 1949; Osgood, 1946, 1948; Young, 1955) and, also, retroactive facilitation (Haagen, 1943; Parducci & Knopf, 1958). The variable that determined the direction of the effect was the degree of similarity between the two responses. The problem of developing a rigorously objective quantitative scale of meaningful similarity along dimensions feasible for use in verbal form is a serious one, and it has not been adequately met. Usually, adjectives scaled for varying levels of synonymity to standard words were used. These levels were based upon pooled ratings by judges (Haagen, 1949; Osgood, 1946). Parducci and Knopf (1948) used geometric figures varying along some physical dimension with four-digit numerals varying in identity as the verbal responses required. Their OL and recall were visual discrimination tasks, and not really paired associates. The distinction is that the correct response figure and numeral appeared on the stimulus card, whereas in the true paired associates, the response is never a part of the stimulus item. The theoretical rationale of Young’s (1955) study deserves some discussion. In the A—B, A—C paradigm, learning A—B also adds to the associative strength of A—C through generalized reinforcement. The magnitude of such generalized reinforcement should be a positive function of the degree of similarity between the B and C response items. In the RI design it was hypothesized that the original list’s associative strength (after the IL list was learned) would be the sum of the direct reinforcement gained during its acquisition plus the additional generalized reinforcement gained from the subsequent IL learning. The IL list, on the other hand, would already have gained some generalized reinforcement as a result of the OL training and would thus need less direct reinforcement to achieve criterion during its learning. This would leave the original list with a greater associative strength at recall than the interpolated list, and the magnitude of this difference would be determined by the degree of response similarity between lists. Therefore, it was predicted that, as response similarity between lists increased, RI would decrease and PI would increase. These predictions were tested by Young, using three lists of paired adjectives (to increase the effect) and three levels of response similarity. Results showed that RI as well as overt intrusions decreased as response similarity increased, as predicted. The PI results, as well as a reinterpretation of this entire experiment, will be taken up at the end of this section.


Osgood’s (1949) generalization that as response similarity decreases from identity to antagonism, retroactive facilitation should gradually change to increasing RI, was given some empirical support within this period. However, one disturbing finding has emerged. Bugelski and Cadwallader (1956) made a comprehensive attempt to test Osgood’s generalizations about similarity effects, part of which involved the use of Osgood’s own word lists to define four degrees of response similarity—identical, similar, neutral, and opposed—while keeping the stimuli the same. Results showed decreasing RI with decreasing response similarity. There was more RI with similar than with opposed responses—a finding directly contrary to Osgood’s prediction, and not in accord with other data. No explanation was given for these results, but they cast doubt upon the previous formulation of response similarity. In addition to Osgood’s disinclination to use RI control groups, he has also relied upon an uncommon measure of retention, namely, latency scores. In one of his studies (Osgood, 1948), the significant drop in RI between opposed and similar responses was evident only with latency scores, but traditional recall showed no significant differences. In Osgood’s other study (1946) there were no significant latency differences at recall, but only on the second and third relearning trials. At no time were the differences between the neutral and opposed conditions significant. All things considered, the evidence in favor of the retroaction surface is less than overwhelming as far as the right half of the response dimension goes, and indicates that a revision is needed.


Saltz (1953) hypothesized that learning A—C after A—B inhibits B. Assuming that inhibition generalizes less than excitation, presenting a slightly altered A stimulus should again tend to evoke B. When tested in a straightforward manner, the hypothesis was not confirmed. A second attempt, designed to minimize changes in set, did result in a tendency toward reappearance of B. No further RI work along these lines has been reported.


There have been two papers on the effects of response similarity on PT. One reported no differential effect (Young, 1955), although overt intrusions increased with response similarity, and the other (Morgan & Underwood, 1950) found that PI tended to decrease as response similarity increased. Osgood (1946, 1948) reported results couched in terms of PI, but his data are for List 2 acquisition and therefore are measuring negative transfer. A methodological oversight with consequent possible confounding of the results of the Young (1955) and Morgan and Underwood (1950) studies should be pointed out. They both varied similarity along the synonymity of meaning dimension. In terms of A—B, A—C, the C response varied from very high (i.e., discreet-ailing, discreet-sickly), to very low similarity, or neutrality with regard to the B response (i.e., noiseless-sincere, noiseless-latent). Each single list had all of the responses at the same similarity level. Thus, it is conceivable that S could “catch on” that the List 2 responses were similar in meaning to those of List 1, and thereby reduce his chances of making errors by restricting his responses to members of the synonym category, with a resulting high positive transfer and low apparent PI. This postulated shift in the pool of responses available to S could be made entirely without his awareness, as several studies of verbal operant conditioning have demonstrated. With lists of low similarity on the other hand, the possibility of such an occurrence would be nil, and therefore no response class restriction would be made, resulting in a drop in positive transfer and higher apparent PI. Since these studies address themselves to rote learning and retention, the possibility of such a form of concept formation is a serious confounding variable. The test of retention may not be of rote recall at all, but actually of reconstruction of the response on the basis of the general concept of synonymity. As would clearly be predicted by such a “categorization” approach, the learning of List 2 was in fact fastest with high response similarity and became progressively slower with decreasing similarity. Both studies stressed the previously discussed response generalization rationale which would lead to increasing PI with increasing similarity, because learning a similar List 2 response would add to the interfering strength of the List 1 response through generalized or “parasitic” reinforcement. These predictions were not in fact confirmed; rather, PI tended to decrease with increasing similarity (although not statistically significant), an expectation consistent with the categorization hypothesis. The magnitude of the effect is probably dependent upon the relative strength of the two lists, as well as upon the number of alternatives in the response classes, which is a task for further empirical work to verify. Such an unintended source of bias may also have been working in the Bugelski and Cadwallader (1956) study, which used a similar list construction technique. Preferably, items at varying levels of response similarity should be included within the same list, so that S would have no opportunity to grasp the concept of the overall list structure. Such a procedure was used for RI by Osgood (1946, 1958) who was aware of this problem. A paper by Twedt and Underwood (1959), which showed that there was no difference in transfer effects between “mixed” and “unmixed” lists, is relevant to lists differing only in formal characteristics, but does not bear upon the question of the general synonymity of the list items as a whole. The lists of the latter study were not varied in degree of meaningful response similarity and thus do not constitute a test of the categorization hypothesis. However, an important paper by Barnes and Underwood (1959) suggested a mediation rationale as another possibility. If A—B is the first list and A—B the second, there is a possibility of an A—B—B mediation occurring at recall. In view of these complications, we must conclude that the effects of varying response similarity still have not been unequivocally demonstrated or explained.


The retention effect of learning the same response to a new stimulus was reported in four studies, all of which found retroactive facilitation (Bugeiski & Cadwallader, 1956; Haagen, 1943; Hamilton, 1943; Highland, 1949). Similarity was varied either by using geometric figures differing in generalizability (originally developed by Gibson, 1941) or meaningful words scaled for synonymity. The results agreed that retroactive facilitation increased with increasing stimulus similarity. The extreme of similarity is identity, and this produces the most facilitation of all since it amounts to continued practice on the original list. At levels of very low similarity there was some inhibition (Haagen, 1943), and according to Hamilton (1943, p. 375): “When the stimulus forms were of 0 degree generalization there was very little difference in retention in conditions with response identical and with responses different.”


No study has ever tested the effects of opposed or antagonistic stimulus relationships while keeping responses the same. Osgood’s (1949) retroaction surface does not extend the dimension of stimulus dissimilarity beyond “neutral” or unrelated, although the response dimension does include “antagonistic” relations. The implication is that stimulus opposition is no different in its effects from stimulus neutrality, although no RI evidence is adduced for such a position. It is conceivable, however, that meaningful stimulus opposition or antonymity would actually result in facilitation of recall, based upon a mediation rationale, since such words would be related by S’s previous language experience. If response opposition is expected to differ in effect from response neutrality, then stimulus opposition might also. There are no corresponding paired-associates studies upon the PI effects of stimulus variation.


The effect of changing both the stimulus and response members of the interfering list is concisely stated by Osgood (1949, p. 135): “negative transfer and retroactive inhibition are obtained, the magnitude of both increasing as the stimulus similarity increases.” One experiment did not vary stimulus similarity with unrelated responses (Highland, 1949), four studies did vary stimulus similarity with unrelated responses (Gibson, 1941; Haagen, 1943; McClelland & Heath, 1943; Postman, 1958), and another used three degrees of response similarity as well (Bugelski & Cadwallader, 1956). The five latter reports indicate increasing RI with increasing stimulus similarity, and the one study available shows that this holds over all levels of response similarity tested. Two studies from this group will be more fully described since they represent an intriguing departure from the use of the usual physical or meaningful similarity dimension. McClelland and Heath (1943) used as stimulus items for the original and interpolated lists, respectively, a Kent-Rosanoff stimulus word and the most frequent free-association response made to it. Thus an existing prepotent connection was deliberately introduced. Responses were unrelated, and there was no control group. Recall was significantly less under that condition as compared with the case in which there was no association between the stimuli. Since the related words were not similar in appearance or in meaning (e.g., Thirsty-Water) and since a common mediating response could not account for the directionality of the association, the authors concluded that:

to define the relation between original and interpolated activities which determines the amount of RI, as similarity or as generalization (plain or mediated) is too narrow a conceptionalization, since it does not cover such a learned, uni-directional relation between the two activities as was demonstrated to be of importance here (p. 429).

This study was not carried far enough to prove the point. A third group is needed, for which the related OL and IL stimuli would be interchanged. If this group would display no better recall than the unrelated stimuli group, then the case for the effect of unidirectionality of relationships upon RI would be established. Postman (1958) used geometric figures as OL stimuli. The IL stimuli were either the identical figures, words describing the figures (i.e., “square”) , or color names. Responses were unrelated. Both the figure and word groups showed significant RI, with the former having the largest decrement, while the color group did not. These results were explained in terms of the previously learned connections between figures and their names, with formal similarity producing greater interference than mediated equivalence. The influence of unidirectionally prepotent and mediated connections upon forgetting deserves even more attention than it has received. PT is once again slighted, for there are no paired- associates studies concerning both stimulus and response changes.


We turn now to serial list studies, divided into those employing discrete, unconnected items, and those using connected discourse or some approximation thereto. Effects of similarity relations between discrete item lists were reported in three papers which were relatively unrelated as regards their major purposes. Irion (1946) varied the relative serial positions of the original and interpolated adjectives, with some groups learning the identical words for IL, and others learning synonyms. He concluded that similarity of serial position was an effective variable only when identity of meaning was also present. Since several significant differences for IL were reported, we feel that the main variables were confounded with the uncontrolled degree of IL, rendering the results ambiguous. Melton and von Lackum (1941), in a study designed to test an important deduction from the two- factor theory, used two levels of similarity of interpolated items, and found both RI and PT greater under the high similarity condition. Kingsley (1946), with meaningful words, also found poorer retention with interpolated synonyms as opposed to antonyms. Both of the above studies support the generalization that, with serial lists, RI increases with increasing stimulus similarity, along dimensions of both identical elements and meaningfulness.


Ordinary prose or connected discourse has been, until recently, unusually resistant to demonstrable interference effects. Blankenship and Whitely (1941) studied PI of advertising material (a simulated grocer’s handbill) as a function of two levels of judged List 2 similarity. Recall after 48 hours showed greater PI for the more similar condition. Their study actually did not vary degrees of similarity of prose, since one of the two lists was nonsense material, and it may be questioned whether a grocer’s handbill resembles prose rather than a list of paired associates. Hall (1955) in an RI design, using a completion test, gave 30 sentences for OL, with IL being more sentences varying in two levels of similarity of topic. Results of that, and of a second, unpublished study, both showed no RI. Deese and Hardman (1954) found no RI for connected discourse under conditions of unlimited response time. Ausubel, Robbins, and Blake (1957), using the method of whole presentation, found no RI. The measure of both learning and recall was a recognition test, largely of substance retention. Peairs (1958) did find RI using a recognition procedure; Slamecka (1959), using grouped Ss, reported that unaided written recall of a short passage was a negative function of the degree of similarity of topic the interfering passage bore to the original passage.


On the whole, these results were rather discouraging about generalizing RI findings from nonsense material to connected discourse and led to the view that prose was not susceptible to RI, or at least to the similarity variable (Miller, 1951, p. 220). We feel, however, that the difficulty was not in the characteristics of connected discourse, but rather in the methods employed. It is noteworthy that all of the above studies employed the less well- controlled techniques of group testing, whole presentation, unlimited recall times, recognition tests, and the like. When, however, connected discourse was presented in the same manner as the traditional serial list, using the serial anticipation method with individually tested Ss, significant RI was obtained, and it was clearly shown to be a function of degree of OL and IL, as well as of similarity of OL-IL subject matter (Slamecka, 1960a, 1960b). Any presumption of the uniqueness of connected discourse with regard to these variables is no longer tenable, and the door is now open for further exploration of this area.


Errors in recognition and recall of a story were shown to be a function of the interference provided by the interpolated presentation of a picture which bore some thematic resemblance to the story (Davis & Sinha, 1950a 1950b). Similarly, Belbin (1950) showed that an interpolated recall test concerning an incidentally present poster interfered with the subsequent recognition of the poster. If the attempted recall is viewed as interfering with the original perceptual trace, then the degree of OL and IL (recall test) similarity was determined by each S’s own recall performance.


Lying somewhere between the use of discrete, unconnected items and ordinary prose are two studies employing lists of various orders of approximation to English, constructed according to a method developed by Miller and Seifridge (1950). If RI is a function of contextual constraint, then the use of such materials should be appropriate.


Heise (1956) used an unrelated word list as OL, and five different IL levels of approximation to English. He found recall was best with the greatest dissimilarity between the lists. Thus, the seventh order IL list (close to English text) produced almost no interference, whereas the first order list (same order as OL) produced a great deal, again supporting the generalization concerning greater RI with greater similarity between serial lists. King and Cofer (1958) extended this technique by using OL lists at the zero, first, third, and fifth orders, with four different orders of IL at each of the OL levels. Their intent was to examine similarity effects at various levels of contextual constraint, but the results did not show an overall comprehensive pattern for RI. They suggested that the effects of contextual constraint may prove to be more complex than originally expected, and called for further investigation.




In this section are papers focusing upon variables actually extrinsic to the specific items being learned. In most of these studies the groups learned identical materials, and they differed only with regard to such things as the general surround, testing methods, and sets.


The striking effects of altered environment were shown by Bilodeau and Schlosberg (1951). The two groups differed only in the conditions under which IL took place. One group stayed in the same room for all phases, and the other had the IL in a dissimilar room with a different exposure device and a changed posture for S. Recall, done in the OL room, indicated that IL interfered only half as much when associated with a different surround. Elaborating upon this, Greenspoon and Ranyard (1957) also used two different surrounds (different rooms, posture, and exposure devices designated as A and B), in four combinations, and the results, in terms of decreasing order of recall were ABA (AAA, ABB) AAB (those within parentheses not significantly different). Although no controls were used, the findings agree with those of Bilodeau and Schlosberg. These studies support the view that, since recall takes place in some context, the cues governing a response lie not only within the learning material, but also in the general surround, and that the magnitude of RI is a partial function of such context-carried cues. The relative importance of the proprioceptive vs. the exteroceptive cues was not assessed.


Jenkins and Postman (1949) varied testing procedures for OL and IL, using anticipation (A) or recognition (R), in four combinations. Results showed a significant increase in recall when procedures were different, under only one of the comparisons (A_A, A_R). The authors concluded that using a different testing method is a change in set and “helps in the functional isolation of materials learned successively” (p. 72). Postman and Postman (1948) gave four groups the same materials, differing only in the order of the S—R items. Paired syllables-numbers for OL were followed by either paired numbers-syllables or more syllables-numbers. The changed set groups showed better recall. No control groups were used. In the second part of the same report, OL was paired words with either a compatible (doctorheal) or incompatible (war-peaceful) relation between them. For IL, half the Ss learned a list with the same logical relations, and half learned one with the opposite relations to OL. This latter group showed superior retention, again attributed to the dissimilar sets involved.


Comparing the effects of incidental vs. intentional learning of OL and IL, Postman and Adams (1956) found that, regardless of the OL conditions, intentional IL produced more RI than incidental IL. Both intentional and incidental learning were equally susceptible to RI when followed by IL of the same kind and strength as OL. The authors noted that: “Intentional practice resulted in the learning of a longer number of items during interpolation and hence was a more effective source of interference” (p. 328). Thus, it appears that these conditions were simply the vehicles by which degree of IL, the effective variable, was manipulated. In an earlier paper, Prentice (1943) concluded that incidental learning was more subject to RI than intentional, but when Postman and Adams (1956) corrected Prentice’s data by subtracting the respective control group scores, the results agreed with the Postman and Adams findings. If incidental and intentional conditions are construed as providing different sets, or “functional isolation,” then an experiment in which the degree of acquisition was equalized should be expected to give different results: the similarly treated groups should display more RI than the changed-set groups. Since this has not been done, we must conclude that the RI effects of incidental vs. intentional conditions per se are not yet known.


The effect of the emotion-arousing characteristic of the IL upon retention is an interesting question, but only one study attempted it within this period and produced inconclusive results (McMullin, 1942), probably because of a confounded experimental design. Among the truly inherent subject variables that have been investigated is the effect of the age of S (Gladis & Braun, 1958; Wywrocki, 1957). The former study divided Ss into three age classes: 20—29, 40—49, and 60—72 years. There was no control group. Although a negative relationship between age and rate of learning was found, the adjusted absolute recall scores revealed no differential RI effects related to age. One might speculate that the decreased learning ability of the older Ss was a PI effect resulting from their many years of previous learning. When the recall scores were “corrected” for this, the actually obtained negative relation between raw recall and age was eliminated. Among the more clinical subject variables, Cassel (1957) reported no differential RI susceptibility between Ss of normal mentality and those with mental deficiency. Sherman (1957) found that psychopaths showed better retention than either neurotics or normals, measured by total forgetting scores. Livson and Krech (1955) reported a moderate positive correlation between recall and scores on the KAE (Kinesthetic Aftereffect Test, which was related to Krech’s cortical conductivity hypothesis).


The importance of set factors, generally called warm-up effects, has been recognized (Trion, 1948). Thune (1958) showed that recall was significantly facilitated by a preceding appropriate warm-up. If OL was from a memory drum and IL from a filmstrip, then a memory drum warm-up facilitated recall, but a filmstrip warm-up did not. Inappropriate warm-up did facilitate later relearning trials, and Thune concluded that warm-up has both peripheral and central components, with the former more transitory. No RI control groups were used.


The effects of such extrinsic variables upon PI have not yet been investigated. This line of research should be extended, since the magnitudes of interference obtained are often considerable, and probably much of our everyday forgetting is attributable to such context-associated factors.




Swenson (1941) summarized the effects of temporal variables as follows:

[a]. . . interpolation immediately adjacent either to original learning or to recall of original learning is more effective in producing retroactive inhibition than is interpolated activity between those two extremes (p. 15). [b] . . . the more recent studies suggest an inverse relationship between length of the time interval and relative retroactive inhibition (p. 16).

Subsequent work has called for a modification of those statements. Examination of the RI paradigm reveals three manipulable temporal intervals: end of OL—start of IL, end of IL—start of RL, and end of OL—start of RL. No single experiment, while keeping the IL learning period constant, can vary only one of these intervals without automatically changing one of the others. When the IL learning period varies (as in studies giving different numbers of IL trials) while the OL-IL and the OL-RL intervals are kept constant, then the IL—RL interval will inevitably vary. Therefore, in the study of any one of these variables, confounding is inescapable. There is no easy way out of this dilemma. The only technique approaching a solution seems to be to do several separate experiments, confounding a different pair of intervals each time, and then evaluating the results of all the experiments by determining which confoundings have no effect. This more elaborate approach has not been used in actual practice; rather, acceptance of such confounding seems to be the rule.


Varying the IL—RL interval allows for measurement of progressive changes in the strength of RI and PI, and deductions concerning the events that occur in that time. Underwood (1948a), using IL—RL intervals of 5 and 48 hr., and Briggs (1954) at 4 min. to 72 hr., report no significant changes in magnitude of R.I. Deese and Marder (1957), using unlimited response times, from intervals of 4 min. to 48 hrs., and Peterson and Peterson (1957) from 0 to 15 min., both found no changes in recall. Slight RI decreases were reported by Jones (1953) from .17 to 24 hrs. (with an increase from 24 to 144 hrs.) and by Ishihara (1951). Using the uncommon A—B, C—D design with very high levels of practice, Rothkopf (1957) found an increase in recall from 0 to 21 hrs., but no control groups were used. From the trend of these results, the best conclusion seems to be that RI remains relatively stable over time, at least up to 72 hrs.


In examining the temporal course of PI, Underwood (l949b) found no change from 20 to 75 min., but (Underwood, 1948a) did find a drop in recall from 5 to 48 hrs. (no control groups), and Jones (1953) also reported increasing PI. In a study not explicitly designed to assess PI, therefore lacking control groups, Greenberg and Underwood (1950) also found a significant drop in List 2 recall from 10 min. to 5 hrs. to 48 hrs. In spite of the lack of appropriate controls in some of these studies, the results are in sufficient agreement to allow the conclusion that PT shows a gradual increase through time, which is in accord with logical expectations, as Underwood (1948a) has pointed out.


In comparing the relative strengths of RI vs. PI through time under comparable conditions, Underwood (1948a) found that RI was greater at 5 hrs., but that there was no difference at 48 hrs. Jones (1953) and Rothkopf (1957) reported similar observations. Underwood hypothesized that the failure of List 1 recall to diminish might be due to a process of gradual recovery of OL responses after their unlearning during IL. This led to the use of the modified free recall (MFR) procedure as a method of assessing response dominance. In MFR, S is given a stimulus item common to both lists and asked for the first response that comes to mind. It was felt that such unrestricted, uncorrected recall would provide a fairer estimate of the relative strengths of the competing responses, although it was clearly not intended to be equivalent to the restricted recall required for RI measures. Underwood (l948b) gave MFR at 1 min., 5, 24, and 48 hrs. after IL and found no change in OL responses, a consistent drop in IL responses, and a rise in “other” responses. He concluded that:

These data are given as further support of the interpretation of unlearning of the first list as being similar to experimental extinction. The fact that no decrease in the effective strength of the first list responses takes place over 48 hrs. suggests that a process running counter to the usual forgetting process is present. It is suggested that this mechanism may be likened to spontaneous recovery (p. 438).

Concerning OL responses, it seems unnecessary to hypothesize two opposing tendencies (recovery vs. “usual forgetting”) canceling each other out, as it were, to account for a finding of no change. The usual forgetting curve might not necessarily be expected of OL responses, since the effects of IL could be such as to obliterate, through differential unlearning, more of the weak than the strong responses, leaving the strong, stable ones that are more resistant to the “usual forgetting” process, in the preponderance. List 2 responses, not so selectively eliminated, would be expected to decrease in time. In support of this alternate view we call attention to two relevant bits of evidence. Deese and Marder (1957) found that the number of items recalled after interpolation remained constant over intervals of 4 min., 2, 24, and 48 hours after IL. Also, Runquist (1957) found that resistance to RI was positively related to the degree of an original item’s strength. In another MFR experiment, Briggs (1954) did obtain a rise in OL responses between 4 min. and 6 hrs., with subsequent stability through 72 hrs. Because of the discrepancy between these data and those of Underwood (l948b), another study was done in which Briggs, Thompson, and Brogden (1954) found no OL changes between 4 min. and 6 hrs. These authors concluded that “responses from original learning show no change, that responses from interpolated learning tend to decrease with time interval in a fairly regular manner, and that ‘other’ responses tend to increase . . .“ (p. 423). From these MFR data, we tentatively çonclude that the processes underlying the temporal stability of RI do not as yet clearly indicate the recovery of unlearned original responses. The spontaneous recovery hypothesis is an attractive one, but more evidence of its validity should be brought forth.


Another problem of interest is the effect of the temporal point of interpolation, which requires keeping the OL—RL interval constant and varying the OL—IL period. Unavoidably, this introduces confounding with the simultaneously varying OL—IL interval, as discussed above.


Houlahan (1941) gave IL either 0, 4, or 8 mins. after OL and found more RI for the immediate interpolation condition. However, there was no direct measure of OL; rather, the performance on some previously learned lists thought to be of equal difficulty to OL was used as a comparison. Within a 16-day OL— RL period, Postman and Alper (1946) gave IL at eight evenly dispersed intervals and found maxima of recall at 1, 8, and 15 days after OL. Degree of acquisition was uncontrolled, since fixed numbers of trials were given; and, since no acquisition data were presented, unequivocal conclusions about the temporal variable cannot be drawn.


Maeda (1951), using short intervals, reported greatest reproduction when IL directly followed OL. Newton (1955) with an A—B, C—B design, and Archer and Underwood (1951) with A—B, A—C, using a 48 hr. OL—RL period with IL at 0, 24, and 48 hrs., concluded that temporal point of interpolation was not an effective variable. Newton and Wickens (1956) noted that the Archer and Underwood study failed to control for differential warm-up, in that the group with IL immediately before RL benefited by warm-up, whereas the other two groups had no comparable advantage. They repeated the Archer and Underwood study with the same materials, but gave a warm-up task to the 0- and 24-hr. groups. No effects of the temporal intervals were obtained, confirming the previous results. However, they also reported two additional experiments, with an A_B, C—fl design, with warm-up provided. One study had a performance acquisition criterion, and the other a fixed number of trials. Results of both showed that the 48-hr. group did show significantly more RI than the other two. Those authors state that the A—B, A—C design “is a relationship which is designed to produce a maximum amount of RI, and the intensity of this condition may obscure the RI which can arise from a variable of lesser importance—as the temporal variable may well be” (Newton & Wickens, 1956, p. 153). They especially stressed the importance of generalized competition between lists, a point which shall be developed further in the theoretical section below. We tentatively conclude from the Newton and Wickens (1956) ‘data, supported by Maeda (1951), that RI increases as the OL-IL interval increases and that the effect is thus far specific to the A—B, C—D design.


A comparable PI design would require a constant List 2—RL period, while varying the List 1—List 2 interval. We have been unable to find such an experiment in the literature within this period. Ray (1945) studied List 2 acquisition as a function of the interval since the learning of List 1. Although he speaks of PI, the design is appropriate only to conclusions about negative transfer.


Another temporal variable which has not been studied sufficiently in an RI design is the rate of presentation of the items. The only relevant retention study on this is an unpublished honors thesis by Seeler (1958). OL was a 35-word passage of prose presented via tape recording to a criterion of one perfect unaided written recall, followed by similar memorization of an IL passage, and then by OL recall. OL rates of presentation were ½, 1, and 2 secs., followed by ½; 1; or 2-second counterbalanced rates on the IL. No control groups were used. Results showed that number of trials to mastery of all original and interpolated passages was a direct function of their presentation rates, a finding consistent with acquisition reports based on nonsense materials. There was no influence of either the OL or IL presentation rates, or any of their combinations, upon recall. It might have been supposed that an IL rate different from the OL rate would have served to functionally isolate the original list and produce less forgetting, but that was not the case. The possibility of confounding rate of presentation and strength of associations at the end of OL due to differential acquisition rates (Underwood, 1949a) is not a problem in this study, since unaided recall was used. With the method of serial anticipation, however, the criterial OL trial is also another learning trial, and two groups taken to the same performance criterion may still differ on total associative strength at the termination of the last OL trial. The problem is always present whenever any variable that effects rate of acquisition (such as meaningfulness, similarity, etc.) is used along with the serial anticipation technique. The generalizability of the latter results to unconnected materials as well as the additional independent problem of the RI effects of massed vs. distributed training must await further study.




In this Section we shall discuss the four main theoretical positions which have influenced the period covered by this review. Two major formulations, appearing within a few months of each other (Gibson, 1940; Melton & Irwin, 1940), guided the theoretical aspects of the study of RI within the first few years covered by this review.


Utilizing the classical conditioning principles of stimulus generalization and differentiation, Gibson (1940) presented a set of postulates for verbal behavior that served to lend greater predictive specificity to the transfer or straight competition-of- response view, previously developed by McGeoch and his collaborators. Basic to Gibson’s approach is the view that verbal learning and retention are matters of developing discriminations among the items to be learned. She defines her two basic constructs as follows: The construct of generalization is “the tendency for a response Ra learned to Sa to occur when Sb (with which it has not been previously associated) is presented” (p. 204). The construct of differentiation is a “progressive decrease in generalization as a result of reinforced practice with Sa—Ra and reinforced presentation of Sb (p. 205). A curvilinear growth function of the generalization tendency as practice trials increase is stressed. Essentially, RI is related to the degree of discriminability of the two lists, such discriminability being a positive function of their respective degrees of learning, and a negative function of the time elapsed since learning. Spontaneous recovery of generalization tendencies (wrong responses) through time is assumed. From these postulates, several deductions concerning RI were presented, and some of these have been tested and confirmed: for instance, RI as a function of various similarity relations among the items (Gibson, 1941; Hamilton, 1943), and the curvilinear RI function obtained as the degree of IL increases (Melton & Irwin, 1940). Among the deductions tested but not confirmed is one bearing upon the temporal point of interpolation problem. Gibson feels that one of the reasons for the disparity of results on this question lies in the neglect of the importance of the degree of acquisition of the lists. She predicted that acquisition level would be found to interact with the temporal point of interpolation because the spontaneous recovery of generalization tendencies between lists is a function of time. This prediction was tested by Archer and Underwood (1951) using three levels of IL acquisition (6/10, 10/10, and 10/10+5 trials) and three OL—IL intervals (0, 24, and 48 hrs.), but no interaction between them was found. RI control groups were not used, and in light of the theoretical importance of this study it would seem advisable to re-examine these variables with a design adaptable to relative RI measures. The authors themselves expressed dissatisfaction with the outcome and “felt that a modification of the conditions in our design would indicate the temporal position to be a factor” (p. 289).


Considering the general reaction toward Gibson’s theory in succeeding RI work, we feel that, on the whole, it has been favorably received, since it has been given a certain amount of implicit corroboration by way of being compatible with many findings (for instance, Briggs, 1957) and has potential for even further development. It has not, however, stimulated a comprehensive series of experiments aimed at testing the many RI deductions implicit within it. The reason for this is certainly not any lack of clarity in the postulates. One present weakness seems to be the lack of direct evidence for a spontaneous recovery process influencing RI.


Melton and Irwin (1940) introduced their two-factor theory within the framework of a study of RI as a function of the degree of IL. OL was 5 trials on an 18-item serial nonsense list, followed by 5, 10, 20, or 40 trials on an IL list. Relying upon a count of the overt interlist intrusions as an objective index of the degree of competition between original and interpolated responses at recall, they found that the curves of amount of absolute RI, and the number of such intrusions (multiplied by a factor of 2 to do justice to partial intrusions) were not highly correlated. (The theoretical importance of intrusion counts gained its ascendancy with this study.) Rather, interlist intrusions increased to a maximum at intermediate IL levels and then decreased markedly, whereas the curve of RI rose sharply and maintained a relatively high level, declining slightly at the highest degree of IL. That portion of the RI attributable to direct competition of responses at recall was at a maximum when OL and IL were about equal in strength. Therefore, to account for the remainder of the obtained RI not accounted for by overt competition, Melton and Irwin postulated another factor at work, tentatively identified as the direct “unlearning” of the original responses by their unreinforced elicitation or punishment, during IL. The growth of this “Factor X” was assumed to be a progessively increasing function of IL strength. Since Factor X was almost totally responsible for the absolute RI at the highest IL level, and since RI under that condition dissipated most rapidly after a few relearning trials, it was concluded that the effects of such unlearning were quite transitory. This was still a competition of response theory in the sense that the original responses ‘were still assumed to be competing at recall with the interpolated ones, but to that was added the factor of weakening in OL response strength, if not complete extinction, through the process of unlearning.


The presence of confounding between the degree of IL, and the end of IL-start of RL interval was pointed out by Peterson and Peterson (1957) as a possible alternative account of the differences in intrusions obtained by the Melton and Irwin design. With a fixed OL—RL interval the IL—RL interval shortens, with increasing IL trials taking more time. However, another study of the effects of degree of IL did use a fixed IL—RL interval (with a correspondingly varying OL—RL interval—Osgood, 1948) and still found comparable intrusion changes.


A direct deduction from the two-factor theory is that RI, being a result of both unlearning and competition effects, should be greater than PI, which was presumed to be the result of response competition alone. This hypothesis was tested and confirmed by Melton and von Lackum (1941) in a study using five trials on each of two 10-item consonant lists, and has also beengiven further general support by others (Jones, 1953; McGeoch & Underwood, 1943; Underwood, 1942, 1945). Underwood (1948a) in yet another study also found greater RI than PT at 5 hrs.; but at 24 hrs. they were equal. His resulting postulation of spontaneous recovery of the OL, and the subsequent developments of that concept have been discussed above.


Later, certain other observations led to some discontent with the two-factor theory. In an experiment designed to test the generalizability of the Melton and Irwin findings to paired-adjectives lists, Thune and Underwood (1943) used an A—B, A—C design with five OL trials and 0, 5, 10, or 20 IL trials. Their results confirmed the existence of a negatively accelerated function between RI and degree of IL, as well as the fact that overt intrusions were maximal at the intermediate IL levels (10 trials) and declined sharply by the 20-trial level, while RI still remained massive. However, there was no difference in the rate of RI dissipation between the 10 and 20 trial IL levels, and therefore the transitoriness of RI at these levels could not reasonably be attributable to the unlearning construct. The two-factor theory would have been forced to predict faster dissipation at the 20 IL level, since overt intrusions were far less for it than for the 10 IL level. In addition, the curve of Factor X drawn for the Thune and Underwood data was quite different in shape from that obtained by Melton and Irwin, and it was felt to imply rather incongruous psychological properties for a curve of unlearning. In addition, an item analysis revealed that almost half of the overt intrusions took place on items where the original response had never been reinforced (or correctly anticipated) at all! Therefore, such interlist intrusions could not be legitimate indicators of response competition, since those responses had never been learned during OL, and were simply not available to be competing with anything. It is also to be expected that for original responses to be unlearned they would have to occur during IL in sufficient frequency to be subject to punishment or lack of reinforcement. Yet, as Osgood (1948) pointed out from his data, the number of related original list intrusions during IL was “infinitesimally small” and could not possibly account for much unlearning at all. This previously observed discrepancy between the assumed growth of Factor X and the lack of increase in intrusions during IL as a function of increasing IL trials should be tempered with the possibility that partial intrusions could still play a large role in determining the degree of unlearning obtained, and such intrusions are not easily detected and counted.


Thune and Underwood (1943) suggested that the ratio of overt to covert (and partial) errors need not necessarily remain constant, but may undergo progressive change as a function of the degree of IL, therefore accounting for the drop in overt intrusions by postulating an increase in implicit interference. In a subsequent paper Underwood (1945) elaborated upon this suggestion and formalized his differentiation theory.


The shift in error ratios was interpreted as a resultant of two simultaneous processes: increasing IL associative strength tending to produce more overt intrusions, but being gradually overcome by the growth of differentiation, tending to reduce the intrusions. The magnitude of the differentiation construct was held to be a positive function of the degree of learning of both lists and a negative function of the time between the end of IL and the start of RL. A decrease in overt intrusions was, in effect, the index of increasing differentiation. When the two lists are about equally well learned, intrusions are maximal and differentiation is low; but with increasing disparity between their absolute or relative acquisition levels, intrusions are reduced, indicating increased differentiation. By the same token, a short IL—RL interval should also produce higher differentiation. That this is in fact the case was shown in the Archer and Underwood study (1958) where overt intrusions declined as the IL—RL interval became shorter. The increasing differentiation allows S to recognize and withhold erroneous responses, resulting in fewer inter- list intrusions and more covert or omission errors. Differentiation was described phenomenologically by Underwood (1945, p. 25) as being:


related to the verbally reported experience of “knowing” on the part of the subject that the responses from the interpolated learning are inappropriate at the attempted recall of the OL. Degree of differentiation in this sense is thus an indication of the degree to which the subject identifies the list to which each response belongs.


Empirical support for various aspects of this theory has come from several studies (e.g., Archer & Underwood, 1951; Osgood, 1948; Thune & Underwood, 1943; Underwood, 1945). Further, the fact that intrusion frequencies change but RI still remains constant might be simply a function of the limited recall time (usually 2 sec.) available to S. If this recall time was extended, then perhaps S would have sufficient time both to recognize the erroneous and verbalize the correct response, thus displaying a decrease in RI at high IL (differentiation) levels. Underwood (1950a, 1950b) tested this promising hypothesis, but found no dropping off of the Melton and Irwin effect, and concluded that differentiation does not change as a function of increased (8- sec.) recall time. Unlearning was therefore still retained as a useful concept; but, since it was shown that such response weakening took place only in the first “few” IL trials, as measured by associative inhibition, and because of the relatively great stress put upon the role of differentiation, Underwood’s revision of the two-factor theory became an important independent influence upon subsequent RI thinking.


Certain apparent similarities between Underwood’s differentiation construct and Gibson’s concept of differentiation deserve to be pointed out at this time. For both theorists, differentiation is in part a positive function of degree of reinforced practice on the material, such practice serving to reduce overt intrusion errors. Secondly, temporal relationships also play a large part in determining the strength of both constructs. However, the two positions do differ with regard to certain important aspects of operation of these determiners of differentiation. Underwood’s concept refers to the more global process of S correctly assigning the list membership of the responses, whereas Gibson speaks of discrete S—R connections in competition. Furthermore, Underwood’s theory is derived from experiments based largely upon the A—B, A—C design, whereas for Gibson, generalization as defined requires that the stimulus members be similar, but not identical. For Underwood, increasing differentiation is marked by a reduction of intrusions and an increase in omissions, but no drop in RI, whereas Gibson implies that increasing differentiation will result directly in improved performance. And finally, Gibson makes spontaneous recovery an integral part of her differentiation concept, while it was not until later that Underwood suggested a spontaneous recovery process, and that was reserved for the unlearning aspect of his theory.


The last theoretical formulation to be considered was put forth by Osgood (1946). It stemmed from his investigations of the RI effects of meaningful opposed responses and involved a hypothesis about reciprocal inhibition of antagonistic reactions, wherein “simultaneous with learning any response the S is also learning not to make the directly antagonistic response” (Osgood, 1948, p. 150). This was clearly an application of the reciprocal inhibition concept of neurophysiology to the area of verbal behavior. In pursuing the tenability of this position, two relevant transfer studies have shown that the learning of both similar and opposed List 2 responses was equally rapid, and much easier than learning neutral responses (Ishihara & Kasha, 1953; Ishihara, Morimoto, Kasha, & Kubo, 1957), thus failing to confirm the hypothesis. Unless further support for the hypothesis is forthcoming, we must conclude that it will not become an important influence in RI work.


With regard to the question of the adequacy of the two-factor theories we are of the opinion that the concept of unlearning is a valuable one, but that an acceptable measure of its magnitude has not yet been devised. Interlist intrusions were proposed only as a partial index, but not as a complete measure of its effects, and the difficulties encountered by such an index have been enumerated above. Instructions calculated to encourage the verbalizing of errors do just that: Morrow (1954) and Bugelski (1948) found that “all that is required to obtain a large number of such errors is to ask for them” (p. 680).


Two interesting proposals have been advanced as methods for distinguishing operationally between effects of competition and effects of unlearning. Postman and Kaplan (1947) spoke of’ two measures of RI: error scores, and the reaction times for correct responses (residual retroaction). These two measures were found not to be correlated and are thus of necessity measures of two different processes. They suggest that: “It is possible that retention loss (error scores) reflects the effects of unlearning, whereas reaction times may depend primarily on the competition between responses” (p. 143). Their experiment did not include variation of any factor which might be expected to affect unlearning differentially and therefore the usefulness of their proposal has not yet been tested.


Later, Postman and Egan (1948) proposed that the rate of recall of correct responses be a measure of unlearning. Retention was measured by the free recall procedure, and performance was recorded both in terms of number of items recalled, as well as by the rate of emission of correct items, per 3-sec. periods. They state that:

The two types of measures—amount lost and rate of recall—may be regarded as measures of these two processes (unlearning and competition, respectively). Those aspects of OL which have been unlearned cannot be evoked on retest: unlearning leads to decrement in amount retained. Other aspects suffer competition from the IL but are not unlearned. They are potentially available but “disturbed,” and manifest that in a slower rate of recall (p. 543).

These are both valid and constructive formulations deserving of further attention, but no significant efforts have as yet been made to test their usefulness in predicting data crucially relevant to the unlearning factor.


These experiments by Postman point in a new direction, suggesting that such evidence for competition of response is a result of the brief recall times used in RI studies. If competition of response results in increased latencies, then decrements in recall may come when the latency of a response exceeds the 2-sec. interval usually used. Underwood’s (1950a) study, which found no PI with an 8-sec. recall interval, supports this possibility.


An experiment by Ceraso (1959) may provide further support for such a hypothesis. With an A—B, A—C design, Ss were asked to recall both the first and second list responses and also to assign these to the proper list. Since a 20-sec. (maximum) recall interval was used, blocking due to competition of response should not be expected. An analysis of the first list responses which were correct on the last trial of OL, and were then scored as incorrect at recall, showed that the reason for the forgetting was simply the unavailability of the response. If the response was available at recall, it was also assigned to the correct list. Since competition of response should reveal itself as a misassignment of the response, it was clear that the forgetting obtained could not be accounted for by competition. Using a technique somewhat related, with an A—B, A—C design, Barnes and Underwood (1959) obtained similar results, and accordingly rejected a competition explanation.


Ceraso also found that in a large number of cases S could give both responses to the stimulus. But does not the unlearning hypothesis imply that learning the second list response entails the unavailability of the first list response? The answer that immediately suggests itself is that unlearning is a function of the degree of first and second list item learning. Therefore, an item analysis of the kind performed by Runquist was undertaken. The result showed that degree of learning of the second list item did not affect the retention of the first list item, thus verifying Runquist’s (1957) original finding.


It seems that the latter data pose a real problem for current theories of RI, since the basic mechanism usually postulated requires interaction between associations with similar or identical stimulus items. Both the Runquist and Ceraso findings seem to indicate a nonspecific mechanism. Learning a second list affects the entire first list, regardless of the specific item interactions.


In conclusion, it appears that the major theoretical accounts of RI have remained relatively unchallenged and unchanged for the last ten years, in spite of the accumulation of considerable empirical data. It is hoped that this overview of the current state of the field will help to initiate a more vigorous and sustained effort toward an improved theory of forgetting.




For a concluding statement we feel it would be appropriate to enumerate some of the pressing problems and empirical gaps currently evident in the status of our knowledge of RI and PI. These points are presented in the order of their appearance in the foregoing review and do not reflect any opinion regarding their relative importance.


Reconsideration of the relative merits of RI quantification: absolute RI, relative RI, and total forgetting
Determinants of the RI and PI of individual items
Determinants of the rate of PI dissipation


Development of an objectively quantitative scale of similarity for use in constructing lists of items


Reappraisal of the right half of the response dimension of Osgood’s retroaction surface


Effects of opposed or antagonistic stimulus relations upon RI and PI, with responses the same


Effects of varying response similarity upon PI, with the “categorization approach” error eliminated


Further study of the RI effects of mediated and unidirectional prepotent association between list items


PI as a function of similarity relations within the A—B, C—D design


Determinants of the RI and PI of connected discourse


Relative importance of proprioceptive vs. exteroceptive extrinsic cues for recall


RI effects of incidental vs. intentional acquisition conditions, with degree of acquisition controlled


Effects of the affective characteristics of the material upon its RI and PI


Better handling of the problem of confounding which arises when temporal intervals are manipulated


Further tests of the validity of the spontaneous recovery hypothesis


Examination of the point of interpolation problem as a function of other attendant variables


RI as a function of presentation rate, and of massing vs. distributing trials


Testing of the two-factor theory through an improved measure of the unlearning construct





1. This work was supported in part by a grant to the senior author from the National Science Foundation (G-6192).



Implications of Short-Term Memory for a
General Theory of Memory


ARTHUR W. MELTON, University of Michigan


Memory has never enjoyed even a small fraction of the interdisciplinary interest that has been expressed in symposia, discoveries, and methodological innovations during the last five years. Therefore, it seems probable that the next ten years will see major, perhaps even definitive, advances in our understanding of the biochemistry, neurophysiology, and psychology of memory, especially if these disciplines communicate with one another and seek a unified theory. My thesis is, of course, that psychological studies of human short-term memory, and particularly the further exploitation of new techniques for investigating human short- term memory, will play an important role in these advances toward a general theory of memory. Even now, some critical issues are being sharpened by such observations.


The confluence of forces responsible for this sanguine prediction about future progress is reflected in this AAAS program on memory. Advances in biochemistry and neurophysiology are permitting the formulation and testing of meaningful theories about the palpable stuff that is the correlate of the memory trace as an hypothetical construct (Deutsch, 1962; Gerard, 1963; Thomas, 1962). In this work there is heavy emphasis on the storage mechanism and its properties, especially the consolidation process, and it may be expected that findings here will offer important guide lines for the refinement of the psychologist’s construct once we are clear as to what our human performance data say it should be.


Within psychology several developments have focused attention on memory. In the first place, among learning theorists there is a revival of interest in the appropriate assumptions to be made about the characteristics of the memory traces (engrams, associations, bonds, sHr’s) that are the products of experiences and repetitions of experiences. Thus, Estes (1960) has questioned the validity of the widespread assumption (e.g., Hull, 1943; Spence, 1955) that habit strength grows incrementally over repetitions, and has proposed an all-or-none conception as an alternative. More recently, he has examined (Estes, 1962) in detail the varieties of the incremental and all-or-none conceptions and the evidence related to them. Already, some defenders of the incremental concept (Jones, 1962; Keppel and Underwood, 1962; Postman, 1963) have taken issue with Estes’ conclusions, and it would appear that this fundamental question about memory will loom large in theory and experiments for some time to come. At a somewhat different level, the revival of experimental and theoretical interest in the notion of perseveration or consolidation of the memory trace (Glickman, 1961), and attempts to embody it in a general theory of learning (Hebb, 1949; Walker, 1958), have also focused attention on a theory of memory as a fundamental component of a theory of learning.


A second strong stimulus to research on memory from within psychology are several findings of the last few years that have forced major revisions in the interference theory of forgetting and consequently a renaissance of interest in it (Postman, 1961). First, there was the discovery by Underwood (1957) that proactive inhibition had been grossly underestimated as a source of interference in forgetting. Then, the unlearning factor as a component of retroactive inhibition was given greater credibility by the findings of Barnes and Underwood (1959). And finally, the joint consideration of the habit structure of the individual prior to a new learning experience, the compatibility or incompatibility of the new learning with that structure, and the unlearning factor (among others) led to the formulation of the interference theory of forgetting in terms that made it applicable to all new learning (Melton, 1961; Postman, 1961; Underwood and Postman, 1960). Thus, this development focuses attention on the interactions of memory traces during learning as well as their interactions at the time of attempted retrieval or utilization in recognition, recall, or transfer.


But perhaps the most vigorous force directing attention within psychology to the need for a general theory of memory is the spate of theorizing and research on immediate and short-term memory during the last five years. In 1958, and increasingly thereafter, the principal journals of human learning and performance have been flooded with experimental investigations of human short-term memory. This work has been characterized by strong theoretical interests, and sometimes strong statements, about the nature of memory, the characteristics of the memory trace, and the relations between short-term memory and the memory that results from multiple repetitions. The contrast with the preceding thirty years is striking. During those years most research on short-term memory was concerned with the memory span as a capacity variable, and no more. It is always dangerous to be an historian about the last five or ten years, but I venture to say that Broadbent’s Perception and Communication (1958), with its emphasis on short-term memory as a major factor in human information-processing performance, played a key role in this development. Fortunately, many of the others who have made important methodological and substantive contributions to this analysis of short-term memory have presented their most recent findings and thoughts in these Meetings on Memory, and they thus adequately document my assessment of the vigor and importance of this recent development. Therefore I will refrain from further documentation and analysis at this point, since the impact of some of these findings on our theory of memory is my main theme.





A theory of memory is becoming important for a number of different reasons, and somehow all of these reasons properly belong to a comprehensive theory of memory. Its storage mechanism is the principal concern of biochemists and neurophysiologists: the morphology of its storage—whether as a multiplexed trace system with one trace per repetition, or a single trace system subjected to incremental changes in “strength” by repetition—is becoming a principal concern of learning theorists; its susceptibility to inhibition, interference, or confusion both at the time of new trace formation and at the time of attempted trace retrieval or utilization is the concern of forgetting and transfer theorists; and the perhaps unique properties of its manifestation in immediate and short-term retention is the principal concern of psychologists interested in human information-processing performance. One knows intuitively that all of these different approaches emphasize valid questions or issues that must be encompassed by a general theory of memory, but nowhere—with perhaps the exception of Gomulicki’s (1953) historical-theoretical monograph on memory-trace theory—will one find explicit systematic consideration of these several different facets of the problem of memory.


Since my present intention is to marshal some data relevant to one of the main issues in a general theory of memory—namely, the question of whether single-repetition, short-term memory and multiple-repetition, long-term memory are a dichotomy or points on a continuum—I feel compelled to discuss briefly what I believe to be the proper domain of a theory of memory and to differentiate it from a theory of learning.


After some exclusions that need not concern us here, learning may be defined as the modification of behavior as a function of experience. Operationally, this is translated into the question of whether (and, if so, how much) there has been a change in behavior from Trial n to Trial n + 1. Any attribute of behavior that can be subjected to counting or measuring operations can be an index of change from Trial n to Trial n + 1, and therefore an index of learning. Trials n and n + 1 are, of course, the presentation and test trials of a so-called test of immediate memory or they may be any trial in a repetitive learning situation and any immediately subsequent trial. By convention among psychologists, the change from Trial n to Trial n + 1 is referred to as a learning change when the variable of interest is the ordinal number of Trial n and not the temporal interval between Trial n and Trial n + 1, and the change from Trial n to Trial n + 1 is referred to as a retention change when the variable of interest is the interval, and the events during the interval, between Trial n and Trial n + 1. Learning and retention observations generally imply that the characteristics of the task, situation, or to-be-formed associations remain the same from Trial n to Trial n + 1. When any of these task or situation variables are deliberately manipulated as independent variables between Trial n and Trial n + 1, the object of investigation is transfer of learning, i.e., the availability and utilization of the memorial products of Trial n in a “different” situation.


Now, these operational definitions of learning, retention, and transfer are completely aseptic with respect to theory, and I think it is important to keep them so. In part, this is because it is useful to keep in mind the fact that learning is never observed directly; it is always an inference from an observed change in performance from Trial n to Trial n + 1. Furthermore—and this is the important point for theory—the observed change in performance is always a confounded reflection of three theoretically separable events: (i) the events on Trial n that result in something being stored for use on Trial n + 1; (ii) the storage of this product of Trial n during the interval between Trials n and n + 1; and (iii) the events on Trial n + 1 that result in retrieval and/or utilization of the stored trace of the events on Trial n. For convenience, these three theoretically separable events in an instance of learning will be called trace formation, trace storage, and trace utilization.

Obviously, a theory of learning must encompass these three processes. However, it must also encompass other processes such as those unique to the several varieties of selective learning and problem solving. Some advantages will accrue, therefore, if the domain of a general theory of memory is considered to be only a portion of the domain of a theory of learning; specifically, that portion concerned with the storage and retrieval of the residues of demonstrable instances of association formation. This seems to me to fit the historical schism between learning theories and research on memory and the formal recognition of this distinction may well assist in avoiding some misconceptions about the scope of a theory of memory. Historically, our major learningtheories have not felt compelled to include consideration of the question whether storage of the residue of a learning experience (Trial n) is subject to autonomous decay, autonomous consolidation through reverberation, or to even consider systematically the memory-span phenomenon. On the other hand, much of the controversy between learning theorists surrounds the question of the necessary and sufficient conditions for association (or memory trace) formation. And even though most learning theories must say something about the conditions of transfer, or utilization of traces, they do not always include explicit consideration of the interference theory of forgetting or alternative theories. As for those who have been concerned with memory theory, they have, following Ebbinghaus (1885), employed the operations of rote learning, thus avoiding in so far as possible the problems of selective learning and insuring the contiguous occurrence of stimulus and response under conditions that demonstrably result in the formation of an association. Their emphasis has been on the storage and retrieval or other utilization of that association, i.e., of the residual trace of it in the central nervous system (CNS), and on the ways in which frequency of repetition and other learning affect such storage and retrieval.


The implication of this restriction on the domain of a theory of memory is that the theory will be concerned with post-perceptual traces, i.e., memory traces, and not with pre-perceptual traces, i.e., stimulus traces. It seems to me necessary to accept the notion that stimuli may affect the sensorium for a brief time and also the directly involved CNS segments, but that they may not get “hooked up,” associated, or encoded with central or peripheral response components, and may not, because of this failure of being responded to, become a part of a memory-trace system. This view is supported by the recent work of Averbach and Coriell (1961), Sperling (1960), and Jane Mackworth (1962) which shows that there is a very-short-term visual pre-perceptual trace which suffers rapid decay (complete in .3 to .5 sec.). Only that which is reacted to during the presentation of a stimulus on during this post-exposure short-term trace is potentially retrievable from memory. While it is not necessary to my argument to defend this boundary for memory theory, because if I am wrong the slack will be taken up in a more inclusive theory of learning, it is of some interest that it is accepted by Broadbent (1963) and that it is consistent with a wealth of recent research on “incidental learning” in human subjects (Postman, in press).


What, then, are the principal issues in a theory of memory? These are about either the storage or the retrieval of traces. In the case of the storage of traces we have had four issues.2 The first is whether memory traces should be given the characteristic of autonomous decay over time, which was dignified by Thorndike (1913) as the Law of Disuse and which recently has been vigorously defended by Brown (1958). The antithesis is, of course, the notion that associations, once established, are permanent—a position initially formulated by McGeoch (1932) and incorporated in a radical form in Guthrie’s (1935) theory of learning.


The second storage issue is again an hypothesis about an autonomous process, but one involving the autonomous enhancement (fixation, consolidation) of the memory trace, rather than decay. The hypothesis was first formulated in the perseveration theory of Muller and Pilzecker (1900), with emphasis on the autonomous enhancement, or strengthening, of a memory trace if it was permitted to endure without interruption. As such, the emphasis was on a property of automatic “inner repetition” if repetition and duration are given a trade-off function in determining the strength of traces. More recently, the hypothesis has been that the memory trace established by an experience requires consolidation through autonomous reverberation or perseveration, if it is to become a stable structural memory trace in the CNS (Deutsch, 1962; Gerard, 1963; Glickman, 1961; Hebb, 1949). Presumably, the alternative view is that every experience establishes a structural memory trace without the necessity of consolidation through reverberation or perseveration, but also without denying that such reverberation or perseveration, if permitted, may strengthen the trace.


The third issue about storage is the one previously referred to as morphological (at the molecular level) in our brief reference to the current controversy about the all-or-none versus the incremental notions of association formation. The all-or-none notion implies that the increment in the probability of response on Trial n + 2 is a consequence of establishment of independentand different all-or-none trace systems on Trials n and n + 1; the incremental notion implies that the same trace system is activated in some degree on Trial n and then reactivated and strengthened on Trial n + 1. It is, of course, possible that both notions could be true.


The fourth issue about trace storage is actually one that overlaps the issues about retrieval or utilization of traces, and is perhaps the most critical current issue. This is the question whether there are two kinds of memory storage or only one. A duplex mechanism has been postulated by Hebb (1949), Broadbent (1958), and many others, and on a variety of grounds, but all imply that one type of storage mechanism is involved in remembering or being otherwise affected by an event just recently experienced, i.e., “immediate” or short-term memory for events experienced once, and that a different type is involved in the recall or other utilization of traces established by repetitive learning experiences, i.e., long-term memory or habit. Since a clean distinction between “immediate” memory and short-term memory is not possible (Melton, 1963), we shall henceforward refer to these two manifestations of memory as short-term memory (STM) and long-term memory (LTM).


Some principal contentions regarding the differences between the two memory mechanisms are that: (a) STM involves “activity” traces, while LTM involves “structural” traces (Hebb, 1949; 1961); (b) STM involves autonomous decay, while LTM involves irreversible, non-decaying traces (Hebb, 1949); and (c) STM has a fixed capacity that is subject to overload and consequent loss of elements stored in it, for nonassociative reasons, while LTM is, in effect, infinitely expansible, with failure of retrieval attributable mainly to incompleteness of the cue to retrieval or to interference from previously or subsequently learned associations (Broadbent, 1958; 1963). On the other hand, the monistic view with respect to trace storage is one which, in general, accepts the characteristics of LTM storage as the characteristics of STM storage as well, and thus ascribes to the traces of events that occur only once the same “structural” properties, the same irreversibility, the same susceptibility to associational factors in retrieval, as are ascribed to LTM.


The bridge to the theoretical problems of trace retrieval and utilization as major components of a theory of memory is obviously wrought by the issue of memory as a dichotomy or a continuum. Those who accept a dichotomy do so on the basis of data on retention, forgetting, or transfer that suggest two distinct sets of conditions for retrieval and utilization of traces; those who accept a continuum do so on the basis of data that suggest a single set of conditions or principles.


The history of our thought about the problems of retrieval and utilization of traces reveals three main issues. The first is the question of the dependence of the retrieval on the completeness of the reinstatement on Trial n + 1 of the stimulating situation present on Trial n. Psychologists have formulated several principles in an attempt to describe the relevant observations, but all of them may be subsumed under a principle which asserts that the probability of retrieval will be a decreasing function of the amount of stimulus change from Trial n to Trial n + 1. Changes in directly measured and manipulated cue stimuli, like the CS in a classical conditioning experiment, that result in decrement in response probability are generally referred to a sub-principle of stimulus generalization (Mednick and Freedman, 1960); changes in contextual stimuli that result in forgetting are usually referred to a sub-principle of altered stimulating conditions or altered set (McGeoch and Irion, 1952); and stimulus changes that occur in spite of all attempts to hold the stimulating situation constant are referred to a sub-principle of stimulus fluctuation (Estes, 1955). Since these are all principles of transfer, when they are employed to interpret failure of retrieval on Trial n + 1, it is clear that all principles of transfer of learning, whether they emphasize the occurrence of retrieval in spite of change or the failure of retrieval in spite of some similarity, are fundamental principles of trace retrieval and utilization. At this moment I see no necessary difference between the dual- and single-mechanism theories of memory with respect to this factor of stimulus change in retrieval, but there may be one implicit and undetected.


The second issue relates to the interactions of traces. Here, of course, is the focus of the interference theory of forgetting which has, in recent years, led us to accept the notion that retrieval is a function of interactions between prior traces and new traces at the time of the formation of the new traces, as well as interactions resulting in active interference and blocking of retrieval. This theory was given its most explicit early expression in the attack by McGeoch (1932) on the principle of autonomous decay of traces, and has been refined and corrected in a number of ways since then (Postman, 1961). In its present form it accepts the hypothesis of irreversibility of traces and interprets all failures of retrieval or utilization as instances of stimulus change or interference. Therefore, it implicitly accepts a onemechanism theory of memory. However, it has been recognized (Melton, 1961) that the principal evidence for the theory has come from the study of retrieval following multiple-repetition learning, and that the extension of the theory to STM is not necessarily valid. Since dual-mechanism theorists assert that retrieval in STM is subject to disruption through overloading, but not through associative interference, a prime focus of memory theory becomes the question of associative interference effects in STM.


A third important issue related to retrieval is the relationship between repetition and retrieval probability. While the fact of a strong correlation between repetition and probability of retrieval seems not to be questionable, there are two important questions about repetition that a theory of memory must encompass. The first of these is the question of whether repetition multiplies the number of all-or-none traces or whether it produces incremental changes in the strength of a trace. This has already been listed as a problem in storage, but it is obvious that the alternative notions about storage have important implications for the ways in which repetitions may be manipulated to increase or decrease probability of retrieval. The second is the question of whether there is a fundamental discontinuity between the characteristics of traces established by a single repetition and those established by multiple repetitions (or single repetitions with opportunity for consolidation). This appears to be the contention of the dual-mechanism theorists; whereas, a continuum of the effects of repetition in the establishment of “structural,” permanent traces seems to be the accepted position of the single mechanism theorists.


In summary so far, when the domain of a theory of memory is explicitly confined to the problems of the storage and retrieval of memory traces, it becomes possible to formulate and examine some of the major theoretical issues under the simplifying assumption that the formation of the associations or memory traces has already occurred. Then it becomes clear that the conflicting notions with respect to the properties of trace storage and the conflicting notions with respect to the principal determinants of trace retrieval, or failure thereof, converge on the more fundamental issue of the unitary or dual nature of the storage mechanism. My plan is to examine these alleged differences between STM and LTM in the light of some recent studies of human short-term memory, and then return to a summary of the implications these studies seem to have for the major issues in a general theory of memory.





The contrasting characteristics of STM and LTM that have led to the hypothesis that there are two kinds of memory have not, to my knowledge, been considered systematically by any memory theorist, although Hebb (1949), Broadbent (1957; 1958; 1963), and Brown (1958) have defended the dichotomy.


The decay of traces in immediate memory, in contrast to the permanence, even irreversibility, of the memory traces established through repetitive learning, is the most universally acclaimed differentiation. For Hebb (1949) this rapid decay is a correlate of the non-structural, i.e., “activity,” nature of the single perception that is given neither the “fixation” effect of repetition nor the opportunity for “fixation” through reverberation. For Broadbent (1957; 1958) and Brown (1958) this autonomous decay in time is a property of the postulated STM mechanism, and attempts have been made (e.g., Conrad and Hille, 1958) to support the notion that time per se is the critical factor in decay. Obviously, this autonomous decay can be postponed by rehearsal—recirculating through the short-term store (Broadbent, 1958) —and Brown (1958) has maintained that such rehearsal has no strengthening effect on the structural trace. However, the decay of a specific trace begins whenever rehearsal is prevented by distraction or overloading of the short-term store (Broadbent, 1957; 1958). A corollary of this last proposition is that the initiation of the decay process, by dislodging the trace from the short-term store, is not dependent on new learning and therefore not on the associative interference principles which account for most if not all of the forgetting of events that reach the long-term store through repetition, reverberation, or both (Broadbent, 1963).


These characteristics contrast sharply with those attributed to LTM by the interference theory of forgetting which has dominated our thinking since McGeoch’s (1932) classical attack on the Law of Disuse and which has gained new stature as a consequence of recent refinements (Melton, 1961; Postman, 1961). This theory implies: (a) that traces, even those that result from single repetitions, are “structural” in Hebb’s sense, and are permanent except as overlaid by either the recovery of temporarily extinguished stronger competing traces or by new traces; and (b) that all persistent and progressive losses in the retrievability of traces are to be attributed to such associative interference factors, and not to decay or to a combination of nonassociative disruption plus decay. And, as a consequence of these two implications, it is assumed that the effect of repetition on the strength of the single type of trace is a continuous monotonic process. On this basis a continuum is assumed to encompass single events or sequential dependencies between them when these events are well within the span of immediate memory and also complex sequences of events, such as in serial and paired-associate lists, that are far beyond the span of immediate memory and thus require multiple repetitions for mastery of the entire set of events or relations between them.


My discussion of the question: “STM or LTM; continuum or dichotomy?” will therefore examine some experimental data on STM to see (a) whether they are interpretable in terms of the interference factors known to operate in LTM, and (b) whether the durability of memory for sub-span and supra-span to-be-remembered units is a continuous function of repetitions.


The reference experiments that provide the data of interest are those recently devised by Peterson and Peterson (1959) and Hebb (1961), with major emphasis on the former. While a number of ingenious techniques for investigating STM have been invented during the last few years, I believe that the Peter-sons’ method is the key to integration of retention data on immediate memory, STM and LTM. This is because, as you will see, it can be applied to to-be-remembered units in the entire range from those well below the memory span to those well above it, and the control and manipulation of duration and frequency of presentation are essentially continuous with those traditionally employed in list memorization.


In what must have been a moment of supreme skepticism of laboratory dogma, not unlike that which recently confounded the chemist’s dogma that the noble gases are nonreactive (Abelson, 1962), Peterson and Peterson (1959) determined the recallability of single trigrams, such as X-J-R, after intervals of 3, 6, 9, 12, 15, and 18 sec. The trigrams were presented auditorily in 1 sec., a 3-digit number occurred during the next second, and S counted backward by 3’s or 4’s from that number until, after the appropriate interval, he received a cue to recall the trigram. The S was given up to 14 sec. for the recall of the trigram, thus avoiding any time-pressure in the retrieval process. The principal measure of retention was the frequency of completely correct tri- grams in recall.


The results of this experiment are shown in Fig. 1. It is noteworthy that the curve has the Ebbinghausian form, even though the maximum interval is only 18 sec., and that there is an appreciable amount of forgetting after only 3 and 6 sec. Other observations reported by the Petersons permit us to estimate that the recall after zero time interval, which is the usual definition of immediate memory, would have been 90 percent, which is to say that in 10 percent of the cases the trigram was misperceived, so that the forgetting is actually not as great as it might appear to be. Even with this correction for misperception, however, the retention after 18 sec. would be only about 20 percent, which is rather startling when one remembers that these trigrams were well below the memory span of the college students who served as Ss.


The rapid deterioration of performance over time is not inconsistent with the decay theory, nor is it necessarily inconsistent with the notion that traces from single occurrences of single items are on a continuum with traces from multiple items learned through repetition. However, additional data with the same method were soon forthcoming. Murdock (1961) first replicated the Peterson and Peterson experiment with 3-consonant trigrams, and then repeated all details of the experiment except that, in one study he used single common words drawn from the more frequent ones in the Thorndike-Lorge word lists, and in another study he used word triads, i.e., three-unrelated common words, as the to-be-remembered unit.


Murdock’s results from these three experiments are shown alongside the Petersons’ results in Fig. 1. His replication of the Petersons’ study with trigrams gave remarkably similar results. Of considerable significance, as we will see later, is his finding that single words show less forgetting than did the trigrams, but that some forgetting occurs with even such simple units. Finally, the most seminal fact for theory in these experiments is his discovery that word triads act like 3-consonant trigrams in short- term retention.



Murdock’s data strongly suggested that the critical determinant of the slope of the short-term retention function was the number of Millerian (1956) “chunks” in the to-be-remembered unit. Of even greater importance, from my point of view was the implication that, other things being equal, the rate of forgetting of a unit presented once is a function of the amount of intra-unit interference, and that this intra-unit interference is a function of the number of encoded chunks within the item rather than the number of physical elements, such as letters, or information units.


The first of several projected experimental tests of this hypothesis has been completed.3 The to-be-remembered units were 1, 2, 3, 4, or 5 consonants. The unit, whatever its size, was presented visually for 1 sec., and read off aloud by S. Then .7 sec. later a 3-digit number was shown for 1 sec. and removed. The S read off the number and then counted backward aloud by 3’s or 4’s until a visual cue for recall, a set of 4 asterisks, was shown. The delayed retention intervals were 4, 12, and 32 sec., and a fourth condition involved recall after only .7 sec., hereafter referred to as the zero interval. The Ss were given 8 sec. for the recall of each item. In the course of the experiment each S was tested four times at each combination of unit size and interval for a total of 80 observations. Every condition was represented in each of 4 successive blocks of 20 observations, and there was partial counterbalancing of conditions within the blocks and of to-be-remembered units between the blocks. Through my error, the to-be-remembered units of each specific size were not counterbalanced across the four retention intervals. Thanks only to the power of the variable we were investigating, this did not, as you will see, materially affect the orderliness of the data.


The results for the last two blocks of trials are shown in Fig. 2. Again, the measure of recall performance is the percentage

of completely correct recalls of the to-be-remembered unit, i.e., the single consonant had to be correct when only one was presented, all five consonants had to be correct and in the proper order when the 5-consonant unit was presented. The same relationships hold when Ss are not as well-practiced in the task, i.e., in Blocks 1 and 2, although the absolute amounts of forgetting are greater. The data in Fig. 2 are to be preferred to those for the earlier stages of practice, because all five curves in this figure have their origin very near to 100 percent recall. That is, in all cases it is possible to assume that Ss had, in fact, learned the to- be-remembered unit during the 1-sec. presentation interval.


Aside from the self-evident generalization that the slope of the short-term forgetting curve increases as a direct function of the number of elements in the to-be-remembered unit, two features of these data are worthy of special attention. First, it should be noted that the slope of the curve for the 3-consonant units is not as steep as was reported by both Peterson and Peterson (1959) and by Murdock (1961). We do not know why there is this discrepancy, although it occurs consistently in our work with the Petersons’ method.


The other point of interest is the obvious forgetting of the one-consonant unit. This curve looks very much like the one obtained by Murdock for single words. Both findings have signify chance for theory because they represent instances of forgeting when the intra-unit interference is at a minimum for verbal units. But before giving additional consideration to this point, a further set of data from this experiment needs to be presented and a more general statement of the observed relationships deserves formulation.





If the increased slopes of the forgetting curves shown in Fig. 2 are attributed to an increase in intra-unit interference, it is of some importance to show that the more frequent breakdown of complete recall as one increases the number of letters in the to- be-remembered unit is not merely a breakdown in the sequential dependencies between the letters, but is also reflected in the frequency of correct recall of the first letter of the unit. In Fig. 3 are shown the percentages of first-letter recalls in the last two blocks of our experiment. Although they are lacking in the monotonic beauty of the curves for whole units correct, I am willing to accept the generalization that first-letter recall suffers interference as a function of the number of other letters in the to-be- remembered unit. Thus, what Peterson (1963) has called “background conditioning,” and is measured by the recall of first letters, and what he has called “cue learning,” and is represented by sequential dependencies in recall, are affected alike by the number of elements in the to-be-remembered unit. This is expected in so far as there is functional parallelism between “free” recall and serial or paired-associate recall with respect to the effect of learning and interference variables (Melton, 1963).



In Fig. 4 the results obtained so far have been generalized and extrapolated. This set of hypothetical curves will be used as the conceptual anchor for three points that are related to the question whether short-term and long-term memory are a dichotomy or points on a continuum. The first, and most obvious, point about the figure is that it reaffirms the notion that intraunit interference is a major factor in the short-term forgetting of sub-span units, but now the parameter is the number of encoded chunks, instead of the number of physical elements or information units. This is consistent with Miller’s (1956) cogent arguments for the concept of chunk as the unit of measurement of human information-processing capacities. It is also the unit most likely to have a one-to-one relationship to the memory trace. Obviously, it is also the concept demanded by the parallelism of the findings of Murdock with 1 and 3 words and our findings with 1 to 5 consonants, even though it cannot, of course, be asserted that the number of elements beyond one in these experiments, be they words or consonants, stand in a one-to-one relationship to the number of chunks. Even though the strings of consonants in our experiment were constructed by subtracting from or combining consonant trigrams of Witmer (1935) association values less than 60 percent, there were surely some easy-to-learn letter sequences and some hard-to-learn letter sequences. That such differences in meaningfulness are correlated with chunkability is well known (Underwood and Schulz, 1960). Also, Peterson, Peterson, and Miller (1961) have shown, although on a limited scale, that the meaningfulness of CVC trigrams is positively correlated with recall after 6 sec. in the Petersons’ situation. But perhaps the greatest gain from the use of the chunk as the unit of measurement in formulating the otherwise empirical generalization is a suggestion this yields about how we may get a handle on that intervening variable. It suggests to me that we may be able to establish empirical benchmarks for 1, 2, 3, . . . ,n chunks in terms of the slopes of short-term memory functions and then use these slopes to calibrate our verbal learning materials in terms of a chunk scale.


The evidence that the slope of the short-term forgetting curve increases dramatically as a function of the number of encoded chunks in the unit is evidence against autonomous decay being a major factor, but it does not deny that such decay may occur. It is evidence against decay as a major factor because: (a) a single consonant was remembered with very high frequency over a 32- sec. interval filled with numerical operations that surely qualify as overloading and disrupting activities (if one grants that the Petersons’ method adequately controls surreptitious rehearsal); and (b) the major portion of the variance in recall is accounted for by intra-unit interference, rather than time. It does not deny that decay may occur, since there was some forgetting of even the single consonant (and of the single word in Murdock’s experiment) even though only one “chunk” was involved, and intraunit interference was at a minimum.


The reason for the forgetting of the single chunk is, I believe, to be found in the other sources of interference in recall in this type of experiment. In the first place, I presume that no one will argue that counting backward aloud is the mental vacuum that interference theory needs to insure the absence of retroactive inhibition in the recall of the to-be-remembered unit, nor is it necessarily the least interfering, and at the same time rehearsal-preventing, activity that can be found for such experiments. However, we must leave this point for future research, because we have none of the systematic studies that must be done on the effects of different methods of filling these short retention intervals, and we also have no evidence, therefore, on the extent to which retroactive interference and intra-unit interference interact.


On the other source of interference which may explain the forgetting of the single chunk—namely, proactive interference (PI) —we do have some evidence. Peterson (1963) has maintained, on the basis of analysis of blocks of trials in the original Peterson and Peterson (1959) study, that there is no evidence for the build-up of proactive inhibition in that experiment, only practice effects. However, this evidence is unconvincing (Melton, 1963) when practice effects are strong, and if it is assumed that proactive inhibition from previous items in the series of tests may build up rapidly but asymptote after only a few such previous items. Such an assumption about a rapidly achieved high steady- state of PT is given some credence by the rapid development of a steady-state in frequency of false-positives in studies of short- term recognition memory (Shepard and Teghtsoonian, 1961).


A second, and powerful, argument for large amounts of PT throughout the Peterson type of experiment is the frequency of overt intrusions from previous units in the series during the attempt to recall an individual unit. Murdock (1961) found such intrusions in his studies of short-term retention of words, and there was the strong recency effect among these intrusions that is to be expected if the steady-state notion is valid. The analysis of such intrusions in studies involving letters rather than words is limited by the identifiability of the source of the intrusions, but all who run experiments with letters become convinced that such intrusions are very common and usually come from the immediately preceding units.4



More systematic evidence for strong PT effects in STM in the Petersons’ situation is given by Keppel and Underwood (1962). A representative finding is shown in Fig. 5. A three-consonant item which is the first item in the series is recalled almost perfectly after as long as 18 sec., and PT builds up rapidly over items, especially for the longer retention interval. These data support the notion that there is substantial PT in the Peterson and Peterson experiment on short-term memory for single verbal units. As such, they, as well as the other evidence cited, indicate that the small amount of forgetting of single consonants or single words over short intervals of time may be partly, if not entirely, attributable to the PI resulting from sequential testing of recall of such items. Keppel and Underwoods’ results do not, however, support the view that the PI reaches a steady state in as few as five items, but this does not necessarily deny the steady-state notion. Also, a careful study of these data and the data on intraunit interference suggests some strong interactions between PI, intra-unit interference (II), and the retention interval, all of which would support the interference interpretation, but discussion of these interactions would be tedious and unrewarding until properly designed experiments have been performed.


My conclusion from all this is that there is sufficient direct or inferential evidence for PI, RI, and II in the short-term retention of single sub-span verbal units, and that the PI and potential RI may account for the observed forgetting of one- chunk units, that is, when TI is minimal. So much for interference.


The other line of investigation that needs to be considered before the question of continuum versus dichotomy can be properly assessed has to do with the effect of repetition on the short- term memory for sub-span and just supra-span strings of elements or chunks.


The concept of the memory span is rather important in this discussion because it is the boundary between the number of elements, or chunks, that can be correctly reproduced immediately after a single repetition and the number of elements, or chunks, that require two or more repetitions for immediate correct reproduction. Interestingly enough, the short-term forgetting curve for a unit of memory-span length turns out to be the limiting member of the hypothetical family of curves that has been used to generalize the relationship between the slope of the forgetting curve and the number of chunks in the to-be-remembered unit. The extrapolated forgetting curve for a unit of memory-span length is shown as the dotted-line curve of Fig. 4.


The origin of this limiting curve on the ordinate will, of course, depend on the statistical definition of the span of immediate memory, but in order to be consistent I have placed it in Fig. 4 at or near 100 percent recall after zero interval. It is also assumed that the presentation time for this and all other smaller numbers of chunks is just sufficient for one perceptual encoding of each element, i.e., for one repetition. For a unit of span length it is not surprising that a precipitous decline of completely correct recall to zero is expected when only very short, but filled, delays are introduced before recall begins. No experiment in the literature fits exactly these operational requirements, but the prediction is a matter of common experience in looking up telephone numbers, and we also have Conrad’s (1958) evidence that Ss show a radical reduction in correct dialing of 8-digit numbers when required merely to dial “zero” before dialing the number.


At this point we are brought face to face with the question of the effects of repetition of sub-span and supra-span units on their recall. Such data are important for at least two reasons. In the first place, the argument for a continuum of STM and LTM requires that there be only orderly quantitative differences in the effects of repetition on sub-span and supra-span units. In the second place, if repetition has an effect on the frequency of correct recall of sub-span units, such as consonant trigrams, this must certainly have some significance for the conceptualization of the strength of a memory trace—whether it is all-or- none or cummulative.


The effect of time for rehearsal of a set of items before a filled retention interval was first studied by Brown (1958) . His negative results led him to the conclusion that recirculation of information through the temporary memory store merely delays the onset of decay, but does not strengthen the trace. However, the original Peterson and Peterson (1959) report on the retention of consonant trigrams included an experiment which showed a significant effect of instructed rehearsal on short-term retention.


Fortunately, we now have available a report by Hellyer (1962) in which consonant trigrams were given one, two, four, or eight 1-sec. visual presentations before retention intervals of 3, 9, 18, and 27 sec. His data are shown in Fig. 6 and require little comment. Obviously, a consonant trigram is remembered better with repetition even though it is completely and correctly perceived and encoded after only one repetition, as judged by the immediate recall of it. The slopes of the retention curves in our hypothetical family of curves based on the number of chunks in the to-be-remembered unit are, therefore, a joint function of chunks and repetitions. Or perhaps a better theoretical statement of this would be to say that repetition reduces the number of chunks in the to-be-remembered unit. This is why one word and one consonant have the same rate of forgetting.



As for the effect of repetition on just supra-span units, we have no data directly comparable to those of Hellyer for sub-span units, but we have data from a much more severe test of the repetition effect. I refer to the method and data of Hebb’s (1961) study in which he disproved to his own satisfaction his own assumption about “activity” traces. In this experiment he presented a fixed set of 24 series of 9-digit numbers. Each of the digits from 1 to 9 was used only once within each to-be-remembered unit. The series was read aloud to S at the rate of about 1 digit/sec., and S was instructed to repeat the digits immediately in exactly the same order. The unusual feature of the experiment was that exactly the same series of digits occurred on every third trial, i.e., the 3rd, 6th, 9th . . . 24th, and others varying in a random fashion.


His results are shown in Fig. 7. Hebb considered the rising curve for the repeated 9-digit numbers, when contrasted with the flat curve for the nonrepeated numbers, to be sufficient basis for concluding that some form of structural trace results from a single repetition of an associative sequence of events. Further, he properly considers this to be a demonstration of the cumulative structural effects of repetition under extremely adverse conditions involving large amounts of RI.



Hebb’s method in this experiment may well be another important invention in the analysis of human memory. But I was not completely satisfied with his experiment and the reliability of his findings, for reasons that need not be detailed here. As a consequence of these uncertainties, I have repeated and extended Hebb’s experiment by giving each of 32 women Ss two practice numbers and then 80 tests for immediate recall of 9-digit numbers. Within these 80 tests there were 4 instances in which a specific 9-digit number occurred 4 times with 2 other numbers intervening between successive trials, 4 in which a specific number occurred 4 times with 3 intervening numbers, 4 for 4 trials with 5 intervening numbers and 4 for 4 trials with 8 intervening numbers. In addition, there were 16 9-digit numbers that occurred only once. I will not try to describe the interlocking pattern of events that was used to achieve this design, but the design chosen was used in both a forward and backward order for different Ss, and the specific repeated numbers were used equally often under the different spacings of repetitions. Furthermore, within the entire set of 32 different 9-digit numbers used in this experiment, interseries similarities were minimized by insuring that no more than two digits ever occurred twice in the same order. The numbers were presented visually for 3.7 sec. and S recorded her response by writing on a 3 x 5 in. card which contained 9 blocks. Recall began .7 sec. after the stimulus slide disappeared, and 8.8 sec. were allowed for recall.


Unfortunately, my Ss behaved in a somewhat more typical fashion than did Hebb’s in that they showed substantial nonspecific practice effects. This complicates the determination of the effects of specific repetition, because later trials on a particular 9-digit number must always be later in practice than earlier trials, and also because this confounding of specific and nonspecific practice effects is more serious the greater the interval between repetitions of a specific number. This confounding has been eliminated, at least to my satisfaction, by determining the function that seemed to be the most appropriate fit to the practice curve based on first occurrences of specific numbers. This function was then used to correct obtained scores on the 2nd, 3rd, and 4th repetitions of a specific number in a manner and amount appropriate to the expected nonspecific practice effect.

A preferred measure of the effect of repetition in this situation is the mean number of digits correctly recalled in their proper positions. In Fig. 8 is shown the mean number of digits correctly recalled, as a function of ordinal position of the first occurrence of a 9-digit number within the experimental session. This merely confirms my statement about practice effects; exhibits the equation used for corrections for general practice effects; and permits observation of the large variability of mean performance in this type of experiment.




The principal data from the experiment are shown in Fig. 9. The effect of repetition of a specific 9-digit number is plotted, the parameter being the number of other different 9-digit numbers that intervened between successive repetitions of the specific number. In these curves the points for first-repetition performance are obtained points, and those for performance on the 2nd, 3rd, and 4th repetitions have been corrected for nonspecific practice effects. In Fig. 10 these last data are expressed as gains in performance over performance on the first occurrence of a number. Comparable data for gains in the frequency with which entire 9-digit numbers were correctly recalled show the same relationships.


These data not only confirm the Hebb data, they also add material substance to an argument for a continuum of immediate, short-term, and long-term memory. Just as a continuum theory would have predicted Hebb’s results with two intervening numbers between repetitions of a specific number, it also would predict that the repetition effect would be a decreasing function of the number of intervening numbers because between-repetition retroactive inhibition is being increased. Even so, I am not sure that any theory would have predicted that one would need to place as many as 8 other 9-digit numbers in between repetitions of a specific 9-digit number before the repetition effect would be washed out. Surely, the structural memory trace established by a single occurrence of an event must be extraordinarily persistent.


With respect to our hypothetical family of retention curves based on the number of chunks in the to-be-remembered unit, we can now with some confidence say that events which contain chunks beyond the normal memory span can be brought to the criterion of perfect immediate recall by reducing the number of chunks through repetition. If this empirical model involving chunks and repetitions to predict short-term forgetting is valid, it should be possible to show that a supra-span 9-chunk unit that is reduced to 7 chunks through repetition, would have the short- term forgetting curve of a 7-chunk unit, and one reduced through repetition to a 3-chunk unit should have a 3-chunk short-term forgetting curve. Even though this prediction is probably much too simple-minded, it now requires no stretch of my imagination to conceive of the “immediate” or short-term memory for single units and the memory for memorized supra-span units, like 12 serial nonsense syllables or 8 paired associates, as belonging on a continuum.




We may now turn to the implications these data on short- term memory seem to me to have for a theory of memory. I will attempt no finely spun theory, because such is neither my talent nor my interest. Also, I can be brief because, aged Functionalist that I am, I would be the first to admit—even insist—that my inferences are stated with confidence only for the storage and retrieval of verbal material demonstrably encoded by adult human Ss.


The duplexity theory of memory Storage must, it seems to me, yield to the evidence favoring a continuum of STM and LTM or else come up with an adequate accounting for the evidence presented here. My preference is for a theoretical strategy that accepts STM and LTM as mediated by a single type of storage mechanism. In such a continuum, frequency of repetition appears to be the important independent variable, “chunking” seems to be the important intervening variable, and the slope of the retenti9n curve is the important dependent variable. I am persuaded of this by the orderly way in which repetition operates on both sub-span units and supra-span units to increase the probability of retrieval in recall, and also by the parallelism between STM and LTM that is revealed as we look at STM with the conceptual tools of the interference theory of forgetting which was developed from data on LTM.


The evidence that implies a continuum of STM and LTM also relates, of course, to some of the other issues about the characteristics of memory storage. While it is perhaps too early to say that the autonomous decay of traces has no part in forgetting, whether short-term or long-term, I see no basis for assuming that such decay has the extreme rapidity sometimes ascribed to it or for assuming that it accounts for a very significant portion of the forgetting that we all suffer continually and in large amounts. On the contrary, the data from both STM and LTM tempt one to the radical hypothesis that every perception, however fleeting and embedded in a stream of perceptions, leaves its permanent “structural” trace in the CNS.

Insofar as I can understand the implications of the consolidation hypothesis about memory storage, I must concur with Hebb’s (1961) conclusion that his experiment demonstrates the fixation of a structural trace by a single repetition of an event and without the benefit of autonomous consolidation processes. In fact, I think that our repetition and extension of his experiment establishes that conclusion even more firmly, because it shows that the retrievability of the trace of the first experience of a specific 9-digit number is a decreasing function of the amount of reuse of the elements in the interval between repetitions. Therefore, as far as our present data go, it seems proper to conclude that a consolidation process extending over more than a few seconds is not a necessary condition for the fixation of a structural trace. This does not, of course, deny that consolidation may be a necessary condition in other types of learning or other types of organism, nor does it deny that types of experience (e.g., Kleinsmith and Kaplan, 1963; Walker, 1963) other than the mundane remembering of nonsense strings of letters or words may benefit from such autonomous consolidation processes if they are permitted to occur.

The issue as to whether memory traces are established in an incremental or all-or-none fashion can be refined, but not resolved, on the basis of our observations on short-term memory. Tn all of the experiments with the Petersons’ method, the initial operation was to insure that S encoded, i.e., learned, the to-be- remembered unit in a single 1-sec. presentation of it before the retention interval was introduced. This is “one-trial” learning in a more exact sense than has been true of various attempts to demonstrate the all-or-none principle in associative learning (Postman, 1963). Yet forgetting was rapid and strongly a function of the amount of potential intra-unit interference in the to-be- remembered unit. Also, this unit that was perfectly remembered after one repetition was better remembered after multiple massed repetitions. The proper question in the case of verbal associative learning seems, therefore, to be the characteristics of the trace storage that reflect the effects of repetitions on performance, rather than the question whether such associative connections reach full effective strength in one trial. The question of whether repetitions multiply the number of traces leading to a particular response or produce incremental changes in specific traces seems to me to be subject to direct experimental attack. Perhaps again because of my Functionalist background, I am inclined to believe that future research will show that both the multiplexing of traces and the incremental strengthening of traces results from repetition. Which mode of storage carries the greater burden in facilitating retrieval will depend on the variability of stimulation from repetition to repetition and the appropriateness of the sampling of this prior stimulation at the time of attempted retrieval.


Finally, with respect to the retrieval process, the theory of which is dominated by transfer theory for LTM, it seems that the placing of STM and LTM on a continuum—and the reasons for doing so—forces the interference theory of forgetting to include the prediction of forgetting in STM within its domain. At least, the testing of the theory in that context will extend its importance as a general theory of forgetting, if it survives the tests, and will quickly reveal the discontinuity of STM and LTM, if such is in fact the case.


Whatever may be the outcome of these theoretical and experimental issues in the next few years, of one thing we can be certain at this time. The revival of interest in short-term memory and the new techniques that have been devised for the analysis of short-term memory will enrich and extend our understanding of human memory far beyond what could have been accomplished by the most assiduous exploitation of the techniques of rote memorization of lists of verbal units. In fact, our evidence on STM for near-span and supra-span verbal units suggests that the systematic exploration of the retention of varying sizes of units over short and long time intervals will give new meaning to research employing lists.




This paper comprises, in substance, the author’s Vice-Presidential Address to Section I (Psychology) of the American Association for the Advancement of Science, 1962. The author is particularly indebted to the Center for Human Learning, University of California, Berkeley, where a research appointment during the Fall semester of 1962—1963 gave the freedom from academic routine and the stimulating discussions that led to the repetition of the Hebb experiment and also supported the preparation of this paper. Early exploratory studies on short-term memory and the experiment on the recall of different sized verbal units were supported by Project MICHIGAN under Department of the Army Contract DA-36-039- SC-78801, administered by the United States Army Signal Corps. Reproduction for any purpose of the United States Government is permitted.
For the purposes of this discussion, I am ignoring the hypothetical property of autonomous, dynamic changes within memory traces in the directions specified by gestalt laws (Koffka, 1935) . While the need for such an hypothetical property is not yet a dead issue (Duncan, 1960; Lovibond, 1958) , it has had very little support since the classical treatment of the matter by Hebb and Foord (1945).
This study and a subsequent one are graduate research projects of David Wulff and Robert G. Crowder, University of Michigan, and will be reported under the title: Melton, A. W., Crowder, R. G., and Wulff, D., Short-term memory for individual items with varying numbers of elements.
Apparent intrusions from preceding to-be-remembered units were very common in the 1- to 5-consonant experiment reported here, but the experimental design did not counterbalance first-order sequence effects over conditions and nothing meaningful can he said about such intrusions except that they occur with substantial frequency.



Unbiased and Unnoticed

Verbal Conditioning:

The Double Agent Robot Procedure1


HOWARD M. ROSENFELD and DONALD M. BAER, University of Kansas



Subjects who were told they were “experimenters” attempted to reinforce fluent speech in a supposed subject with whom they spoke via intercom. The supposed subject was to say nouns, one at a time, on request by the “experimenter,” who reinforced fluent pronunciation with points. Actually, the “experimenter” was talking to a multi-track tape recording, one track of which contained fluently spoken nouns, the other track containing disfluently spoken nouns. If the “experimenter’s” request for the next noun was in a specified form a word from the fluent track was played to him as reinforcement; requests in any other form produced the word from the disfluent track. Repeated conditioning of specific forms of requests was accomplished with two subject-”experimenters,” who were unable to describe changes in their own behavior, or the contingencies applied. This technique improved upon an earlier method that had yielded similar results, but was less thoroughly controlled against possible human bias.


The enduring interest in the conditioning of verbal behavior (Holz and Azrin, 1966) probably is attributable not only to the obvious importance of language in human behavior, but also to the special status accorded language in some non-behavioral or semi-behavioral theories. In this context, a particular body of research (Spielberger, 1965; Spielberger and DeNike, 1966) appears to have demonstrated that when verbal conditioning has proven possible in subjects, it has been accompanied by “awareness” in those subjects: it has occurred only in groups of subjects who could either state or recognize the contingencies of reinforcement applied to them. These results have been interpreted to indicate that changes in verbal responses were not attributable directly to the reinforcing function of the experimenter’s contingent verbal approval; rather they were mediated by the discriminative function of private recognitions of the reinforcement contingencies. This inference of a controlling “awareness,” derived from probes of introspection, is of course questionable. However, the inference becomes unnecessary if a verbal conditioning situation can be devised in which such probes fail to show any “awareness” to be explained. To facilitate such outcomes the present report describes an improvement on a technique designed by Rosenfeld and Baer (1969) for conditioning verbal behavior without “awareness.”


The original technique used by Rosenfeld and Baer required that the subject of the study be recruited for the nominal role of experimenter in a study of social reinforcement. This subject was told that he would interview another person, and in the course of that interview, would socially reinforce some selected response shown by the interviewee. The interviewee was in fact a confederate of the authors—a “double agent”—and served as the true experimenter of the study. The interviewee deliberately displayed a simple hand gesture (rubbing his chin) in a random way. The interviewer attempted to reinforce that gesture by nodding vigorously in consequence. The interviewer was also told that to keep the interviewee “involved” (and hence “conditionable”), it was necessary to prompt him verbally to give fuller answers to the interview question being asked. One interviewer used prompts such as “Yeah” and “Mm-hmm” for this purpose. The interviewee deliberately gave short answers, thus evoking a steady rate of prompting by the interviewer. The interviewee then selectively reinforced one of the prompts (“Yeah”) by displaying his gesture (chin rubbing) whenever that prompt occurred. Thus, while the interviewer prompted the interviewee to answer questions fully, and also attempted to reinforce the interviewee’s chin-rubs (by nodding at them), the interviewee in fact emitted those chin-rubs as reinforcement for a selected kind of verbal prompt by the interviewer. Conditioning not describable by the interviewer resulted.


A drawback of such interpersonal paradigms is that the experimenter himself is also reinforceable (by success), and thus may purposefully or inadvertently elicit critical responses from the subject by behaviors other than those formally designated as reinforcers. For example, in the original study, “Yeah” or “Mm-hmm” might have been differentially elicited by choice of words, inflection, or facial expression, in addition to being reinforced by experimentally controlled chin rubs. Thus, the human double agent was replaced with a semi-automated mechanism not susceptible to having its own behavior changed by unrecognized contingencies. In addition, such a mechanism requires virtually no training of special personnel and is typically more reliable than the human experimenter.






The subjects of this report (referred to below as “experimenters”) were two undergraduate college girls of 12 students initially contacted. They were asked to participate in a study of what makes people successful in influencing other people. They were offered a minimum payment of $1.00 each to participate, and the possibility of two additional dollars if they could influence another person and explain to the authors exactly what accounted for their success, respectively.


Setting and Instructions


The subject was told by an assistant that she would be the “experimenter” in a verbal conditioning study. She would operate alone in a laboratory room, to guard against accidentally giving “cues” to the supposed subject. Seated at a desk, she was shown her intercom with a manual press-to-talk switch, a pair of lever switches, a pair of counters, and a small light. A tape recorder on her desk played initial instructions, summarized in the following comments. The “experimenter” was told that the intercom connected her to her “subject” in another, nearby room, that if she pressed the intercom switch she could speak to the “subject” who would speak back to her through the intercom. She was also told that the lever switches produced points on an add-subtract counter located in front of the “subject,” one switch adding points, the other subtracting them. Her own counters would record the numbers of correct and incorrect responses that later would be produced by the “subject” (operated by the assistant who would monitor the intercom). The light would signal timeout periods, during which she would rest, make notes, and sometimes receive further instructions.


The “experimenter” was told that she was to attempt to condition the “subject,” specifically, some aspect of the “subject’s” speech. It was explained that the “subject” had already been told that her task was to emit nouns, when asked, one at a time. The task for the “experimenter,” then, was three-fold:


Use the intercom to tell the subject when to emit the next word.
Use the lever switches to add or subtract points on the counter before the “subject” at any time, to influence her noun-emitting behavior in some specific way.
Write down at any time whatever she thought might be responsible for any changes in the “subject’s” noun-emitting behavior. (Paper and pencil were supplied.)


The researchers and their apparatus were located in an adjoining room that allowed observation of the “experimenter” through a one-way window. The essential item of apparatus was a multi-channel tape recorder, programmed to play very brief segments of tape at any moment to the “experimenter” through her intercom. On one channel of tape a series of nouns had been recorded, at 3-sec intervals, each fluently enunciated. On a parallel channel, in the corresponding positions of the tape, the same nouns had been recorded, but enunciated in a disfluent manner, typically in the form “Uhh (noun).” Both tracks had been recorded by a professional actress, who read a list of 1000 nouns from a previously free-associated list, simulating the performance of an actual subject. A research assistant, listening to the “experimenter” request the next word from the subject over the inter- corn, could then play to her the next noun from either channel. (The relay-operated recorder stopped after any word had been played, thus remaining in position to play the next work at any time.)


The Practice Session


The first visit was described to the “experimenter” as a practice session, during which she would become familiar with the situation and the execution of her assignments. More importantly, it allowed an assessment of her typical use of various requests for nouns from the subject, so that one could be chosen for future verbal conditioning. The form of request chosen will be referred to as the “critical request.”


After the instructions were completed, the research assistant explained to the “experimenter” that in a moment the assistant would act out the role of the subject, so the “experimenter” could practice.


The assistant then retired to the next room. From there she played over the intercom a tape recording of her own voice (not the actress’) which contained a liberal sprinkling of animate and inanimate, singular and plural, and fluent and disfluent nouns. One word at a time was played, following each request by the “experimenter” for the next word. The kind and sequence of these requests were recorded for 30 min. The “experimenter” was then given an appointment to return, to attempt conditioning a “real subject.”


Meanwhile, the kinds and relative frequency of her requests for each next word were analyzed. (Typical requests were of the sort “Next,” “Next word,” “Go ahead, please,” “O.K.,” or “Now.”) One of these requests (e.g., “Next word”) was tentatively chosen as the critical request for future verbal conditioning. Criteria for such choice included a moderate frequency of use (not too close to 0 or 100 percent of all requests used), and some evidence of stability over the 30-min session (judged informally).


It should be noted that of 12 potential “experimenters”, some produced requests during their practice session that were either too unstable over time, or entirely too stable, to make them acceptable candidates for future conditioning. In these cases, the research assistant sometimes instructed the “experimenter” to be “more interesting.” These instructions generally eliminated such problems only briefly. The two “experimenters” described here showed satisfactory baselines of requests during the practice session.


The Experimental Session


Baseline period. When the “experimenter” returned for the second session, she was told that a “real subject” was in the next room, and that it was necessary to gather a baseline of that “subject’s” noun emitting behavior, so as to choose some aspect of it to condition. She then was left alone to interact via intercom with the multi-channel tape recording of the actress’ voice. A segment of tape was played containing a portion of disfluencies equal to the “experimenter’s” baseline proportion of critical requests during the practice session, and her rate of the critical request was checked for its current stability. The criterion for stability was that the rate of this request could vary no more than three responses out of 25, for at least two consecutive blocks of 25 requests each (a “nonsignificant” variation if the sequential requests met the assumptions of the binomial distribution). The rate during this baseline did not have to match the rate of the previous day’s baseline session; however, it had to comprise reliably between 20 and 80 percent of the responses per block. If less than 50 percent, it was selected for reinforcement (fluencies); if over 50 percent it was to be followed by disfluent responses. If the critical request did not meet this criterion of stability during the first four 25-response blocks of this session, it was abandoned as a candidate, and other forms of request were examined for stability. If no such request could be found by the sixth block, the subject was considered unsuitable, debriefed, paid, and dismissed. Debriefing was delayed if the subject was recruited from a group in which other members had not yet participated.


Conditioning period. Once a stable critical request had been chosen, the “experimenter’s” timeout light was illuminated, and the assistant returned to tell her that the “subject” had a characteristic rate of disfluency that should be a good target for influence through point addition or subtraction. (In case the “experimenter” had not noticed the disfluencies, they were imitated for her.) She was told to use points in any way that would decrease the rate of these disfluencies, and to keep notes about her techniques and their relative success. These notes were to be made whenever the timeout light was illuminated and at any other times that she wished. Further, she was told that the counters before her now would record the cumulative numbers of fluencies and disfluencies emitted by the subject, so that she could see how well her techniques had been working. She was to reset the counters during each timeout, so that they would always show current success or failure.

After these instructions the assistant left and the “experimenter” resumed her interaction with the tape, via intercom. From this point throughout the Conditioning Period, experimental contingencies operated as follows:


Each time the “experimenter” used the critical request, the next word played was from the fluent track of the tape; each time she used any other request, the next word played was from the disfluent track of the tape;
No more than five consecutive fluent or five consecutive disfluent nouns were played, even though the occasion called for another according to the first rule. (This was to reduce the probability that the “experimenter” would notice the contingency.)


Conditioning by these contingencies continued for at least three blocks of 25 requests each, and until a criterion of conditioning had been met. The criterion required that at least two consecutive 25-request blocks each contain at least enough critical requests to exceed the baseline rates of these requests at the 0.05 level of confidence as specified by tables of the binomial distribution.2 If conditioning to this criterion was not evident by the end of eight 25-request blocks, the subject was considered a failure, debriefed, paid, and dismissed.


During the Conditioning Period, a timeout was held typically after every third 25-request block to allow the “experimenter” to survey counters, write notes on the effectiveness of her techniques, and re-set the counters.

First reversal period. When the criterion of the Conditioning Period had been met, the contingencies of that period were reversed. Now, in general, it would be true that:


Each time the “experimenter” used the critical request, the next word played was from the disfluent track of the tape; each time any other request was used, the next word played was from the fluent track of the tape; except that:
Fluent nouns were played for only half of the non-critical requests of the first 25-request block of this Reversal Period. (This was to reduce the probability that the “experimenter” would notice an otherwise blatant reversal of the just-prior contingencies.)
Subsequent to the first block of 25 requests, no more than five consecutive fluent or five consecutive disfluent nouns were played, even though the occasion called for another according to the first rule.


Otherwise, experimental conditions during the First Reversal Period were similar to those of the Conditioning Period. The criterion of a successful reversal was similar to that of a successful conditioning, except that now performance was compared to that of the last block of the preceding Conditioning Period.


Second reversal period. Given a successful reversal according to the above criteria, the critical request was again subjected to the same contingencies used during the Conditioning Period, plus the qualification that only half of the critical requests of the first 25-request block during this period would be followed by fluent nouns. The same type of criterion for successful reversal was applied as had been used for the First Reversal Period.


Both subjects reports here finished within a single experimental session lasting 90 min. The session took place the day after the practice session.

Interview. At the conclusion of the Second Reversal Period (or on the occasion of earlier dismissal of subjects), an interview was conducted by the assistant to see if the “experimenter” could state the contingencies applied to her or describe the changes that had taken place in her verbal behavior. The interview procedure was adapted from standard procedures employed by Levin (1961) and Spielberger (1962). It began with fairly distant questions asking about what had happened, what techniques were used, and how well they worked, and progressed to increasingly detailed questions about all the contingencies holding between the “experimenter” and her subject.



The requests made by the “experimenter” were tape-recorded and also recorded verbatim in handwriting by a second assistant in the adjoining room. (Handwritten records allowed the immediate calculation of the rates of the “experimenter’s” critical requests, necessary to determine when the criteria of Baseline, Conditioning, First Reversal, and Second Reversal had been met.) Reliability of the handwritten records was established as 96 percent, by comparing them to the tape recordings of requests.




Six of 12 potential subjects examined met the criteria cited for stability of individual baseline within six blocks of 25 responses, during their practice sessions. Of these six subjects, two met all further criteria of successful conditioning, suppression, and reinstatement of the critical response. Inasmuch as variations in several experimental parameters between the subjects could have accounted for the differential successes, the question of generality or of specific conditions for unaware verbal conditioning cannot be answered in this study. The following accounts of the two successful cases are offered as evidence of the possibility of the effect. Of the remaining cases, one conditioned and was aware of the contingencies; two others unknowingly conditioned but failed to reverse; and one failed to condition at all.


Figure 1 displays the rate of critical request for the successful subjects. Subject A displayed several requests in apparently random fashion. The most stable of these was the phrase, “Next word”. In the Experimental Session Baseline Period her rate of “Next word” varied from 44 to 48 percent per block and was accordingly chosen as the critical request. The criterion for reliable conditioning was set at 60 percent for two consecutive blocks. This was achieved and surpassed during the sixth and seventh 25-request blocks of the Conditioning Period, as Fig. 1 shows.


The criterion for reliable reversal was set at 80 percent for two consecutive blocks. This criterion excluded the first reversal block, when only half of the non-critical responses produced fluencies, according to the experimental convention designed to avoid awareness. The reversal criterion was met almost immediately, and rate of critical response in fact fell below the criterion, reading 0 percent during the third 25-request block of the First Reversal Period. The Second Reversal Period followed a pattern similar to that of the First Reversal Period, but more quickly: rate of the critical request increased such that the third 25-request block contained 23 critical requests.



In contrast to the variability in baseline responses of Subject A, Subject B emitted only two responses throughout the study (“All right” and “O.K.”). In general, her results were similar to those for the first subject in that criteria for conditioning “O.K.” and two reversals were met. In this case, changes in rate following changes in contingency were more gradual. Inasmuch as her critical response never exceeded 75 percent of any block, the prescribed one-block 50 percent schedule was not employed.


Subject A wrote notes on her techniques four times after the conditioning phase began. The first of these stated that she supposed “the subject doesn’t seem to catch on at all.” By the second, Subject A had “produced” a high ratio of fluent responses and commented: “The new strategy seems to have worked much better. At first she seemed to think it was parts of the body but she still did not say ‘uh’—even after she went on to other words. She seems to have caught on consciously since she hasn’t made one mistake.” The next timeout came at the end of a successful reversal, and Subject A wrote: “At first did very badly like at beginning of exp. and then did O.K. again. Once she got going she never reverted back. Did not stick to any subject matter for a great length of time.” Virtually the same comments were written during the last opportunity, which followed the final reversal period.


In the terminal interview, carried out by a research assistant, Subject A offered several explanations, illustrated in the following transcriptions:


“Flow effective did you feel you were as an experimenter?”
“Hmm, well, I don’t think she ever caught on to what it was, so I consider that it was O.K. You know, because it’s kind of an unconscious thing.”
“What strategy were you using?”
(Repeated comments written in timeout periods).
“And you think it was the point-giving that influenced her?”
“I think so. It might have been the words she got on to; but yet it still changes. When she got on to the parts of the body like nose, throat, ear—maybe just because they’re sharp words, but when she used other words like ‘negotiation’ she didn’t go ‘uh, negotiation.’ So I think it must have been the points and not the subject matter.”
“I see. Did you think that anything else you might have done influenced her in any way?”
“Maybe I sound more pleased when she did well. I don’t know. I didn’t try to.”
“Maybe my inflection, uh, my own inflection.”
“Uh-huh, anything else?”
“I don’t know if my, you know, response would make a difference. Like if I would say ‘go on’ or ‘next.’”
“Did it seem to?”
“I don’t think so.”
“You don’t know that it did in any way?”


Subject B wrote only comments on the details of the various point-giving strategies she had employed. In her interview she produced no hints at all of any possible awareness that her verbal responses had any effect on her success.




The validity of inferring awareness of elicitation of post-behavioral interpretations from subjects is a matter of epistemological preference. Yet on the basis of pervasive evidence of such “awareness”, attempts have been made to diminish the significance of verbal conditioning. Evidence that there are conditions under which probes do not produce awareness should discourage such generalizations. More important, the availability of “awareness”-avoiding procedures can further conditioning research in general by unconfounding instructional and reinforcement effects.3


The apparently successful production of the double-agent effect in the present study indicates that verbal conditioning without awareness (as defined here) is a real possibility. Only one of the subjects even noted the possibility that her verbal behavior might somehow have contributed to her success. Even in this case the possibility was not stated during the regular within- experiment probes, but occurred only in response to extreme prodding and suggestion after the experimental session.


A major advantage of the current automated procedure over previous methods is that the robot experimenter isolates the subject (the “experimenter”) from uncontrolled sources of stimulation that are possible in any human experimenter (Rosenthal, 1966). Despite its artificiality, it nevertheless is apparently accepted by college students as real; all subjects seemed to believe they were dealing with a real person at the other end of the intercom. The use of a professional actress to record the tapes being played was probably an important part of the success of this illusion. She sometimes appeared momentarily at a loss for the next word, or amused by her choice, or curious (presumably about the listener’s reaction to the word), or even bored. The words she recorded were realistically balanced for variety and sequence. Thus, the “experimenter” might sometimes develop the hypothesis that the “subject” had fallen into a pattern (such as animal names), but soon would be forced to abandon that hypothesis.


The false leads implicit in the content of the taped noun sequences may indeed contribute to the overall effectiveness of the robot procedure, but it is presumably subordinate to the distraction from awareness provided by the subtle reversal of roles of subject and experimenter. While the effect of the double-agent procedure itself on awareness has not been directly demonstrated (by direct comparison with a control condition), the hypotheses produced by “experimenters” in this and the initial study suggest that they were attending to aspects of their relationship to the “subject” other than the “subject’s” attempt to manipulate their verbal behavior.


There probably are numerous other sources of distraction from awareness in interpersonal settings which could be submitted to experimental analysis. On the assumption that schedules of reinforcement are prominent among these possibilities, “experimenters” in the current study typically were allowed to receive only a limited number of consecutive reinforcers. Also, when very high rates of the critical request occurred during the conditioning phase, the contingencies of the subsequent reversal phase were faded in, rather than switched abruptly. While these procedures may have served to prevent awareness, they also may have contributed to a certain ineffectiveness in the reversal procedures, perhaps accounting for those “experimenters” who conditioned but failed to reverse. For example, by effectively putting a fully conditioned response on a fixed-ratio 6 schedule during the fading in of the reversal, that response may have been maintained even in the face of the disfluencies produced by five of every six emissions. Particularly if the disfluencies had no punishing function for the “experimenter,” fixed-ratio 6 could prove a reasonable maintenance schedule. Failure to reverse might be eliminated in future research by instructions to do better than one fluency in six; by starting reversals before too extreme a response shift has been produced; by a different convention concerning the number of consecutive reinforcements allowed; or by a different convention concerning the number of reinforce- able responses allowed reinforcement during the first 25-request blocks of reversal periods. Thus, a better balance between procedures designed to modify verbal behavior and procedures designed to prevent awareness of these modifications is an important problem for future methodological research.




The research was conducted at the Bureau of Child Research Laboratories in Lawrence, Kansas, and supported by Program Project Grant HP 00870 from the National Institnte of Child Health and Human Development. The authors appreciate the assistance of Pamela Gunnell and Charles Salzberg. Reprints may be obtained from the authors, Department of Psychology, university of Kansas, Lawrence, Kansas 66044.
In the absence of a rapid technique for testing for independence of sequential responses, the 0.05 level was used as a guide, not as an accurate estimate of the probability that such results could have occurred by chance.
When awareness of reinforcement contingencies has been induced by instructional sets, conditioning has been facilitated (DeNike and Spielberger, 1963).



Motivated Forgetting Mediated by Implicit Verbal Chaining:  A Laboratory Analog of Repression


SAM GLUCKSBERG and LLOYD J. KING, Princeton University


After learning an A—B paired-associates list, college students read a list of D words, several of which were consistently accompanied by unavoidable electric shock. The D words were members of implicit B—C, C—D chains, inferred from published word-association norms. In a subsequent recall test of the original A—B list, the B words that were implicitly associated with the shocked D words were forgotten significantly more often than control words.


Are memory items which are specifically associated with unpleasant events more readily forgotten than affectively neutral items? Despite the wealth of empirical and theoretical interest in this question, particularly with respect to the psychodynamic concept of repression, no simple and effective techniques for the study of motivated forgetting have been reported.1 Our purpose was to demonstrate that forgetting does occur as a function of unpleasant associations.


We adapted Russell and Storms’s2 four-stage mediation paradigm for this purpose. In their study, subjects first learned an A—B paired-associates list, where A is a nonsense syllable and B is an English word. Associations BC and C—D were inferred from word association norms. For example, if the A—B pair were cefstem, then the B—C association would be stem-flower, and the C—D association, flower-smell; A and D are thus associated by way of the B—C and C—D links (see Table 1). Russell and Storms found that learning an A—B pair facilitated the subsequent learning of a related A—D pair.




If such implicit verbal chains do operate, then saying the B word implicitly elicits the C, and in turn, the D words. If the D word is associated with an unpleasant event, such as electric shock, then the likelihood of saying, or thinking of, the associated B word should be reduced, because the B response has an unpleasant consequence, namely, thinking of the D word, which presumably elicits fear. Thus, pairing specific D words with electric shock should cause differential forgetting of A—B pairs learned prior to the D word presentations. The B words associated with shock-paired D words should be forgotten more often than B words associated with neutral D words.

The stimulus materials we used are shown in Table 1. Note that, as in the Russell and Storms2 study, the C words are never presented, but are assumed to occur as implicit associative responses linking B with D. Subjects first learned the A—B pairs (List 1). After attaining a specified criterion, the D words were presented (List 2), with electric shock paired with three of the ten words. Finally, the A—B pairs were presented once to test retention.


Sixteen male Princeton University undergraduates served as volunteers. General paired-associates instructions were provided, and subjects learned List 1 by the method of anticipation. The list was presented in three random orders, by use of a slide projector controlled by interval timers. Each nonsense syllable appeared for 1 second, followed, after a 0.75-second slide change, by the syllable and the response word for 1 second. The next syllable appeared immediately after, and 2 seconds elapsed between successive list presentations. List 1 learning continued to one perfect trial in which all pairs were correctly anticipated.


Electrodes were then placed on the third and fourth fingers of the left hand, and a key-operated buzzer was provided for the right hand. Subjects were told that a list of words (List 2) would be projected on the screen, and that each time a word appeared they were to pronounce that word aloud. Some words would be accompanied by shock, and subjects were to press the buzzer key whenever one of these words appeared. The key press did not avoid or escape the shock; it simply indicated that subjects had learned which words led to shock, and which were safe. Each word was presented for 2 seconds, and shock, when presented, occurred during the last second of word presentation. The shock source delivered 1.25 ma at 250 volts a-c, 60 hertz.


For half the subjects, the shock-paired D words were smell, war, and tree; for the other half, brain, good, and take. List 2 presentations, in three random orders, continued until subjects had correctly anticipated shock for three consecutive trials with no incorrect anticipations. The mean number of presentations of List 2 was 7.2, standard deviation = 8.6. Subjects were then given a single relearning trial of List 1, with electrodes left in place. The measure of motivated forgetting was the percentage of shock-associated B words forgotten relative to the percentage of control B words forgotten. Finally, we asked subjects to state the purpose of the experiment, to recall the words associated with shock, to recall the words of List 2, and finally, to state any con- nections they could think of between shocked words and List 1 words, and between any List 2 and List 1 words.


Since the two groups of subjects did not differ significantly in any experimental measures, their data were pooled. No subject correctly stated the purpose of the experiment. Recall of the shock-associated D words was perfect, while mean percent recall of the control D words was 71.4. This difference is significant at the .01 level, as evaluated by a Wilcoxon matched-pairs signed- ranks test. This result is to be expected, since subjects’ task was to learn which words anticipated shock.


Ten subjects reported that they could not think of any specific connections between any D word and the A—B pairs. Of the remaining six subjects, four reported two correct associations each, but none involved the experimental (shocked) stimuli. The other reported connections were incorrect. These data indicate that any shock-related forgetting is not attributable to verbalizable associations between D and B words, nor to the demand characteristics of the experiment. The original List 1 learning data indicate that any subsequent differences in recall cannot be attributed to differential initial learning. The mean numbers of trials to learn experimental A—B pairs (4.1, S.D. = 2.8) and control A—B pairs (3.4, S.D. = 2.0) did not differ significantly. Similarly, the mean number of correct anticipations (repetitions) of experimental and control pairs did not differ (experimental, 5.1, S.D. = 5.8; control, 5.9, S.D. = 3.1).


We turn now to the recall data relevant to our hypothesis. For the subjects with smell, war, and tree as the shocked D words, 20.8 percent of the associated A—B pairs were not anticipated correctly, compared to 3.6 percent of the control pairs forgotten. The other group of subjects forgot 37.5 percent of the experimental pairs, and 8.9 percent of the control pairs. A Wilcoxon matched-pairs signed-ranks test applied to the pooled data indicated that the difference in percent retention between experimental and control pairs is significant at the .01 level, T (13) = 5. Three subjects forgot none of the pairs, and only two subjects forgot more control than experimental pairs. No subject substituted an experimental B word incorrectly. Perhaps because of the rapid pacing of the paired-associates list, subjects either anticipated correctly in the recall test or failed to answer.


Pairing of shock with associates of memory items clearly interfered with their subsequent retrieval. Two interpretations of this finding may be considered. First, the shock may have resulted in differential retroactive interference mediated by the superior retention of the experimental D words. If learning a list of such D words between initial learning and recall produces retroactive interference, then the particular form of motivation employed may be irrelevant. The same effect may be obtainable with positive reinforcement, and, indeed, with any operation that produces superior retention of experimental words. This possibility, however, seems unlikely in view of the retroactive facilitation effects reported by Horton and Wiley.3 Using a three-stage chaining paradigm, they found that, after learning an A—B and a B—C list, learning an A—C list facilitated A—B retention.


Nevertheless, the experiment was repeated with an independent sample of 40 subjects drawn from a different college population: paid volunteers attending summer session at Dickenson College, Carlyle, Pennsylvania. Half of these subjects received shock associated with the experimental D words; the other half received money reward associated with the experimental D words. As in the original experiment, trials to learn List 1 and number of correct anticipations during List 1 learning did not vary as a function of any experimental conditions. Again, as in the earlier experiment, recall of the experimental D words was significantly superior to recall of control D words (100 percent versus 49 percent correct recall for the shock group; 95 percent versus 55 percent of the money group; P < .01 in both cases). In terms of these variables, this second experiment replicated the first.


Differential forgetting as a function of shock was similar to the data obtained earlier. Fifteen percent of the experimental A—B pairs were forgotten, compared to 5 percent of the control pairs, and this difference is significant at the .05 level. In contrast, no significant difference in forgetting was obtained between experimental and control pairs in the money-reward condition (10 and 11 percent, respectively). In this money condition, 22 pairs were forgotten, 6 experimental and 16 control. This is very close to the distribution that would be expected by chance, namely 6.6 and 15.4.


These additional data are unambiguous. The differential forgetting shown is specific to an unpleasant event, shock, and is not attributable to the differential recall of shock-associated words.