{"title": "A solvable connectionist model of immediate recall of ordered lists", "book": "Advances in Neural Information Processing Systems", "page_first": 51, "page_last": 58, "abstract": null, "full_text": "A solvable connectionist model of \nimmediate recall of ordered lists \n\nNeil Burgess \n\nDepartment of Anatomy, University College London \n\nLondon WC1E 6BT, England \n(e-mail: n.burgessOucl.ac.uk) \n\nAbstract \n\nA model of short-term memory for serially ordered lists of verbal \nstimuli is proposed as an implementation of the 'articulatory loop' \nthought to mediate this type of memory (Baddeley, 1986). The \nmodel predicts the presence of a repeatable time-varying 'context' \nsignal coding the timing of items' presentation in addition to a \nstore of phonological information and a process of serial rehearsal. \nItems are associated with context nodes and phonemes by Hebbian \nconnections showing both short and long term plasticity. Items are \nactivated by phonemic input during presentation and reactivated \nby context and phonemic feedback during output. Serial selection \nof items occurs via a winner-take-all interaction amongst items, \nwith the winner subsequently receiving decaying inhibition. An \napproximate analysis of error probabilities due to Gaussian noise \nduring output is presented. The model provides an explanatory \naccount of the probability of error as a function of serial position, \nlist length, word length, phonemic similarity, temporal grouping, \nitem and list familiarity, and is proposed as the starting point for \na model of rehearsal and vocabulary acquisition. \n\n1 \n\nIntroduction \n\nShort-term memory for serially ordered lists of pronounceable stimuli is well de(cid:173)\nscribed, at a crude level, by the idea of an 'articulatory loop' (AL). This postulates \nthat information is phonologically encoded and decays within 2 seconds unless re(cid:173)\nfreshed by serial rehearsal, see (Baddeley, 1986). It successfully accounts for (i) \n\n\f52 \n\nNeil Burgess \n\nthe linear relationship between memory span s (the number of items s such that \n50% of lists of s items are correctly recalled) and articulation rate r (the number \nof items that can be said per second) in which s ~ 2r + c, where r varies as a \nfunction of the items, language and development; (ii) the fact that span is lower for \nlists of phonemically similar items than phonemically distinct ones; (iii) unattended \nspeech and articulatory distract or tasks (e.g. saying blah-blah-blah ... ) both reduce \nmemory span. Recent evidence suggests that the AL plays a role in the learning of \nnew words both during development and during recovery after brain traumas, see \ne.g. (Gathercole & Baddeley, 1993). Positron emission tomography studies indicate \nthat the phonological store is localised in the left supramarginal gyrus, whereas sub(cid:173)\nvocal rehearsal involves Broca's area and some of the motor areas involved in speech \nplanning and production (Paulesu et al., 1993). \nHowever, the detail of the types of errors committed is not addressed by the AL \nidea. Principally: (iv) the majority of errors are 'order errors' rather than 'item \nerrors', and tend to involve transpositions of neighbouring or phonemically similar \nitems; (v) the probability of correctly recalling a list as a function of list length \nis a sigmoid; (vi) the probability of correctly recalling an item as a function of its \nserial position in the list (the 'serial position curve') has a bowed shape; (vii) span \nincreases with the familiarity of the items used, specifically the c in s ~ 2r + c can \nincrease from 0 to 2.5 (see (Hulme et al., 1991)), and also increases if a list has been \npreviously presented (the 'Hebb effect'); (viii) 'position specific intrusions' occur, \nin which an item from a previous list is recalled at the same position in the current \nlist. Taken together, these data impose strong functional constraints on any neural \nmechanism implementing the AL. \nMost models showing serial behaviour rely on some form of 'chaining' mech(cid:173)\nanism which associates previous states to successive states, via recurrent con(cid:173)\nnections of various types. Chaining of item or phoneme representations gener(cid:173)\nates errors that are incompatible with human data, particularly (iv) above, see \n(Burgess & Hitch, 1992, Henson, 1994). Here items are maintained in serial order \nby association to a repeatable time-varying signal (which is suggested by position \nspecific intrusions and is referred to below as 'context'), and by the recovery from \nsuppression involved in the selection process - a modification of the 'competitive \nqueuing' model for speech production (Houghton, 1990). The characteristics of \nSTM for serially ordered items arise due to the way that context and phoneme \ninformation prompts the selection of each item. \n\n2 The model \n\nThe model consists of 3 layers of artificial neurons representing context, phonemes \nand items respectively, connected by Hebbian connections with long and short term \nplasticity, see Fig. 1. There is a winner-take-all (WTA) interaction between item \nnodes: at each time step the item with the greatest input is given activation 1, and \nthe others o. The winner at the end of each time step receives a decaying inhibition \nthat prevents it from being selected twice consecutively. \nDuring presentation, phoneme nodes are activated by acoustic or (translated) \nvisual input, activation in the context layer follows the pattern shown in Fig. 1, \nitem nodes receive input from phoneme nodes via connections Wij. Connections \n\n\fA Solvable Connectionist Model of Immediate Recall of Ordered Lists \n\n53 \n\nA) \n\nB) \n\ncontext \n\n0000000 \n\nWij(t) \n\n\\ \n\nitems (WTA + \nsuppression) \n\n[0 \n\n..... 1 - - - - nc ---I.~ \nt=l \n\u2022 \u2022 \u2022 \u2022 \u2022 \u2022 00000 0 \n0 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 0000 0 \nt=2 \n00 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 0000 t=3 \n\n. translated \n\n.. ,/ visual input \n\nphonemes \n\n~ \n0000000 \n\nacoustic \ninput buffer \n\noutput \n\nFigure 1: A) Context states as a function of serial position t; filled circles are active \nnodes, empty circles are inactive nodes. B) The architecture of the model. Full \nlines are connections with short and long term plasticity; dashed lines are routes by \nwhich information enters the model. \n\nWij (t) learn the association between the context state and the winning item, and \nWij and Wij learn the association with the active phonemes. During recall, the \ncontext layer is re-activated as in presentation, activation spreads to the item layer \n(via Wij(t)) where one item wins and activates its phonemes (via Wij(t\u00bb. The \nitem that now wins, given both context and phoneme inputs, is output, and then \nsuppressed. \nAs described so far, the model makes no errors. Errors occur when Gaussian noise \nis added to items' activations during the selection of the winning item to be output. \nErrors are likely when there are many items with similar activation levels due to \ndecay of connection weights and inhibition since presentation. Items may then be \nselected in the wrong order, and performance will decrease with the time taken to \npresent or recall a list. \n\n2.1 Learning and familiarity \n\nConnection weights have both long and short term plasticity: Wij (t) (similarly \nWij(t) and Wij(t)) have an incremental long term component Wi~(t), and a one(cid:173)\nshot short term component Wl,(t) which decays by a factor b.. per second. The net \nweight of the connection is the sum of the two components: Wij(t) = Wi~(t)+W/i(t). \nLearning occurs according to: \n\nif Cj(t)ai(t) > Wij(t)j \notherwise, \n\n\f54 \n\nNeil Burgess \n\n{ Wil(t) + eCj(t)Uoi(t) \n\nWij (t) \n\nif Cj(t)Uoi(t) > 0; \notherwise, \n\n(1) \n\nwhere Cj(t) and Uoi(t) are the pre- and post-connection activations, and e decreases \nwith IW/i(t)1 so that the long term component saturates at some maximum value. \nThese modifiable connection weights are never negative. \nAn item's cfamiliarity' is reflected by the size of the long term components wfj and \nwfj of the weights storing the association with its phonemes. These components \nincrease with each (error-free) presentation or recall of the item. For lists of to(cid:173)\ntally unfamiliar items, the item nodes are completely interchangeable having only \nthe short-term connections w!j to phoneme nodes that are learned at presentation. \nWhereas the presentation of a familiar item leads to the selection of a particular \nitem node (due to the weights wfj) and, during output, this item will activate its \nphonemes more strongly due to the weights w! '. Unfamiliar items that are phone(cid:173)\nmically similar to a familiar item will tend to be represented by the familiar item \nnode, and can take advantage of its long-term item-phoneme weights wfj. \nPresentation of a list leads to an increase in the long term component of the context(cid:173)\nitem association. Thus, if the same list is presented more than once its recall \nimproves, and position specific intrusions from previous lists may also occur. Notice \nthat only weights to or from an item winning at presentation or output are increased. \n\n3 Details \nThere are nw items per list, np phonemes per item, and a phoneme takes time lp \nseconds to present or recall. At time t, item node i has activation Uoi(t) , context node \ni has activation Ci(t), Ct is the set of nc context nodes active at time t, phoneme \nnode i has activation bi(t) and Pi is the set of np phonemes comprising item i. \nContext nodes have activation 0 or J3/2nc , phonemes take activation 0 or 1/ y'n;, \nso Wij(t) ~ J3/2nc and wlj(t) = Wji(t) ~ 1/ h' see (1). This sets the relative \neffect that context and phoneme layers have on items' activation, and ensures that \nitems of neither few nor many phonemes are favoured, see (Burgess & Hitch, 1992). \nThe long-term components of phoneme-item weights wfj(t) and wji(t) are 0.45/ y'n; \nfor familiar items, and 0.15/ y'n; for unfamiliar items (chosen to match the data \nin Fig. 3B). The long-term components of context-item weights Wi~(t) increase by \n0.15/.Jn; for each ofthe first few presentations or recalls of a list. \nApart from the WTA interaction, each item node i has input: \n\n(2) \nwhere Ii(t) < 0 is a decaying inhibition imposed following an item's selection at \npresentation or output (see below), TJi is a (0, u) Gaussian random variable added \nat output only, and Ei(t) is the excitatory input to the item from the phoneme layer \nduring presentation and the context and phoneme layers during recall: \nduring presentation; \nduring recall. \n\n(3) \n\nDuring recall phoneme nodes are activated according to bi(t) = 2:j Wij(t)aj(t). \n\n\fbi(t) = 0 otherwise. \n\ni =1= k. \n\n3. Select the winning item, i.e. ak(t) = 1 where hk(t) = maJC.i{hi(t)}; ai(t) = 0, for \n\n4. Learning, i.e. increment all connection weights according to (1). \n5. Decay, i.e. multiply short-term connection weights Wl;(t), w[j(t) and w[j(t), \n\nand inhibitions Ii(t) by a factor .6.n plp. \n\n6. Inhibit winner, i.e. set Ik(t) = -2, where k is the item selected in 3. \n7. t ---+ t + 1, go to 1. \nRecall \no. t = 1. \n1. Set the context layer to state Ct , as above. \n2. Set all phoneme activations to zero. \n3. Select the winning item, as above. \n4. Output. Activate phonemes via Wji(t), select the winning item (in the presence \n\nA Solvable Connectionist Model of Immediate Recall of Ordered Lists \n\n55 \n\nOne time step refers to the presentation or recall of an item and has duration nplp. \nThe variable t increases by 1 per time step, and refers to both time and serial \nposition. Short term connection weights and inhibition Ii(t) decay by a factor .6. \nper second, or .6. nplp per time step. \n\nThe algorithm is as follows; rehearsal corresponds to repeating the recall phase. \nPresentation \no. Set activations, inhibitions and short term weights to zero, t = 1. \n1. Set the context layer to state Ct : Ci(t) = J3/2nc if i E Ct; Ci(t) = 0 otherwise. \n2. Input items, i.e. set the phoneme layer to state 1't : bi(t) = 1/..;n; if i E 1't; \n\nof noise). \n\n5. Learning, as above. \n6. Decay, as above. \n7. Inhibit winner, i.e. set Ik(t) = -2, where k is the item selected in 4. \n8. t ---+ t + 1, go to 1. \n\n4 Analysis \n\nThe output of the model, averaged over many trials, depends on (i) the activation \nvalues of all items at the output step for each time t and, (ii) given these activations \nand the noise level, the probability of each item being the winner. Estimation is \nnecessary since there is no simple exact expression for (ii), and (i) depends on which \nitems were output prior to time t. \nI define \"Y(t, i) to be the time elapsed, by output at time t, since item i was last \nselected (at presentation or output), i.e. in the absence of errors: \n\n. \n\n\"Y(t, l) = \n\n{(t-i)lpnp \n\n(nw - (i - t))lpnp \n\nifi