{"title": "Capacity and Information Efficiency of a Brain-like Associative Net", "book": "Advances in Neural Information Processing Systems", "page_first": 513, "page_last": 520, "abstract": null, "full_text": "Capacity and Information Efficiency of a \n\nBrain-like Associative Net \n\nBruce Graham and David Willshaw \n\nCentre for Cognitive Science, University of Edinburgh \n\n2 Buccleuch Place, Edinburgh, EH8 9LW, UK \n\nEmail: bruce@cns.ed.ac.uk&david@cns.ed.ac.uk \n\nAbstract \n\nWe have determined the capacity and information efficiency of an \nassociative net configured in a brain-like way with partial connec(cid:173)\ntivity and noisy input cues. Recall theory was used to calculate \nthe capacity when pattern recall is achieved using a winners-take(cid:173)\nall strategy. Transforming the dendritic sum according to input \nactivity and unit usage can greatly increase the capacity of the \nassociative net under these conditions. For moderately sparse pat(cid:173)\nterns, maximum information efficiency is achieved with very low \nconnectivity levels (~ 10%). This corresponds to the level of con(cid:173)\nnectivity commonly seen in the brain and invites speculation that \nthe brain is connected in the most information efficient way. \n\n1 \n\nINTRODUCTION \n\nStandard network associative memories become more plausible as models of asso(cid:173)\nciative memory in the brain if they incorporate (1) partial connectivity, (2) sparse \nactivity and (3) recall from noisy cues. In this paper we consider the capacity of \na binary associative net (Willshaw, Buneman, & Longuet-Higgins, 1969; Willshaw, \n1971; Buckingham, 1991) containing these features. While the associative net is \na very simple model of associative memory, its behaviour as a storage device is \nnot trivial and yet it is tractable to theoretical analysis. We are able to calculate \n\n\f514 \n\nBruce Graham, David Willshaw \n\nthe capacity of the net in different configurations and with different pattern recall \nstrategies. Here we consider the capacity as a function of connectivity level when \nwinners-take-all recall is used. \n\nThe associative net is an heteroassociative memory in which pairs of binary pat(cid:173)\nterns are stored by altering the connection weights between input and output units \nvia a Hebbian learning rule. After pattern storage, an output pattern is recalled \nby presenting a previously stored input pattern on the input units. Which output \nunits become active during recall is determined by applying a threshold of activa(cid:173)\ntion to measurements that each output unit makes of the input cue pattern. The \nmost commonly used measurement is the weighted sum of the inputs, or dendritic \nsum. Amongst the simpler thresholding strategies is the winners-take-all (WTA) \napproach, which chooses the required number of output units with the highest den(cid:173)\ndritic sums to be active. This works well when the net is fully connected (each input \nunit is connected to every output unit), and input cues are noise-free. However, \nrecall performance deteriorates rapidly if the net is partially connected (each input \nunit is connected to only some of the output units) and cues are noisy. \n\nMarr (1971) recognised that when an associative net is only partially connected, \nanother useful measurement for threshold setting is the total input activity (sum of \nthe inputs, regardless of the connection weights). The ratio of the dendritic sum \nto the input activity can be a better discriminator of which output units should be \nactive than the dendritic sum alone. Buckingham and Willshaw (1993) showed that \ndifferences in unit usage (the number of patterns in which an output unit is active \nduring storage) causes variations in the dendritic sums that makes accurate recall \ndifficult when the input cues are noisy. They incorporated both input activity and \nunit usage measurements into a recall strategy that minimised the number of errors \nin the output pattern by setting the activity threshold on a unit by unit basis. This \nis a rather more complex threshold setting strategy than a simple winners-take-all. \n\nWe have previously demonstrated via computer simulations (Graham & Wills haw , \n1994) that the WTA threshold strategy can achieve the same recall performance \nas this minimisation approach if the dendritic sums are transformed by certain \nfunctions of the input activity and unit usage before a threshold is applied. Here \nwe calculate the capacity of the associative net when WTA recall is used with three \ndifferent functions of the dendritic sums: (1) pure dendritic sums, (2) modified by \ninput activity and (3) modified by input activity and unit usage. The results show \nthat up to four times the capacity can be obtained by transforming the dendritic \nsums by a function of both input activity and unit usage. This increase in capacity \nwas obtained without a loss of information efficiency. For the moderately sparse \npatterns used, WTA recall is most information efficient at low levels of connectivity \n(~ 10%), as is the minimisation approach to threshold setting (Buckingham, 1991). \nThis connectivity range is similar to that commonly seen in the brain. \n\n\fCapacity and Infonnation Efficiency of a Brain-Like Associative Net \n\n515 \n\n2 NOTATION AND OPERATION \n\nThe associative net consists of N B binary output units each connected to a propor(cid:173)\ntion Z of the N A binary input units. Pairs of binary patterns are stored in the net. \nInput and output patterns contain MA and MB active units, respectively (activity \nlevel a = M / N \u00ab 1). All connection weights start at zero. On presentation to \nthe net of a pattern pair for storage, the connection weight between an active input \nunit and an active output unit is set to 1. During recall an input cue pattern is \npresented on the input units. The input cue is a noisy version of a previously stored \ninput pattern in which a fraction, s, of the MA active units do not come from the \nstored pattern. A thresholding strategy is applied to the output units to determine \nwhich of them should be active. Those that should be active in response to the \ninput cue will be called high units, and those that should be inactive will be called \nlow units. We consider winners-take-all (WTA) thresholding strategies that choose \nto be active the MB output units with the highest values of three functions of the \ndendritic sum, d, the input activity, a, and the unit usage, r. These functions are \nlisted in Table 1. The normalised strategy deals with partial connectivity. The \ntransformed strategy reduces variations in the dendritic sums due to differences in \nunit usage. This function minimises the variance of the low unit dendritic sums \nwith respect to the unit usage (Graham & Willshaw, 1994). \n\nTable 1: WTA Strategies \n\nWTA Strategy FUnction \nBasic \nNormalised \nTransformed \n\nd \nd' = d/a \nd* = 1 -\n\n(1 - d/a)l/r \n\n3 RECALL THEORY \n\nThe capacity of the associative net is defined to be the number of pattern pairs that \ncan be stored before there is one bit in error in a recalled output pattern. This \ncannot be calculated analytically for the net configuration under study. However, it \ncan be determined numerically for the WTA recall strategy by calculating the recall \nresponse for different numbers of stored patterns, R, until the minimum value of R \nis found for which a recall error occurs. The WTA recall response can be calculated \ntheoretically using expressions for the distributions of the dendritic sums of low and \nhigh output units. The probability that the dendritic sum of a low or high output \nunit should have a particular value x is, respectively (Buckingham & Willshaw, \n1993; Buckingham, 1991) \n\nP(d, = x) = t. ( ~ ) \"'8(1- \"B)R-, ( \"!,A ) (Zp[rJ)\"(I- Zp[rJ)MA-\" \n\n(I) \n\n\f516 \n\nBruce Graham. David Willshaw \n\nP(dh = x) = ~ ( ~ ) a;'(l-aB)R-\" ( ~A ) (ZI'[r +1])\"(1- ZI'[r+1])MA-\" \n(2) \nwhere p[r] and /L[r] are the probabilities that an arbitrarily selected active input is \non a connection with weight 1. For a low unit, p[r] = 1- (1- OA)r. For a high unit \na good approximation for /L is /L[r + 1] ~ g + sp[r] = 1 - s(l - OAY where g and \ns are the probabilities that a particular active input in the cue pattern is genuine \n(belongs to the stored pattern) or spurious, respectively (g + s = 1) (Buckingham \n& Willshaw, 1993). The basic WTA response is calculated using these distributions \nby finding the threshold, T, that gives \n\n(3) \n\nThe number of false positive and false negative errors of the response is given by \n\n(4) \n\nThe actual distributions of the normalised dendritic sums are the distributions of \ndja. For the purposes of calculating the normalised WTA response, it is possi(cid:173)\nble to use the basic distributions for the situation where every unit has the mean \ninput activity, am = MAZ. In this case the low and high unit distributions are \napproximately \n\nP(til = x) = t. ( ~ ) ,,;'(1- \"B)R-\" ( \"; ) (p[r])\"(l- p[r])\"m-\" \nP(d'h = x) = ~ ( ~ ) aB(1- \"B )R-\" ( a; ) (I'[r + 1])\" (l-l'[r +1 ])\"m -\" (6) \n\n(5) \n\nDue to the nonlinear transformation used, it is not possible to calculate the trans(cid:173)\nformed distributions as simple sums of binomials, so the following approach is used \nto generate the transformed WTA response. For a given transformed threshold, T*, \nand for each possible value of unit usage, r, an equivalent normalised threshold is \ncalculated via \n\n(7) \nThe transformed cumulative probabilities can then be calculated from the nor(cid:173)\nmalised distributions: \n\nT'[r] = am (l - (1 - T*Y) \n\nP( dj :2: TO) = t. ( ~ ) aB(1 - \"B )R-\" P( til :2: T'[r]) \n\nP( di. :2: T') = ~ ( ~ ) ,,;'(1 - \"B )R-\" P(d' :2: T'[r + 1]) \n\n(8) \n\n(9) \n\nThe normalised and transformed WTA responses are calculated in the same manner \nas the basic response, using the appropriate probability distributions. \n\n\fCapacity and Information Efficiency of a Brain-Like Associative Net \n\n517 \n\n(a) 0% noise \n\n--B \n- - N T \n, \n......... \n\n./\" \n\n...-., \n\n............ \n\n~ ... .-~ \n\n(b) 40% noise \n\n--B \n\u00b7 -N \n---T \n\n3000 \n\n2500 \n\n2000 \n\n1500 \n\n1000 \n\n500 \n\n5000 \n\n4000 \n\ng \n~ 3000 \n\u00b7u \nca \nCo ca 2000 \nU \n\n1000 \n\n20 \n\n40 \n\n60 \n\n80 \n\n100 \n\nConnectivity (%) \n\nO~---...L ___ --'---'-.....L-....o...-..I.-..........J \n\n0 \n\n20 \n\n40 \n\n60 \n\n80 100 \n\nConnectivity (%) \n\nFigure 1: Capacity Versus Connectivity \n\n4 RESULTS \n\nExtensive simulations were previously carried out of WTA recall from a large \nassociative net with the following specifications (Graham & Willshaw, 1994): \nNA = 48000, MA = 1440, NB = 6144, MB = 180. Agreement between the simula(cid:173)\ntions and the theoretical recall described above is extremely good, indicating that \nthe approximations used in the theory are valid. Here we use the theoretical recall \nto calculate capacity results for this large associative net that are not easily ob(cid:173)\ntained via simulations. All the results shown have been generated using the theory \ndescribed in the previous section. \n\nFigure 1 shows the capacity as a function of connectivity for the different WTA \nstrategies when there is no noise in the input cue, or 40% noise in the cue (legend: B \n= basic WTA, N = normalised WTA, T = transformed WTA; for clarity, individual \ndata points are omitted). With no noise in the cue the normalised and transformed \nmethods perform identically, so only the normalised results are shown. Figure l(a) \nhighlights the effectiveness of normalising the dendritic sums against input activity \nwhen the net is partially connected. Figure 1 (b) shows the effect of noise on capacity. \nThe capacity of each recall strategy at a given connectivity level is much reduced \ncompared to the noise-free case. However, for connectivities greater than 10% the \ncapacity of the transformed WTA is now much greater than either the normalised \nor basic WTA. \n\nThe relative capacities of the different strategies are shown in Figure 2 (legend: \nNIB = ratio of normalised to basic capacity, T I B = ratio of transformed to basic, \nTIN = ratio of transformed to normalised). In the noise-free case (Figure 2(a)), at \nlow levels of connectivity the relative capacity is distorted because the basic capacity \n\n\f518 \n\nBruce Graham, David Willshaw \n\ndrops to near zero, so that even low normalised capacities are relatively very large. \nFor most connectivity levels (10-90%) the normalised WTA provides 2-4 times the \ncapacity of the basic WTA. In the noisy case (Figure 2(b)), the normalised capacity \nis only up to 1.5 times the basic capacity over this range of connectivities. The \ntransformed WTA, however, provides 3 to nearly 4 times the basic capacity and 2.5 \nto nearly 3 times the normalised capacity for connectivities greater than 10%. \n\nThe capacities can be interpreted in information theoretic terms by considering the \ninformation efficiency of the net. This is the ratio of the amount of information \nthat can be retrieved from the net to the number of bits of storage available and \nis given by \"'0 = Ro10jZNANB' where Ro is the capacity, 10 is the amount of \ninformation contained in an output pattern and Z NAN B is the number of weights, \nor bits of storage required (Willshaw et al., 1969; Buckingham & Willshaw, 1992). \nInformation efficiency as a function of connectivity is shown in Figure 3. There is \na distinct peak in information efficiency for each of the recall strategies at some \nlow level of connectivity. The peak information efficiencies and the efficiencies at \nfull connectivity are summarised in Table 2. The greatest contrast between full \nand partial connectivity is seen with the normalised WTA and noise-free cues. At \n1 % connectivity the normalised WTA is nearly 14 times more efficient than at full \nconnectivity. In absolute terms, however, the normalised capacity is only 694 at \n1 % connectivity, compared with 5122 at full connectivity. The peak efficiency of \n53% obtained by the normalised WTA is approaching the theoretically approximate \nmaximum of 69% for a fully connected net (Willshaw et al., 1969). \n\n5 DISCUSSION \n\nPrevious simulations (Graham & Willshaw, 1994) have shown that, when the input \ncues are noisy, the recall performance of the winners-take-all thresholding strategy \napplied to the partially connected associative net is greatly improved if the den(cid:173)\ndritic sums of the output units are transformed by functions of input activity and \nunit usage. We have confirmed and extended these results here by calculating the \ntheoretical capacity of the associative net as a function of connectivity. \n\nFor the moderately sparse patterns used, all of the recall strategies are most infor(cid:173)\nmation efficient at very low levels of connectivity (~ 10%). However, the optimum \nconnectivity level is dependent on the pattern coding rate. Extending the analysis \nof Willshaw et al. (1969) to a partially connected net using normalised WTA recall \nyields that maximum information efficiency is obtained when ZMA = log2(NB). \nSo for input coding rates higher than log2(NB), a partially connected net is most \ninformation efficient. For the input coding rate used here, this relationship gives an \noptimum connectivity level of 0.87%, very close to the 1% obtained from the recall \ntheory. \n\nComparing the peak efficiencies across the different strategies for the noisy cue \ncase, the normalised WTA is about twice as efficient as the basic WTA, while the \ntransformed WTA is three times as efficient. This comparison does not include the \n\n\fCapacity and Information Efficiency of a Brain-Like Associative Net \n\n519 \n\n(a) 0% noise \n\n- - NIB, TIB \n\n12 \n\n10 \n\n(b) 40% noise \n\" ...... - .............. \n..... , \n\" \n, \n\n, \n, \n...... --------\n\n-~ \n\n, ,..-\n\n4 \n\n3 \n\nI \n\nI \n( \n\n, \n\nI \n\\1 \n\nI \n' \n\n.~ \n0 \nIII \na. \nIII \n\n8 \n\n0 \n\n0 \n\n60 \n\nu 6 ~ 2 }{ \n\n~ ;; \nIII 4 \nQ) \na: \n\n' ... \n\n1 \n\n2 \n\n------\n\n., \n\\ \n\n1;\u00b7-\nI, - - NIB \nII \n' --- T/B \nI \nTIN \n\n- -___ \n\n-_ \n\n20 \n\n40 \n\n60 \n\n80 \n\n100 \n\nConnectivity (%) \n\n0 \n\n0 \n\n20 \n\n40 \n\n60 \n\n80 \n\n100 \n\nConnectivity (%) \n\nFigure 2: Relative Capacity Versus Connectivity \n\n(a) 0% noise \n\n(b) 40% noise \n\n5 \n\n--B \n._._- N, T \n\n-\n~ 50 \n>. \ng 40 \n\\ \nQ) \n.(3 \nffi 30 \nc: ~ \n0 - 10 \n\n. Q \nia 20 \nE .... \n1: \n\n----\"::7 \n100 \n\n80 \n\no \n\nr--\no \n20 \n\n40 \n\n60 \n\n--B \n--N \n---T \n\n6 \n\n'\\ \n\n, '\\ \n\", \n' '\\ \n, '\\ \n, \n4 \nf.l \n;'\\ \n. , \n3 .\\ \n' ... \n2 ., \nI \n'--\n\n, \n\" \n'\" \n\" \n\n............ \n\n- ... \n\n---------\n\n0 L-'--' ........ ---L_--L_---L.--'---I \n\n0 \n\n20 \n\n40 \n\n60 \n\n80 100 \n\nConnectivity (%) \n\nConnectivity (%) \n\nFigure 3: Information Efficiency Versus Connectivity \n\nTable 2: Information Efficiency \n\n0% Noise \n\n40% Noise \n\nWTA \nStrategy \n\nBasic \nNormalised \nTransformed \n\nTJo at \n\nT]o at Z at \n\nTJo at Z at \nTJo at \nPeak Peak Z=1 Peak Peak Z=1 \n(%) \n(%) \n6.1 \n0.8 \n0.8 \n53.3 \n2.3 \n53.3 \n\n(%) \n2.0 \n3.6 \n5.7 \n\n(%) \n7 \n5 \n10 \n\n(%) \n3.9 \n3.9 \n3.9 \n\n(%) \n4 \n1 \n1 \n\n\f520 \n\nBruce Graham, David Willshaw \n\ncost of storing input activity and unit usage information. If one bit of storage per \nconnection is required for the input activity, and another bit for the unit usage, then \nthe information efficiency of the normalised WTA is halved, and the information \nefficiency of the transformed WTA is reduced by two thirds. This results in all the \nstrategies having about the same peak efficiency. However, the absolute capacities \nof the different strategies at their peak efficiencies are 183,237 and 741 for the basic, \nnormalised and transformed WTA, respectively. So, at the same level of efficiency, \nthe transformed WTA delivers four times the capacity of the basic WTA. \nIn conclusion, numerical calculations of the capacity of the associative net show that \nit is most information efficient at a very low level of connectivity when moderately \nsparse patterns are stored. Including input activity and unit usage information into \nthe recall calculations results in a four-fold increase in storage capacity without loss \nof efficiency. \n\nAcknowledgements \n\nTo the Medical Research Council for financial support under Programme Grant PG \n9119632 \n\nReferences \n\nBuckingham, J., & Willshaw, D. (1992). Performance characteristics of the asso(cid:173)\n\nciative net. Network, 8, 407-414. \n\nBuckingham, J., & Willshaw, D. (1993). On setting unit thresholds in an incom(cid:173)\n\npletely connected associative net. Network, 4, 441-459. \n\nBuckingham, J. (1991). Delicate nets, faint recollections: a study of partially con(cid:173)\nnected associative network memories. Ph.D. thesis, University of Edinburgh. \n\nGraham, B., & Willshaw, D. (1994). Improving recall from an associative memory. \n\nBioi. Cybem., in press. \n\nMarr, D. (1971). Simple memory: a theory for archicortex. Phil. Trans. Roy. Soc. \n\nLond. B, 262, 23-81. \n\nShepherd, G. (Ed.). (1990). The Synaptic Organization of the Brain (Third edition). \n\nOxford University Press, New York, Oxford. \n\nWillshaw, D. (1971). Models of distributed associative memory. Ph.D. thesis, Uni(cid:173)\n\nversity of Edinburgh. \n\nWillshaw, D., Buneman, 0., & Longuet-Higgins, H. (1969). Non-holographic asso(cid:173)\n\nciative memory. Nature, 222, 960-962. \n\n\f", "award": [], "sourceid": 968, "authors": [{"given_name": "Bruce", "family_name": "Graham", "institution": null}, {"given_name": "David", "family_name": "Willshaw", "institution": null}]}