{"title": "Phase Diagram and Storage Capacity of Sequence-Storing Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 211, "page_last": 217, "abstract": null, "full_text": "Phase Diagram and Storage Capacity of \n\nSequence Storing Neural Networks \n\nA. During \n\nDept.  of Physics \nOxford University \nOxford OX 1 3NP \nUnited Kingdom \n\na.duringl @physics.oxford.ac .uk \n\nA.  C.  C. Coolen \n\nDept.  of Mathematics \n\nKing 's College \n\nLondon WC2R 2LS \n\nUnited Kingdom \n\ntcoolen @mth.kc1.ac.uk \n\nD. Sherrington \nDept.  of Physics \nOxford University \nOxford OX I 3NP \nUnited Kingdom \n\nd.sherrington I @physics.oxford.ac.uk \n\nAbstract \n\nWe solve the dynamics of Hopfield-type neural networks which store se(cid:173)\nquences of patterns, close to saturation. The asymmetry of the interaction \nmatrix in such models leads to violation of detailed balance, ruling out an \nequilibrium statistical mechanical analysis.  Using generating functional \nmethods we derive exact closed equations for dynamical order parame(cid:173)\nters,  viz.  the  sequence overlap and  correlation and  response functions. \nin  the  limit of an  infinite system size.  We  calculate the time  translation \ninvariant solutions of these equations. describing stationary limit-cycles. \nwhich  leads  to  a phase diagram.  The  effective retarded  self-interaction \nusually  appearing  in  symmetric  models  is  here  found  to  vanish,  which \ncauses  a  significantly  enlarged  storage  capacity  of eYe  ~ 0.269.  com(cid:173)\npared to  eYe  ~ 0.139 for Hopfield networks s~oring static patterns.  Our \nresults  are  tested  against extensive  computer simulations and  excellent \nagreement is found. \n\n\f212 \n\nA.  Diiring,  A.  C.  C.  Coo/en  and D. Sherrington \n\n1  INTRODUCTION AND DEFINITIONS \n\nWe  consider a system of N  neurons O'(t)  = {ai(t)  = \u00b11}, which can change their states \ncollectively  at  discrete  times  (parallel  dynamics).  Each  neuron  changes  its  state  with  a \nprobability Pi(t)  =  ~[l-tanh,Bai(t)[Lj Jijaj(t)+Oi(t)]], so that the transition matrix is \nW[o'(s + l)IO'(s)]  = II e.BO',(s+l)[E;=l J, j O'} (s)+ o,( s)]-ln2cosh(i3[E; =1  J'JO'} ( s)+(J, (s ))) \n\nN \n\ni=l \n\nwith the (non-symmetric) interaction strengths Jij chosen as \n\np \n\n-\n\nJ \nij  - N  ~ \"'i \n\n1  \"'\" el'+l el' \n\"'j' \n\n(I) \n\n(2) \n\n1'=1 \n\nThe ~r represent components of an  ordered sequence of patterns to  be stored I.  The gain \nparameter ,B  can  be interpreted as an  inverse temperature governing the  noise level  in  the \ndynamics  (1)  and  the  number of patterns  is  assumed  to  scale  as  N,  i.  e.  P  =  aN.  If \nthe interaction matrix  would have been chosen symmetrically, the model  would  be  acces(cid:173)\nsible  to  methods originally  developed  for  the  equilibrium statistical  mechanical  analysis \nof physical  spin  systems  and  related  models  [1 , 2],  in  particular the  replica method.  For \nthe nonsymmetric interaction matrix proposed here this  is  ruled out, and  no exact solution \nexists to  our knowledge, although both models have been first mentioned at the same time \nand an  approximate solution compatible with  the numerical evidence at the time has been \nprovided by Amari [3] . The difficulty for the analysis is  that a system with the interactions \n(2)  never reaches  equilibrium  in  the  thermodynamic  sense,  so  that  equilibrium  methods \nare  not  applicable.  One therefore  has  to  apply  dynamical  methods and  give a dynamical \nmeaning to the  notion of the recall  state.  Consequently, we  will  for this paper employ  the \ndynamical  method  of path  integrals,  pioneered for spin  glasses  by  de  Dominicis  [4]  and \napplied to  the  Hopfield model by Rieger et al.  [5] . \n\nWe  point out  that  our choice  of parallel  dynamics  for  the  problem  of sequence recall  is \ndeliberate in  that simple sequential dynamics  will  not lead to  stable recall  of a  sequence. \nThis  is  due  to  the fact  that  the  number of updates of a  single neuron  per time  unit is  not \na  constant for  sequential  dynamics.  Schemes  for  using  delayed  asymmetric  interactions \ncombined with sequential updates have been proposed (see e.  g.  [6] for a review),  but are \noutside the scope of this paper. \nOur analysis starts with the introduction of a generating functional Z[1jJ]  of the form \n\nZ[1jJ]  =  L  p[O'(O), . ..  , O'(t)] e- i E .<t  O' ( s )' 1/1 ( s ) , \n\n(3) \n\nO'(O) ... O'(t) \n\nwhich depends on real  fields  { 'ljJi (t)} . These fields  playa formal role only, allowing for the \nidentification of interesting order parameters, such as \n\n8Z[1jJ] \nmi(s) = (ai (s))  = 1  hm  8  () \n'ljJ1 S \n\n\" \n1/1-t0 \n\n(\n\n') \n\n8 \n\n( \n\n)) \n\n.. \n\nGiJ  S, S  = 80j (s')  a1 (s  = 1 Jt.~o 8'IjJi(S)80j (s') \nlim  8  ~201jJ\\  )' \n'ljJj  S' \n\nCij(s,s')  =  (ai(s)aj(s'))  =  -\n\n'ljJi  S \n\n1/1-t0 \n\n82 Z[1jJ] \n\nI Upper (pattern) indices are understood to be taken modulo p unless otherwise stated. \n\n\fPhase Diagram and Storage Capacity ojSequence-Storing Neural Networks \n\n213 \n\nfor the average activation, response and correlation functions, respectively.  Since this func(cid:173)\ntional  involves the probability p[o-(O), ... ,o-(t)] of finding  a  'path'  of neuron  activations \n{o-(O), ... ,o-(t)},  the  task  of the  analysis  is  to  express  this  probability  in  terms  of the \nmacroscopic order parameters itself to arrive at a set of closed macroscopic equations. \n\nThe  first  step  in  rewriting  the  path  probability  is  to  realise  that  (I)  describes  a  one(cid:173)\nstep  Markov  process  and  the  path  probability  is  therefore  just  the  product  of  the \nsingle-time  transition  probabilities,  weighted  by  the  probability  of  the  initial  state: \np[o-(O), ...  , o-(t)]  =  p(o-(O)) TI~:~ W[o-(s  +  l)lo-(s)] .  Furthermore,  we  will  in  the \ncourse  of the  analysis  frequently  isolate  interesting  variables  by  introducing  appropriate \n8-functions, such as \n\nThe variable hi(t)  can  be interpreted as  the  local  field  (or presynaptic  potential)  at  site  i \nand time t  and their introduction transforms Z['ljJ]  into \n\nZ['ljJ]  =  L  p(o-(O)) J d2~ d~t II [ei3U (S+I) .h(S)-Li In 2cosh(i3h;is)) \n\n,  t-l \n\nu(O) .. . u(t) \n\n( )   s=o \n\nThis expression is the last general form of Z['ljJ]  we consider. To proceed with the analysis, \nwe  have to make a specific ansatz for the system behaviour. \n\n2  DYNAMIC MEAN FIELD THEORY \n\nAs  sequence  recall  is  the  mode  of operation  we  are  most  interested  in ,  we  make  the \nansatz  that,  for  large  systems,  we  have  an  overlap of order 0  (NO)  between  the  pattern \ne at  time  s,  and  that  all  other patterns  are  overlapping  with  order a (N- 1/ 2 )  at  most. \nAccordingly,  we  introduce  the  macroscopic  order parameters  for  the  condensed  pattern \nm(s)  = N- 1 L:i ~:ai(s) and for the quantity k(s)  = N- 1 L:i ~:hi(S), and their noncon(cid:173)\ndensed equivalents yl'(s)  = N- 1/ 2 L:i ~rai(s) and x(s)  =  N- 1/ 2 L:i ~rhi(S) (1-\u00a3  =f.  s), \nwhere the scaling ansatz is  reflected in  the normalisation constants.  Introducing these ob(cid:173)\njects using 8 functions,  as  with the  local  fields  hi (s),  removes the product of two patterns \nin the last line of eq.  (4), so  that the exponent will  be linear in  the pattern bits. \n\nBecause macroscopic observab1es will in general not depend on the microscopic realisation \nof the patterns, the values of these observab1es do  not change if we average Z['ljJ]  over the \nrealisations of the  patterns.  Performing this  average  is  complicated by  the  occurrence of \nsome  patterns  in  both  the  condensed  and  the  noncondensed  overlaps,  depending  on  the \ncurrent time index, which is  an effect not occurring in  the standard Hopfield model.  Using \nsome  simple  scaling  arguments,  this  difficulty  can  be  removed  and  we  can  perform  the \naverage over the noncondensed patterns.  The disorder averaged Z['ljJ]  acquires the form \n\n\f214 \n\nA.  During,  A.  C.  C.  Coo/en  and D.  Sherrington \n\nwhere we have introduced the new observables q(s, S')  = 1/ N L:i ai (s )ai(s'), Q(s, S')  = \nI/N Li hi(S)hi(S'), and K(s, S')  =  I/N Li ai(s)hi(S'), and their corresponding conju(cid:173)\ngate variables.  The functions in  the exponent turn out to be \n'l1[m, ril, k, k, q, q, Q, Q, K, K]  =  i L  [m(s)m(s) + k(s)k(s)  - m(s)k(s)]  + \ni  L  [q(s, S')q(S, S')  + O(s, S')Q(S, S') + K(s, s')K(s, Sl)], \n\n(6) \n\ns<t \n\ns,s'<t \n\n<I>[m, k, q, Q, K] = ~ LIn [  L  Pi(a(O)) J II [dh(S;:h(S)] \n\ni \n\nO'(O) .. . O'(t) \n\ns<t \n\neL.<t  [,BO'(S+l)h(s) - ln 2('osh(~h(s))]  X \n\ne- i L\"\"<t  [q(s,s')O'(s)O'(s')+Q(s ,s')h(s)h(s')+K(s,s')O'(slh(s')]  x \n\nei L\u00ab ,(. j [k( , j-\u2022 .(.j -i(, j,:+< ]-; E.\u00ab a(.j [m(.j\u20ac:  H. (,j]]. \n\n(7) \n\nand \n\nn[q  Q  Q]  =  ~ In /  II [dU(S) dV(S)]  ei L,,>t 2::.<t  U,,+l (s)v,,(s)  X \n\n\"N  \n\n(2rr )(p-t) \n\ns<t \n\ne- ~ L\" >1  L \u2022.\u2022 , < 1  [u\" (s)Q(s,s' )u\" (s' )+u\" (s)K(s' ,s)v\" (s' )+v\" (s)K(s,s' )u\" (s' )+v\" (s)q(s,s' )v\" (s')]  . \n\n(8) \n\nThe first  of these expressions  is just a result of the  introduction  of 6  functions,  while  the \nsecond  will  turn  out to  represent a probability measure given by  the evolution  of a  single \nneuron under prescribed fields  and the  third reflects  the disorder contribution to  the local \nfields in  that single neuron measure2\u2022 We have thus reduced the original problem involving \nN  neurons in  a  one-step Markov process to  one involving just a  single neuron,  but  at  the \ncost of introducing two-time observables. \n\n3  DERIVATION OF SADDLE POINT EQUATIONS \n\nThe  integral  in  (5)  will  be  dominated  by  saddle  points,  in  our  case  by  a  unique  saddle \npoint  when  causality  is  taken  into  account.  Extremising the exponent with respect to  all \noccurring  variables  gives  a  number of equations,  the  most  important  of which  give  the \nphysical meanings of three observables:  q(s, S')  = C(s, S'),  K(s , S')  = iG(s, s'), \n\nm(s) =  lim  N1  ' \"  (at (s)~i \n\nN~oo \n\n6 \n\n(9) \n\nwith \n\n2We have assumed p(u(O)) =  n, p,(a,(O)). \n\n(10) \n\n1 \n\n\u2022 \n\nG(s, s  ) =  hm  N  6 \n\nN~oo \n\n1  '\" a(ai(s) \n\nae  ( ')  , \n\ni  S \n\nt \n\n\fPhase Diagram and Storage Capacity of Sequence-Storing Neural Networks \n\n215 \n\nwhich are the single-site correlation and response functions, respectively.  The overline . . . \nis taken to represent disorder averaged values. Using also additional equations arising from \nthe normalisation Z[O]  = 1, we can rewrite the single neuron measure ell  as \n\n(f[{u}])* = 2::  In  [dh(S;:h(S)]  p(a(O))J[{u} ]eLs< t [t30'(S+1 )h( s)- ln 2COSh (.L3 h (s)) ] \n\nO'o ... O'(t)  s< t \n\n(11 ) \nwith  the  short-hand R  = L:~o GtlCGl .  To  simplify  notation,  we  have  here  assumed \nthat  the  initial  probabilities  Pi(ai(O ))  are  uniform  and  that  the  external  fields  Oi(S)  are \nso-called  staggered  ones,  i.  e.  Oi (s)  =  O~:+ 1,  which  makes  the  single  neuron  measure \nsite-independent.  This  single  neuron  measure  (II) represents  the  essential  result  of our \ncalculations and is already properly normalised (i.e.  (1)  = 1). \n\n* \n\nWhen one compares the present form  of the single neuron measure with that obtained for \nthe  symmetric  Hopfield  network, One  finds  in  the  latter model  an  additional  term  which \ncorresponds to  a retarded self-interaction.  The absence of such a term  here suggests that \nthe present model will  have a higher storage capacity.  It can be explained by  the constant \nchange of state  of a  large  number of neurons  as  the  network goes  through the  sequence, \nwhich prevents the  build-up of microscopic memory of past activations. \n\nHowever, as is the case for the standard Hopfield model,  the measure (II) is  still  too com(cid:173)\nplicated  to  find  explicit equations for the observables  we are  interested  in.  Although it  is \npossible to evaluate the necessary integrals numerically, we instead concentrate on the inter(cid:173)\nesting behaviour when transients have died out and time-translation invariance is  present. \n\n4  STATIONARY STATE \n\nWe  will  now concentrate on the behaviour of the network at the stage when transients have \nsubsided and the system is  on a macroscopic limit cycle. Then the relations \n\nm(s) = m \n\nC(s , s')  = C(s - s') \n\n(12) \nhold  and  also  R(s , s')  =  R(s - s') .  We  can  then  for  simplicity  shift  the  time  origin \nto  =  - 00  and  the  upper temporal  bound to t  =  00 .  Note, however,  that this  state  is  not \nto  be confused with microscopic equilibrium in  the thermodynamic sense.  The stationary \nversions of the measure (11) for the interesting observables are then given by the following \nexpressions (note that C(O)  = 1): \n\nG(s , s')  =  C(s - s'). \n\nm  = I II dV( S;:W(S) e ivw-!w .Rw tanh/3[m + 0 + Q! v(O)] \nC(T  f=  0)  = In dV(S~:w(S) e iv .w-!w .Rw  x \n\ns \n\ns \n\ntanh B[m + 0 + Q~V(T)] tanh /3 [m + 0 + Q ~ V(O)] \n\nG( T)  =  (30,,1 [1 -J If dv(s~~w(s) e'vw- ;wRw tanh'  (3 [m + B + ,, '\"(0) 1] \n\n(13) \n\nand we notice that the response function  is  nOw  limited to  a single time step,  which again \nreflects the influence of the uncorrelated flips  induced by the sequence recall.  These equa(cid:173)\ntions can be sol ved by  separating the persistent and fluctuating parts of C( T)  and R( T), \n\n\f216 \n\nA.  During,  A. C.  C.  Coolen and D. Sherrington \n\nC(T)  = q + C(T), \n\nR(T)  =  r  + R(T), \n\nlim  C(T)  =  lim  R(T)  = O. \nT=\u00b1OO \n\nT=\u00b1OO \n\nDoing so  eventually leads us to  the coupled equations \n\np = [1  - ,82(1  - q)2rl \nm  =  /  Dz  tanh,8[m + e + zv'aP] \nq = /  Dz  tanh2 ,8[m + e + zv'aP] \n\n(14) \n\n(15) \n\nq = /  Dz  [/ Dx  tanh,8 [m + e + zJoqp + xV 0(1 - q)p] r  (17) \n\n(16) \n\nNote that the three equations (14-16) form a closed set,  from  which the persistent corre(cid:173)\nlation q simply follows. \n\n5  PHASE DIAGRAM AND STORAGE CAPACITY \n\n1.0 \n\n0.8 \n\n0.6 \n\nT \n\n0.4  I \nr \n, \n0.2  I \nI-\n\n0.0 \n\n0.0 \n\np \n\nR \n\n0.1 \n\n0.2 \n\n0.3 \n\na \n\nFigure 1:  Phase diagram of the sequence storage network, in  which one finds  two phases: \na recall  phase (R),  characterized by  {m  f:.  0,  q  > 0,  ij  > O},  and a paramagnetic phase \n(P),  characterized by  {m  = 0,  q  = 0,  q > O}.  The solid line separating the  two  phases \nis  the theoretical prediction for the (discontinuous) phase transition.  The markers represent \nsimulation  results,  for  systems  of N  =  10, 000  neurons  measured  after  2, 500  iteration \nsteps, and obtained by  bisection in o.  The precision in  terms of 0  is  at least  6.0  = 0.005 \n(indicated by  error bars); the values for T  are exact. \n\nThe coupled equations (14-17) can  be  solved  numerically for e = 0 to  find  the  area in \nthe o-T plane where solutions m  f:.  0 -\ncorresponding to  sequence recall- exist.  The \nboundary of this  area describes the storage capacity  of the system.  This theoretical  curve \ncan then be compared with computer simulations directly performing the neural dynamics \n\n\fPhase Diagram and Storage Capacity ojSequence-Storing Neural Networks \n\n217 \n\ngiven by  (I) and (2).  We show the result of doing both in  the same accompanying diagram . \nWe  find  that there are only two types of solutions, namely  a recall  phase R  where m  f:.  0 \nand  q  f:.  0,  and  a  paramagnetic phase where m  = q  = O.  Unlike the  standard Hopfield \nmodel,  the  present model does not have a spin  glass  phase with m  =  a and q  f:.  O.  The \nagreement between  simulations (done here for  N  =  la , 000 neurons) and theoretical  re(cid:173)\nsults  is  excellent and separate simulations of systems with up to  N  = 50, 000 neurons to \nassess finite size effects confirm that the numerical data are reliable. \n\n6  DISCUSSION \n\nIn  this  paper, we have used path integral methods to  solve in  the infinite system size  limit \nthe dynamics of a  non-symmetric neural network model, designed to store and recall  a se(cid:173)\nquence of patterns, close to saturation. This model has been known for over a decade from \nnumerical  simulations  to  possess  a  storage capacity  roughly  twice  that  of the  symmetric \nHopfield model , but no rigorous analytic results  were available.  We  find  here that in  con(cid:173)\ntrast to equilibrium statistical  mechanical methods, which do not apply due to the absence \nof detailed  balance, the  powerful path integral formalism  provides us  with a  solution  and \na transparent explanation of the increased storage capacity. It turns out that this higher ca(cid:173)\npacity is  due to the absence of a retarded self-interaction, viz.  the absence of microscopic \nmemory of activations. \n\nThe theoretically obtained phase diagram can be compared to the results of numerical sim(cid:173)\nulations and we  find  excellent  agreement.  Our confidence in  this agreement is  supported \nby additional simulations to study the effect of finite size scaling. Full details of the calcu(cid:173)\nlations will  be presented elsewhere [7] . \n\nReferences \n\n[I]  Sherrington D and Kirkpatrick S  1975 Phys. Rev.  Lett. 35 1972 \n[2]  Amit D J, Gutfreund H, and Sompolinsky H  1985 Phys. Rev.  Lett.  55  1530 \n[3]  Amari Sand Maginu K  1988 Neural Networks  1 63 \n[4]  de Dominicis G  1978 Phys. Rev.  B 184913 \n[5]  Rieger H,  Schreckenberg M,  and Zittartz J  1988 J.  Phys. A: Math. Gen.  21  L263 \n[6]  Kuhn  R  and  van  Hemmen  J L  1991  Temporal  Association  ed  E  Domany,  J L  van \n\nHemmen , and K Schulten (Berlin, Heidelberg:  Springer) p 213 \n\n[7]  During A, Coolen A C C, and Sherrington D  1998 J.  Phys. A:  Math.  Gen. 31  8607 \n\n\f", "award": [], "sourceid": 1587, "authors": [{"given_name": "A.", "family_name": "D\u00fcring", "institution": null}, {"given_name": "Anthony", "family_name": "Coolen", "institution": null}, {"given_name": "D.", "family_name": "Sherrington", "institution": null}]}