{"title": "Some results on convergent unlearning algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 358, "page_last": 364, "abstract": null, "full_text": "Some results on convergent  unlearning \n\nalgorithm \n\nSerguei A.  Semenov &:  Irina B.  Shuvalova \n\nInstitute of Physics  and Technology \n\nPrechistenka St.  13/7 \nMoscow  119034,  Russia \n\nAbstract \n\nIn  this  paper  we  consider  probabilities of different  asymptotics of \nconvergent  unlearning  algorithm for  the  Hopfield-type  neural  net(cid:173)\nwork  (Plakhov  &  Semenov,  1994)  treating  the  case  of unbiased \nrandom  patterns.  We  show  also  that  failed  unlearning  results  in \ntotal memory breakdown. \n\n1 \n\nINTRODUCTION \n\nIn  the  past  years  the  unsupervised  learning  schemes  arose  strong  interest  among \nresearchers but for the time being a little is known about underlying learning mech(cid:173)\nanisms, as well  as still less  rigorous results like convergence  theorems were obtained \nin  this  field.  One  of promising concepts  along  this  line  is  so  called  \"unlearning\" \nfor  the Hopfield-type  neural  networks  (Hopfield et ai,  1983,  van  Hemmen &  Klem(cid:173)\nmer,  1992,  Wimbauer  et  ai,  1994).  Elaborating that  elegant  ideas  the  convergent \nunlearning algorithm has  recently  been  proposed  (Plakhov  &  Semenov,  1994),  ex(cid:173)\necuting  without  patterns  presentation.  It  is  aimed  at  to  correct  initial  Hebbian \nconnectivity in  order to provide extensive storage of arbitrary  correlated  data. \nThis algorithm is  stated  as follows.  Pick  up  at iteration step  m,  m  =  0,1,2, ... a \nrandom network  state s(m) = (S~m), .. . , S~m), with  the  values  sfm) = \u00b11  having \nequal  probability  1/2, calculate local fields  generated  by  s(m) \n\nh~m) =  ~ J~~)S~m) \n, \n' \n\n'J \n\nJ \n\nt  =  , ... ,  , \n. \nN \n\n1 \n\nN \nL..J \ni=l \n\nand then  update the synaptic weights  by \n\nJ ~~+1) =  J~~) - cN-lh~m)h~m) \n' J  ' \n'J \n\n'J \n\n..  1 \nt, J  =  , ... , \n\nN \n\n. \n\n(1) \n\n\fSome  Results  on  Convergent  Unlearning  Algorithm \n\n359 \n\nHere  C  >  0  stands  for  the  unlearning  strength  parameter.  We  stress  that  self(cid:173)\ninteractions,  Jii ,  are  necessarily  involved  in  the  iteration  process.  The initial  con-\ndition for  (1)  is  given  by  the  Hebb  matrix, J~O) =  J!f: \n\n(2) \n\nwith  arbitrary  (\u00b11)-patterns eJJ ,  J.l  = 1, ... ,p. \nFor C < Ce,  the (rescaled)  synaptic matrix has been  proven  to converge  with proba(cid:173)\nbility one  to the  projection one on  the linear subspace  spanned by  maximal subset \nof linearly  independent  patterns  (Plakhov  &  Semenov,  1994).  As  the  sufficient \ncondition for  that  convergence  to occur,  the value of unlearning strength  C  should \nbe  less  than  Ce  =  '\\;~~x  where  Amax  denotes  the  largest  eigenvalue  of  the  Hebb \nmatrix.  Very  often  in  real-world  situations  there  are  no  means  to  know  Ce  in  ad(cid:173)\nvance,  and  therefore  it  is  of interest  to  explore  asymptotic  behaviour  of iterated \nsynaptic matrix for  arbitrary values  of c.  As  it is  seen,  there  are  only  three  possi(cid:173)\nble  limiting behaviours of the  normalized synaptic matrix (Plakhov  1995,  Plakhov \n&  Semenov,  1995).  The corresponding convergence  theorems  relate  corresponding \nspectrum dynamics to limiting behaviour of normalized synaptic matrix j  = J IIIJII \n( IPII  = (L:~=1 Ji~)1/2  )  which  can  be  described  in  terms  of A~n;2  the  smallest \neigenvalues of J(m): \n\nI.  if A~2 =  0  for  every  m  =  0,1,2, ... ,  with  multiplicity  of zero  eigenvalue  being \nfixed,  then \n\n(A) \n\nlim  j~r:n) = S-1/2 PiJ\u00b7 \n\nm-oo \n\nIJ \n\nwhere  P  marks the  projection  matrix on  the linear subspace  CeRN spanned  by \nthe nominated patterns set eJJ ,  J.l  = 1, . .. , p,  s = dim C ~ p; \nII.  if A~n;2 = 0,  m  = 0,1,2, ... , besides  at some (at least  one)  steps  mUltiplicity  of \nzero  eigenvalue increases,  then \n\n(B) \n\nI\u00b7 \n1m \nm-oo \n\nJ-(m)  -\n-\n\n.. \nIJ \n\n'-1/2p' \n.. \ns \nIJ \n\nwhere  P'  is  the  projector on some subspace C'  C  C,  s' =  dimC' < s; \nIII.  if  A~n;2 < 0 starting from some value of m,  then \n\n(C) \n\n(3) \n\nwith some (not  a  \u00b11) unity random vector e = (6, \u00b7 .. ,eN). \nThese  three cases  exhaust  all  possible asymptotic behaviours of ji~m), that is  their \ntotal  probability  is  unity:  PA  + PB + Pc  =  1.  The patterns set  is  supposed  to  be \nfixed. \n\nThe convergence  theorems say nothing about relative  probabilities to have  specific \nasymptotics depending on model parameters.  In this paper we  present some general \nresults elucidating this question  and verify  them by  numerical simulation. \nWe  show  further  that  the  limiting synaptic  matrix for  the  case  (C)  which  is  the \nprojector on -e E C cannot maintain  any  associative  memory.  Brief discussion  on \nthe retrieval  properties of the intermediate case  (B)  is  also given . \n\n\f360 \n\nS.  A.  SEMENOV, 1. B. SHUVALOVA \n\n2  PROBABILITIES  OF  POSSIBLE  LIMITING \n\nBEHAVIOURS  OF  j(m) \n\nThe unlearning procedure under consideration is stochastic in nature.  Which result \nof iteration  process,  (A), (B)  or (C),  will  realize  depends  upon  the value  of \u20ac,  size \nand statistical  properties  of the  patterns set  {~~,  J.l  =  1, ... , p}, and  realization of \nunlearning sequence  {sCm),  m=0,1,2, . .. }. \n\nUnder  fixed  patterns  set  probabilities of appearance  of each  limiting behaviour  of \nsynaptic  matrix is  determined  by  the  value  of unlearning strength  E  only.  In  this \nsection  we  consider these  probabilities as  a  function  of E. \n\n0,  otherwise  Pc(E)  -\n\nGenerally speaking, considered  probabilities exhibit strong dependence  on  patterns \nset,  making impossible to calculate them explicitly.  It is  possible however  to obtain \nsome general  knowledge  concerning  that  probabilities,  namely:  P A (E)  -\n0+,  and  hence,  PB,c(E)  -\n0, \nbecause  of P A + PB + Pc =  1.  This means that  the  risk  to have failed  unlearning \nrises  when  E  increases.  Specifically,  we  are able  to  prove  the following: \nProposition.  There  exist  positive \u20acl  and \u20ac2 \nand  Pc(\u20ac)  = 1,  \u00a32  < \u20ac. \nBefore passing to the proof we  bring forward an alternative formulation of the above \nstated classification.  After multiplying both sides of(l) by  sIm)sjm)  and summing \nup over  all  i  and j, we  obtain in  the matrix notation \n\nsuch  that  P A (\u00a3)  =  1, \n\n00,  and  PA,B(\u00a3)  -\n\n0 < \u20ac  < \u20acl, \n\n1 as  E -\n\n1 as  E -\n\ns(m)T J(m+l)s(m) = D.ms(m)T J(m)s(m) \n\n(4) \nwhere  the  contraction  factor  D. m  = 1 - EN-ls(m)T J(m)s(m)  controls  the  asymp(cid:173)\ntotics  of  j(m),  as  it  is  suggested  by  detailed  analysis  (Plakhov  &  Semenov,  1995). \n(Here and below superscript T  designates the transpose.)  The hypothesis of conver-\ngence  theorems  can  be thus restated  in  terms of D.m ,  instead of .A~'7~,  respectively: \nI.  D.m  > 0  'tim; \nIII.  D.m  < 0 at some step \nm. \n\nII.  D.m  =  0 for  I steps  ml, ... , ml; \n\nIt is  obvious  that  D.m  2  1 - \u20ac.Af:a1  where  .A~\"!1  marks  the  largest  eigen(cid:173)\nProof \nvalue  of J(m) .  From  (4),  it  follows  that  the  sequence  p~\"!t,  m  =  0,1,2, ... }  is \nnon increasing,  and consequently D.m 2 1 - \u00a3.A~~x with \n\n.A~~x =  s~ xT  JHx =  s~ N- l t (L~rxi)2 \n=:;  sup  N- 1 L L)~n2 L xl =  p. \n\nIxl-l \n\nIxl-l \n\nP  N \n\ni \nN \n\n~=1 \n\nIxl=l \n\n~=l i=l \n\ni=l \n\nFrom this, it is straightforward to see that, if \u00a3  < p-l , then  D. m  > 0 for  any m.  By \nconvergence  theorem  (Plakhov  &  Semenov,  1995)  iteration  process  (1)  thus  leads \nto the limiting relation  (A). \nLet  by  definition  \"I  =  mins N-lsr JHS where  minimum is  taken  over such  (\u00b11)(cid:173)\nvectors  S  for  which  JH S  =1=  0  (-y  >  0,  in  view  of positive semidefiniteness  of JH), \nand put \u20ac  > \"1- 1 .  Let  us further denote by  n  the iteration step such  that JH sCm)  = \n0,  m = 0,1, ... , n  - 1 and  JH sen)  =1=  O.  Needless  to say  that this condition may  be \nsatisfied  even  for  the initial step  n = 0:  JH S(O)  =1=  O.  At step  n one has \n\nD. n  =  1 - EN- 1 s(n)T JH sen)  =:;  1 - q  <  O. \n\n\fSome  Results  on  Convergent  Unlearning  Algorithm \n\n361 \n\nThe latter implies loss  of positive semidefiniteness  of J(m),  what results  in asymp(cid:173)\ntotics  (C)  (Plakhov,  1995,  Plakhov &  Semenov,  1995).  By  choosing  Cl  = p-l and \nC2  = 1'-1  we  come to the statement of Proposition. \nComparison of numerical  estimates of considered  probabilities with  analytical  ap(cid:173)\nproximations can be done on simple patterns statistics.  In what follows  the patterns \nare  assumed  to be  random and unbiased. \nThe dependence P(c) has been found in computer simulation with unbiased random \npatterns.  It is  worth  noting,  by  passing,  that  calculation  Llm  using  current simu(cid:173)\nlation  data supplies  a  good  control  of unlearning  process  owing  to  an  alternative \nformulation  of convergence  theorems.  In  simulation  we  calculate pf (c)  averaged \nover  the  sets  of unbiased  random  patterns,  as  well  as  over  the  realizations  of un(cid:173)\nlearning sequence.  As  N  increases,  with  0: = piN remaining fixed,  the curves slope \nsteeply  down  approaching step function  PA'(c)  =  O(c - 0:- 1 )  (Plakhov & Semenov, \n1995).  Without  presenting  of derivation  or  proof we  will  advance  the  reasoning \nsuggestive  of it.  First  it  can  be  checked  that  Ll m  is  a selfaveraging  quantity  with \nmean  1 - cN- 1TrJ(m)  and  variance  vanishing  as  N  goes  to infinity.  Initially one \nhas  N- 1TrJ H  = 0:,  and obviously the sequence  {TrJ(m),  m = 0,1,2, ... } is  nonin(cid:173)\ncreasing.  Therefore  Llo  =  1 - cO:,  and  all  others  Llm  are  not  less  than  Llo.  If one \nchooses c < 0:- 1 ,  then all  Llm  will be positive,  and the case (A)  will realize.  On the \nother hand,  when c > 0:- 1 ,  we  have  Llo  < 0,  and the case  (C)  will  take place. \nWhat is  probability  for  asymptotics (B)  to  appear?  We  will  adduce  an  argument \n(detailed  analysis  (Plakhov  &  Semenov,  1995)  is  rather  cumbersome  and  omitted \nhere)  indicating that  this probability is  quite small.  First note  that given  patterns \nset it is nonzero for isolated values of c only.  Under the assumption that the patterns \nare random and unbiased, we  have calculated probability of I-fold appearance Llm  = \no summed  up  over  that  isolated  values  of c.  Using  Gaussian  approximation  at \nlarge  N,  we  have  found  that  probability  scales  with  N  as  N'/2+2-21+m+l.  The \ntotal  probability  can  then  be  obtained  through  summing  up  over  integer  values \nI:  0  <  I  <  s  and  all  the  iteration  steps  m  = 0,1,2, ....  As  a  result,  the  main \ncontribution  to the  total probability comes from  m = 0  term which  is of the order \nN- 3 / 2 . \n\n3  LIMITING RETRIEVAL PROPERTIES \n\nHow does reduction of dimension of \"memory space\"  in  the case (B),  5  ~ 5'  =  5-1, \naffect  retrieval properties of the system?  They may vary considerably depending on \nI.  In the most probable case I =  1 it is expected  that there will be a slight decrease \nin storage  capacity but  the size  of attraction  basins will  change  negligibly.  This is \ncorroborated by  calculating the stability parameter for  each  pattern  J.I. \n\nI-'  _  cl-'  ' \" '  pi  cl-' \n\"'i  - <'i  ~ ij<'j\u00b7 \n\njti \n\n(5) \n\nLet  SemI)  be  the  state  vector  with  normalized  projection  on  C  given  by  V  = \nps(mI) IIPs(mI)1  such  that \n\nIPs(ml)1 = Jo:N,  ~ '\" N-l/2,  L ~~r '\" 1. \n\nN \n\ni=1 \n\nThen the stability parameter (5)  is  estimated by \n\n\",r = ~r L (Pij  - ~Vj)~j = (1- Pii)- (~~r t Vj~j - Vi 2 )  ~ 1-Pii+O(N- 1/ 2 ). \n\nj~i \n\nj=1 \n\n\f362 \n\nS. A. SEMENOV, I.  B.  SHUV ALOVA \n\nSince Pij  has mean a  and variance vanishing as N  --t  00, we  thus conclude that the \nstability  parameter only  slightly  differs  from  that  calculated for  the projector  rule \n(s  =  s')  (Kanter & Sompolinsky,  1987). \nOn the other hand, in the situation 0 < s' /s  ~ 1 (the possible case i  = 0 is trivial) \nthe system will be capable retrieving only  a few  nominated patterns which ones  we \ncannot specify  beforehand.  As  mentioned  above,  this case  realizes  with very small \nbut finite  probability. \nThe main effect of self-interactions Jji lies in substantial decrease in storage capacity \n(Kanter  &  Sompolinsky,  1987).  This  is  relevant  when  considering  the  cases  (A) \nand  (B).  In  the  case  (C)  the system  possesses  an  interesting  dynamics exhibiting \npermanent walk over the state space.  There are no fixed  points at all.  To show this, \nwe  write  down  the  fixed  point  condition  for  arbitrary state  S:  Si I:f:l JjjSj  > \n0,  i  =  1, ... , N.  By  using  the  explicit  expression  for  limiting  matrix ~j (3)  and \nsumming up over  i's,  we  get  as  a  result (I:j Sj\u20acj)2  < 0,  what is  impossible. \nIf self-interactions are excluded from local  fields  at  the stage of network  dynamics, \nit  is  then  driven  by  the  energy  function  of the  form  H  =  -(2N)-1 I:itj JjjSjSj. \n(Zero-temperature  sequential  dynamics either  random or  regular  one is  assumed.) \nIn  the rest  of this section  we  examine dynamics of the network equiped  with limit(cid:173)\ning synaptic  matrix (C)  (3).  We  will show  that in  this  limit the system  lacks  any \nassociative memory.  There are a single global maximum of H  given  by Sj  = sgn(\u20acd \nand  exponentially many  shallow  minima concentrated  close  to  the  hyperplane  or(cid:173)\nthogonal to \u20ac.  Moreover it is  turned out that all the metastable states are  unstable \nagainst single spin flip only,  whatever the realization of limiting vector \u20ac.  Therefore \nafter a spin flips  the system can relax into a new nearby energy minimum.  Through \na  sequence  of steps  each  consisting of a  single spin flip  followed  by  relaxation  one \ncan,  in  principle,  pass  from one metastable state to the other one. \n\nWe  will  prove  in  what  follows  that  any  given  metastable  state  S'  one  can  pass  to \nany  other  one  S  through  a  sequence  of steps  each  consisting  of a  single  spin  flip \nand subsequent  relaxation  to  a  some  new  metastable  state.  Note  that  this general \nstatement gives  no indications concerning the order of spin flips  when  moving along \na  particular trajectory in  the state space. \n\nNow  on  we  turn  to  the  proof.  Let  us  enumerate  the  spins  in  increasing  order  in \nabsolute  values  of vector  components 0 ~ 161  ~ ...  ~ I\u20acNI.  The proof is  carried out \nby induction on j  =  1, ... , N  where  j  is  the maximal index for  which SJ  1=  Sj. \nFor  j  =  1  the  statement  is  evident.  Assuming  that  it  holds for  1, ... , j  - 1  (2  ~ \nj  ~ N),  let  us  prove  it for  j.  One  has  j  = max { i:  Sf 1=  Sd.  With flipping  spin \nj  in  the state Sl, we  next  allow  relaxation  by  flipping  spins 1, .. . ,j - 1 only.  The \nsystem  finally  reaches  the  state  S2  realizing  conditional  energy  minimum  under \nfixed  Sj, ... , S N \u2022 \n\nShow that S2  is  true energy  minimum.  There are two possibilities: \n(i)  For  some  i,  1 ~ i  ~ j  - 1,  one  has sgn (\u20acj  Sn  = sgn (\u20acT  S2)  . The fixed  point \ncondition for  S2  can be then written  as \n\nI \u20acT  S2  I~ min {I\u20acd:  1 ~ i  ~ j  - 1,  sgn(\u20acjS;)  = sgn(\u20acT  S2)} . \n\nl.From  this,  in  view of increasing order of I\u20aci  I 's,  one gets  immediately \nI \u20acTS2  I~ min {I\u20acd:  1 ~ i  ~ N,  sgn(\u20acjS;)  = sgn(\u20acT  S2)}  , \n\nwhat implies S2  is  true energy  minimum. \n\n\fSome  Results  on  Convergent  Unlearning  Algorithm \n\n363 \n\nIf ~T S2  = 0,  the fixed  point  condition for  S2  is  automatically satisfied.  Otherwise, \nfor  1 $  i $  j  - 1 one has \n\nand \n\n~TS2 =  -sgn(~T S2) 2: 1~s:1 + 2:~iSs:, \n\nj-l \n\nN \n\ni;;;1 \n\ni=j \n\n(6) \n\nFor  the sake  of definiteness,  we  set  ~T S  > O.  (The opposite case  is  treated  analo(cid:173)\ngously.)  In  this case  ~T S2  > 0, since otherwise,  according to  (6),  it should be \n\nj-I \n\no ~ ~T S2  = I: l~s:I + 2: ~iSi ~ ~T S, \n\nN \n\ni=1 \n\ni=j \n\nwhat contradicts our setting. \n\nOne thus obtains \n\nj-l \n\n~TS2 = - I: I~d + L~iSi $  ~TS, \n\nN \n\ni=1 \n\ni=j \n\n(7) \n\nand using  the fixed  point condition for  S one gets \n\n~T S $  min {1~s:I:  ~iSi > O}  $  min {1~s:I:  j  $  i  $  N,  ~iSi > O} \n= min{l~s:I:  ~iSf > O}. \n\n(8) \nIn the latter inequality of(8) one uses that ~iSf < 0,  1 $  i  ~ j-l and Sf = Ss:,  j  ~ \ni  ~ N.  Taking into  account  (7)  and  (8),  as  a  result  we  come to the  condition for \nS2  to be  true energy  minimum \n\no < ~T S2  ~ min {I~il :  ~iSf > O}  . \n\nAccording to inductive hypothesis, since S; = Si,  j  ~ i  ~ N, from the state S2  one \ncan pass  to S, and  therefore  from  S'  through  S2  to S.  This proves  the statement. \n\nIn general, metastable states may be grouped in clusters surrounded by high energy \nbarriers.  The  meaning of proven  statement  resides  in  excluding  the  possibility  of \neven  such  type  a  memory.  Conversely,  allowing  a sequence  of single spin  flips  (for \ninstance, this can  be done at finite  temperatures)  it is  possible to walk  through  the \nwhole set of metastable states. \n\n4  CONCLUSION \n\nIn  this  paper  we  have  begun  studying on  probabilities of different  asymptotics  of \nconvergent unlearning algorithm considering the case  of unbiased random patterns. \nWe  have shown  also that failed  unlearning results  in  total memory breakdown. \n\nReferences \n\nHopfield, J.J., Feinstein, D.I. &  Palmer, R.G.  (1983) \"Unlearning\"  has a stabilizing \neffect  in  collective memories.  Nature  304:158-159 . \nvan  Hemmen,  J.L.  &  Klemmer,  N.  (1992)  Unlearning  and  its  relevance  to  REM \nsleep:  Decorrelating  correlated  data.  In  J.  G.  Taylor  et  al  (eds.) ,  Neural  Network \nDynamics,  pp.  30-43.  London:  Springer. \n\n\f364 \n\nS.  A. SEMENOV, I. B. SHUV ALOV A \n\nWimbauer, U., Klemmer, N. & van Hemmen, J .L. (1994) Universality of unlearning. \nNeural Networks 7:261-270. \nPlakhov,  A.Yu.  &  Semenov,  S.A.  (1994)  Neural  networks:  iterative  unlearning \nalgorithm converging to the  projector rule matrix.  J.  Phys.I France 4:253-260. \nPlakhov, A.Yu.  (1995)  private communication \nPlakhov,  A.Yu .  & Semenov, S.A.  (1995)  preprint IPT. \nKanter,  I. &  Sompolinsky,  H.  (1987)  Associative  recall  of memory without errors. \nPhys.  Rev.  A  35:380-392. \n\n\f", "award": [], "sourceid": 1130, "authors": [{"given_name": "Serguei", "family_name": "Semenov", "institution": null}, {"given_name": "Irina", "family_name": "Shuvalova", "institution": null}]}