{"title": "Harmony Networks Do Not Work", "book": "Advances in Neural Information Processing Systems", "page_first": 31, "page_last": 37, "abstract": null, "full_text": "Harmony Networks Do Not  Work \n\nRene  Gourley \n\nSchool of Computing Science \n\nSimon Fraser  University \n\nBurnaby,  B.C., V5A  1S6,  Canada \n\ngourley@mprgate.mpr.ca \n\nAbstract \n\nHarmony  networks  have  been  proposed  as  a  means by  which  con(cid:173)\nnectionist  models can perform symbolic computation.  Indeed,  pro(cid:173)\nponents claim that a harmony network can be built that constructs \nparse trees for strings in  a context free  language.  This paper shows \nthat  harmony  networks  do  not  work  in  the  following  sense:  they \nconstruct  many outputs that are  not valid  parse  trees. \n\nIn order to show  that the notion of systematicity is compatible with connectionism, \nPaul  Smolensky,  Geraldine  Legendre  and  Yoshiro  Miyata  (Smolensky,  Legendre, \nand  Miyata  1992;  Smolen sky  1993;  Smolen sky,  Legendre,  and  Miyata  1994)  pro(cid:173)\nposed a mechanism, \"Harmony Theory,\"  by which connectionist models purportedly \nperform  structure  sensitive  operations  without  implementing classical  algorithms. \nHarmony theory  describes  a  \"harmony network\"  which,  in the course of reaching a \nstable equilibrium, apparently computes parse trees  that are valid according to the \nrules  of a  particular context-free grammar. \n\nHarmony  networks  consist  of four  major  components  which  will  be  explained  in \ndetail in  Section  1.  The four  components are, \n\nTensor Representation:  A means to interpret the activation vector of a  connec(cid:173)\n\ntionist system as  a  parse  tree for  a string in a  context-free  language. \n\nHarmony:  A  function  that  maps all  possible  parse trees  to the  non-positive inte(cid:173)\n\ngers so that a  parse tree is  valid  if and only if its harmony is  zero. \n\nEnergy:  A  function  that  maps  the  set  of activation  vectors  to  the  real  numbers \n\nand which  is  minimized by  certain  connectionist  networks!. \n\nRecursive Construction:  A system for  determining  the weight  matrix of a  con(cid:173)\n\nnectionist  network  so  that if its  activation  vector  is  interpreted  as  a  parse \n\n1 Smolensky,  Legendre and Miyata use the term  \"harmony\"  to refer to both energy and \nharmony.  To distinguish  between them, we  will  use the term that is often used  to describe \nthe Lyapunov  function  of dynamic  systems,  \"energy\"  (see for  example  Golden  1986). \n\n\f32 \n\nR.  GOURLEY \n\ntree,  then the network's energy is the negation of the harmony of that parse \ntree. \n\nSmolen sky  et  al.  contend  that,  in  the  process  of minimizing their  energy  values, \nharmony networks implicitly maximize the harmony of the parse tree represented by \ntheir  activation vector.  Thus,  if the harmony network  reaches  a  stable equilibrium \nwhere the energy is equal to zero,  the parse tree that is represented  by the activation \nvector  must be a  valid parse tree: \n\nWhen  the  lower-level  description  of the  activation-spreading  pro(cid:173)\ncess  satisfies  certain  mathematical properties,  this  process  can  be \nanalyzed  on  a  higher  level  as  the  construction  of  that  structure \nincluding  the  given  input  structure  which  maximizes  Harmony. \n(Smolensky  1993, p848,  emphasis is  original) \n\nUnfortunately,  harmony  networks  do  not  work  -\nthey  do  not  always  construct \nmaximum-harmony parse trees.  The problem is  that the energy function  is defined \non the values of the activation vector.  By contrast, the harmony function is  defined \non  possible  parse  trees.  Section  2  of this  paper shows  that these  two  domains are \nnot equal, that is,  there are some activation vectors that do not represent  any parse \ntree. \n\nThe  recursive  construction  merely  guarantees  that  the  energy  function  passes \nthrough  zero  at  the  appropriate  points;  its  minima are  unrestricted.  So,  while \nit  may  be  the  case  that  the  energy  and  harmony  functions  are  negations  of one \nanother,  it is  not always the  case  that a  local minimum of one is  a  local maximum \nof the other.  More  succinctly,  the  harmony network  will  find  minima that  are  not \neven  trees,  let alone valid parse  trees. \n\nThe reason  why harmony networks do not work is straightforward.  Section 3 shows \nthat  the  weight  matrix must have  only negative eigenvalues, for  otherwise the  net(cid:173)\nwork  constructs  structures  which  are  not  valid  trees.  Section  4  shows  that  if the \nweight  matrix has only negative eigenvalues,  then the energy  function  admits only \na  single zero  -\npreted  as  a  valid parse  tree.  Thus,  the stable points of a  harmony network  are  not \nvalid parse trees. \n\nthe origin.  Furthermore,  we  show  that  the  origin cannot  be  inter(cid:173)\n\n1  HARMONY  NETWORKS \n\n1.1  TENSOR REPRESENTATION \n\nHarmony theory makes use of tensor products (Smolensky 1990; Smolensky, Legen(cid:173)\ndre,  and Miyata 1992; Legendre,  Miyata, and Smolensky 1991) to convolve symbols \nwith  their roles.  The resulting products are then  added to represent  a  labelled tree \nusing the harmony network's activation vector.  The particular tensor product  used \nis  very simple: \n\n(aI, a2,\u00b7 \u00b7 \u00b7, an) <8>  (bl , b2,.\u00b7., bm )  = \n\n(albl , alb2, ... , a}bm , a2bl, a2 b2, ... , a2bm, .. . , anbm ) \n\nIf two  tensors  of differing  dimensions  are  to  be  added ,  then  they  are  essentially \nconcatenated. \n\nBinary trees  are represented  with  this  tensor  product using  the following recursive \nrules: \n\n1.  The  tensor  representation of a  tree  containing no vertices  is  O. \n\n\fHarmony Networks Do Not Work \n\n33 \n\nTable 1:  Rules for determining harmony and the weight matrix.  Let G =  (V, E, P, S) \nbe  a  context-free  grammar  of  the  type  suggested  in  section  1.2.  The  rules  for \ndetermining the harmony of a  tree  labelled  with  V  and  E  are shown  in  the second \ncolumn.  The rules for determining the system of equations for recursive construction \nare shown in the third column.  (Smolensky, Legendre,  and Miyata 1992; Smolensky \n1993) \n\nGrammar  Harmony Rule \nElement \n\nEnergy  Equation \n\nS \n\nxEE \n\nFor every node labelled \nS  add -1  to H(T). \nFor every node labelled \nx  add -1  to H(T). \nFor every node labelled \nx  add -2  or -3  to H(T) \n\nInclude (S+00r,)Wroot(S+00rr) = 2 \nin  the system of equations \nInclude (x +60r,)Wroot (x +60r,) = 2 \nin  the system of equations \nInclude (x+60r,)Wroot(x+00r,) = 4 \nx  E  V\\  depending  on  whether  or 6 in the system of equations, depend-\ning on whether or not x  appears on the \n{S} \nleft  of a  production  with  two  symbols \n\nor  not  x  appears  on \nthe  left  of  a  produc-\ntion  with  two  symbols  on the right. \non the  right. \nFor  every  edge  where \nx  is  the  parent  and  y \n\nx  - yz \nor  x  - is  the  left  child  add  2. \n\nSimilarly,  add  2  every \ntime z  is the right child \nof x. \n\nyE P \n\nInclude in  the system of equations, \n(x + 60 r,)Wroot (6 + y 0  r,) =  -2 \n(0 + y 0  r,)Wroot(x + 60 r,) =  -2 \n(x + 60 r,)Wroot(O + z 0  r,) =  -2 \n(6 + z 0  r,)Wroot(x + 6\u00ae r,) =  -2 \n\n2.  If A is  the root of a tree,  and TL, TR  are the tensor product representations \nof its left subtree and right subtree respectively,  then  A + TL  0  r, + TR 0  rr \nis  the tensor  representation of the whole tree. \n\nThe vectors,  r\"  and  rr  are  called  \"role  vectors\"  and indicate the roles  of left  child \nand right  child. \n\n1.2  HARMONY \n\nHarmony (Legendre,  Miyata, and Smolensky 1990;  Smolensky, Legendre,  and Miy(cid:173)\nata 1992) describes  a way to determine the well-formedness of a potential parse tree \nwith  respect  to  a  particular  context  free  grammar.  Without  loss  of generality,  we \ncan  assume that  the right-hand side  of each  production  has  at  most  two  symbols, \nand if a production has two symbols on the right, then it is  the only production for \nthe  variable on  its left  side.  For  a  given  binary  tree,  T,  we  compute  the  harmony \nof T,  H(T) by  first  adding the negative contributions of all  the  nodes  according  to \ntheir  labels,  and  then  adding  the  contributions of the edges  (see  first  two columns \nof table  1). \n\n\f34 \n\n1.3  ENERGY \n\nR.GOURLEY \n\nUnder  certain  conditions,  some  connectionist  models  are  known  to  admit the  fol(cid:173)\nlowing energy  or Lyapunov function  (see  Legendre,  Miyata,  and Smolensky 1991): \n\nE(a) =  --atWa \n\n1 \n2 \n\nHere,  W  is  the  weight  matrix of the  connectionist  network,  and  a  is  its  activation \nvector.  Every  non-equilibrium  change  in  the  activation  vector  results  in  a  strict \ndecrease  in  the  network's  energy.  In  effect,  the  connectionist  network  serves  to \nminimize its energy  as it moves towards equilibrium. \n\n1.4  RECURSIVE  CONSTRUCTION \n\nSmolensky,  Legendre,  and  Miyata  (1992)  proposed  that  the  recursive  structure  of \ntheir  tensor  representations  together  with  the  local  nature  of the  harmony  calcu(cid:173)\nlation  could  be  used  to  construct  the  weight  matrix for  a  network  whose  energy \nfunction  is  the  negation  of the  harmony of the  tree  represented  by  the  activation \nvector.  First  construct  a  matrix  W root  which  satisfies  a  system of equations.  The \nsystem of equations is  found  by  including equations for  every  symbol and  produc(cid:173)\ntion in  the  grammar, as  shown  in  column  three  of table  1.  Gourley  (1995)  shows \nthat if W  is constructed from copies of W root  according to a  particular formula, and \nif aT  is  a  tensor  representation  for  a  tree, T,  then  E(aT) =  -H(T). \n\n2  SOME  ACTIVATIONS  ARE  NOT TREES \n\nAs  noted  above,  the  reason  why  harmony networks  do  not  work  is  that  they  seek \nminima in their state space which may not coincide with parse tree representations. \nOne  way  to  amelioarate  this  would  be  to  make  every  possible  activation  vector \nrepresent  some  parse  tree.  If every  activation  vector  represents  some  parse  tree, \nthen the rules that determine the weight matrix will ensure  that the energy minima \nagree with the valid parse trees.  Unfortunately, in that case, the system of equations \nused  to determine  W root  has  no solution. \nIf every  activation  vector  is  to  represent  some  parse  tree,  and  the  symbols of the \ngrammar are  two  dimensional, then  there  are  symbols represented  by  each  vector, \n(Xl, xt), (Xl, X2), (X2' xt), and  (X2' X2),  where  Xl  1=  X2 .  These  symbols must satisfy \nthe equations given  in  table  1  , and so, \n\nXi{Wrootll  + Wroot12  + Wroot~l + Wroot~~) \nXiWrootll  + XIX2 W root12  + XIX2 W root:n  + x~Wroot:n \nX~Wrootll + XIX2Wrootl~ + XIX2 W root :n  + xiWroot~2 \nx~(Wrootll + Wroot12  + Wroot~l + Wrootn) \n\nIn  that \nBecause  hi  E  {2, 4, 6},  there  must  be  a  pair  hi, hj  which  are  equal. \ncase,  it  can  be  shown  using  Gaussian  elimination  that  there  is  no  solution  for \nWrootll , Wrootl~' Wroot~l , Wroot~~.  Similarly, if the symbols are  represented  by vec(cid:173)\ntors of dimension three  or greater,  the same contradiction occurs. \n\nThus  there  are  some  activation  vectors  that  do  not  represent  any  tree  -\nvalid  or \ninvalid.  The question  now  becomes one of determining whether  all of the harmony \nnetwork's stable equilibria are valid parse  trees. \n\n\fHarmony Networks Do Not Work \n\n35 \n\na \n\nb \n\nFigure 1:  Energy functions of two-dimensional harmony networks.  In each case,  the \npoints i  and f  respectively  represent  an initial and a  final  state of the  network.  In \na, one eigenvector is  positive and the other is negative; the hashed plane represents \nthe plane  E  = 0  which  intersects  the energy  function  and  the  vertical  axis  at  the \norigin.  In  b,  one  eigenvalue  is  negative  while  the  other  is  zero;  The  heavy  line \nrepresents  the intersection of the surface with the plane E = 0 and it intersects  the \nvertical  axis at the origin. \n\n3  NON-NEGATIVE EIGENVECTORS YIELD \n\nNON-TREES \n\nIf any of the eigenvalues of the weight matrix, W, is positive, then it is easy to show \nthat  the  harmony  network  will  seek  a  stable  equilibrium  that  does  not  represent \na  parse  tree  at  all.  Let  A  >  0  be  a  positive  eigenvalue  of  W,  and  let  e  be  an \neigenvector,  corresponding  to A,  that falls  within the state space.  Then, \n\nE(e)  = --etWe  = --Aete  <  O. \n\n1 \n2 \n\n1 \n2 \n\nBecause  the energy  drops below  zero,  the harmony network  would  have to undergo \nan energy  increase  in  order  to find  a  zero-energy  stable  equilibrium.  This  cannot \nhappen,  and so,  the  network  reaches  an  equilibrium with  energy  strictly  less  than \nzero. \n\nFigure la illustrates the energy function of a harmony network where one eigenvalue \nis  positive.  Because  harmony is  the  negation  of energy,  in  this  figure  all  the  valid \nparse trees rest  on  the hashed plane, and all  the invalid parse  trees  are above it.  As \nwe  can see,  the harmony network  with positive eigenvalues will certainly find  stable \nequilibria which  are  not valid  parse tree  representations. \nNow,  suppose  W,  the  weight  matrix, has  a  zero  eigenvalue.  If e  is  an  eigenvector \ncorresponding  to  that  eigenvalue,  then  for  every  real  a,  aWe =  O.  Consequently, \none of the following must be true: \n\n1.  ae is  not a stable equilibrium.  In that case,  the energy function  must drop \na  stable  equilibrium \n\nbelow  zero,  yielding  a  sub-zero  stable equilibrium -\nthat does  not  represent  any  tree. \n\n2.  ae \n\nis  a  stable  equilibrium. \n\nvalid  tree  representation. \n\nThen  for  every  a,  ae  must  be  a \nin  fig-\n\nSuch  a  situation  is  represented \n\n\f36 \n\nR.  GOURLEY \n\nFigure  2:  The energy  function  of a  two-dimensional harmony  network  where  both \neigenvalues are negative.  The vertical axis pierces  the surface at the origin, and the \npoints i  and f  respectively  represent  an initial and a  final  state of the network. \n\nure  Ib  where  the  set  of  all  points  ae  is  represented  by  the  heavy \nline.  This  implies  that  there  is  a  symbol,  (al, a2,  . . . , an),  such  that \nCkl(al , a2, .. . ,an),Ck2(al,a2, . . . ,an), .. . ,an2+l(al,a2, ... , an)  are  also  all \nsymbols.  As  before,  this implies that  Wroot  must satisfy  the equation, \n\n\u00abal, ... , an) + 0 \u00ae r,)  Wroot\u00abal, ... , an) + 0 0  r,) \n\nt \n\n-\n\n-\n\nhi \n2\" '  {2  4  6} \n, \na \u00b7 \n\nhi  E \n\" \n\nfor  i  =  1 ... n2 + 1.  Again using Gaussian elimination, it can be shown that \nthere  is  no solution  to this system of equations. \n\nIn  either  case,  the  harmony network  admits stable equilibria that do  not represent \nany  tree.  Thus,  the eigenvalues must all  be negative. \n\n4  NEGATIVE EIGENVECTORS  YIELD  NON-TREES \n\nIf all the eigenvalues of the weight matrix are negative, then the energy function has \na  very  special  shape:  it is  a  paraboloid centered  on  the  origin  and  concave  in  the \ndirection  of positive energy.  This is easily seen  by  considering  the first  and second \nderivatives of E: \n\n8E(x)  __ ~ W, .. x . \n'.1' \n8x; \n\nL..j \n\n-\n\n8 2 E(x)  -\n8x;8x;  -\n\n-W, . . \n'.1 \n\nClearly,  all  the first  derivatives  are  zero  at  the origin,  and so,  it is  a  critical point. \nNow  the  origin  is  a  strict  minimum  if  all  the  roots  of the  following  well-known \nequation  are  positive: \n\n0= det \n\n= det I-W - All \n\ndet 1- W - All is  the  characteristic polynomial of -W. If A is  a  root  then  it is  an \neigenvalue of - W, or equivalently, it is the negative of an eigenvalue of W .  Because \nall of W's eigenvalues are negative, the origin is  a strict  minimum, and indeed  it is \nthe only  minimum.  Such  a  harmony network is illustrated  in  Figure 2. \n\n\fHannony Networks Do Not Work \n\n37 \n\nThus  the  origin  is  the  only  stable  point  where  the  energy  is  zero,  but  it  cannot \nrepresent  a  parse  tree which  is  valid for  the grammar.  If it does,  then \n\nS + TL  0  r, + TR (9 rr = (0, . . . ,0) \n\nwhere  TL, TR  are  appropriate  left  and  right  subtree  representations,  and  S  is  the \nstart symbol of the  grammar.  Because  each  of the subtrees  is  multiplied by either \nr,  or rr,  they  are not the same dimension as  S,  and are consequently  concatenated \ninstead of added.  Therefore  S  = O.  But then,  Wroot  must satisfy  the equation \n\nThis is  impossible, and so,  the origin is  not a  valid tree  representation. \n\n(0 + 0 (9 r,)Wroot(O + 0 (9 r,) =-2 \n\n5  CONCLUSION \n\nThis  paper  has  shown  that  in  every  case,  a  harmony  network  will  reach  stable \nequilibria  that  are  not  valid  parse  trees.  This  is  not  unexpected.  Because  the \nenergy  function  is  a  very  simple  function,  it  would  be  more  surprising  if such  a \nconnectionist system could  construct  complicated structures such  as  parse  trees for \na  context free  grammar. \n\nAcknowledgements \n\nThe author thanks Dr.  Robert Hadley and Dr.  Arvind Gupta, both of Simon Fraser \nUniversity,  for  their  invaluable comments on a  draft of this  paper. \n\nReferences \n\nGolden,  R. (1986).  The  'brain-state-in-a-box'  neural  model  is  a  gradient  descent \nalgorithm.  Journal  of Mathematical Psychology  30,  73-80. \nGourley, R.  (1995) . Tensor represenations and harmony theory:  A critical analysis. \nMaster's  thesis,  Simon Fraser University,  Burnaby, Canada.  In  preparation. \nLegendre,  G., Y.  Miyata, and P. Smolensky (1990).  Harmonic grammar - a formal \nmulti-level connectionist  theory of linguistic well-formedness:  Theoretical founda(cid:173)\ntions.  In  Proceedings  of the  Twelfth  National  Conference  on  Cognitive  Science, \nCambridge, MA,  pp . 385- 395.  Lawrence  Erlbaum. \nLegendre,  G .,  Y.  Miyata, and P. Smolensky (1991) .  Distributedrecursive structure \nprocessing.  In  B.  Mayoh  (Ed.),  Proceedings  of the  1991  Scandinavian  Conference \non  Artificial Intelligence , Amsterdam, pp.  47-53.  lOS  Press. \nSmolensky,  P.  (1990) .  Tensor  product  variable  binding and  the  representation  of \nsymbolic structures  in connectionist systems.  Artificial Intelligence  46,  159-216. \nSmolensky,  P.  (1993).  Harmonic grammars for  formal  languages.  In  S.  Hanson, \nJ. Cowan, and C. Giles (Eds.),  Advances in Neural Information Processing Systems \n5,  pp. 847-854 . San  Mateo:  Morgan  Kauffman. \nSmolensky,  P.,  G.  Legendre,  and  Y.  Miyata  (1992).  Principles  for  an  integrated \nconnectionist/symbolic theory of higher  cognition.  Technical  Report  CU-CS-600-\n92,  University of Colorado Computer Science  Department. \nSmolensky, P., G.  Legendre, and Y. Miyata (1994) .  Integrating connectionist  and \nsymbolic computation for the theory of language. In V. Honavar and L.  Uhr (Eds.), \nArtificial Intelligence  and  Neural Networks :  Steps  Toward  Principled  Integration, \npp.  509-530. Boston:  Academic Press. \n\n\f", "award": [], "sourceid": 1160, "authors": [{"given_name": "Ren\u00e9", "family_name": "Gourley", "institution": null}]}