{"title": "Dynamically-Adaptive Winner-Take-All Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 341, "page_last": 348, "abstract": null, "full_text": "Dynamically-Adaptive Winner-Take-All Networks \n\nTreat  E.  Laale \n\nArtif1cia1IntcUigeoce Laboratory \nComputer Science Department \n\nUnivmity of California. Los Angeles. CA  90024 \n\nAbstract \n\nWinner-Take-All  (WTA)  networks.  in  which  inhibitory  interconnec(cid:173)\ntions are used to determine the most highly-activated of a pool of unilS. \nare an important part of many  neural network models.  Unfortunately, \nconvergence of normal  WT A networks  is  extremely  sensitive  to  the \nmagnitudes of their weights, which must be hand-tuned and which gen(cid:173)\nerally  only  provide  the  right amount of inhibition across a relatively \nsmall  range of initial  conditions.  This  paper  presents Dynamjcally(cid:173)\nAdaptive Winner-Telke-All  (DA WTA)  netw<rls, which use a regulatory \nunit to  provide the competitive inhibition  to  the units  in  the  network. \nThe DA WT A regulatory unit dynamically adjusts its level of activation \nduring competition to provide  the right amount of inhibition  to differ(cid:173)\nentiate between competitors and  drive a single winner.  This dynamic \nadaptation  allows  DA WT A networks  to  perform  the  winner-lake-all \nfunction  for  nearly  any  network  size or initial condition.  using O(N) \nconnections.  In addition, the DA WT A regulaaory unit can be biased 10 \nfind the level of inhibition necessary to settle upon the K most highly(cid:173)\nactivated units, and therefore serve as a K -Winners-Take-All network. \n\n1.  INTRODUCTION \nWinner-Take-All networks are fixed group of units which compete by mutual inhibition \nuntil the unit with the highest initial activation or input level suppresses the activation of \nall the others.  Winner-lake-all selection of the most highJy-activated unit is an  important \npart of many neural network models (e.g. McCleUand and Rumelhart, 1981; Feldman and \nBallard. 1982; Kohonen. 1984; Tomelzky. 1989; Lange and Dyer, 1989a,b). \n\nUnfortunately, successful convergence in winner-lake-all networks is extremely sensitive \nto the magnitudes of the inhibitory weights between  units and other network parameU7S. \nFor example. a weight value for  the mutually-inhibitory connections allowing the  most \nhighly-activated unit to suppress  the other units  in  one initial condition (e.g. Figure  la) \nmay not provide enough inhibition to select a single winner if the initial input activation \nlevels are closer together and/or higher (e.g. Figure Ib).  On the other hand, if the compe-\n\n341 \n\n\f342 \n\nLange \n\nt.O \nO.sa \n0.8 \n0.7 \n0.8 \n0.5 \n0.4 \n0.3 \n0.2 \n0.1 \n\n1.0 \n\n0.11 \n0.8 \n0.7 \n\n0.8 \n0.5 \n0.4 \n0.3 \n\n0.2 \n0.1 \n\n7.0 \n\n5.0 \n\n3.0 \n\n1.0 \n\n-1.0 \n\n-3.0 \n\n-~o \n\n-7.0 \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n80 \n\n70 \n\n80 \n\nsao \n\n100 \n\n0 \n\n10 \n\n20 \n\n30 \n\n~ \n\n/ \n.I \n\n-\n\n-\n\n. \n\n. \n\n3 \n\n2 \n\n50 \n\n, \n\n# \n\n110 \n\n70 \n\n1011 \n\n110 \n\n10 \n\n- MIf-bill- .1~ \n- MIf~ .120 \n- -,~ . 11U \n-- MII-e- 1.0111 \n\n5 \n\n8 \n\n. \n\n7 \n\nI \n\nII \n\n10 \n\nC) \n\nFigure  1.  Several plots of activation versus  time  for different  initial condi(cid:173)\n\ntions  in  a  winner-lake-all  network  in  which  there  is  a  bidirectional inhibitory \nconnection of weight -0.2 between every pair of units.  Unit activation function \nis  that  from  the  interactive  activation  model  of  McClelland  and  Rumelhan \n(1981).  (a)  Network  in  which  five  units are  given  an  input self bias  ranging \nfrom  0.10 to 0.14.  (b) Network in  which  five units are given an input self bias \nranging from 0.50 to 0.54.  Note that the network ended up  with three  winners \nbecause the inhibitory connections of weight -0.2 did not provide enough inhibi(cid:173)\ntion to suppress the  second and third most-active nodes.  (c) Network in which \n100 units are given an input self bias ranging from  0.01  to 0.14.  The combined \nactivation of all  100 nodes through the inhibitory weight of -0.2 provides far too \nmuch inhibition. causing the network to overreact and oscillate wildly \n\ntition  involves a  larger  number of active  units.  then  the  same  inhibitory  weights may \nprovide  too much  inhibition  and either suppress  the  activations of all  units or lead  to \noscillations (e.g. Figure  Ie). \n\n\fDynamically-Adaptive Winner-Take-All  Networks \n\n343 \n\nBecause of these problems, it is genenlly necessary to hand-tune network paramderS to \nallow for successful winner-lake-all performance in a given neuraI  network archilleCture \nhaving certain expected levels of incoming activations.  For complex networks. this can \nrequire a detailed mathematical analysis of the model (cf. Touretzky & Hinton, 1988) ex' a \nheuristic, computer-assisted trial-and-error search process (cf. Reggia,  1989) to find the \nvalues of inhibitory weights, unit thresholds, and other network parameters necessary for \nclear-cut winner-lake-all performance  in a given  model's input space.  In some cases. \nhowever, no set of network constant network parameters can be found to handle the range \nof possible  initial  conditions a model  may  be faced  with  (Bamden. Kankanaballi. and \nDharmavaratha.  1990). such as when the numbers of units actually competing in a given \nnetwork may be two at one time and thousands at another (e.g. Bamden, 1990; Lange, in \npress). \nThis paper presents a new variant of winnet-take-all networks. the Dynamically-Adaptive \nWinner-Take-All  (DAWfA) network.  DAWTA networks. using O(N) connectioas. are \nable to robustly act as winner-lake-all networks for nearly any network initial condition \nwithout any  hand-tuning of network parameters.  In essence, the DA WT A network dy(cid:173)\nnamically \"tunes\" itself by adjusting the  level of inhibition sent to each unit in  the net(cid:173)\nwork depending upon feedback from  the current conditions of the competition.  In addi(cid:173)\ntion. a biasing activation can be added to  the network to allow  it to act as a K-Winners(cid:173)\nTake-All network (cf.  Majani,  Erlanson. and Abu-Mostafa,  1989).  in  which  the K most \nhighly-activated units end up active. \n\n2.  DYNAMICALL Y -ADAPTIVE  WT A  NETWORKS \n\nThe basic idea behind the Dynamically-Adaptive Winner-Take-All mechanism can be de(cid:173)\nscribed by looking at a version of a winner-lake-all network that is functionally equivalent \nto a nonnal winner-lake-all network but which  uses only O(N) connections.  Several re(cid:173)\nsearchers have pointed out that the (N2_N)(l. bidirectional inhibitory connections (each of \nweight -WI) normally needed in a winner-lake-all network can be replaced by an excitatory \nself-connection of weight WI  for  each unit and a single regulatory  unit that sums up the \nactivations  of all  N  units  and  inhibits  them  each  by  that  -WI  times  that  amount \n(fouretzky &  Hinton.  1988: Majani et al..  1989) (see Figure 2). \n\nWhen  viewed in  this fashion,  the mutually inhibitory connections of winner-lake-all net(cid:173)\nworks can  be seen as a regulator (i.e. the regulatory unit) that is attempting to IYOvide the \nright amount of inhibition to  the network  to  allow the winner-to-be  unit's activation  to \ngrow while suppressing the activations of all others.  This is exactly what happens when \nWI  has  been  chosen  correctly  for  the  activations  of the  network  (as  in  Figure  la). \nHowever, because the amounl of this regulatory  inhibition  is fixed  precisely by  that in(cid:173)\nhibitory weight (i.e.  always equal  to  that weight times  the  sum  of the  network activa(cid:173)\ntions), there  is  no  way  for  it  to  increase when  it is  not enough  (as  in  Figure  Ib) or de(cid:173)\ncrease when it is too much (as in Figure lc). \n\n2.1.  THE  DA WTA  REGULATORY  UNIT \n\nFrom  the  point  of  view  of  the  competing  units'  inputs.  the  Dynamically-Adaptive \nWinner-Take-All network  is  equivalenl to  the regulatory-unit simplification of a nonnal \nwinner-take-all  network.  Each  unit  has  an  excitatory connection  to  itself and  an  in(cid:173)\nhibitory connection from  a regulatory unit whose function  is  to  suppress the activations \n\n\f344 \n\nLange \n\nFigure 2.  Simplification of a standard WTA network using O(n) connectiOllS \nby introduction of a regulatory unit (top node) that sums up the activations of all \nnetwork units.  Each unit has an excitatory connection to itself and an inhibitay \nconnection of weight -WI  from  the regulatory unit  Shading of units (darker = \nhigher) represents their levels of activation at a hypothetical time in  the middle \nof network cycling. \n\nof all  but  the  winning  unitl.  However, the  regulatory  unit itself,  and how  it calculates \nthe inhibition it provides to the network, is different \n\nWhereas the connections to the regulatory unit in a nonnal winner-lake-all network cause \nit to produce an inhibitory activation (i.e. the sum of the units' activations) that happens \nto  work  if its inhibitory  weights  were  set correctly,  the structure of connections to  the \nregulatory  unit in a dynamically-adaptive winner-lake-all network cause it to continually \nadjust its  level of activation  until  the right amount of inhibition  is  found,  regardless of \nthe network's initial conditions.  As  the  network cycles and the winner-lake-all  is being \nperfonned,  the  DA WT A regulatory  unit's activation inhibits the  networks'  units, which \nresults in  feedback to  the  regulatory  unit that causes it to increase its activation if more \ninhibition  is  required  to  induce a single winner,  or decrease  its  activation if less  is  re(cid:173)\nquired.  Accordingly, the  DAWTA regulatory  unit's activation  (aR(t)  now  includes its \nprevious activation, and is the following: \n\nnetR(t+l)  S  -8 \n\n-8  <  net R ( t + 1 )  <  8 \n\nnetR(t+l)  ~  8 \n\nwhere  netR (t+l)  is  the  total net input to  the  regulator at time 1+1. and 8  is a small \nconstant (typically 0.05) whose purpose is  to stop the regulatory unit's activation from \nrising or falling too rapidly on any given cycle.  Figure 3 shows the actual Dynamically(cid:173)\nAdaptive Winner-Take-All network.  As in Figure 2, the regulatory unit is the unit at the \ntop and the competing units are the the circular units at the bottom that are inhibited by it \nand which have connections (of weight ws)  to themselves.  However, there are now  two \n\n1 As in all  winner-lake-all networks, the competing units may also have inputs from \n\noutside the network that provide the initial activations driving the competition. \n\n\fDynamically-Adaptive Winner-Take-All  Networks \n\n345 \n\nFigure 3.  Dynamically-Adaptive Winnez-Take-All Network at a hypothetical \ntime  in  the  middle  of network  cycling.  The  topmost  unit  is  the  DA WT A \nregulatory unit. whose outgoing coonections to all of me competing units II abe \nbottom all have weight -1.  The input -A:-wd is a constant self biasing activation \nto the regulatory unit whose  value determines how  many  winners it will  try to \ndrive.  The  two  middle  units  are  simple  linear summation  units  each  having \ninputs of unit weight that calculate the total activation of the competing units at \ntime I and time I-I, respectively. \n\nintennediate units  that calculate the  net inputs that increase or decrease  the regulatory \nunit's inhibitory activation depending on the state of the competition.  These inputs cause \nthe regulatory unit to receive a net input netR  (t+l)  of: \n\nwhich simplifies to: \n\nnetR(i+I) = Wt{o,(t-l) - k) + wio,(t-l) - o,{t-2\u00bb \n\nwhere  Ol(t) is  the  total  summed output of all of the competing units (calculated by  the \nintennediate  units  shown),  W,  and Wd  are  constant  weights,  and  k is  the  number of \nwinners the network is attempting to seek (1  to perfonn a nonnal winner-lake-all). \n\nThe effect of the above activation function and the connections shown  in Figure 3 is  to \napply two different activation pressures on  the regulatory  unit. each of which combined \nover  time  drive  the  DA WTA  regulatory  unit's  activation  to  find  the  right  level  of \ninhibition  to  suppress  all  but the  winning  uniL  The  most  important  pressure, and  the \nkey  to  the  DA WT A regulatory  unit's  success,  is  that  the  regulatory  unit's activation \nincreases by a factor of W,  if there is too much activation in the network, and decreases by \na corresponding factor if there is not enough activation in  the network.  This is the result \nof the  tenn  w,(o,(I-I) - k)  in  its  net  input function,  which  simplifies  to w,(o,(t-I)  - 1) \nwhen  k equals  1.  The \"right amount\"  of total  activation  in  the  network  is  simply  the \ntotal summed activation of the goal state, i.e.  the winner-lake-all network  state in which \nthere is one active  unit (having activation  I) and in  which all other competing units have \n\n\f346 \n\nLange \n\nbeen driven down to an activation of 0, leaving the IDtal network activation 01..1) equal to \n1.  The factor w,(o,(t-l)  - 1)  of the regulaWry  input's net input will therefore laid to \nincrease the regulatory unit's activation if there are too many units active in abc network \n(e.g.  if there are  three units  with  activity 0.7, 0.5, and 0.3, since  the total  outpUt 0,(1) \nwill be  1.5), to decrease its activation if there is  not enough totally active units in the \nnetwork (e.g. one unit with  activation 0.2 and the rest with activation O.O), and to leave \nits activation unchanged if the activation is the same as the fmal  goal  activation.  Noce \nthat  any  temporary coincidences  in  which  the total  network activation  sums  to  1 but \nwhich  is  not  the  fmal  winner-lake-all state  (e.g.  when  one  unit has  activation 0.6 and \nanother has activation 0.4) will be broken by the  competing units  lhemselves. since the \nwinning  unit's activation will always rise more quickly thaD  the  loser's just by ill own \nactivation function (e.g. that of McClelland and Rumelhart, 1981). \nThe other pressure on Ihe DAWTA regulatory unit. from the wct<o/..t-l) - 0I..t-2\u00bb tam of \nnetR(t+ I), is to tend to decrease the regulator's activation if the overall network activation \nis  falling  too  rapidly,  or to  increase  it if the  overall  network  activation  is  rising  too \nrapidly.  This is essentially a dampening term to avoid oscillaIions in  the network in the \nearly stages of the winner-lake-all, in which there may be many active units whose activa(cid:173)\ntions are falling  rapidly  (due  inhibition from  the regulatory  unit), but in  which  Ihe total \nnetwork activation is still above the final  goal activation.  As can be  seen,  this second \nterm of the regulatory unit's net input will also sum to 0 and therefore leave the regula(cid:173)\ntory  unit's activation  unchanged  when  the  goal  state of the  network  has  been  reached, \nsince the total activation of the network in the winner-take-all state will remain constanl \n\nAll of the weights and connections of the D A WT A network are constant parameters that \nare the same for any size network or set of initial network conditions.  Typically we have \nused  W, = 0.025 and Wd = 0.5.  The actual  values  are  not critical, as  long as Wd  \u00bbWs. \nwhich  assures  that  Wd  is  high enough to dampen  the  rapid rise or fall  in  total  network \nactivation sometimes caused  by  the direct pressure of Wt.  The  value of the regulatory \nunit's  self bias  term  !W,  that  sets the  goal  total  network  activation  that  the regulatory \nunit attempts to reach is simply detennined simply by Ie,  the number of winnas desired \n(1  for a normal winner-lake-all network), and W,. \n3.  RESULTS \nDynamically-adaptive winner-lake-all networks have been tested in the DESCARTES con(cid:173)\nnectionist simulator (Lange,  1990) and used in our connectionist model of short-tenn se(cid:173)\nquential memory (Lange, in press).  Figures 4a-c show the plots of activation vasus time \nin networks given the same initial conditions as those of the normal winner-lake-all net(cid:173)\nwork  shown  in  Figures \nla-c.  Note  that in  each  case  the  regulatory  unit's activation \nstarts off at zero and increases until il reaches a level that provides sufficient inhibition to \nstan driving  the  winner-lake-all.  So whereas the inhibitory weights of -0.2 that worked \nfor  inputs ranging from  0.10 to 0.14  in  the  winner-lake-all  network  in  FigUle  la could \nnot provide enough  inhibition  to  drive a  single winner when  the  inputs  were  ovez 0.5 \n(Figure  1 b).  the  DA WT A regulatory  unit simply  increases its activation  level  until  the \ninhibition it provides is sufficient 10 start suppressing the eventual losers (Figures 4a and \n4b).  As can also be seen in  the  figures, the activation of the regulatory unit tends to vary \nover  time  with  different  feedback  from  the  network  in  a  process  that  maximizes \ndifferentiation between units while assuring that the group of remaining potential winnas \nstays active and are nOl over-inhibited. \n\n\fDynamically-Adaptive Winner-Take-All  Networks \n\n347 \n\n1~~-----------------:=-~ .. \"\"\"\"\"\" ____ \"\"\" \no.et---------:~\"..IOi.._--------=+=_iiif.liIii:lN1 \nOJi-------------~~~------------------------.--.----~~ \nO.7+------~#C_-------------..=-=:...a .. -.oIua._i \no.s+------:~L-----------;;;;~;;~H \n0.5+---~';#Jt..----------------------_t \n0.4+---:2~-------------------------_t \n\n0.7~\"\"\"\"'--\n0.8-t---f---\n\n0.5+.1---\n\n0 \n\n0.1 \n\n0.0 \n\n1.0 \n\no.a \n0.7 \no. \nO.S \n\n0.4 \n\n0.3 \n0.2 \n0.1 \n\n0 \n\n10 \n\n20 \n\n30 \n\neo \n\n7'0 \n\neo \n\n100 \n\nFigure 4.  Plots of activation  versus  time in  a dynamically-adaptive winner(cid:173)\n\ntake-all network given the same activation functions and initial conditions of the \nwinner-take-all  plots  in  Figure  1.  The  grey  background  plot  shows  the \nactivation  level of the regulatory  uniL  (a) With five  units activated  with self(cid:173)\nbiases from  0.10  to  0.14.  (b)  With  five  units activated  with  self-biases  from \n0.50 to 0.54.  (c) With  100 units activated with self-biases from 0.01  to 0.14 \n\nFinally,  though  there is not space to  show the  graphic results  here,  the same DAWTA \nnetwex-ks  have been simulated to drive a successful winnez-take-all within 200 cycles on \nnetworks ranging in  size from  2 to  10,000 units and on  initial conditions where the win(cid:173)\nning  unit has  an  input of 0.000001  to initial  conditions  where the winning  unit has  an \ninput of 0.999,  without tuning the network  in  any  way.  The same networks  have also \nbeen successfully simulated to act as K-wiMer-take-alJ networks (i.e. to select the K most \nactive  units)  by  simply  setting the desired  value for Ie  in  the DA WT A's self bias  term \nkwd\u00b7 \n\n\f348 \n\nLange \n\n4.  CONCLUSIONS \nWe have presented Dynamically-Adaptive Winner-Taite-All netwcdcs. which UIC O(N) \nconnections to perform the winner-take-all function.  Unlike noonaI winner-lake-all net(cid:173)\nworks, DA WT A networks are able to select the most highly-activated unit out of. group \nof units for nearly any network size and initial condition witbout lUning any network pa(cid:173)\nrameters.  They are able  to do so because  the inhibition that drives the  winner-taite-all \nnetwork is provided by a regulatory-unit that is constantly getting feedback from the state \nof the network and dynamically adjusting its level to provide the right amount of inhibi(cid:173)\ntion  to differentiate the winning unit from  the losers.  An important side-feature of this \ndynamically-adaptive  inhibition approach  is  that  it  can  be biased  to  select the K most \nhighly-activated units, and therefore laVe as a K-winnas-take-all netw<rt. \n\nReferences \nBamden.l. (1990).  The power of some unusual coonectiooist dala-SlruCturing techniques. \nIn 1.  A.  Bamden and 1.  B. Pollack (Eels.), AdvalJces ill coltMctiollist and MJlTal com(cid:173)\nputation theory, Norwood, NJ:  Ablex. \n\nBamden, J.,  Kankanahalli,  S.,  Dhannavaratba. D.  (1990).  Winner-tate-all networks: \nTime-based  versus  activation-based  mechanisms  for  various  selection  tasks. \nProceedings 0/ the  IEEE  International  Symposium  on  Circuits and Systems,  New \nOrleans. LA. \n\nFeldman,  J.  A.  &  Ballard,  D.  H.  (1982).  Connectionist models  and  their  properties. \n\nCognitive Science, 6,  205-254. \n\nKohonen, T. (1984).  Self-organization and associative memory.  New York:  Springer(cid:173)\n\nVerlag, Berlin. \n\nLange, T.  (1990).  Simulation  of heterogeneous  neural  networks on  serial and  parallel \n\nmachines.  Parallel Computing,  14,287-303. \n\nLange, T.  (in press).  Hybrid connectionist models:  Temporary bridges over the gap be(cid:173)\ntween the symbolic and the subsymbolic.  To appear in J. Dinsmore (ed.), Closing the \nGap:  Symbolic  vs.  Subsymbolic  Processing.  Hillsdale,  NJ:  Lawrence  Erlbaum \nAssociates. \n\nLange, T. & Dyer, M. O.  (1989a).  Dynamic, non-local role-bindings and inferencing in a \nlocalist network for natural  Janguage understanding.  In David S.  Touretzky. editor, \nAdvances  in  Neural  In/ormation  Processing  Systems  I,  p.  545-552,  Morgan \nKaufmann, San Mateo, CA. \n\nLange,  T.  &  Dyer,  M. O.  (1989b).  High-level inferencing in  a connectionist network. \n\nConnection SciellCe,  I  (2), 181-217. \n\nMajani, E., ErIanson, R.  &  Abu-Mostafa, Y.  (1989).  On  the k-winners-lake-all network. \nIn David S. Touretzky, editor, Advances in Neural InformaJion Processing Systems I, \np. 634-642, Morgan Kaufmann, San Mateo, CA. \n\nMcClelland, J.  L., & Rumelhart, D.  E. (1981).  An  interactive activation model of con(cid:173)\n\ntext effects in JelkS perception:  Part 1.  An account of basic findings.  Psychological \nReview.88,375-407. \n\nReggia,  J.  A.  (1989).  Methods  for  deriving  competitive  activation  mechanisms. \n\nProceedings o/the First AnllUlJi International Joint Conference on Neural Networks. \n\nTouretzky,  D.  (1989).  Analyzing  the energy  landscapes of distributed  winner-lake-all \nnetworks  (1989).  In  David  S.  Touretzky,  editor, Advances  in  Neural  Information \nProcessing Systems I,  p. 626-633, Morgan Kaufmann, San Mateo, CA. \n\nTouretzky,  D.,  &  Hinton,  G.  (1988).  A  distributed  connectionist production  system. \n\nCognitive  Science, 12. 423-466. \n\n\fPART VII \n\nVISION \n\n\f\f", "award": [], "sourceid": 507, "authors": [{"given_name": "Trent", "family_name": "Lange", "institution": null}]}