{"title": "Neural Computation with Winner-Take-All as the Only Nonlinear Operation", "book": "Advances in Neural Information Processing Systems", "page_first": 293, "page_last": 299, "abstract": null, "full_text": "Neural Computation with Winner-Take-All as \n\nthe only Nonlinear Operation \n\nWolfgang Maass \n\nInstitute for Theoretical Computer Science \n\nTechnische UniversWit Graz \n\nA-8010 Graz, Austria \n\nemail:  maass@igi.tu-graz.ac.at \n\nhttp://www.cis.tu-graz.ac.atiigi/maass \n\nAbstract \n\nEverybody \"knows\" that neural networks need more than a single layer \nof nonlinear units to compute interesting functions.  We show that this is \nfalse if one employs winner-take-all as nonlinear unit: \n\n\u2022  Any boolean function can be computed by a single k-winner-take(cid:173)\n\nall unit applied to weighted sums of the input variables. \n\n\u2022  Any continuous  function  can be approximated arbitrarily well  by \na  single soft winner-take-all unit applied to weighted sums of the \ninput variables. \n\n\u2022  Only positive weights are needed in these (linear) weighted sums. \nThis may be of interest from  the point of view of neurophysiology, \nsince only 15% of the synapses in the cortex are inhibitory.  In addi(cid:173)\ntion it is  widely believed that there are special microcircuits in the \ncortex that compute winner-take-all. \n\n\u2022  Our results  support the  view  that winner-take-all  is  a  very  useful \n\nbasic computational unit in Neural VLS!: \no \n\nit  is  wellknown  that  winner-take-all  of n  input  variables  can \nbe  computed  very  efficiently  with  2n  transistors  (and  a  to(cid:173)\ntal  wire  length  and  area  that  is  linear  in  n)  in  analog  VLSI \n[Lazzaro et at.,  1989] \n\no  we show that winner-take-all is  not just useful  for special pur(cid:173)\npose computations, but may serve as the only nonlinear unit for \nneural circuits with universal computational power \n\no  we show that any multi-layer perceptron needs quadratically in \nn many gates to compute winner-take-all for n  input variables, \nhence  winner-take-all  provides a  substantially  more  powerful \ncomputational  unit  than  a  perceptron (at  about the  same  cost \nof implementation in analog VLSI). \n\nComplete  proofs  and  further  details  to  these  results  can  be  found  in \n[Maass, 2000]. \n\n\f294 \n\n1  Introduction \n\nW.  Maass \n\nComputational models that involve competitive stages have so  far been neglected in com(cid:173)\nputational complexity theory, although they are widely used in computational brain models, \nartificial neural networks, and analog VLSI. The circuit of [Lazzaro et aI.,  1989] computes \nan  approximate version of winner-take-all on n  inputs with just 2n transistors and wires \noflength O(n), with lateral inhibition implemented by adding currents on a single wire of \nlength O( n). Numerous other efficient implementations of winner-take-all in analog VLSI \nhave subsequently been produced.  Among them are circuits based on silicon spiking neu(cid:173)\nrons ([Meador and Hylander, 1994], [Indiveri,  1999]) and circuits that emulate attention in \nartificial sensory processing ([Horiuchi et aI.,  1997], [Indiveri, 1999]). Preceding analytical \nresults on winner-take-all circuits can be found in  [Grossberg,  1973] and [Brown,  1991]. \n\nWe will analyze in section 4 the computational power of the most basic competitive compu(cid:173)\ntational operation: winner-take-all (=  l-WTAn).  In section 2 we will discuss the somewhat \nmore complex operation k-winner-take-all (k-WTA n ),  which has also  been implemented \nin analog VLSI [Urahama and Nagao,  1995].  Section 3 is devoted to soft winner-take-all, \nwhich has been implemented by [Indiveri, 1999]  in analog VLSJ  via temporal  coding of \nthe output. \n\nOur results  shows  that winner-take-all  is  a  surprisingly  powerful  computational  module \nin  comparison  with  threshold  gates  (=  McCulloch-Pitts  neurons)  and  sigmoidal  gates. \nOur  theoretical  analysis  also  provides  answers  to  two  basic  questions  that  have  been \nraised  by  neurophysiologists  in  view  of the  well-known  asymmetry  between  excitatory \nand inhibitory connections in cortical circuits:  how much computational power of neural \nnetworks is  lost if only positive weights are employed in weighted linear sums,  and how \nmuch learning capability is lost if only the positive weights are subject to plasticity. \n\n2  Restructuring Neural Circuits with Digital Output \n\nWe investigate in this section the computational power of a k-winner-take-all gate comput-\ning the function \n\n:  ~n -+ {a, l}n \n\nk - WT An \n\nE~ \n\nE {a, I} \n\nk- WTAn \n\n... \n\nwith \nbi  =  1 +-+  Xi  is among the k  largest ofthe inputs Xl, ...  ,Xn . \n[precisely:  bi  =  1 +-+  Xj  > Xi  holds for at most k  - 1 indices j] \n\n\fNeural Computation with Winner-Take-All \n\n295 \n\nTheorem 1.  Any  two-layer  feedf01ward  circuit  C  (with  m  analog  or  binary  input \nvariables  and  one  binary  output  variable)  consisting  of threshold  gates  (=percep(cid:173)\ntrons)  can  be  simulated  by  a  circuit  W  consisting  of a  single  k-winner-take-all  gate \nk-WTA n I  applied to weighted sums of the input variables with positive weights.  This holds \nfor all digital inputs.  and for analog inputs except for some set S  ~ IR.m ~f inputs that has \nmeasure O. \n\nIn particular, any booleanfunction \n\nf : {D , l}m -+  {O, I} \n\ncan  be computed by a single k-winner-take-all gate applied to positive weighted sums of \nthe input bits. \n\nRemarks \n\nI.  If C has polynomial size and integer weights, whose size is bounded by a polyno(cid:173)\n\nmial in m, then the number oflinear gates S in W  can be bounded by a polynomial \nin m, and all weights in the simulating circuit W  are natural numbers whose size \nis bounded by a polynomial in m. \n\n2.  The exception set of measure D in this  result is  a union of finitely  many hyper(cid:173)\nplanes  in  lRm.  One  can  easily  show that this  exception  set  S of measure D in \nTheorem 1  is necessary. \n\n3.  Any circuit that has the structure ofW can be converted back into a 2-layerthresh(cid:173)\n\nold circuit,  with  a number of gates that is  quadratic in  the  number of weighted \nsums (=1inear gates) in W . This relies on the construction in section 4. \n\nProof of Theorem 1:  Since  the  outputs  of the  gates on  the  hidden  layer of C  are  from \n{O, I}, we can assume without loss of generality that the weights a1 , . ..  ,an of the out(cid:173)\nput gate G of C  are from  { - 1, 1}  (see for example [Siu et al.,  1995]  for details; one first \nobserves that it suffices to use integer weights for threshold gates with binary inputs, one \ncan then nonnalize these weights to values in { -1,1} by duplicating gates on the hidden \nlayer of C).  Thus for any circuit input & E IR.m we have C(&)  =  1 \u00a2:}  L:  ajG j  (&)  2:  e, \nwhere G1, ... , Gn  are the threshold gates on the hidden layer of C, a1 , .. . , an are from \n{-I, I}, and e is  the threshold of the output gate G.  In order to  eliminate the negative \nweights in G we replace each gate G j  for which a j  =  -1 by another threshold gate (; j  so \nthat (;j(&)  =  1 - Gj (&)  for all  &  E  IR.m except on some hyperpJane. 2  We  set Gj  := G j \nfor all j  E {I, . . .  ,n} with a j  =  1. Then we have for all & E lRm ,  except for & from some \nexception set S consisting of up to n hyperplanes, \n\nj=1 \n\nn \n\n2: a j Gj(&) = 2: (;j(&)  -I{j E {I , ... , n}: aj = -1}1\u00b7 \n\nn \n\nj=1 \n\nn \n\nj=1 \n\nHence C(&)  =  1 \u00a2:}  L:  Gj (&)  2:  k \n\nn\n\n, \n\n, \n\nj=1 \n\nfor all  Z  E IR.m  - S, for some suitable kE N. \n\nLet w{ , . .. , win  E  lR be the weights and ej  E  IR. be the threshold of gate (; j  ,j =  1, .. .  , n. \n\nI of which we only use its last output bit \n2We exploit here that --, I:7:1 W iZi  ;:::  0  <=?  I:7:1 (-W i )Zi  > -0 for arbitrary Wi ,  Z i, 0  E R  . \n\n\f296 \n\nW.  Maass \n\nc \n\nb \n\nG1 , \u2022\u2022.  ,Gn  are arbitrary threshold gates,  G \nis a threshold gate with weights from {-I, I} \n\nZI \n\nZm \n\nw \n\nb \n\nSI, ... ,Sn+1  are  linear gates  (with  positive \nweights only, which are sums of absolute val(cid:173)\nues of weights from the gates G 1 ,  .  .\u2022  ,G n) \n\n' \"  andback \n\ni:w{>O \n\ni:w{<O \n\ni:wt <0 \n\nl#j i:wf>o \n\nfor j  = 1, ... ,n \n\nand \n\nn \n\nSn+1  := L  L  Iw11zi \n\nj=1 i:w1>o \n\nwe have for every j  E {I, ... ,n} and every \u00a3  E ~m : \n\nSn+l  ~ Sj  \u00a2:}  L  Iw11zi  - L  Iw11zi  >  ej  \u00a2:} Gj (\u00a3)  =  1 . \n\ni:w{>O \n\ni:w{<O \n\nThis implies that the (n + l}st output bn+1  of the  k-winner-take-all gate k-WTAn+1  for \n\n\fNeural Computation with  Winner-Take-All \n\n297 \n\nk := n - k + 1 applied to Sl, ... , Sn+l  satisfies \n\nbn+1  =  1 \n\n\u00a2:>  Ib E {I, ... ,n+ I}: Sj  > Sn+dl  ~ n - k \n\u00a2:>  Ib E {I, ... ,n+ I}: Sn+1  ~ Sj}1  ~ k+ 1 \n\u00a2:>  Ib E {I, ... ,n}: Sn+1  ~ Sj}1  ~ k \n\u00a2:>  L: Gj(~) ~ k \n\u00a2:>  C(~) =  1 . \n\nj=l \n\nA \n\nn \n\nA \n\nNote that all the coefficients in the sums Sl, ... , Sn+1  are positive. \n\n\u2022 \n\n3  Restructuring Neural Circuits with Analog Output \n\nIn order to approximate arbitrary continuous functions with values in  [0,  1]  by circuits that \nhave a similar structure as those in the preceding section, we consider here a variation of a \nwinner-take-all gate that outputs analog numbers between 0 and I, whose values depend on \nthe rank of the corresponding input in the linear order of all the n input numbers.  One may \nargue that such gate is  no longer a \"winner-take-all\" gate, but in agreement with common \nterminology we refer to  it as  a soft winner-take-all gate.  Such gate computes a  function \nfrom m.n  into [0, l]n \n\nXn \n\nElR \n\nsoft winner-take-all \n\n... \n\nE  [0,1] \n\nwhose ith output Ti  E  [0,1]  is roughly proportional to the rank of Xi  among the numbers \nXl, \u2022\u2022.  , X n . More precisely:  for some parameter TEN we set \n\nl{jE{I, ... ,n}:  xi~xj}I-~ \n' \n\nT \n\nTi  = \n\nrounded  to  0  or  1 if this  value  is  outside  [0,1].  Hence  this  gate  focuses  on  those \ninputs  Xi  whose  rank  among  the  n  input  numbers  Xl, \u2022 \u2022.  ,Xn  belongs  to  the  set \n{~, ~ + 1, ... , min{n, T + ~}}. These ranks are linearly scaled into [0, 1].3 \n\nTheorem 2.  Circuits consisting oj a single soft winner-take-all gate (oJ which we only use \nits first output T1)  applied to positive weighted sums oj the  input variables are  universal \napproximatorsJor arbitrary continuousJunctionsJrom lRm  into [0, 1]. \n\u2022 \n\n3It  is  shown  in  [Maass, 2000]  that actually  any continuous  monotone  scaling  into  [0,1]  can be \n\nused instead. \n\n\f298 \n\nW  Maass \n\nA circuit of the type considered in Theorem 2 (with a soft winner-take-all gate applied to \nn  positive weighted sums 51, ... ,5n )  has a very simple geometrical interpretation:  Over \neach point &: of the input \"plane\" Rm  we consider the relative heights of the n hyperplanes \nHI, ... ,Hn defined by the n positive weighted sums 51, .. . ,5n.  The circuit output de(cid:173)\npends only on how many ofthe otherhyperplanesH2 ,  ...  , Hn are above HI at this point\u00a3. \n\n4  A Lower Bound Result for Winner-Take-All \n\nOne can easily see that any k-WTA gate with n inputs can be computed by a 2-layer thresh(cid:173)\nold circuit consisting of (~) + n threshold gates: \n\n? \nl  _ \n\nX \u00b7  > X\u00b7 \nJ \n\nXn \n\nG)  threshold gates \n\nn  threshold gates \n\nI \n\nbl \n\n\": \n\n, \n\n, \n, \n\nbi \n\n, \n\n, \n\nI \n\nb\u00b7 J \n\nI \n\nbn \n\n? \n\nL:~n-k \n\nHence the following result provides an optima/lower bound. \n\nTheorem 3.  Any JeedJmward threshold circuit (=multi-Iayer perceptron)  that computes \nl-WTAJor n inputs needs to have at least  (~) + n gates. \n\u2022 \n\n5  Conclusions \n\nThe lower bound result of Theorem 3 shows that the computational power of winner-take(cid:173)\nall is quite large, even if compared with the arguably most powerful gate commonly studied \nin circuit complexity theory:  the threshold gate (also referred to a McCulloch-Pitts neuron \nor perceptron). \n\n\fNeural Computation with Winner-Take-All \n\n299 \n\nIt  is  well  known  ([Minsky and Papert,  1969])  that a  single  threshold  gate  is  not  able  to \ncompute certain important functions, whereas circuits of moderate (i.e., polynomial) size \nconsisting of two  layers of threshold gates with polynomial size integer weights have re(cid:173)\nmarkable computational power (see [Siu et aI.,  1995]).  We have shown in Theorem  1 that \nany such 2-layer(i.e.,  I hidden layer) circuit can be simulated by a single k-winner-take-all \ngate, applied to polynomially many weighted sums with positive integer weights of poly(cid:173)\nnomial size. \n\nWe have also analyzed the computational power of soft winner-take-all gates in the context \nof analog computation.  It  is  shown in  Theorem  2 that  a single soft winner-take-all  gate \nmay serve as the only nonlinearity in a class of circuits that have universal computational \npower in the sense that they can approximate any continuous functions. \n\nFurthermore our novel universal approximators require only positive linear operations be(cid:173)\nsides soft winner-take-all, thereby showing that in principle no computational power is lost \nif in a biological neural system inhibition is  used exclusively for unspecific lateral inhibi(cid:173)\ntion, and no adaptive flexibility is  lost if synaptic plasticity (i.e., \"learning\") is restricted to \nexcitatory synapses. \n\nOur somewhat surprising  results  regarding the  computational power and  universality  of \nwinner-take-all  point  to  further  opportunities  for  low-power  analog  VLSI  chips,  since \nwinner-take-all can be implemented very efficiently in this technology. \n\nReferences \n\n[Brown,  1991]  Brown, T.  X.  (1991). Neural Network Design for Switching Network Con(cid:173)\n\ntrol .. Ph.-D.-Thesis, CAL TECH. \n\n[Grossberg, 1973]  Grossberg,  S.  (1973).  Contour enhancement, short term  memory,  and \nconstancies in  reverberating neural networks. Studies  in Applied Mathematics,  vol.  52, \n217-257. \n\n[Horiuchi et aI.,  1997]  Horiuchi, T.  K.,  Morris, T.  G.,  Koch,  C.,  DeWeerth,  S.  P.  (1997). \n\nAnalog VLSI circuits for attention-based visual tracking. Advances in  Neural Informa(cid:173)\ntion Processing Systems, vol. 9, 706-712. \n\n[Indiveri,  1999]  Indiveri,  G.  (1999).  Modeling selective  attention using  a  neuromorphic \n\nanalog VLSI device, submitted for publication. \n\n[Lazzaro et aI.,  1989]  Lazzaro, 1., Ryckebusch, S., Mahowald, M. A., Mead, C. A. (1989). \n\nWinner-take-all networks of O( n) complexity. Advances in Neural Information Process(cid:173)\ning Systems, vol.  I, Morgan Kaufmann (San Mateo), 703-711. \n\n[Maass,2000]  Maass, W.  (2000). On the computational power of winner-take-all, Neural \n\nComputation, in press. \n\n[Meador and Hylander, 1994]  Meador,  J.  L.,  and  Hylander,  P.  D.  (1994).  Pulse  coded \nwinner-take-all networks. In:  Silicon Implementation of Pulse Coded Neural Networks, \nZaghloul, M.  E., Meador, 1., and Newcomb, R.  W.,  eds., Kluwer Academic Publishers \n(Boston),79-99. \n\n[Minsky and Papert,  1969]  Minsky,  M.  C.,  Papert,  S.  A.  (1969). Perceptrons,  MIT Press \n\n(Cambridge). \n\n[Siu et aI.,  1995]  Siu, K.-Y.,  Roychowdhury, v., Kailath, T.  (1995). Discrete Neural Com(cid:173)\n\nputation: A  Theoretical Foundation. Prentice Hall (Englewood Cliffs, NJ, USA). \n\n[Urahama and Nagao,  1995]  Urahama, K., and Nagao, T. (1995). k-winner-take-all circuit \n\nwith O(N) complexity. IEEE Trans.  on Neural Networks, vol.6, 776--778. \n\n\f", "award": [], "sourceid": 1636, "authors": [{"given_name": "Wolfgang", "family_name": "Maass", "institution": null}]}