{"title": "Neural Computing with Small Weights", "book": "Advances in Neural Information Processing Systems", "page_first": 944, "page_last": 949, "abstract": null, "full_text": "Neural  Computing with Small Weights \n\nDept.  of Electrical &  Computer Engineering \n\nUniversity  of California, Irvine \n\nKai-Yeung Siu \n\nIrvine,  CA  92717 \n\nJ ehoshua Bruck \n\nIBM  Research  Division \n\nAlmaden Research  Center \nSan Jose,  CA 95120-6099 \n\nAbstract \n\nAn important issue in neural computation is the dynamic range of weights \nin  the  neural  networks.  Many  experimental  results  on  learning  indicate \nthat  the  weights  in  the  networks  can  grow  prohibitively  large  with  the \nsize  of the  inputs.  Here  we  address  this  issue  by  studying  the  tradeoffs \nbetween  the  depth  and  the  size  of weights  in  polynomial-size  networks \nof linear  threshold  elements  (LTEs).  We  show  that  there  is  an  efficient \nway  of simulating  a  network  of LTEs  with  large  weights  by  a  network \nof LTEs  with  small weights.  In  particular,  we  prove  that  every  depth-d, \npolynomial-size network  of LTEs  with  exponentially  large  integer  weights \ncan  be  simulated  by  a  depth-(2d + 1),  polynomial-size network  of LTEs \nwith  polynomially  bounded  integer  weights.  To  prove  these  results,  we \nuse  tools from  harmonic analysis of Boolean  functions.  Our  technique  is \nquite  general,  it  provides  insights  to some other  problems.  For  example, \nwe  are  able to improve the  best  known  results  on  the  depth  of a  network \nof linear  threshold  elements  that  computes the  COM PARI SO N,  SUM \nand  PRO DU CT of two  n-bits  numbers,  and  the  MAX 1M U M  and  the \nSORTING of n  n-bit numbers. \n\n1 \n\nIntroduction \n\nThe  motivation  for  this  work  comes  from  the  area  of neural  networks,  where  a \nlinear threshold element is the basic processing element.  Many experimental results \non  learning  have  indicated that the magnitudes of the  coefficients  in the  threshold \nelements grow  very fast  with the size of the inputs and therefore limit the practical \nuse  of the  network.  One  natural  question  to  ask  is  the  following:  How  limited \n944 \n\n\fNeural Computing with Small  Weights \n\n945 \n\nis  the  computational  power  of the  network  if we  restrict  ourselves  to  threshold \nelements  with  only  \"small\"  growth  in  the  coefficients?  We  answer  this  question  by \nshowing  that we  can  trade-off an exponential  growth  with  a  polynomial growth  in \nthe magnitudes of coefficients by  increasing the depth of the network  by  a factor of \nalmost two and  a  polynomial growth  in  the size. \n\nLinear  Threshold  Functions:  A  linear  threshold  function  f(X)  is  a  Boolean \nfunction  such  that \n\nwhere \n\nf(X) = sgn(F(X\u00bb  = {_II  if F(X) > 0 \nif F(X) < 0 \n\nn \n\nF(X) = 2:= Wi  .  Xi + Wo \n\ni=l \n\n---\n\n---\n\n-\n\nThroughout  this  paper,  a  Boolean  function  will  be  defined  as  f  :  {I, _I}n  --+ \n{I, -I}; namely,  0  and  1 are  represented  by  1 and  -1, respectively.  Without  loss \nof generality,  we  can  assume  F(X):/;  0 for  all  X  E {I,-I}n.  The  coefficients  Wi \nare  commonly referred  to  as  the  weights of the  threshold  function.  We  denote  the \nclass of all linear  threshold  functions  by  LT1 \u2022 \n\n---LT1  functions:  In  this  paper,  we  shall study  a  subclass  of LT1  which  we  denote \nby  IT1 .  Each  function  f(X)  =  sgn(L:~=l Wi' Xi + wo)  in  IT1  is  characterized  by \nthe  property  that  the weights  Wi  are  integers  and  bounded  by  a  polynomial in  n, \ni.e.  IWil  ~ n C  for  some  constant  c > O. \nThreshold  Circuits:  A  threshold  circuit  [5,  10]  is  a  Boolean  network  in  which \nevery  gate computes an LT1  function.  The size of a threshold  circuit is  the number \nof LT1  elements  in  the  circuit.  Let  LTk  denote  the  class  of threshold  circuits  of \ndepth  k  with  the  size  bounded  by  a  polynomial in  the number of inputs.  We  define \nLTk  similarly except  that we  allow each  gate in  LTk  to compute an  LTI  function. \nAlthough the  definition of (LTd linear threshold function  allows  the  weights  to be \nreal numbers, it is known [12] that we can replace each of the real weights by integers \nof O( n log n)  bits, where  n is the number of input Boolean  variables.  So  in  the rest \nof the paper, we shall assume without loss of generality that all weights are integers. \nHowever,  this still  allows  the  magnitudes of the  weights  to  increase  exponentially \nfast  with  the  size  of the  inputs.  It  is  natural  to  ask  if this  is  necessary.  In  other \nwords,  is  there  a  linear  threshold  function  that  must  require  exponentially  large \nweights?  Since there are 2n(n~) linear threshold functions in  n  variables [8,  14,  15], \nthere  exists  at  least  one  which  requires  O(n 2 )  bits to specify  the  weights.  By  the \npigeonhole  principle,  at  least  one  weight  of such  a  function  must  need  O(n)  bits, \nand thus is  exponentially large in  magnitude.  i.e. \n\n-LTI  ~ LT1 \n\nThe above result was proved in [9]  using a different method by explicitly constructing \nan  LT1  function  and  proving  that  it  is  not  in  LT1 .  In  the  following  section,  we \nshall  show  that  the  COMPARISON  function  (to  be  defined  later)  also  requires \nexponentially  large  weights.  We  will  refer  to  this function  later on  in  the  proof of \n\n-\n\n\f946 \n\nSiu and Bruck \n\nour main results.  Main Results:  The fact  that we  can simulate a linear threshold \nfunction with exponentially large weights in a 'constant' number oflayers of elements \nwith  'small' weights follows  from  the  results  in  [3]  and  [11].  Their  results  showed \nthat  the  sum  of n  n-bit  numbers  is  computable in  a  constant  number  of layers  of \n'counting' gates,  which  in  turn can  be simulated by  a  constant number of layers of \nthreshold  elements with  'small' weights.  However,  it was  not  explicitly stated how \nmany layers are needed  in  each step of their construction  and  direct  application of \ntheir  results  would  yield  a  constant  such  as  13.  In  this  paper,  we  shall  reduce  the \nconstant  to  3  by  giving  a  more  'depth'-efficient  algorithm and  by  using  harmonic \nanalysis of Boolean functions [1,2,6].  We then generalize this result to higher depth \ncircuits  and show  how  to simulate a  threshold  circuit of depth-d and exponentially \nlarge  weights  in  a  depth-(2d + 1)  threshold  circuit  of 'small' weights,  i.e.  LTd  ~ \nfr2d+l. \nAs  another  application  of  harmonic  analysis,  we  also  show \nthe \nCOM P ARISON and  ADDITION of two n-bit numbers is  computable with only \ntwo  layers  of elements  with  'small'  weights,  while  it  was  only  known  to  be  com(cid:173)\nputable in  3 layers  [5].  We  also  indicate how  our  'depth'-efficient  algorithm can  be \napplied  to  show  that the  product of two  n-bit numbers  can  be  computed  in  LT4 . \nIn  addition,  we  show  that  the  MAXIMUM  and  SORTING ofn  n-bit  numbers \ncan  be  computed  in  fr3 and  LT4 ,  respectively. \n\n--\n\nthat \n\n2  Main  Results \n\nDefinition:  Let  X  =  (Xl, ... , Xn),  Y  =  (YI, ... , Yn)  E  {I, _l}n.  We  consider  X \nand Y  as two n-bit numbers representing E?=l Xi\u00b7 2'  and E?=l Yi  . 2i ,  respectively. \nThe COMPARISON function  is  defined  as \n\nC(X, Y) = 1  iff  X  ~ Y \n\nIn  other words, \n\nLemma 1 \n\nC(X, Y) =  sgn{L:: 2i(Xi  - yd + I} \n\nn \n\ni=l \n\n-\nCOMPARISON  tt  LTI \n\nOn  the other hand, using  harmonic analysis  [2],  we  can  show  the following: \n\nLemma 2 \n\nCOMPARISON  E m \n\nSpectral  representation of Boolean  functions:  Recently,  harmonic  analysis \nhas  been  found  to  be  a  powerful  tool  in  studying  the  computational  complexity of \nBoolean functions  [1,  2,  7].  The idea is  that every  Boolean function f  : {I, _1}n  -+ \n{I, -I} can  be  represented  as  a  polynomial over  the  field  of rational  numbers  as \nfollows: \n\nf(X) =  L  aa xa \n\naE{O,l}n \n\n\fNeural Computing with Small Weights \n\n947 \n\nh \nwere \n\nX a \n\nal  al \n= x 1  x 2 \n\nan \n.\u2022. Xn \n\n\u2022 \n\nSuch  representation  is  unique  and  the  coefficients  of  the  polynomial,  {aal Q  E \n{a, l}n},  are  called  the  spectral coefficients of f. \nWe  shall define  the Ll  spectral norm of f  to be \n\nIIfll =  ~ laal\u00b7 \n\nae{O,I}n \n\nThe proof of Lemma 2 is  based  on  the spectral  techniques  developed  in  [2].  Using \nprobabilistic arguments, it was proved in  [2]  that if a  Boolean function  has Ll spec-\ntral norm which  is  polynomially bounded,  then  the function  is  computable in  LT2 \u2022 \nWe observe  (together with Noga Alon)  that the techniques in [2]  can  be generalized \nto show  that  any  Boolean function  with  polynomially  bounded  Ll  spectral  norm \ncan even  be closely  approximated by  a sparse  polynomial.  This observation is crucial \nwhen  we  extend our result from a  single element to networks of elements with large \nweights. \n\n. -\n\nLemma 3  Let  f(X)  : {I, _l}n  --+  {I, -I} such  that  IIfll  ~ n C  for  some  c.  Then \nfor  any  k  > 0,  there  exists  a sparse  polynomial \n\nF(X) = N  2:'.::  wa Xa  such  that \n\n1 \n\naes \n\nwhere Wa  and  N  are  integers,  S  c {O,  l}n,  the  size  of S,  Wa  and  N  are  all bounded \nby  a polynomial in  n.  Hence,  f(X)  E LT2 \u2022 \n\nIF(X) -\n\nf(X)1  ~ n- k , \n\nAs  a  consequence  of  this  result,  Lemma  2  follows  since  it  can  be  shown  that \nCOM PARISON has  a  polynQmially bounded  Ll spectral  norm. \n\nNow  we  are  ready  to state our main results.  Although  most linear  threshold func(cid:173)\ntions require exponentially large weights,  we  can  always simulate them  by  3 layers \nof in elements. \nTheorem 1 \n\n-\n\nLTI  ~ LT3 \n\nThe  result  stated  in  Theorem  1 implies  that  a  depth-d  threshold  circuit  with  ex(cid:173)\nponentially  large  weights  can  be  simulated  by  a  depth-3d  threshold  circuit  with \npolynomially large weights.  Using  the result of Lemma 3,  we  can  actually obtain a \nmore depth-efficient  simulation. \n\nTheorem 2 \n\nAs  another  consequence  of Lemma 3,  we  have the following  : \n\n\f948 \n\nSiu and Bruck \n\nCorollary 1  Let /1 (X), ... , fm(X)  be  functions  with polynomially  bounded Ll  spec-\ntral  norms,  and  g(/1 (X), ... , fm(X\u00bb  be  an  fi\\  function  with  fi(X) 's  as  inputs, \nI.e. \n\ng(/1(X), ... , fm(X\u00bb  = sgn(2: Wdi(X) + wo) \n\nm \n\nThen  9  can  be  expressed  as  a  sign  of a  sparse  polynomial  in  X  with  polynomially \nmany  number  of monomial  terms  xcr 's  and  polynomially  bounded  integer  coeffi-\ncients.  Hence  9  E LT2. \n\n---\n\ni=l \n\nIf all  LTI  functions  have  polynomially  bounded  Ll  spectral  norms,  then  it would \nfollow  that  LTI  C iT 2 \u2022  However,  even  the simple MAJORITY function  does  not \nhave  a  polynomially  bounded  Ll  spectral  norm.  We  shall  prove  this  fact  via the \nfollowing theorem.  (As in Lemma 3,  by  a sparse polynomial we  mean a  polynomial \nwith only polynomially many monomial terms xcr's). \nTheorem 3  The iTl  function  MAJORITY: \n\nn \n\nsgn(2: X i) \n\ni=l \n\ncannot  be  approximated  by  a  sparse  polynomial  with  an  error o( n -1). \n\nOther applications of the harmonic analysis techniques and the results of Lemma 3 \nyields  the following  theorems: \n\nTheorem 4 \n\nLet  x, y  be  two n-bit  numbers.  Then \n\nADDITION(x, y)  E m \n\nTheorem 5  The  product  of two n-bit  integers  can  be  computed  in  LT4 \u2022 \n\n---\nTheorem 6  The  MAX I MU M  of n  n-bit  numbers  can  be  computed  in  LT3. \nTheorem 7  The  SORTING  ofn n-bit  numbers  can  be  computed  in IT4 . \n\n---\n\n3  Concluding Remarks \n\nOur  main result  indicates  that  for  networks  of linear  threshold  elements,  we  can \ntrade-off arbitrary  real  weights  with polynomially bounded  integer  weights,  at the \nexpense of a polynomial increase in the size and a factor of almost two in the depth of \nthe network.  The proofs of the results in this paper can be found  in  [13].  We  would \nlike to mention that our results have recently  been  improved by  Goldmann, Hastad \nand Razborov  [4].  They showed  that any polynomial-size depth-d network oflinear \nthreshold  elements  with  arbitrary  weights  can  be  simulated  by  a  polynomial-size \ndepth-( d + 1) network  with  \"small\"  (polynomially bounded integer) weights.  While \nour construction can be  made explicit, only the existence of the simulation result is \nproved  in  [4];  it is  left  as  an open  problem in  [4]  if there  is  an explicit  construction \nof their  results. \n\n\fNeural Computing with Small  Weights \n\n949 \n\nAcknowledgements \n\nThis work  was done  while  Kai-Yeung Siu  was a  research  student  associate  at IBM \nAlmaden Research Center and was supported in part by  the Joint Services Program \nat Stanford University (US Army, US  Navy, US Air Force) under Contract DAAL03-\n88-C-0011,  and  the  Department  of the  Navy  (NAVELEX),  NASA  Headquarters, \nCenter  for  Aeronautics  and  Space  Information Sciences  under  Grant  NAGW-419-\nS6. \n\nReferences \n\n[1]  J.  Bruck.  Harmonic Analysis of Polynomial Threshold  Functions.  SIAM Jour(cid:173)\n\nnal on  Discrete  Mathematics,  May  1990. \n\n[2]  J.  Bruck  and  R.  Smolensky.  Polynomial Threshold  Functions,  ACo  Functions \nand  Spectral  Norms.  Technical  Report  RJ  7140,  IBM  Research,  November \n1989.  Appeared  in  IEEE Symp. on  Found. of Compo  Sci.  October,  1990. \n\n[3]  A.  K.  Chandra,  L.  Stockmeyer,  and  U.  Vishkin.  Constant depth  reducibility. \n\nSiam  J.  Comput., 13:423-439, 1984. \n\n[4]  M.  Goldmann,  J.  Hastad,  and  A.  Razborov  Majority  Gates  vS.  General \n\nWeighted  Threshold  Gates.  Unpublished  Manuscript. \n\n[5]  A.  HajnaI, W . Maass,  P.  PudIak, M.  Szegedy,  and G. Turan. Threshold circuits \n\nof bounded  depth .  IEEE Symp.  Found.  Compo  Sci.,  28:99-110, 1987. \n\n[6]  R.  J. Lechner.  Harmonic analysis of switching functions.  In A.  Mukhopadhyay, \n\neditor,  Recent  Development  in  Switching  Theory.  Academic Press,  1971. \n\n[7]  N.  LiniaI, Y.  Mansour, and  N.  Nisan.  Constant Depth Circuits,  Fourier Trans(cid:173)\n\nforms,  and  Learnability.  Proc.  30th  IEEE Symp.  Found.  Compo  Sci.,  1989. \n\n[8]  S.  Muroga and 1.  Toda.  Lower  Bound of the  Number of Threshold  Functions. \n\nIEEE  Trans.  on  Electronic  Computers,  EC  15,  1966. \n\n[9]  J.  Myhill and  W.  H.  Kautz.  On the Size of Weights Required for  Linear-Input \n\nSwitching  Functions.  IRE  Trans.  on  Electronic  Computers,  EC  10,  1961. \n\n[10]  I. Parberry and G.  Schnitger.  Parallel Computation with Threshold Functions \n\n.  Journal  of Computer and  System  Sciences,  36(3):278-302, 1988. \n\n[11]  N.  Pippenger.  The  complexity  of computations  by  networks.  IBM  J.  Res. \n\nDevelop. , 31(2),  March  1987. \n\n[12]  P.  Raghavan.  Learning  in  Threshold  Networks:  A  Computation  Model  and \n\nApplications.  Technical  Report  RC  13859, IBM  Research,  July 1988. \n\n[13]  K.-Y.  Siu  and  J.  Bruck.  On  the  Power  of Threshold  Circuits  with  Small \n\nWeights.  SIAM J.  Discrete  Math.,  4(3):423-435, August  1991. \n\n[14]  D.  R.  Smith. Bounds on  the Number of Threshold  Functions.  IEEE  Trans.  on \n\nElectronic  Computers,  EC  15,  1966. \n\n[15]  S.  Yajima and T. Ibaraki.  A  Lower  Bound on the  Number of Threshold  Func(cid:173)\n\ntions.  IEEE  Trans.  on  Electronic  Computers,  EC  14,  1965. \n\n\f", "award": [], "sourceid": 529, "authors": [{"given_name": "Kai-Yeung", "family_name": "Siu", "institution": null}, {"given_name": "Jehoshua", "family_name": "Bruck", "institution": null}]}