{"title": "Models of Ocular Dominance Column Formation: Analytical and Computational Results", "book": "Advances in Neural Information Processing Systems", "page_first": 375, "page_last": 383, "abstract": null, "full_text": "LINEAR LEARNING: LANDSCAPES  AND  ALGORITHMS \n\n65 \n\nPierre Baldi \n\nJet  Propulsion Laboratory \n\nCalifornia Institute of Technology \n\nPasadena, CA  91109 \n\nWhat follows  extends some of our results of [1]  on learning from  ex(cid:173)\namples  in  layered feed-forward  networks  of linear units.  In  particu(cid:173)\nlar we  examine what  happens when  the ntunber of layers is  large or \nwhen  the  connectivity  between  layers  is  local  and  investigate  some \nof  the  properties  of an  autoassociative  algorithm.  Notation  will  be \nas  in  [1]  where  additional  motivations  and  references  can  be  found. \nIt  is  usual  to  criticize  linear  networks  because  \"linear  functions  do \nnot  compute\"  and  because  several  layers  can  always  be  reduced  to \none by the proper multiplication of matrices.  However this is  not the \npoint of view  adopted here.  It is assumed that the architecture of the \nnetwork  is  given  (and could perhaps depend on external constraints) \nand the  purpose is  to  understand  what  happens  during  the  learning \nphase,  what  strategies  are  adopted by a  synaptic weights  modifying \nalgorithm, ... [see  also  Cottrell et  al.  (1988)  for  an  example of an  ap(cid:173)\nplication and the work of Linsker  (1988)  on the emergence of feature \ndetecting units in linear networks}. \n\nConsider first  a  two layer network with n  input units, n  output units \nand  p  hidden  units  (p  <  n).  Let  (Xl, YI), ... , (XT, YT)  be  the  set  of \ncentered input-output training patterns.  The problem is  then to find \ntwo matrices of weights  A  and B  minimizing the error function  E: \n\nE(A, B) =  L  IIYt - ABXtIl2. \n\nl<t<T \n\n(1) \n\n\f66 \n\nBaldi \n\nLet ~x x, ~XY, ~yy, ~y x  denote the usual covariance matrices.  The \nmain result  of [1]  is  a  description of the landscape of E, characerised \nby  a  multiplicity  of saddle  points  and  an  absence  of  local  minima. \nMore precisely,  the following four facts  are true. \n\nFact  1:  For any fixed  n  x p  matrix A  the function E(A, B) is  convex \nin the coefficients  of B  and attains its minimum for  any  B  satisfying \nthe equation \n\nA'AB~xx =  A/~yX. \n\n(2) \n\nIf in  addition  ~x X  is  invertible  and  A  is  of full  rank  p,  then  E  is \nstrictly convex and has a  unique minimum reached when \n\n(3) \n\nFact  2:  For any fixed p x n matrix B  the function E(A, B) is  convex \nin the coefficients  of A  and attains its minimum for  any  A  satisfying \nthe equation \n\nAB~xxB' =  ~YXB'. \n\n(4) \n\nIf in  addition  ~xx is  invertible  and  B  is  of full  rank  p,  then  E  is \nstrictly convex and has a  unique minimum reached when \n\n(5) \n\nFact  3:  Assume  that  ~x X  is  invertible.  If two  matrices  A  and  B \ndefine a  critical point of E  (i.e.  a point where 8E / 8aij =  8E /8bij  = \n0)  then the global map W  =  AB is  of the form \n\nwhere  P A  denotes  the  matrix of the  orthogonal  projection  onto  the \nsubspace spanned by the columns of A  and A  satisfies \n\n(6) \n\n(7) \n\n\fLinear Learning: Landscapes and Algorithms \n\n67 \n\nwith  ~ =  ~y X ~x~~XY. If A  is  of full  rank p,  then A  and B  define \na  critical point of E  if and  only  if A  satisties  (7)  and B  =  R(A),  or \nequivalently if and only if A  and W  satisfy  (6)  and (7).  Notice  that \nin (6),  the matrix ~y X  ~x~ is  the slope matrix for  the ordinary least \nsquare regression of Y  on X. \n\nFact  4:  Assume  that  ~ is  full  rank with n  distinct eigenvalues  At  > \nIf I  =  {i t , ... ,ip}(l  <  it  <  ...  <  ip  <  n)  is  any  or(cid:173)\n...  >  An. \ndered  p-index  set,  let  Uz  =  [Ui t ,  \u2022\u2022\u2022 ,  Uip ]  denote  the  matrix  formed \nby  the orthononnal eigenvectors of ~ associated  with the eigenvalues \nAil' ... , Aip \u2022  Then  two  full  rank  matrices  A  and  B  define  a  critical \npoint of E  if and only  if there exist  an ordered p-index set I  and an \ninvertible p  x  p  matrix C  such that \n\nA=UzC \n\nFor such  a critical point we  have \n\nE(A,B) =  tr(~yy) - L Ai. \n\niEZ \n\n(8) \n\n(9) \n\n(10) \n\n(11 ) \n\nTherefore a  critical point of W  of rank p is  always  the product of the \nordinary  least  squares  regression  matrix  followed  by  an  orthogonal \nprojection onto the subspace spanned by p eigenvectors of~. The map \nW  associated  with  the  index  set  {I, 2, ... ,p}  is  the  unique  local  and \nglobal minimum of E.  The remaining  (;) -1 p-index sets correspond \nto  saddle  points.  All  additional  critical  points  defined  by  matrices \nA  and  B  which  are  not  of full  rank  are  also  saddle  points  and  can \nbe  characerized  in  terms  of  orthogonal  projections  onto  subspaces \nspanned by  q  eigenvectors with  q < p. \n\n\f68 \n\nBaldi \n\nDeep Networks \n\nConsider now the case of a  deep network with a  first  layer of n  input \nunits, an (m + 1 )-th layer of n  output units and m - 1 hidden layers \nwith an error function given by \n\nE(AI, ... ,An)=  L  IIYt-AIAl ... AmXtll2. \n\nl<t<T \n\n(12) \n\nIt is worth noticing that, as in fact  1 and 2 above,  if we fix  any m-1 \nof the m  matrices AI, ... , Am  then E  is  convex in the remaining matrix \nof connection  weights.  Let  p  (p  < n)  denote  the ntunber of units in \nthe  smallest  layer  of the  network  (several  hidden layers  may  have  p \nunits).  In other words  the network  has  a  bottleneck of size p.  Let  i \nbe the index of the corresponding layer and set \n\nA  =  A I A2 ... A m-i+1 \nB  =  Am-i+2 ... Am \n\n(13) \n\nWhen we let  AI, ... , Am  vary,  the  only  restriction  they  impose  on  A \nand  B  is  that  they  be  of rank  at  most  p.  Conversely,  any  two  ma(cid:173)\ntrices  A  and  B  of rank  at  most  p  can  always  be  decomposed  (and \nin  many  ways)  in  products  of the  form  of (13).  It  results  that  any \nlocal  minima of the error function of the deep network should yield a \nlocal  minima for  the corresponding  \"collapsed\" .three layers network \ninduced  by  (13)  and  vice  versa.  Therefore  E(AI , ... , Am)  does  not \nhave any local minima and the global minimal map W\u00b7 =  AIA2 ... Am \nis  unique  and  given  by  (10)  with  index  set  {I, 2, ... , p}.  Notice  that \nof course  there  is  a  large  number  of ways  of decomposing  W\u00b7  into \na  product  of the  form  A I A 2 ... A m .  Also  a  saddle  point  of the  error \nfunction  E(A, B) does  not necessarily generate a  saddle point of the \ncorresponding E (A I  , ... , Am) for  the expressions corresponding to the \ntwo gradients  are very different. \n\n\fLinear Learning: Landscapes and Algorithms \n\n69 \n\nForced  Connections.  Local Connectivity \n\nAssume now  an error function of the form \n\nE(A) =  L  IIYt  - AXt[[2 \n\nl~t~T \n\n(14) \n\nfor  a  two  layers  network  but  where  the  value of some of the entries \nof  A  may  be  externally  prescribed.  In  particular  this  includes  the \ncase  of local  connectivity  described  by  relations  of the form  aij  =  0 \nfor  any output unit i  and any  input unit j  which  are not  connected. \nClearly  the error function  E(A)  is  convex  in  A.  Every  constraint  of \nthe form aij =cst defines an hyperplane in the space of all possible A. \nThe intersection of all these constraints is therefore a convex set.  Thus \nminimizing E under the given constraints is still a convex optimization \nproblem and so  there are no local minima.  It should be noticed that, \nin the case of a  network with even only three constrained layers  with \ntwo  matrices  A  and  B  and a  set of constraints  of the form  aij  =cst \non  A  and  bk1  =cst for  B,  the  set  of admissible matrices  of the  form \nW  =  AB is,  in general,  not  convex  anymore.  It is  not  unreasonable \nto conjecture that local minima may then arise,  though this question \nneeds to be investigated in greater detail. \n\nAlgorithmic Aspects \n\nOne  of the  nice  features  of the  error landscapes  described  so  far  is \nthe  absence  of  local  minima  and  the  existence,  up  to  equivalence, \nof  a  unique  global  minimum  which  can  be  understood  in  terms  of \nprincipal  component  analysis  and  least  square  regression.  However \nthe  landscapes  are  also  characterized  by  a  large  number  of  saddle \npoints which could constitute a problem for  a simple gradient descent \nalgorithm  during  the  learning  phase.  The  proof  in  [1]  shows  that \nthe  lower  is  the  E  value  corresponding  to  a  saddle  point,  the  more \ndifficult  it is  to escape from  it because of a  reduction in the possible \nnumber of directions of escape (see also [Chauvin, 1989] in a context of \nHebbian learning).  To assert how relevant these issues are for practical \nimplementations requires further simulation experiments.  On a  more \n\n\f70 \n\nBaldi \n\nspeculative  side,  it  remains  also  to  be  seen  whether,  in  a  problem \nof large  size,  the  number  and  spacing  of saddle  points  encountered \nduring the first stages of a  descent  process  could not be used to  \"get \na  feeling\"  for  the  type  of terrain  being  descented  and as  a  result  to \nadjust the pace (i.  e.  the learning rate). \n\nWe  now  turn to a  simple algorithm for  the auto-associative case in a \nthree layers  network,  i.  e.  the  case  where  the presence of a  teacher \ncan  be  avoided  by  setting  Yt  =  Xt  and  thereby  trying  to  achieve  a \ncompression  of  the  input  data  in  the  hidden  layer.  This  technique \nIf \nis  related  to  principal  component  analysis,  as  described  in  [1]. \nYt  =  Xt,  it is  easy  to see from  equations  (8)  and  (9)  that,  if we  take \nthe  matrix  C  to  be  the  identity,  then  at  the  optimum  the  matrices \nA  and  B  are  transpose  of each  other.  This  heuristically  suggests  a \npossible fast  algorithm for  auto-association, where at each iteration a \ngradient descent step is applied only to one of the connection matrices \nwhile the other is updated in a symmetric fashion using transposition \nand  avoiding  to  back-propagate  the  error  in  one  of  the  layers  (see \n[Williams,  1985]  for  a  similar  idea).  More  formally,  the  algorithm \ncould be concisely described by \n\nA(O)  =  random \nB(O)  =  A'(O) \n\n8E \nA(k+l)=A(k)-11 8A \nB(k+l)=A'(k+l) \n\n(15) \n\nObviously a  similar algorithm can be obtained by setting B(k + 1)  = \nB(k) -118E/8B  and A(k + 1) =  B'(k + 1).  It may  actually even be \nbet ter to alternate the gradient  step, one iteration with respect  to A \nand one iteration with respect  to B. \nA simple  calculation shows  that  (15)  can be rewritten as \n\nA(k + 1) =  A(k) + 11(1 - W(k))~xxA(k) \nB(k + 1) =  B(k) + 11B(k)~xx(I - W(k)) \n\n(16) \n\n\fLinear Learning: Landscapes and Algorithms \n\n71 \n\nwhere  W(k)  =  A(k)B(k).  It is  natural from  what  we  have  already \nseen to examine the behavior of this algorithm on the eigenvectors of \n~xx. Assume that  u  is  an eigenvector of both  ~xx and W(k)  with \neigenvalues ,\\  and /-l( k).  Then it is easy to see that u  is  an eigenvector \nof W(k + 1)  with eigenvalue \n\n(17) \n\nFor the algorithm to converge to the optimal W,  /-l( k + 1)  must  con(cid:173)\nverge  to  0  or  1.  Thus  one  has  to  look  at  the  iterates  of  the  func(cid:173)\ntion  f( x)  =  x[l  + 7],\\(1  - x)]2.  This  can  be  done  in  detail  and \nwe  shall  only  describe  the  main  points.  First  of all,  f' (x)  =  0  iff \nx  =  0  or  x  =  Xa  =  1 + (1/7],\\)  or  x  =  Xb  =  1/3 + (1/37],\\)  and \nf\"(x)  =  0  iff  x  =  Xc  =  2/3 + (2/37],\\)  =  2Xb.  For  the fixed  points, \nf(x)  =  x  iff  x  =  0,  x  =  1  or  x  =  Xd  =  1 + (2/7],\\).  Notice  also \nthat  f(xa)  =  a and  f(1 + (1/7],\\))  =  (1  + (1/7],\\)(1  - 1?  Points  cor(cid:173)\nresponding  to  the  values  0,1, X a ,  Xd  of the  x  variable  can readily  be \npositioned on  the curve of f  but  the relative position of  Xb  (and  xc) \ndepends on the value assumed by  7],\\  with  respect  to  1/2.  Obviously \nif J1(0)  =  0,1  or  Xd  then  J1( k)  =  0,1  or  Xd,  if J1(0)  < 0  /-l( k)  ~ -00 \nand if /-l( k)  >  Xd  J1( k)  ~ +00.  Therefore the algorithm can converge \nonly  for  a <  /-leO)  <  Xd.  When  the  learning rate  is  too  large,  i.  e. \nwhen 7],\\  >  1/2 then even if /-leO)  is  in the interval (0, Xd)  one can see \nthat  the algorithm does  not  converge and may even exhibit  complex \noscillatory behavior.  However when 7],\\  <  1/2, if 0 <  J1(0)  <  Xa  then \nJ1( k)  ~ 1,  if /-leO)  = Xa  then  /-l( k)  = a and if  Xa  <  J1(0)  <  Xd  then \n/-l(k)  ~ 1. \n\nIn  conclusion,  we  see  that  if the  algorithm  is  to  be  tested,  the \nlearning rate should be chosen so that it does not exceed  1/2,\\, where \n,\\  is  the largest eigenvalue of ~xx. Even more so than back propaga(cid:173)\ntion,  it  can  encounter  problems  in  the  proximity  of  saddle  points. \nOnce  a  non-principal  eigenvector  of  ~xx is  learnt,  the  algorithm \nrapidly  incorporates  a  projection  along  that  direction  which  cannot \nbe  escaped  at  later stages.  Simulations  are required  to  examine  the \neffects  of  \"noisy  gradients\"  (computed after  the presentation of only \na  few  training examples),  multiple  starting points,  variable  learning \nrates, momentum terms, and so forth. \n\n\f72 \n\nBaldi \n\nAknowledgement \n\nWork  supported  by  NSF  grant  DMS-8800323  and  in  part  by  ONR \ncontract 411P006-01. \n\nReferences \n\n(1)  Baldi,  P.  and  Hornik,  K.  (1988)  Neural  Networks  and  Principal \nComponent  Analysis:  Learning from  Examples  without  Local  Min(cid:173)\nima.  Neural Networks,  Vol.  2, No.  1. \n(2)  Chauvin, Y.  (1989)  Another Neural Model as a Principal Compo(cid:173)\nnent  Analyzer.  Submitted for  publication. \n(3)  Cottrell, G.  W., Munro, P.  W.  and Zipser, D.  (1988)  Image Com(cid:173)\npression  by  Back  Propagation:  a  Demonstration of Extensional Pro(cid:173)\ngramming.  In:  Advances in Cognitive Science, Vol.  2,  Sharkey,  N.  E. \ned.,  Norwood,  NJ  Abbex. \n(4)  Linsker,  R.  (1988)  Self-Organization  in  a  Perceptual  Network. \nComputer 21  (3),  105-117. \n( 5) Willi ams, R. J. (1985) Feature Discovery Through Error-Correction \nLearning.  ICS  Report 8501,  University of California.,  San Diego. \n\n\f375 \n\nMODELS  OF  OCULAR DOMINANCE \n\nCOLUMN FORMATION: ANALYTICAL AND \n\nCOMPUTATIONAL RESULTS \n\nKenneth D.  Miller \n\nUCSF Dept. of Physiology \n\nJoseph B.  Keller \nSF,  CA 94143-0444 \nMathematics Dept.,  Stanford  ken@phyb.ucsf.edu \n\nMichael P.  Stryker \nPhysiology Dept.,  UCSF \n\nABSTRACT \n\nWe  have  previously  developed  a  simple  mathemati(cid:173)\n\ncal model  for  formation  of ocular  dominance  columns  in \nmammalian  visual  cortex.  The  model  provides  a  com(cid:173)\nmon framework  in  which a  variety  of activity-dependent \nbiological machanisms can be studied.  Analytic and com(cid:173)\nputational  results  together  now  reveal  the  following: \nif \ninputs  specific  to  each eye  are  locally  correlated  in  their \nfiring,  and are not  anticorrelated within an arbor radius, \nmonocular  cells  will  robustly  form  and  be  organized  by \nintra-cortical  interactions  into  columns.  Broader  corre(cid:173)\nlations  withln  each  eye,  or anti-correlations  between the \neyes, create a  more purely monocular cortex; positive cor(cid:173)\nrelation  over  an  arbor  radius  yields  an  almost  perfectly \nmonocular cortex.  Most features of the model can be un(cid:173)\nderstood  analytically  through  decomposition  into  eigen(cid:173)\nfunctions and linear stability analysis.  This allows predic(cid:173)\ntion of the widths of the columns and other features from \nmeasurable biological parameters. \n\nINTRODUCTION \n\nIn the developing visual system in many mammalian species,  there is initially a uni(cid:173)\nform,  overlapping innervation of layer 4 of the visual cortex by inputs representing \nthe  two eyes.  Subsequently,  these inputs segregate into patches or stripes that  are \nlargely  or  exclusively  innervated  by  inputs  serving  a  single  eye,  known  as  ocular \ndominance  patches.  The ocular dominance patches are on  a  small scale compared \nto  the map of the visual world,  so  that the initially continuous map  becomes  two \ninterdigitated  maps,  one  representing  each  eye.  These  patches,  together with  the \nlayers  of cortex  above  and  below  layer  4,  whose  responses  are  dominated  by  the \neye  innervating  the  corresponding  layer  4  patch,  are  known  as  ocular  dominance \ncolumns. \n\n\f376 \n\nMiller, Keller and Stryker \n\nThe discoveries of this system of ocular dominance and many of the basic features \nof its development were made by Hubel and Wiesel in a  series of pioneering studies \nin  the  1960s  and  1970s  (e.g.  Wiesel  and  Hubel  (1965),  Hubel,  Wiesel  and  LeVay \n(1977)).  A  recent brief review is  in  Miller and Stryker (1989). \n\nThe segregation of patches depends on local correlations of neural activity that are \nvery much greater within  neighboring cells in each eye  than between the two eyes. \nForcing  the  eyes  to fire  synchronously prevents segregation,  while  forcing  them to \nfire  more  asynchronously  than normally  causes  a  more  complete segregation  than \nnormal.  The segregation also  depends on the activity of cortical cells.  Normally,  if \none eye is  closed in a  young kitten during a  critical period for  developmental plas(cid:173)\nticity (\"monocular deprivation\"), the more active, open eye comes to dominantly or \nexclusively drive most cortical cells,  and the inputs and influence of the closed  eye \nbecome largely confined to small islands of cortex.  However, when cortical cells are \ninhibited from firing,  the opposite is the case:  the less active eye becomes dominant, \nsuggesting that it is  the  correlation between pre- and post-synaptic activation that \nis critical to synaptic strengthening. \n\nWe  have  developed  and  analyzed  a  simple  mathematical  model  for  formation  of \nocular  dominance  patches  in  mammalian  visual  cortex,  which  we  briefly  review \nhere (Miller,  Keller,  and Stryker, 1986).  The model provides a  common framework \nin which a variety of activity-dependent biological models,  including Hebb synapses \nand  activity-dependent release and uptake of trophic factors,  can  be studied.  The \nequations are similar to those developed by Linsker (1986)  to study the development \nof orientation selectivity in visual cortex.  We  have now extended our analysis  and \nalso undertaken extensive simulations to achieve a  more complete understanding of \nthe  model.  Many  results  have  appeared,  or  will  appear,  in  more  detail elsewhere \n(Miller,  Keller and Stryker, 1989;  Miller and Stryker, 1989;  Miller,  1989). \n\nEQUATIONS \n\nConsider inputs carrying information from two eyes and co-innervating a single cor(cid:173)\ntical sheet.  Let  SL(x, 5, t)  and  SR(x, 5, t),  respectively,  be  the  left  eye  and  right \neye synaptic weight from eye-coordinate 5  to cortical coordinate x  at time t.  Con(cid:173)\nsideration of simple activity-dependent models  of synaptic plasticity, such as Hebb \nsynapses or activity-dependent release and uptake of trophic or modification factors, \nleads to equations for  the time development of SL and SR: \n\n8t S J (x,5,t) =  AA(x-5)  L  I(x-y)OJK(5-P)SK(y, p,t)_-ySK(x, 5,t)-e.  (1) \n\nf/,{l,K \n\nJ, K  are variables which each may take on the values L, R.  A(x-5) is a connectivity \nfunction, giving the number of synapses from 5 to x  (we assume an identity mapping \nfrom eye coordinates to cortical coordinates).  OJ K  (5 - P)  measures the correlation \nin firing of inputs from eyes J  and K  when the inputs are separated by the distance \n5 - p.  I(x - y)  is  a  model-dependent spread of influence across cortex,  telling  how \ntwo synapses which fire synchronously, separated by the distance x-y, will influence \n\n\fModels of Ocular Dominance Column Formation \n\n377 \n\none another's growth.  This influence incorporates lateral synaptic interconnections \nin  the  case  of  Hebb  synapses  (for  linear  activation,  1= (1- B)-l,  where  1  is \nthe  identity matrix and  B  is  the matrix of cortico-cortical synaptic weights),  and \nincorporates  the  effects  of diffusion  of  trophic  or  modification  factors  in  models \ninvolving  such factors.  .A, \"1  and  \u20ac  are  constants.  Constraints to  conserve  or limit \nthe total synaptic strength supported by a single cell,  and nonlinearities to keep left(cid:173)\nand right-eye synaptic weights positive and less  than some maximum, are added. \nSubtracting the equation for  SR  from  that for  SL  yields  a  model equation for  the \ndifference,  SD(x, 0, t)  = SL(x, 0, t)  - SR(x, 0, t): \n\n8t SD(x, 0, t)  =  .AA(x - 0) L I(x - y)eD(o - p)SD(y, p, t)  - \"1SD(x, 0, t). \n\n\".Il \n\n(2) \n\nHere,  CD  =  eSameEye  _  eOppEye,  where  eSameEye  =  eLL  =  eRR,  eOppEye  = \ne LR  = e RL , and we have assumed  statistical equality of the two eyes. \n\nSIMULATIONS \n\nThe  development  of equation  (1)  was  studied in simulations  using  three  25  x  25 \ngrids for  the two input layers, one representing each eye,  and a single cortical layer. \nEach  input cell connects  to  a  7 x  7  square arbor of cortical cells  centered  on  the \ncorresponding  grid  point (A(x)  = 1 on the square of \u00b13 grid  points,  0  otherwise). \nInitial synaptic weights are randomly assigned from a uniform distribution between \n0.8  and  1.2.  Synapses  are  allowed  to  decrease  to  0  or increase  to  a  weight  of 8. \nSynaptic  strength  over  each  cortical  cell  is  conserved  by  subtracting  after  each \niteration from each active synapse the average change in synaptic strength on that \ncortical cell.  Periodic boundary conditions on  the three grids are used. \n\nA  typical time  development of the  cortical  pattern of ocular  dominance  is  shown \nin figure  1.  For this simulation,  correlations within each eye decrease with distance \nto  zero  over  4-5  grid  points  (circularly  symmetric  gaussian  with  parameter  2.8 \ngrid points).  There are no  opposite-eye correlations.  The cortical interaction func(cid:173)\ntion is a  \"Mexican hat\"  function  excitatory between nearest  neighbors  and weakly \ninhibitory more distantly (I(x)  = exp((-1;1)2)  - ~exp((;l:1)2), .A1  = 0.93.)  Indi(cid:173)\nvidual cortical cell receptive fields refine in size and become monocular (innervated \nexclusively by a  single eye), while individual input arbors refine  in size and become \nconfined to alternating cortical stripes (not shown). \n\nDependence of these results on the correlation function is  shown in figure 2.  Wider \nranging  correlations within  each  eye,  or  addition  of opposite-eye  anticorrelations, \nincrease  the  monocularity  of cortex.  Same-eye  anticorrelations  decrease  monocu(cid:173)\nlarity,  and  if significant within an arbor radius  (i.e.  within \u00b13 grid  points for  the \n7  x  7  square  arbors)  tend  to  destory  the  monocular  organization,  as  seen  at  the \nlower right.  Other simulations  (not shown)  indicate that same-eye correlation over \nnearest neighbors is sufficient to give the periodic organization of ocular dominance, \nwhile  correlation over an arbor radius gives an essentially fully  monocular cortex. \n\n\f378 \n\nMiller, Keller and Stryker \n\nT=O \n\nT=10 \n\nT=20 \n\nR \n\nT=30 \n\nT=40 \n\nT=80 \n\nL \nFigure  1.  Time  development  of cortical  ocular dominance.  Results  shown  after  0, \n10,  20,30, 40,  80 iterations.  Each pixel represents ocular dominance (E a  SD(x, a)) \nof a single  cortical  cell.  40 X  40  pixels  are  shown,  repeating 15 columns  and rows  of \nthe  cortical grid,  to  reveal  the  pattern  across  the  periodic  boundary  conditions. \n\nSimulation of time development  with  varying  cortical  interaction and arbor func(cid:173)\ntions  shows complete agreement with  the analytical results  presented  below.  The \nmodel also reproduces the experimental effects of monocular deprivation,  including \nthe presence of a  critical developmental period for  this effect. \n\nEIGENFUNCTION ANALYSIS \n\nConsider  an  initial  condition  for  which  SD  ~ 0,  and  near  which  equation  (2) \nlinearizes  some  more  complex,  nonlinear  biological  reality.  SD  = 0  is  a  time(cid:173)\nindependent solution of equation (2).  Is  this solution stable to small perturbations, \nso that equality of the two eyes will be restored after perturbation, or is  it unstable, \nso  that a  pattern  of ocular  dominance will  grow?  If it  is  unstable,  which  pattern \nwill  initially grow the fastest?  These are inherently linear questions:  they  depend \nonly  on  the  behavior of  the  equations  when  SD  is  small,  so  that  nonlinear  terms \nare  negligible. \n\nTo solve  this  problem,  we  find  the  characteristic,  independently growing  modes of \nequation (2).  These are the eigenfunctions of the operator on the right side of that \nequation.  Each mode grows exponentially with growth rate given by its eigenvalue. \nIf any eigenvalue is positive (they are real), the corresponding mode will grow.  Then \n\nthe SD = 0  solution is  unstable to perturbation. \n\n\fModels of Ocular Dominance Column Formation \n\n379 \n\nSAME-EYE \n\nCORRELATIONS \n\n+ OPP-EYE \nANTI-CORR \n\n+  SAME-EYE \nANTI-CORR \n\n2.8 \n\n1.4 \n\nFigure  f.  Cortical ocular dominance  after  fOO  iterations for  6  choices of correlation \nfunctions.  Top  left  is  simulation  of figure  1.  Top  and  bottom  rows  correspond  to \ngaussian  same-eye  correlations  with parameter f.B and 1.4  grid points,  respectively. \nMiddle  column  shows  the  effect  of adding  weak,  broadly  ranging  anticorrelations \nbetween  the  two  eyes  (gaussian  with  parameter 9  times  larger  than,  and  amplitude \n- ~ that  oj,  the  same-eye  correlations).  Right  column  shows  the  effect  of instead \nadding  the  anticorrelation to  the  same-eye  correlation function. \n\nANALYTICAL CHARACTERIZATION OF EIGENFUNC(cid:173)\nTIONS \n\nChange variables in  equation  (2)  from  cortex and  inputs,  (x, a),  to cortex and re(cid:173)\nceptive field,  (x, r)  with  r = x-a.  Then equation 2  becomes a  convolution in  the \ncortical  variable.  The  result  (assume  a  continuum;  results  on  a  grid  are  similar) \nis  that  eigenfunctions  are  of the  form  S~,e(x, a, t)  =  eimoz RFm,e(r).  RFm,e  is  a \ncharacteristic  receptive  field,  representing  the  variation  of the  eigenfunction  as  r \nvaries while cortical location x  is  fixed.  m  is a  pair of real numbers specifying a  two \ndimensional wavenumber of cortical oscillation, and e is an additional index enumer(cid:173)\nating  RFs for  a  given  m.  The eigenfunctions can also  be written eimoa ARBm..,{r) \nwhere  ARBm..,(r)  =  eimor RFm..,(r).  ARB  is  a  characteristic  arbor,  representing \nthe variation of the eigenfunction as  r  varies while input location a  is  fixed.  While \nthese functions  are complex,  one  can construct real eigenfunctions from  them with \nsimilar properties  (Miller and Stryker,  1989).  A  monocular  (real)  eigenfunction  is \nillustrated in figure  3. \n\n\f380 \n\nMiller, Keller and Stryker \n\nCHARACTERISTIC  RECEPTIVE  FIELD \n\nI \n\nvvv \n\nCHARACTERISTIC  ARBOR \n\nFigure  9.  One  of the  set  (identical  but  for  rotations  and  reflections)  of fastest(cid:173)\ngrowing  eigenfunctions  for  the  functions  used in  figure  1.  The  monocular receptive \nfields  of synaptic  differences SD  at different cortical locations,  the  oscillation across \ncortez,  and the  corresponding  arbors  are  illustrated. \n\nModes with  RFs dominated by one  eye  C~::II RFm.dY)  -:F  0)  will oscillate  in  domi(cid:173)\nnance with wavelength  ~ across cortex.  A  monocular mode is  one for  which RF \ndoes  not  change sign.  The oscillation of monocular fields,  between domination  by \none eye and domination by the other, yields ocular dominance columns.  The fastest \ngrowing  mode in the linear regime will  dominate  the final  pattern:  if its receptive \nfield  is  monocular,  its wavelength will determine  the width of the final  columns. \n\nOne can characterize the eigenfunctions analytically in various limiting cases.  The \ngeneral  conclusion  is  as  follows.  The  fastest  growing  mode's  receptive  field  RF \nis  largely  determined  by  the  correlation  function  CD.  If the  peak  of the  fourier \ntransform of CD  corresponds to a wavelength much larger than an arbor diameter, \nthe mode will be monocular; if it corresponds to a wavelength smaller than an arbor \ndiameter,  the  mode will  be  binocular.  If CD  selects  a  monocular mode,  a  broader \nCD  (more sharply peaked fourier spectrum about wavenumber 0)  will increase the \ndominance  in  growth  rate  of  the  monocular  mode  over other  modes;  in  the  limit \n\n\fModels of Ocular Dominance Column Formation \n\n381 \n\nin  which  CD  is  constant  with  distance,  only  the  monocular  modes  grow  and  all \nother modes decay.  If the mode is  monocular,  the peak of the fourier  transform of \nthe  cortical  interaction function  selects  the  wavelength  of the  cortical oscillation, \nand thus selects  the wavelength of ocular dominance organization.  In  the limit  in \nwhich correlations  are  broad with respect  to an  arbor,  one  can  calculate  that  the \ngrowth rate  of monocular  modes  as  a  function  of wavenumber  of oscillation  m  is \nproportional to E, i(m -1)6(1)..42 (1)  (where X is  the fourier  transform of X).  In \nthis limit,  only  l's  which are close  to 0 can contribute to  the sum,  so  the  peak will \noccur at or near the m  which  maximizes i(m). \n\nThere is an exception to the above results if constraints conserve, or limit the change \nin,  the  total synaptic  strength  over  the  arbor  of an  input  cell.  Then  monocular \nmodes  with  wavelength  longer  than  an  arbor  diameter  are  suppressed  in  growth \nrate,  since  individual inputs would  have to  gain  or lose  strength  throughout  their \narborization.  Given a  correlation  function  that leads  to monocular cells,  a  purely \nexcitatory  cortical  interaction  function  would  lead  a  single  eye  to  take  over  all \nof cortex;  however,  if constraints  conserve  synaptic strength over  an  input  arbor, \nthe  wavelength  will  instead  be  about  an  arbor  diameter,  the  largest  wavelength \nwhose growth rate is not suppressed.  Thus, ocular dominance segregation can occur \nwith  a  purely  excitatory  cortical interaction function,  though  this  is  a  less  robust \nphenomenon.  Analytically,  a  constraint  conserving  strength  over  afferent  arbors, \nimplemented by subtracting  the average change in strength  over  an arbor at  each \niteration from each synapse in  the arbor,  transforms the previous expression for the \ngrowth rates to E, i(m -1)0(1)..42(1)(1- A~~!?)). \n\nCOMPUTATION OF  EIGENFUNCTIONS \n\nEigenfunctions are computed on a grid,  ~nd the resulting growthrates as a function \nof wavelength  are  compared  to  the  analytical  expression  above,  in  the  absence  of \nconstraints  on  afferents.  The  results,  for  the  parameters  used  in  figure  (2),  are \nshown in figure  (4).  The grey level indicates monocularity of the modes,  defined  as \nEr RF(r)  normalized on a  scale between 0 and  1  (described  in Miller and Stryker \n(1989)).  The  analytical  expression  for  the  growth  rate,  1rhose  peak  coincides  in \nevery case with the peak of i(m), accurately predicts the growth rate of monocular \nmodes,  even far from the limiting case in which the expression was derived.  Broader \ncorrelations  or  opposite-eye  anticorrelations  enhance  the  monocularity  of  modes \nand the growth rate of monocular modes, while same-eye anticorrelations have the \nopposite  effects.  When same-eye  anticorrelations are  short  range  compared  to  an \narbor radius,  the fastest growing modes are  binocular. \n\nResults obtained for calculations in the presence of constraints on afferents are \nalso  as  predicted.  With  an  excitatory  cortical interaction function,  the  spectrum \nis  radically  changed  by  constraints,  selecting  a  mode  with  a  wavelength  equal  to \nan arbor diameter rather than one with a  wavelength as wide  as  cortex.  With the \nMexican  hat  cortical interaction function  used  in  the simulations,  the  constraints \nsuppress the growth of long-wavelength monocular modes but do not alter the basic \n\n\f382 \n\nMiller, Keller and Stryker \n\nSAME-EYE \n\nCORRELATIONS \n\n+ OPP-EYE \nANTI-CORR \n\n+ SAME-EYE \nANTI-CO~~ \n\n35.0 \n\n19.7 \n\n27.3  .  : \n\n\u00b7 \u00b7 '.-.. \u00b7 \n\n2.8 \n\n8.98 \n\n13.8 \n\n5.92 \n\n1.4 \n\nFigure 4.  Growth rate  (vertical axis) as  a function  of inverse  wavelength  (horizontal \naxis) for  the  six sets of functions  used in figure  2,  computed on the  same  grids.  Grey \nlevel  codes  maximum  monocularity of modes  with  the  given  wavelength  and  growth \nrate,  from  fully  monocular  (  white)  to  fully  binocular  (black).  The  black  curve \nindicates  the  prediction  for  relative  growth  rates  of monocular  modes  given  in  the \nl~\u00b7mit of broad  correlations,  as  described in  the  text. \n\nstructure or peak of the spectrum. \n\nCONNECTIONS  TO  OTHER MODELS \n\nThe model of Swindale  (1980)  for  ocular dominance segregation emerges  as a  lim(cid:173)\niting case of this model when correlations are constant over a  bit more  than an ar(cid:173)\nbor diameter.  Swindale's model assumed  an effective  interaction between synapses \ndepending  only  on  eye  of  origin  and  distance  across  cortex.  Our  model  gives  a \nbiological underpinning to this effective interaction in the limiting case,  allows con(cid:173)\nsideration  of  more  general  correlation  functions,  and  allows  examination  of  the \ndevelopment of individual arbors and receptive fields  and their relationships as well \nas of overall ocular dominance. \n\nEquation  2  is  very  similar  to  equations  studied  by  others  (Linsker,  1986,  1988; \nSanger,  this volume).  There are several important differences  in  our results.  First, \nin this  model synapses are constrained  to remain  positive.  Biological synapses are \n\n\fModels of Ocular Dominance Column Fonnation \n\n383 \n\neither exclusively positive or exclusively negative, and in particular the projection of \nvisual input to visual cortex is  purely excitatory.  Even if one is  modelling a  system \nin which there are both excitatory and inhibitory inputs, these two populations will \nalmost certainly be statistically distinct in their activities and hence not treatable as \na  single population whose strengths may be either positive or negative.  S D,  on the \nother hand,  is  a  biological variable which starts  near 0  and  may be either positive \nor negative.  This  allows for a  linear analysis whose results will remain  accurate in \nthe presence of nonlinearities, which is  crucial for  biology. \n\nSecond, we analyze the effect of intracortical synaptic interactions.  These have two \nimpacts on  the modes:  first,  they  introduce a  phase variation or oscillation across \ncortex.  Second,  they typically enhance the growth rate of monocular modes relative \nto modes whose  sign varies across the receptive field. \n\nAcknowledgements \n\nSupported  by  an  NSF  predoctoral  fellowship  and  by  grants  from  the  McKnight \nFoundation and the System Development Foundation.  Simulations were performed \nat the San Diego Supercomputer Center. \n\nReferences \n\nHubel,  D.H.,  T.N.  Wiesel  and  S.  LeVay,  1977.  Plasticity  of  ocular  dominance \ncolumns in monkey striate cortex,  Phil. Trans. R.  Soc. Lond.  B.  278:377-409. \nLinsker, R.,  1986.  From basic network principles to neural architecture, Proc.  Nat!. \n\nAcad. Sci.  USA  83:7508-7512,  8390-8394,  8779-8783. \n\nLinsker,  R.,  1988.  Self-Organization  in  a  Perceptual Network.  IEEE  Computer \n\n21:105-117. \n\nMiller,  K.D.,  1989.  Correlation-based  models of neural development,  to appear in \n\nNeuroscience  and  Connectionist  Theory  (M.A.  Gluck  &  D.E.  Rumel(cid:173)\nhart,  Eds.),  Hillsdale,  NJ:  Lawrence Erlbaum Associates. \n\nMiller,  K.D.,  J.B.  Keller &  M.P.  Stryker,  1986.  Models for  the formation  of ocular \ndominance  columns  solved  by linear stability analysis,  Soc.  N eurosc.  Abst. \n12:1373. \n\nMiller,  K.D.,  J.B. Keller &  M.P. Stryker,  1989.  Ocular dominance column develop(cid:173)\n\nment:  analysis and simulation.  Submitted for  publication. \n\nMiller,  K.D. & M.P. Stryker, 1989.  The development of ocular dominance columns: \nmechanisms and models, to appear in Connectionist Modeling and Brain \nFunction:  The Developing Interface (S.  J.  Hanson &  C.  R.  Olson, Eds.), \nMIT Press/Bradford. \n\nSanger,  T.D., 1989.  An optimality principle for unsupervised learning,  this volume. \nSwindale,  N. V.,  1980.  A model for  the formation of ocular dominance stripes,  Proc. \n\nR.  Soc.  Lond.  B.  208:265-307. \n\nWiesel,  T.N.  &  D.H.  Hubel,  1965.  Comparison  of  the  effects  of  unilateral  and \nbilateral  eye  closure  on  cortical  unit  responses  in  kittens,  J.  Neurophysiol. \n28:,  1029-1040. \n\n\f", "award": [], "sourceid": 124, "authors": [{"given_name": "Kenneth", "family_name": "Miller", "institution": null}, {"given_name": "Joseph", "family_name": "Keller", "institution": null}, {"given_name": "Michael", "family_name": "Stryker", "institution": null}]}