{"title": "Hierarchical Transformation of Space in the Visual System", "book": "Advances in Neural Information Processing Systems", "page_first": 412, "page_last": 419, "abstract": null, "full_text": "Hierarchical Transformation of Space in the \n\nVisual System \n\nAlexandre Pouget \n\nStephen A.  Fisher \n\nTerrence J.  Sejnowski \n\nComputational Neurobiology  Laboratory \n\nThe Salk Institute \nLa Jolla, CA 92037 \n\nAbstract \n\nNeurons  encoding  simple  visual  features  in  area  VI  such  as  orientation, \ndirection  of motion  and  color  are  organized  in  retinotopic  maps.  How(cid:173)\never,  recent  physiological  experiments have  shown  that  the  responses  of \nmany  neurons  in  VI  and  other  cortical  areas  are  modulated  by  the  di(cid:173)\nrection of gaze.  We  have developed a  neural  network model of the  visual \ncortex to explore the hypothesis that visual  features are encoded in  head(cid:173)\ncentered coordinates at early stages of visual processing.  New experiments \nare suggested for  testing this hypothesis using  electrical stimulations and \npsychophysical observations. \n\n1 \n\nIntroduction \n\nEarly  visual  processing  in  cortical  areas  VI,  V2  and  MT  appear  to encode  visual \nfeatures  in  eye-centered  coordinates.  This  is  based  primarily  on  anatomical  data \nand recordings from neurons in these areas, which are arranged in retinotopic maps. \nIn  addition,  when  neurons  in  the  visual  cortex  are  electrically  stimulated  [9],  the \ndirection of the evoked eye  movement  depends  only on  the  retinotopic  position of \nthe  stimulation site,  as  shown  in  figure  1.  Thus,  when  a  position  corresponding \nto the left  part of the  visual field  is  stimulated,  the eyes  move  toward the left  (left \nfigure),  and eye  movements in the opposite  direction are induced if neurons on the \nright side are stimulated (right figure). \n\n412 \n\n\fHierarchical Transformation of Space in the Visual System \n\n413 \n\n.,\" \n\n.... ' \n.  ,,-\n:. \" \n\nL.: \n\n\u2022 \n\nu \n\n.......... ................ . \n\\'. \n' . \n. \\ \n1. \n.\\  R \n\n.\"  \n\n.... \n\n-\n\nu \n\nL  ~i -=~~~-l R \n\nI \n\n\\ \n\\ \n, \n\nD \n\n... \n~ ,\" \n--.-\nD \n\n~ \nEnd Point of the Saccade \n\nFigure  1:  Eye Movements Evoked by Electrical Stimulations in V1 \n\nA  variety  of psychophysical  experiments  provide further  evidence  that simple  vi(cid:173)\nsual features are organized according to retinal coordinates rather than spatiotopic \ncoordinates [10,  5]. \nAt later stages of visual processing the receptive fields of neurons become very large \nand in  the posterior  parietal cortex,  containing areas  believed  to be  important for \nsensory-motor  coordination (LIP, VIP and  7a),  the visual  responses of neurons are \nmodulated  by  both eye and head  position  [1,  2].  A previous  model of the parietal \ncortex showed that the gain fields of the neurons observed there are consistent with \na  distributed spatial transformation from  retinal to head-centered coordinates [14]. \nRecently,  several  investigators  have  found  that  static eye  position  also  modulates \nthe  visual  response of many  neurons at early stages of visual  processing,  including \nthe  LGN,  V1  and  V3a  [3,  6,  13,  12].  Furthermore,  the  modulation  appears to be \nqualitatively  similar  to  that  previously  reported  in  the  parietal  cortex  and  could \ncontribute to those responses.  These new findings suggest that coordinate transfor(cid:173)\nmations from retinal to spatial representations could be initiated much earlier than \npreviously  thought. \n\nWe have used network optimization techniques to study the spatial transformations \nin  a  feedforward  hierarchy  of cortical  maps.  The  goals  of the  model  were  1)  to \ndetermine whether the modulation of neural responses with eye position as observed \nin  V1  or  V3a is  sufficient  to provide  a  head-centered  coordinate frame,  2)  to help \ninterpret  data based  on  the  electrical  stimulation of early  visual  areas,  and  3)  to \nprovide a framework for  designing experiments and testing predictions. \n\n2  Methods \n\n2.1  Network Task \n\nThe task of the  network  was  to compute  the head-centered coordinates of objects. \nIf E is  the  eye  position  vector  and  R is  the  vector  for  the  retinal  position  of the \n\n\f414 \n\nPouget,  Fisher, and Sejnowski \n\nHead -CeG1ered PCllli&iClll \n\n, 1\"-.  1\"-. \nlLlL \n\nH \n\nv \n\n~  ~  OlJtput \n\nHiddea. aaye\" 2 \n\nH \n\nv \n\no  0 \nH  V \n\nE~ PosiUCIIl \n\nRcciaa \n\nFigure 2:  Network Architecture \n\nobject, then  the head-centered position P is  given  by: \n\n(1) \n\nA  two layer network with linear units can solve this problem.  However,  the goal of \nour study was  not to find  the optimal architecture for  this  task, but to explore the \ntypes of intermediate representation developed in a multilayer network of non-linear \nunits and to compare these results with physiological recordings. \n\n2.2  Network Architecture \n\nWe trained a partially-connected multilayer network to compute the head-centered \nposition of objects from retinal and eye position signals available at the input layer. \nWeights were shared  within each hidden  layer  [7]  and adjusted with the backprop(cid:173)\nagation  algorithm  [11].  All  simulations  were  performed  with  the  SN2  simulator \ndeveloped  by  Botou and LeCun. \n\nIn  the  hierarchical  architecture  illustrated  in  figure  2,  the  sizes  of the  receptive \nfields  were  restricted in  each layer and several hidden  units  were  dedicated to each \nlocation,  typically  3  to  5  units,  depending  on  the  layer.  Although  weights  were \nshared  between  locations  within  a  layer,  each  type  of hidden  unit  was  allowed  to \ndevelop its own receptive field  properties.  This  architecture preserves two essential \naspects  of the  visual  cortex:  1)  restricted  receptive fields  organized  in  retinotopic \nmaps and 2)  the sizes of the receptive fields  increase with distance from the retina. \n\nTraining  examples  consisted  of  an  eye  position  vector  and  a  gaussian  pattern  of \nactivity  placed  at  a  particular location  on  the  input  layer  and these  were  system-\n\n\fHierarchical Transformation of Space in the Visual System \n\n415 \n\nI ViIuI c.ta: Ana VJe I \n\u2022\u2022\u2022 \n\u2022 \u2022 \u2022 \n\u2022  \u2022  \u2022 \nIm .... La,..2 \n\nI ViIuI c.ta I Ana 7A  I \n[;J  00@  @@@ \n\u2022  \u2022  \u2022 \n\u2022  \u2022  \u2022 \n\u2022\u2022\u2022 \n\u2022 \u2022 \u2022 \n\u2022 \u2022 \u2022 \n\u2022\u2022\u2022 \n\u2022 \u2022\u2022 \n\u2022 \u2022\u2022 \nIIIWNal .. ,.. 3 \nI \n[iJ  ~ S@@  I:~I  \u2022\u2022\u2022  ~ \n@Il.  a: \nI: : \n\n\u2022\u2022 @ \n\u2022 \u2022 \u2022 \n\n(!X!X!) \n\n8.@ \n(!)(!)@ \n\n(!)8@ \n0<30 \nI \n\n\u2022\u2022 8 \n\nFigure  3:  Spatial  Gain  Fields:  Comparison  Between  Hidden  Units  and  Cortical \nNeurons  (background activity not shown for  V3a neurons) \n\natically varied throughout  the training.  For some trials  there were no visual inputs \nand the output layer  was trained to reproduce the eye position. \n\n2.3  Electrical Stimulation Experiments \n\nDetermining  the  head-centered  position  of  an  object  is  equivalent  to  computing \nthe  position  of the  eye  required  to  foveate  the  object  (Le.  for  a  foveated  object \nR =  0,  which,  according  to equation  1,  implies  that  P =  E).  Thus,  the  output \nof our  network  can  be  interpreted as  the eye position  for  an intended saccadic  eye \nmovement to acquire  the object. \nFor  the  electrical  stimulation  experiments  we  followed  the  protocol  suggested  by \nGoodman  and  Andersen  [4]  in  an  earlier  study  of the  Zipser-Andersen  model  of \nparietal cortex [14].  The cortical model was stimulated by  clamping the activity of \na set of hidden  units  at a location  in one  of the layers to 1,  their  maximum values, \nand  setting all  visual inputs  to o.  The changes  in  the  activity of the  output units \nwere  computed  and interpreted as  an  intended saccade. \n\n3  Results \n\nWe  trained  several  networks  with  various  numbers  of hidden  units  per  layer  and \nfound  that they  all converged to a nearly perfect solution in  a few  thousand sweeps \nthrough the training set. \n\n3.1  Comparison  Between Hidden Units and Cortical Neurons \n\nThe influence of eye position on  the visual response of a cortical neuron is  typically \nassessed by finding the visual stimulus eliciting the best response and measuring the \ngain of this response at nine different eye fixations  [1].  The responses are plotted as \ncircles  with  diameters  proportional to  activity  and  the  set of nine  circles  is  called \nthe  spatial  gain  field  of a  unit.  We  adopted  the  same  procedure  for  studying  the \nhidden  units in  the model. \n\n\f416 \n\nPouget,  Fisher,  and Sejnowski \n\nFigure 4:  Eye  Movements  Evoked by  Stimulating the  Retina \n\nThe  units  in  a  fully-developed  network  have  properties  that  are  similar  to those \nobserved in  cortical  neurons  (figure  3).  Despite  having restricted  receptive  fields, \nthe  overall  activity  of most  units  increased  monotonically  in  one  direction  of eye \nposition,  each  unit  having  a  different  preferred  direction  in  head-centered  space. \nAlso, the inner and outer circles,  corresponding to the visual activity and the overall \nactivity (visual  plus  background)  did not  always  increase along  the same  direction \ndue  to the nonlinear sigmoid  squashing function  of the unit. \n\n3.2  Significance of the Spatial Gains  Fields \n\nEach hidden layer of the network has a retinotopic map but also contains spatiotopic \n(i.e.  head-centered)  information  through  the  spatial  gain  fields.  We  call  these \nretinospatiotopic maps. \nAt each  location  on  a  map,  R is  implicitly  encoded  by  the  position  of a  unit  on \nthe  map,  and  E is  provided  by  the  inputs  from  the  eye  position  units.  Thus, \neach  location  contains  all  the information  needed  to recover  P,  the  head-centered \ncoordinate.  Therefore,  all of the  visual features  in  the map,  such  as orientation or \ncolor,  are  encoded  in  head-centered  coordinates.  This  suggests  that  some  visual \nrepresentations in  Vl and  V3a may  be  retinospatiotopic. \n\n3.3  Electrical Stimulation Experiments \n\nCan  electrical  stimulation  experiments  distinguish  between  a  purely  retinotopic \nmap, like  the retina,  and retinospatiotopic maps,  like each of the hidden layers? \n\nWhen input units in the retina are stimulated, the direction of the evoked movement \nis  determined  by  the  location  of the  stimulation  site  on  the  map  (figure  4),  as \nexpected from a purely retinotopic map.  For example, stimulating units in the upper \n\n\f+++ \n+ ...  +' + .... , \n+' \" \n\n., \n... \n\" \n,  \" '. \n,  \\ \n,  \\ \n\n, \n. \n........... , \n\n.. \n........ \n\n. , \n\n..... \n\n.... . \n\n-. . ---\n\n/ \n\n\\ \n\nHierarchical Transformation of Space in the Visual System \n\n4 I 7 \n\n~?~ \nG~~  :$; \n\n\\  . \\ \n\n, \n\n\\ \n\n\\ \n\n\\ \n\n\\ \n, \n\\ \n\n@~ \n@@@ \n@@@ \n\nHidden layer 2 \n\nHidden layer 3 \n\nOne Hidden Unit Type Stimulated \n\nFigure 5:  Eye Movements  Evoked by Stimulating one  Hidden  Unit Type \n\nleft  corner of the map produces an output in  the upper left direction,  regardless of \ninitial eye  position. \nThere were several types of hidden  units  at each spatial position of a  hidden layer. \nWhen  the  hidden  units  were  stimulated independently, the pattern of induced eye \nmovements was  no  longer  a  function  solely  of the  location of the  stimulation (fig(cid:173)\nure  5).  Other factors,  such  as  the  preferred  head-centered  direction  of the  stim(cid:173)\nulated  cell,  were  also  important.  Hence,  the  intermediate  maps  were  not  purely \nretinotopic. \nIf all  the  hidden  units  present  at  one  location  in  a  hidden  layer  were  activated \ntogether,  the  pattern  of outputs  resembled  the  one  obtained  by  stimulating  the \ninput  layer  (figure  6).  Even  though  each  hidden  unit  has  a  different  preferred \nhead-centered  direction,  when  simultaneously  activated,  these  directions  balanced \nout and the dominant factor  became the  location of the stimulation. \n\nStrong electrical stimulation in area VI of the visual cortex is  likely to recruit many \nneurons  whose  receptive fields  share  the same  retinal location.  This  might explain \nwhy  McIlwain  [9]  observed eye movements in  directions that depended  only on  the \nposition  of the  stimulation site.  In  higher  visual  areas  with  weaker  retinotopy,  it \nmight be  possible  to obtain  patterns closer  to those  produced  by  stimulating only \none type of hidden unit.  Such patterns of eye movements have already been observed \nin parietal area LIP  [4]. \n\n\f418 \n\nPouget,  Fisher,  and Sejnowski \n\n, \n\n/ \n\n\\.\" \n'\\ \n\n;:;~~ \n\n./  ;/ \n\n\\~  ,,\" \n\" \n... \n' \n\\ \n\nW \n~. -1# ~ \n* \n~ + * \n$ \n\"'-\n'......,-\n/  m ~  ~ itt ~ \n\n;  / \n-* * \n\n\"\",  ,,; \n\n.- ~~ \n\n, \n~ \n\n-\n\n./ \n\n,\" \n\n.. \n\n, \n\n--\n\n~ \n\n,  , \n\n\" \n,,..,\" \n\n/ \n.~  / \"  \n\n/ \n\n'  1 \n\n\" \n\n~\". \n\nHidden layer 2 \n\nHidden layer 3 \n\nAll Hidden Unit Types Stimulated \n\nFigure 6:  Eye Movements Evoked by  Stimulating all  Hidden  Unit Types \n\n4  Discussion  and Predictions \n\nThe  analysis  of our  hierarchical  model  shows  that  the  gain  modulation  of visual \nresponses observed at early stages of visual  processing are  consistent  with  the  hy(cid:173)\npothesis  that  low-level  visual  features  are  encoded  in  head-centered  coordinates. \nWhat experiments could  confirm this hypothesis? \nElectrical  stimulation  cannot  distinguish  between  a  retinotopic  and  a  retinospa(cid:173)\ntiotopic representation unless the number of neurons stimulated is small or restricted \nto those  with similar gain fields.  This might be  possible in  an intermediate level of \nprocessing,  such  as  area V3a. \n\nMost  psychophysical  experiments  have  been  designed  to  test  for  purely  head(cid:173)\ncentered maps  [10,  5]  and  not  for  retinotopic  maps  receiving  a  static eye  position \nsignal.  New experiments are needed  that look for  interactions between eye position \nand visual features.  For example, it should be possible to obtain motion aftereffects \nthat  are  dependent  on  eye  position;  that  is,  an  aftereffect  in  which  the  direction \nof motion  depends  on  the gaze  direction.  John  Mayhew  [8]  has  already  reported \nthis type  of gaze-dependent aftereffect  for  rotation,  which  is  probably  represented \nat later stages of visual  processing.  Similar  experiments with  translational motion \ncould probe earlier levels of visual  processing. \nIf information on spatial location  is  already present in  area VI, the primary visual \narea that projects to other areas  of the visual  cortex in primates,  then  we  need  to \nre-evaluate  the  representation  of objects  in  visual  cortex.  In  the  model  presented \nhere,  the spatial location  of an object was encoded along with  its other features  in \na  distributed  fashion;  hence  spatial location should  be  considered  on  equal footing \nwith  other features  of  an  object.  Such  early  spatial  transformations  would  affect \n\n\fHierarchical Transformation of Space in the Visual System \n\n419 \n\nother  aspects of visual  processing,  such  as  visual  attention and object  recognition, \nand  may  also  be  important  for  nonspatial  tasks,  such  as  shape  constancy  (John \nMayhew,  personal communication). \n\nReferences \n\n[1]  R.A.  Andersen, G.K.  Essick, and  R.M.  Siegel.  Encoding of spatial location by \n\nposterior  parietal neurons.  Science,  230:456-458,  1985. \n\n[2]  P.R.  Brotchie and R.A.  Andersen.  A body-centered coordinate system in  pos(cid:173)\n\nterior parietal cortex.  In  Neurosc.  Abst., page  1281,  New  Orleans,  1991. \n\n[3]  C.  Galleti  and  P.P.  Battaglini.  Gaze-dependent  visual  neurons  in area v3a of \n\nmonkey prestriate cortex.  J.  Neurosc.,  9:1112-1125,  1989. \n\n[4]  S.J.  Goodman  and  R.A.  Andersen.  Microstimulations  of  a  neural  network \n\nmodel for  visually guided  saccades.  J.  Cog.  Neurosc.,  1:317-326, 1989. \n\n[5]  D.E. Irwin, J.1. Zacks,  and J .S.  Brown.  Visual  memory  and the  perception of \n\na stable visual environment.  Perc.  Psychophy.,  47:35-46,  1990. \n\n[6]  R.  Lal  and  M.J.  Freedlander.  Gating  of retinal  transmission  by  afferent  eye \n\nposition  and movement signals.  Science, 243:93-96,  1989. \n\n[7]  Y. LeCun, B.  Boser, J.S. Denker, D.  Henderson, R.E. Howard, and 1.D. Jackel. \n\nBackpropagation applied to handwritten zip  code recognition.  Neural Compu(cid:173)\ntation,  1:540-566,  1990. \n\n[8]  J.E.W.  Mayhew.  After-effects  of movement  contingent  on  direction  of gaze. \n\nVision  Res., 13:877-880,  1973. \n\n[9]  J .T.  Mc  Ilwain.  Saccadic  eye  movements  evoked  by  electrical  stimulation  of \n\nthe  cat visual  cortex.  Visual Neurosc.,  1:135-143,  1988. \n\n[10]  J.K.  O'Regan  and  A.  Levy-Schoen.  Integrating visual  information  from  suc(cid:173)\n\ncessive  fixations:  does  trans-saccadic  fusion  exist?  Vision  Res.,  23:765-768, \n1983. \n\n[11]  D.E.  Rumelhart,  G.E.  Hinton,  and  R.J.  Williams.  Learning  internal  repre(cid:173)\n\nsentations  by  error  propagation.  In  D.  E.  Rumelhart,  J.  L.  McClelland,  and \nthe  PDP  Research  Group,  editors,  Parallel  Distributed  Processing,  volume  1, \nchapter 8,  pages  318-362.  MIT Press,  Cambridge, MA,  1986. \n\n[12]  Y. Trotter, S. Celebrini, S.J. Thorpe, and Imbert M.  Modulation of stereoscopic \nprocessing in  primate visual  cortex vI  by the distance  of fixation.  In  Neurosc. \nAbs., New-Orleans,  1991. \n\n[13]  T.G. Weyand and J.G. Malpeli.  Responses of neurons in primary visual cortex \n\nare  influenced  by eye position.  In  Neurosc.  Abs., page 419.7, St Louis,  1990. \n\n[14]  D.  Zipser  and  R.A.  Andersen.  A  back-propagation programmed  network that \nstimulates reponse properties of a subset of posterior parietal neurons.  Nature, \n331:679-684,  1988. \n\n\f", "award": [], "sourceid": 513, "authors": [{"given_name": "Alexandre", "family_name": "Pouget", "institution": null}, {"given_name": "Stephen", "family_name": "Fisher", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}]}