{"title": "Decoding of Neuronal Signals in Visual Pattern Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 356, "page_last": 363, "abstract": null, "full_text": "Decoding of Neuronal Signals  in Visual Pattern \n\nRecognition \n\nEmad N  Eskandar \n\nBarry J  Richmond \n\nLaboratory of Neuropsychology \n\nNational Institute of Mental  Health \n\nLaboratory of Neuropsychology \n\nNational Institute of Mental  Health \n\nBethesda MD 20892  USA \n\nBethesda MD 20892  USA \n\nJohn A  Hertz \n\nLance M  Optican \n\nNORDITA \n\nB1egdamsvej  17 \n\nDK-2100  Copenhagen  0, Denmark \n\nLaboratory of Sensorimotor Research \n\nNational  Eye Institute \n\nBethesda MD  20892  USA \n\nTroels  Kjmr \n\nNORDITA \n\nB1egdamsvej  17 \n\nDK-2100  Copenhagen 0, Denmark \n\nAbstract \n\nWe  have  investigated  the  properties of neurons  in  inferior  temporal  (IT) \ncortex  in  monkeys  performing  a  pattern  matching  task.  Simple  back(cid:173)\npropagation  networks  were  trained  to  discriminate  the  various  stimulus \nconditions on  the  basis  of the  measured  neuronal signal.  We  also trained \nnetworks to predict the neuronal response waveforms from the spatial pat(cid:173)\nterns of the stimuli.  The  results  indicate  t.hat  IT neurons convey  tempo(cid:173)\nrally  encoded  information  about  both current  and  remembered  patterns, \nas well  as  about  their  behavioral context. \n\n356 \n\n\fDecoding of Neuronal Signals  in Visual Pattern Recognition \n\n357 \n\n1 \n\nINTRODUCTION \n\nAnatomical and neurophysiological studies suggest  that there is  a  cortical pathway \nspecialized  for  visual  object  recognition,  beginning  in  the  primary  visual  cortex \nand  ending  in  the inferior  temporal  (IT)  cortex  (Ungerleider  and  Mishkin,  1982). \nStudies of IT neurons in  awake  behaving monkeys have found  that visually elicited \nresponses  depend  on  the  pattern of the  stimulus and on  the  behavioral  context of \nthe stimulus presentation (Richmond and Sato,  1987; Miller et aI,  1991).  Until now, \nhowever,  no  attempt  had  been  made  to  quantify  the  temporal  pattern  of firing  in \nthe context of a  behaviorally complex task such as  pattern recognition. \n\nOur goal was to examine the information present in  IT neurons about visual stimuli \nand  their  behavioral  context.  We  explicitly  allowed  for  the  possibility  that  this \ninformation  was  encoded  in  the  temporal  pattern  of the  response.  To decode  the \nresponses,  we  used simple feed-forward  networks  trained by back propagation. \n\nIn  work  reported  elsewhere  (Eskandar et  al,  1991)  this  information  is  calculated \nanother  way,  with similar results. \n\n2  THE EXPERIMENT \n\nTwo monkeys  were  trained  to perform a sequent.ial nonmatch to sample  task using \na  complete set of 32  black-and-white patterns based on 2-D Walsh functions.  \\\\'hile \nthe  monkey  fixated  and  grasped  a  bar,  a  sample  pattern  appeared  for  352  msecs; \nafter  a  pause  of 500  msecs  a  test  stimulus  appeared  for  352  msecs.  The  monkey \nindicated whether the test stimulus failed to match the sample stimulus by releasing \nthe  bar.  (If the test matched  the stimulus,  the  monkey waited for  a  third stimulus, \ndifferent from  the sample,  before releasing  the  bar; see  Fig.  1.) \n\nSAMPLE \n\nMATCH \n\n~ \n\n~----------~~----------~, -----------_ . \n352  ms \n\n550  ms \n\nREWARD \n\n550 ms \n\n352 ms \n\nSAMPLE \n\nNON MATCH \n\n~ \n\nI - - - -_ \n\n_+_,  ____________ \u2022 \n\nINTER-TRIAL \n\nlNTER-STIMULUS \n\nREWARD \n\nFigure  1:  The nonmatch-to-sample task. \n\n\f358 \n\nEskandar, Richmond, Hertz,  Optican, and Kj<er \n\nThe  type  of  trial  (match  or  nonmatch)  and  t.he  pairings  of sample  stimuli  with \nnonmatch stimuli  were  selected  randomly.  A  single  experiment  usually  contained \nseveral thousand trials;  thus each of the 32  patterns appeared repeatedly  under the \nthree conditions (sample, match, and nonmatch).  Single neuron  recordings from  IT \ncortex were  carried out while  the  monkeys were  performing  the  task. \n\nSAMPLE \n\nMATCH  NONMATCH \n\nA \n\nIJ \n\nB \n\n,\" \"  \n\n,\" \n\n\"', .. \n..... \n\nJi\u00a3:,O,,,,,, \n\nI. , \n\u2022 I \n. \n.. ..  \" ' \"  \n\" .. ...  , .. \n\n, . ,   I \n\n, \n\n. .  \n\nI\n\nI \n\n\" \n\n\u2022 \n\nI \n\n\u2022\n\n\u2022 \n\n\" \n\nFigure 2:  Responses  produced  by 2 stimuli under  3  behavioural condit.ions. \n\nFig.  2  shows  the  neuronal  signals  produced  by  two  different  stimulus  patterns  in \nthe  three  behavioural  conditions:  sample,  match  and  nonmatch.  The  lower  parts \nof the figure  show single-trial spike  trains, while  the  upper  parts show  the effective \ntime-dependent  firing  probabilities,  inferred  from  the  spike  trains  by  convolving \n\n\fDecoding of Neuronal  Signals in Visual  Pattern Recognition \n\n359 \n\neach spike with  a Gaussian  kernel, adding these  up  for  each trial and averaging the \nresulting continuous signals over trials.  It is evident that for  a given stimulus pattern \nthe  average  signals  produced  in  different  behavioural  conditions  are  different.  In \nv,,-hat follows, we  proceed further to show that there is information about behavioural \ncondition in the signal produced in  a single  trial.  vVe  will compute its average value \nexplicitly. \n\n3  DECODING  NETWORKS \n\nTo compute  this  information  we  trained  networks  to  decode  the  measured  signal. \nThe form of the  network is  shown in  Fig.  3. \n\nspike \ntrains \n\nprincipal \ncomponents \n\nhidden \nunits \n\noutput \n\nFigure  3:  Network  to  decode  neuronal  signals  for  information  about  behavioural \ncondition. \n\nThe first  two layers of t he  network shown preprocess the spike trains as follows:  We \nbegin  with  the  spikes  measured  in  an  interval starting  90  msec  after  the stimulus \nonset  and  lasting  255  msec.  First each spike  is  convolved  with  a  Gaussian  kernel \nto  produce a  continuous signal.  This signal is sampled at 4-msec intervals,  giving a \n54-dimensional input  vector.  In the second step this input vector is  compressed by \nthrowing out  all  hut  a  small  number  of its  principal  components (PC's).  The  PC \nbasis  was  obtained  by  diagonalizing  the  54  x  54  covariance  matrix of the  inputs \ncomputed over all  trials  in  the experiment.  The remaining PC's are  then the  input \nto  the  rest  of the  net work,  which  is  a  standard one  with  one further  hidden  layer. \nEarlier work showed that the first five  PC's transmit most of the pattern information \nin  a  neuronal response  (Richmond et aI,  1987).  Furthermore, the first  PC is  highly \ncorrelated  with  the  spike  count.  Thus,  our subsequent  analysis  was  either  on  the \nfirst  PC  alone,  as  a  measure of spike  count, or  on  the first  five  PC's, as  a  measure \n\n\f360 \n\nEskandar,  Richmond, Hertz,  Optican, and Kja::r \n\nthat incorporates temporal modulation. \nWe  trained  the  networks  to  make  pairwise  discriminations  between  responses \nmeasured  under  different  conditions  (sample-match,  sample-non match , or  match(cid:173)\nnonmatch).  Thus there is  a  single output unit,  and the  target  is  a  1 or  0 according \nto the  behavioural  condition  under which  that spike train  was  measured. \n\nThe final  two  layers  of the  network  were  trained  by  standard  backpropa.gation  of \nerrors for  the  cross-entropy cost  function \n\n(1 ) \n\nwhere  TIJ  is  the  target  and  OIA  the  network output  produced  by  the  input  vector \nxiJ  for  training example J-l.  The output of the  network  with  the  weights that result \nfrom  this  training  is  then  the  optimal estimate  (given  the  chosen  architecture)  of \nthe probability of a behavioural condition, given the measured neuronal signal  used \nas  input.  The number of hidden  units was  adjusted  to  minimize  the  generalization \nerror,  which  was  computed  on  one  quarter of the  data that  was  reserved  for  this \npurpose. \n\nWe  then  calculated  the  mean equivocation, \n\nf  = -(O(x) log(O(x) + [1  - O(x)] log[l - O(x)])x, \n\n(2) \n\nwhere  O(x)  is  the  value  of the  output  unit  for  input  x  and  the  average  is  over  all \ninputs.  (Vie  calculated  this  by  averagng over  the  test  or  training  sets;  the  results \nwere  not  sensitive  to  which  one  we  chose.)  The  equivocation  is  a  measure  of the \nneuron's uncertainty with respect to a given discrimination.  From it we can compute \nthe transmitted  information \n\nI  = Ia priori  -\n\nf  =  1 - f. \n\n(3) \n\nThe last equality follows  because  in  our data sets  the  two  conditions  always occur \nequally often. \n\nIt is  evident  from  Fig.  2  that if we  already  know  that  our signal  is  produced  by  a \nparticular st.imulus pattern,  the  discrimination of the behavioural condition will  be \neasier  than if we  do not possess  this  a priori knowledge.  This  is  because the signal \nvaries  with  stimulus  as  well as behavioural condition  (more strongly,  in  fact),  and \nthe  dependence  on  the  latter  has  to  be  sorted  out  from  that  on  the  former.  To \nget  an idea of the effect  of this  \"distraction\", we  performed  4 separate calculations \nfor  each  of the  3  behavioural-condition  discriminations,  using  1,  4,  8,  and  all  32 \nstimulus patterns,  respectively. \n\nThe  results  are  summarized  in  Fig.  4,  which  shows  the  transmitted  information \nabout  the 3  different  behavioural-condition discriminations  at  the  various  levels of \ndistraction,  averaged  over  5  cells.  It.  also  indicates  how  much  of the  tra.nsmitted \ninformation  in  each case  is  contained  in  the spike count alone  (i.e.  the  first  PC  of \nthe signal). \n\nIt is  apparent  that  measurable information about  behavioural  condition  is  present \nin a single neuronal response, even in the total absence of a priori information about \nthe stimulus pattern.  It is also evident that most of this information is contained in \n\n\fDecoding of Neuronal Signals  in Visual  Pattern Recognition \n\n361 \n\n~ \n\n0.3 \n\n\"'C \n\nen  0.5 \n.0  0.4 \n\n0 -c: \nCI> ... . ~  0.2 \nE \nIII c: \nco ... ... \n\n0.1 \n\n#  patterns  1  4  8  32 \n\nsample(cid:173)\nnonmatch \n\n1  4  8  32 \nsample(cid:173)\nmatch \n\n1  4  8  32 \n\nmatch(cid:173)\nnonmatch \n\nFigure  4:  Transmitted  information  for  the  three  behavioural  discriminations  with \ndifferent  numbers of patterns.  The lower  white  region on each  bar shows  the infor(cid:173)\nmation  transmit.ted  in  the first  PC alone. \n\nthe  time-dependence of the  firing:  the information cont.ained  in  the first  PC of the \nsignal is significantly less  (paired  t-test p < 0.001)  and  was barely out of the noise. \nA finite  data set can lead  to a  biased estimate of the  transmitted information  (Op(cid:173)\ntican  et  aI,  1991).  In  order  to  control  for  this  we  made  a  preliminary  study  of \nthe  dependence of the  calculated equivocation  on  training set size.  We  varied the \nnumber  of trials  available  to  the  network  in  a  range  (64  - 1024)  for  one  pair  of \ndiscriminations  (sample  vs.  nonmatch).  The  calculated apparent equivocation  in(cid:173)\ncreased with the sample size N, indicating a small-sample bias.  The best correlation \n(Pearson  r  = -0.86)  was obtained with  a  fit  of the form: \n(c> 0). \n\n(4) \nThis gives us a systematic way to estimate the small-sample bias and thus provide an \nimproved estimate foo  of the true equivocation.  Details  will  be reported elsewhere. \n\nfeN) = foo  - CN- 1/ 2 \n\n4  PREDICTING NEURONAL RESPONSES \n\nIn a second set of analyses, we examined the neuronal encoding of both current and \nrecalled  patterns.  The networks  were  trained  to  predict  the  neuronal  response  (as \nrepresented  by  its  first  5  PC's) from  the  spatial  pattern  of the  current  non match \nstimulus,  that of the  immediately  preceding sample  stimulus,  or  both.  The inputs \nwere the  pixel values of the patterns. \n\nThe network is shown in  Fig.  5.  In order to avoid having different architectures for \npredictions from one and two input patterns, we always used a number of input units \n\n\f362 \n\nEskandar,  Richmond, Hertz,  Optican,  and  Kja::r \n\nequal  to twice  the  number of pixels  in  the  input.  In  the  case  where  the  prediction \nwas  to  be  made  on  the  basis  of both  previous  and  current  patterns,  each  pattern \nwas  fed  into half the  input  units.  For  prediction from just one  pattern  (either  the \ncurrent  or  previous  one),  the  single  input  pixel  array  was  loaded  separately  onto \nboth  halves of the  input  array.  As  in  the  previous  analyses,  the  number  of hidden \nunits  was fixed  by  testing on a  quarter  of the  data held  out of the  training set  for \nthis purpose. \n\n/' \n\n[]: \n\"-\n/' \n~: \n\n\"-\n\n--...... \n--.... \n\n~ \n\n~ \n\n~ \n\nFigure 5:  Network for  predicting neuronal responses from  the stimulus.  The inputs \nare  pixel  values of the stimuli (see  text),  and  the targets are  the first  5  PC's of the \nmeasured  response. \n\nWe performed this  analysis on data from 6 neurons.  Not surprisingly, the predicted \nwaveforms  were  better  when  the  input  was  the  current  pattern  (normalized  mean \nsquare error  (mse)  =  0.482)  than  when  it  was  the  previous  pattern  (mse  =  0.589). \nHowever, the best prediction was obtained when the input reflected both the current \nand  previous  patterns  (mse  = 0.422) .  Thus  the  neurons  we  analyzed  conveyed \ninformation  about  both  remembered  and current stimuli . \n\n5  CONCLUSION \n\nThe results presented here demonstrate the utilit.y of connectionist networks in ana(cid:173)\nlyzing  neuronal information processing.  \\Ve  have shown that temporally modulated \nresponses in IT cortical neurons convey information about both spatial patterns and \nbehavioral  context.  The  responses  also  convey  information  about  the  patterns  of \nremembered stimuli.  Based on these  results, we  hypothesize  that inferior  temporal \nneurons playa role in  comparing visual  patterns with  those  presented at an earlier \ntime. \n\n\fDecoding of Neuronal Signals  in Visual  Pattern Recognition \n\n363 \n\nAcknowledgements \n\nThis  work  was  supported  by  NATO  through  Collaborative  Research  Grant  CRG \n900189.  EE received support from  the Howard  Hughes Medical Institute as an NIH \nResearch  Scholar. \n\nReferences \n\nE  N  Eskandar  et  al  (1991):  Inferior  temporal  neurons  convey  information  about \nstimulus patterns and  their behavioral relevance,  Soc  Neurosci  Abstr 17 443;  Role \nof inferior  temporal  neurons in  visual  memory, submitted to  J  Neurophysiol. \n\nE  K  Miller  et al  (1991):  A  neural  mechanism for  working and  recognition  memory \nin  inferior  temporal  cortex,  Science 253 \n\nL M Optican et al (1991) :  Unbiased measures of transmitted information and chan(cid:173)\nnel  capacity from multivariate neuronal  data,  Bioi  Cybernetics 65 305-310. \nB J  Richmond and T  Sato (1987):  Enhancement of inferior temporal neurons during \nvisual  discrimination ,  J  NeurophysioL  56  1292-1306. \nB  J  Richmond  et  al  (1987):  Temporal  encoding  of two-dimensional  patterns  by \nsingle  units  in  primate inferior  temporal cortex,  J  Neurophysiol 57  132-178. \nL  G  Ungerleider  and  M  Mishkin  (1982):  Two cortical  visual  systems,  in  Analysis \nof Visual Behavior, ed .  D JIngle, M A Goodale and R J  W  Mansfield,  pp 549-586. \nCambridge:  MIT  Press. \n\n\f", "award": [], "sourceid": 446, "authors": [{"given_name": "Emad", "family_name": "Eskandar", "institution": null}, {"given_name": "Barry", "family_name": "Richmond", "institution": null}, {"given_name": "John", "family_name": "Hertz", "institution": null}, {"given_name": "Lance", "family_name": "Optican", "institution": null}, {"given_name": "Troels", "family_name": "Kj\u00e6r", "institution": null}]}