{"title": "Neural Representation of Multi-Dimensional Stimuli", "book": "Advances in Neural Information Processing Systems", "page_first": 115, "page_last": 121, "abstract": null, "full_text": "Effects of Spatial and Temporal Contiguity on \n\nthe Acquisition of Spatial Information \n\nThea B. Ghiselli-Crippa and Paul W. Munro \n\nDepartment of Information Science and Telecommunications \n\nUniversity of Pittsburgh \nPittsburgh, PA  15260 \n\ntbgst@sis.pitt.edu, munro@sis.pitt.edu \n\nAbstract \n\nSpatial  information comes in two forms:  direct spatial information (for \nexample, retinal position) and indirect temporal contiguity information, \nsince objects encountered sequentially are in general spatially close. The \nacquisition  of spatial  information  by  a  neural  network  is  investigated \nhere. Given a spatial layout of several objects, networks are trained on a \nprediction task.  Networks using temporal sequences with no direct spa(cid:173)\ntial information are found to  develop internal representations that show \ndistances correlated with distances in the external layout.  The influence \nof spatial information is analyzed by providing direct spatial information \nto the system during training that is  either consistent with the layout or \ninconsistent with  it.  This  approach  allows  examination of the relative \ncontributions of spatial and temporal contiguity. \n\n1 \n\nIntroduction \n\nSpatial information is  acquired by a process of exploration that is  fundamentally tempo(cid:173)\nral,  whether it be on a small  scale, such as scanning a picture, or on a larger one, such as \nphysically navigating through a building, a neighborhood, or a city.  Continuous scanning \nof an environment causes locations that are spatially close to have a tendency to occur in \ntemporal  proximity  to  one another.  Thus,  a  temporal  associative mechanism (such  as  a \nHebb rule)  can be used in  conjunction with continuous exploration to  capture the spatial \nstructure of the environment [1].  However, the actual process of building a cognitive map \nneed not rely solely on temporal associations, since some spatial information is encoded in \nthe sensory array (position on the retina and proprioceptive feedback).  Laboratory studies \nshow different types of interaction between the relative contributions of temporal and spa(cid:173)\ntial contiguities to the formation of an internal representation of space.  While Clayton and \nHabibi's [2]  series of recognition priming experiments indicates that priming is controlled \nonly  by  temporal  associations,  in  the work  of McNamara et al.  [3]  priming in  recogni(cid:173)\ntion  is  observed only  when  space and  time are both contiguous.  In addition,  Curiel and \nRadvansky's [4]  work shows that the effects of spatial and temporal contiguity depend on \nwhether location or identity information is  emphasized during learning.  Moreover, other \nexperiments ([3]) also show  how the effects clearly depend on the task and can be quite \ndifferent if an explicitly spatial task is used (e.g., additive effects in location judgments). \n\n\f18 \n\nT.  B.  Ghiselli-Crippa and P  W.  Munro \n\nlabels \n\nlabels \n\nlabels \n\n(A coeff.) \n\ncoordinates \n(B  coeff.) \n\nlabels \n\nlabels \n\ncoordinates \n\nlabels \n\nFigure  1:  Network  architectures:  temporal-only network (left);  spatio-temporal network \nwith  spatial  units  part of the  input representation (center);  spatio-temporal network  with \nspatial units part of the output representation (right). \n\n2  Network architectures \n\nThe goal of the work presented in this  paper is  to  study the structure of the internal rep(cid:173)\nresentations  that  emerge from  the  integration  of temporal  and  spatial  associations.  An \nencoder-like network architecture is  used (see Figure 1),  with a set of N  input units and a \nset of N  output units representing N  nodes on a 2-dimensional graph.  A set of H  units is \nused for the hidden layer. To include space in the learning process, additional spatial units \nare included in the network architecture. These units provide a representation of the spatial \ninformation directly available during the learning/scanning process.  In the simulations de(cid:173)\nscribed in this paper, two units are used and are chosen to represent the (x, y) coordinates of \nthe nodes in the graph. The spatial units can be included as part of the input representation \nor as part of the output representation (see Figure 1, center and right panels):  both choices \nare used in the experiments, to investigate whether the spatial information could better ben(cid:173)\nefit training as  an  input or as an output [5].  In the second case, the relative contribution of \nthe spatial information can be directly manipulated by introducing weighting factors in the \ncost function being minimized.  A two-term cost function is used, with a cross-entropy term \nfor the N  label units and a squared error term for the 2 coordinate units, \n\nri indicates the actual output of unit i  and ti its desired output.  The relative influence of \nthe spatial information is controlled by the coefficients A and B. \n\n3  Learning tasks \n\nThe  left  panel  of Figure  2  shows  an  example  of the  type  of layout  used;  the  effective \nlayout used  in  the study consists  of N  =  28  nodes.  For each  node,  a set of neighboring \nnodes is defined, chosen on the basis of how an observer might scan the layout to learn the \nnode labels and their (spatial) relationships; in Figure 2, the neighborhood relationships are \nrepresented by lines connecting neighboring nodes. From any node in the layout, the only \nallowed transitions are those to a neighbor, thus defining the set of node pairs used to train \nthe network (66  pairs out of C(28, 2)  =  378 possible pairs).  In addition, the probability \nof occurrence of a  particular  transition  is  computed as  a  function  of the distance  to  the \ncorresponding neighbor.  It is then possible to generate a sequence of visits to the network \nnodes, aimed at replicating the scanning process of a human observer studying the layout. \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\nknife \n\ncoin \n\n19 \n\ncup \n\neraser \n\neraser \n\nbutton \n\nFigure 2:  Example of a  layout  (left)  and  its  permuted  version  (right).  Links  represent \nallowed transitions. A larger layout of 28 units was used in the simulations. \n\nThe basic learning task is  similar to  the grammar learning task of Servan-Schreiber et al. \n[6]  and to the neighborhood mapping task described in [1]  and is used to associate each of \nthe N  nodes on the graph and its  (x, y)  coordinates with the probability distribution of the \ntransitions to  its neighboring nodes.  The mapping can be learned directly, by associating \neach  node  with  the probability distribution of the transitions  to  all  its  neighbors:  in  this \ncase,  batch  learning  is  used  as  the  method  of choice for  learning  the  mapping.  On  the \nother  hand,  the  mapping  can  be  learned  indirectly,  by  associating  each  node  with  itself \nand  one of its  neighbors,  with  online learning being  the  method  of choice  in  this  case; \nthe  neighbor chosen  at  each  iteration  is  defined  by  the  sequence of visits  generated  on \nthe basis  of the transition probabilities.  Batch learning  was  chosen because it  generally \nconverges more  smoothly  and  more quickly  than  online  learning  and  gives  qualitatively \nsimilar  results.  While  the  task  and  network architecture  described  in  [1]  allowed  only \nfor temporal association learning, in  this study both temporal and spatial associations are \nlearned simultaneously, thanks to the presence of the spatial units. However, the temporal(cid:173)\nonly  (T-only)  case,  which  has  no  spatial  units,  is  included  in  the  simulations performed \nfor  this  study,  to provide a benchmark for  the evaluation of the results obtained with  the \nspatio-temporal (S-T) networks. \n\nThe task described above allows the network to learn neighborhood relationships for which \nspatial and temporal associations provide consistent information, that is, nodes experienced \ncontiguously in time (as defined by the sequence) are also contiguous in space (being spa(cid:173)\ntial neighbors).  To  tease apart the relative contributions of space and time, the task is kept \nthe same, but the data employed for  training the network is  modified:  the same layout is \nused to generate the temporal sequence, but the x , y coordinates of the nodes are randomly \npermuted (see right panel of Figure 2). If the permuted layout is then scanned following the \nsame sequence of node visits used in the original version, the net effect is that the temporal \nassociations remain the same, but the spatial associations change so that temporally neigh(cid:173)\nboring nodes can now  be spatially close or distant:  the spatial associations  are no  longer \nconsistent with the temporal associations.  As Figure 4 illustrates, the training pairs (filled \ncircles)  all  correspond to  short distances  in  the  original  layout,  but  can have  a  distance \nanywhere in the allowable range in  the permuted layout.  Since the temporal  and spatial \ndistances  were consistent in  the original layout,  the original spatial distance can be used \nas  an indicator of temporal distance and Figure 4 can be interpreted as  a plot of temporal \ndistance vs.  spatial distance for the permuted layout. \n\nThe simulations described in the following include three experimental conditions: temporal \nonly (no direct spatial information available); space and time consistent (the spatial coor(cid:173)\ndinates and the temporal sequence are from the same layout); space and time inconsistent \n(the spatial coordinates and the temporal sequence are from different layouts). \n\n\f20 \n\nT.  B.  Ghise/li-Crippa and P.  W.  Munro \n\nHidden unit representations are compared using Euclidean distance (cosine and inner prod(cid:173)\nuct measures give consistent results); the internal representation distances are also used to \ncompute their correlation with Euclidean distances between nodes  in  the layout (original \nand permuted).  The correlations increase with  the  number of hidden  units  for  values of \nH  between 5  and  10 and  then gradually taper off for  values greater than 10.  The results \npresented in the remainder of the paper all  pertain to networks trained with  H  =  20 and \nwith hidden units using a tanh transfer function; all the results pertaining to S-T networks \nrefer to networks with 2 spatial output units and cost function coefficients A  =  0.625 and \nB  =  6.25. \n\n4  Results \n\nFigure 3 provides a combined view of the results from all three experiments. The left panel \nillustrates  the  evolution  of the correlation  between  internal  representation  distances  and \nlayout (original  and permuted) distances.  The right panel  shows  the distributions of the \ncorrelations at the end of training (1000 epochs). The first general result is that, when spa(cid:173)\ntial information is available and consistent with the temporal information (original layout), \nthe  correlation  between hidden  unit distances  and  layout distances  is  consistently  better \nthan the correlation obtained in the case of temporal associations alone.  The second gen(cid:173)\neral result is that, when spatial information is available but not consistent with the temporal \ninformation (permuted layout), the correlation between hidden unit distances and original \nlayout distances (which represent temporal distances) is similar to that obtained in the case \nof temporal associations alone, except for the initial transient.  When the correlation is com(cid:173)\nputed with respect to  the permuted layout distances,  its  value peaks early during training \nand then decreases rapidly, to reach an asymptotic value well below the other three cases. \nThis behavior is  illustrated in the box plots in the right panel of Figure 3, which report the \ndistribution of correlation values at the end of training. \n\n4.1  Temporal-only vs. spatio-temporal \n\nAs  a first  step in this study, the effects of adding spatial information to the basic temporal \nassociations used to train the network can be examined. Since the learning task is the same \nfor  both  the T-only  and  the  S-T  networks  except for  the  absence or presence of spatial \ninformation during  training,  the differences  observed can be attributed to  the  additional \nspatial information available to the S-T networks.  The higher correlation between internal \nrepresentation distances and original layout distances obtained when spatial information is \n\n0 \n\n-\n., \n\n0 \n\n.. \n... \u2022 8 \" \n\n8 0 \nii \n\n0 \n\n'\" \n\nci \n\n0 \n0 \n\nS and T CO\"Isistent \n\nT-o\" \n\nSand T InCOnsistent \n(corr  with T distance) \n\nS and T Ir'ICOOSlStent \n(corr. Wflh S distance) \n\n~ \n\n., \n\n0 \n\n.. \n\n0 \n\n\" \n\n0 \n\nN \n0 \n\n0 \n0 \n\ni:i \n\n-==-\n~  ~ \n-\n\n=s: \n\n........... \nE:2 \n--'----' \n\n200 \n\n400 \n600 \nOllnber 01  epochs \n\n800 \n\n1000 \n\nSandT \ncon_atent \n\nT-only \n\nSandT \n\nInconsistent \n\n(corr  \" th T ast ) (corr  wth 5 dst ) \n\nSandT \n\nineon.stant \n\nFigure 3:  Evolution of correlation during training (0 - 1000 epochs) (left). Distributions of \ncorrelations at the end of training (1000 epochs) (right). \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\n21 \n\nN -\n0 -\n., \n\n0 \n\n\", \n'\" E  0 \n~ \n\n... \n\n0 \n\nN \n0 \n\n0 \n0 \n\ndHU  =  0.6  + 3.4d T  +  0.3ds  - 2.1(dT)2  +  0.4(d S )2  - 0.4d T ds \n\n2 5 \n\n15 \n\n05 \n\n15 \n\n00 \n\n02 \n\n04 \n\n08 \n\n1 0 \n\n12 \n\n14 \n\n\" \n\nFigure 4:  Distances in the original layout \n(x)  vs_  distances  in  the  permuted  layout \n(y)_  The 66 training pairs are identified by \nfilled circles_ \n\nFigure  5:  Similarities  (Euclidean  distances) \nbetween  internal  representations  developed \nby a S-T network (after 300 epochs)_  Figure \n4 projects the data points onto the x, y plane_ \n\navailable (see Figure 3) is apparent also when the evolution of the internal representations \nis examined_ As  Figure 6  illustrates,  the  presence of spatial  information results  in  better \ngeneralization for  the pattern pairs outside the  training set  While the distances between \ntraining pairs are mapped to similar distances in hidden unit space for both the T-only and \nthe S-T networks, the T-only network tends to cluster the non-training pairs into a narrow \nband of distances in hidden unit space.  In the case of the S-T network instead, the hidden \nunit  distances  between  non-training pairs  are  spread out over a  wider  range  and tend  to \nreflect the original layout distances. \n\n4.2  Permuted layout \n\nAs  described  above,  with  the  permuted  layout it  is  possible  to  decouple the  spatial  and \ntemporal contributions and therefore study the effects of each.  A comprehensive view of \nthe results at a particular point during training (300 epochs) is presented in Figure 5, where \nthe x, y plane represents temporal distance vs.  spatial distance (see also Figure 4) and the z \naxis represents the similarity between hidden unit representations.  The figure also includes \na quadratic regression surface fitted  to the data points.  The coefficients in the equation of \nthe surface provide a quantitative measure of the relative contributions of spatial (ds) and \ntemporal distances (dT )  to the similarity between hidden unit representations (dHU ): \n\n(2) \n\nIn general, after the transient observed in early training (see Figure 3), the largest and most \nsignificant coefficients  are found for  dT  and  (dT?,  indicating  a  stronger dependence of \ndHU on temporal distance than on spatial distance. \n\nThe results illustrated in Figure 5 represent the situation at a particular point during training \n(300 epochs).  Similar plots can be generated for different points during training, to  study \nthe evolution of the  internal representations.  A different view of the evolution process is \nprovided by Figure 7, in  which the data points are projected onto the x,Z plane (top panel) \nand the y,z  plane (bottom panel)  at  four different times  during  training.  In the top panel, \n\n\f22 \n\nN  ,.. \n\n~ \n\n0 \n\n_ \n\n\u2022 \n\n~, \n\n~ ~ \n~  ~ \n~  -... -\n\n00  02  \"  06  O.  \"  12 \n\n\"_d \n\n.. \n\n, \n\n~ \n\n:;  ~  ~' ;;  ~ \n~,  -\n~ \n~ \n: \ni \n~  ~ \n~  .~ \n~ \n\n~ \n\n~ \n\n::: \n~ \n\n00 \n\n' \n\n::: \n\n, \n\n0 \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n~ \n\n~ \n\n, \n\n::: \n~ \n\n, \n, \n\n. \n00  02  ..  06  ..  \"  12 \n\nT.  B.  Ghiselli-Crippa and P  W  Munro \n\n~ ,.. ~ \n~  roo  ~  ~ ~ \n\n~  ~. ~  .~. \n00  02  ..  .. ..  \"  12 \n00  02  ..  06  ..  \"  \" \n:::  ~ \n\n..  . \n\nf/Po \n\n,.~,o  0 \n\n.' \n\n~ : \n~  ~ \n\n~ \n\n~ \n~ \n~  ~ \n\n, \n, \n.I' \n\n. \n\n~ \n~ \n\n.:. \n\" \n\n',' \n\n:  s \n\ne , \n\n',~-, \n\n',' \n\n, \n\n<P \n\n, \n\n0 \n\n, \n\ntP \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\nrIP 0 \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n00 \n\n\u2022 \n\no \n\n::: \n\n\" _d \n\n::: \n\n\"_d \n\n\"_d \n\nDO \n\n~, \n\nDO \n\n00  02  O . \n\nos  ..  10  12 \n\n\"-' \n\n~ \ng \n\n00  02  \"  06  ..  \"  12 \n\n\"-' \n\n~ \n\n~ \n\n00  02  \"  ..  ..  10  12 \n\n\"-' \n\n00  02  \"  O.  o.  \"  12 \n\n\"-' \n\nFigure 6:  Internal representation distances vs.  original layout distances:  S-T network (top) \nvs.  T-only network (bottom). The training pairs are identified by filled circles. The presence \nof spatial information results in better generalization for the pairs outside the training set. \n\nthe internal representation distances are plotted as  a function of temporal distance (i.e., the \nspatial  distance from  the original layout), while in the  bottom panel  they are plotted as  a \nfunction of spatial distance (from the permuted layout). The higher asymptotic correlation \nbetween  internal  representation  distances  and  temporal  distances,  as  opposed  to  spatial \ndistances (see Figure 3),  is  apparent also from  the  examination of the evolutionary plots, \nwhich  show an asymptotic behavior with  respect to  temporal distances (see Figure 7,  top \npanel) very similar to the T-only case (see Figure 6, bottom panel). \n\n5  Discussion \n\nThe first general conclusion that can be drawn from the examination of the results described \nin the previous section is that, when the spatial information is available and consistent with \nthe temporal information (original layout), the similarity structure of the hidden unit rep(cid:173)\nresentations  is  closer  to  the  structure  of the  original  layout  than  that  obtained  by  using \ntemporal  associations  alone.  The second  general  conclusion  is  that,  when the  spatial  in(cid:173)\nformation is  available but not consistent with the temporal information (permuted layout), \nthe similarity structure of the hidden unit representations seems to correspond to temporal \nmore than spatial proximity.  Figures 5 and 7 both indicate that temporal associations take \nprecedence over spatial associations.  This result is in agreement with the results described \nin [1],  showing how temporal associations (plus some high-level constraints) significantly \ncontribute to the internal representation of global spatial information. However, spatial in(cid:173)\nformation certainly is very beneficial to the (temporal) acquisition of a layout, as proven by \nthe results obtained with the S-T network vs.  the T-only network. \n\nIn terms of the model presented in this paper, the results illustrated in Figures 5 and 7 can \nbe  compared  with  the  experimental data reported for  recognition priming  ([2],  [3],  [4]), \nwith  distance between internal representations corresponding to reaction time. The results \nof our model indicate that distances in both the spatially far  and  spatially close condition \nappear to be consistently shorter for the training pairs (temporally close) than for the non(cid:173)\ntraining pairs (temporally distant), highlighting a strong temporal effect consistent with the \ndata reported in  [2]  and  [4]  (for spatially far  pairs) and  in  [3]  (only for the spatially close \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\n23 \n\n;  ~-' ~. ~~. . \n~  0_  Sl \n.. \n~ \n........... -\n\nri \n\n0 \n\n0 \n\nj!I!A\" ...... . \n\\ .. \n\nlfIiiIo \n'0' \n\n,. \n\n,. \n\n0 \n\n\u2022 \n\n110 \n\n0 \n\n~ \n\n~  ~ \n\n~'--_____  -.J \n0 1  01  10  12 \n\n0 2  O. \n\n00 \n\n00  02  O.  01  01  10 \n\n1 2 \n\n00  02 \n\n0.4  01  01  10  12 \n\n0 0  02  O.  01  oa  10  12 \n\nIn_d (T} \n\nIn_d(TI \n\nl'I_d (T) \n\nIn_den \n\n~ L..-____  - . l  \n\n00  02  0\"  01  01 \n\n'0  12 \n\n00 \n\n0.2  o.  ot  01  10  12 \n\n0.0  02  04 \n\n0 8 \n\nall \n\n1 0 \n\n1 2 \n\n~l.-__ ___  -.J \n00  02  O.  06  oa  10  12 \n\nIn_d (S) \n\n.. _d(S) \n\n... u:I (S) \n\n!rUi (S) \n\nFigure 7:  Internal  representation distances  vs.  temporal  distances  (top)  and  vs.  spatial \ndistances (bottom) for  a S-T network (permuted layout).  The training pairs are identified \nby filled circles.  The asymptotic behavior with respect to  temporal distances (top panel) is \nsimilar to the T-only condition.  The bottom panel indicates a weak dependence on spatial \ndistances. \n\ncase).  For the training pairs (temporally close),  slightly shorter distances are obtained for \nspatially  close  pairs  vs.  spatially  far  pairs;  this  result  does  not  provide support for  the \nexperimental  data  reported  in  either  [3]  (strong  spatial  effect)  or  [2]  (no  spatial  effect). \nFor the non-training pairs (temporally distant),  long distances are found throughout, with \nno  strong  dependence on spatial  distance;  this  effect  is  consistent  with  all  the  reported \nexperimental data.  Further  simulations  and  statistical  analyses  are  necessary for  a  more \nconclusive comparison with the experimental data. \n\nReferences \n\n[1]  Ghiselli-Crippa, TB. &  Munro,  P.w.  (1994).  Emergence of global  structure  from local associa(cid:173)\ntions.  In J.D.  Cowan, G. Tesauro, & J.  Alspector (Eds.), Advances in Neural Information Processing \nSystems 6, pp.  1101-1108.  San Francisco, CA:  Morgan Kaufmann. \n\n[2]  Clayton, K.N. & Habibi, A. (1991). The contribution of temporal contiguity to the spatial priming \neffect.  Journal of Experimental Psychology:  Learning.  Memory.  and Cognition 17:263-271. \n\n[3]  McNamara,  TP.,  Halpin.  J.A. &  Hardy,  J.K.  (1992).  Spatial  and temporal  contributions  to  the \nstructure of spatial memory. Journal of Experimental Psychology:  Learning. Memory.  and Cognition \n18:555-564. \n\n[4]  Curiel,  J.M. & Radvansky,  G.A.  (1998).  Mental organization of maps.  Journal  of Experimental \nPsychology:  Learning.  Memory.  and Cognition 24:202-214. \n\n[5]  Caruana,  R.  & de  Sa,  VR.  (1997).  Promoting poor  features  to  supervisors:  Some inputs  work \nbetter as outputs . In M.e. Mozer,  M.I. Jordan,  &  T  Petsche (Eds.), Advances in Neural Information \nProcessing Systems 9, pp.  389-395.  Cambridge, MA:  MIT Press. \n\n[6]  Servan-Schreiber,  D., Cleeremans,  A.  & McClelland,  J.L.  (1989).  Learning sequential structure \nin simple recurrent networks.  In D.S. Touretzky  (Ed.), Advances in Neural  Information  Processing \nSystems 1, pp.  643-652. San Mateo, CA: Morgan Kaufmann. \n\n\fNeural Representation of Multi-Dimensional \n\nStimuli \n\nChristian W. Eurich, Stefan D. Wilke and Helmut Schwegler \n\nInstitut fUr Theoretische Physik \nUniversitat Bremen, Germany \n\n(eurich,swilke,schwegler)@physik.uni-bremen.de \n\nAbstract \n\nThe encoding accuracy of a population of stochastically spiking neurons \nis studied for different distributions of their tuning widths.  The situation \nof identical  radially  symmetric  receptive  fields  for  all  neurons,  which \nis  usually  considered in  the  literature,  turns out to be disadvantageous \nfrom  an  information-theoretic point of view.  Both  a variability  of tun(cid:173)\ning widths and a fragmentation of the neural population into specialized \nsubpopulations improve the encoding accuracy. \n\n1  Introduction \n\nThe topic of neuronal tuning properties and their functional significance has focused much \nattention in the last decades.  However, neither empirical findings  nor theoretical consider(cid:173)\nations have yielded a unified picture of optimal neural encoding strategies given a sensory \nor motor task.  More specifically, the question as to whether narrow tuning or broad tuning \nis  advantageous for the representation of a set of stimulus features is  still being discussed. \nEmpirically, both situations are encountered:  small receptive fields  whose diameter is less \nthan  one degree can,  for example,  be found  in  the  human retina  [7], and large receptive \nfields  up to 1800  in  diameter occur in  the visual system of tongue-projecting salamanders \n[10].  On the theoretical side, arguments have been put forward for small [8]  as well as for \nlarge [5,  1,9, 3,  13] receptive fields. \n\nIn  the last years,  several  approaches have been  made to calculate the  encoding accuracy \nof a  neural  population  as  a function  of receptive  field  size  [5,  1,9,3, 13].  It has  turned \nout that for a firing rate coding, large receptive fields are advantageous provided that D  2: \n3  stimulus  features  are  encoded  [9,  13].  For binary  neurons,  large  receptive  fields  are \nadvantageous also for D  =  2 [5,3]. \n\nHowever, so far  only  radially symmetric tuning curves have been considered.  For neural \npopulations which lack this symmetry, the situation may be very different.  Here we study \nthe encoding accuracy of a popUlation  of stochastically spiking neurons.  A  Fisher infor(cid:173)\nmation analysis performed on different distributions of tunings widths will indeed reveal a \nmuch more detailed picture of neural encoding strategies. \n\n\fJ J 6 \n\n2  Model \n\nC.  W.  Eurich. S.  D.  Wilke and H.  Schwegler \n\nConsider a  D-dimensional stimulus space,  X.  A  stimulus  is characterized by  a position \nx  = (Xl, ... , XD)  E  X, where  the  value of feature  i,  Xi  (i  = 1, ... , D), is  measured \nrelative  to  the  total  range of values  in  the  i-th dimension  such  that  it  is  dimensionless. \nInformation about  the  stimulus  is  encoded by  a  popUlation  of N  stochastically  spiking \nneurons. They are assumed to have independent spike generation mechanisms such that the \njoint probability distribution for observing n  =  (n(l), ... ,n(k), ... ,n(N\u00bb) spikes within a \ntime interval T, Ps(n; x), can be written in the form \n\nPs(n;x) =  II ps(k) (n(k); x), \n\nN \n\nk=l \n\n(1) \n\nwhere Ps(k) (n(k); x) is the single-neuron probability distribution of the number of observed \nspikes given the stimulus at position x.  Note that (1) does not exclude a correlation of the \nneural firing rates, i.e., the neurons may have common input or even share the same tuning \nfunction. \n\nThe firing rates depend on the stimulus via the local values of the tuning functions, such that \nPs(k) (n(k); x) can be written in the form Ps(k) (n(k); x)  =  S (n(k), j(k) (x), T),  where the \ntuning function of neuron k,  j(k) (x), gives its  mean firing rate in response to the stimulus \nat position x.  We assume here a form of the tuning function that is not necessarily radially \nsymmetric, \n\nf(') (x) = F4> (t (Xi ~~r) )2)  =, F\u00a2 ( e( ')2)  , \n\n(2) \n\nwhere  e(k)  =  (c~k), ... , c};\u00bb)  is  the  center of the  tuning  curve  of neuron  k,  O'~k)  is  its \ntuning  width  in  the  i-th dimension, dk)2  :=  (Xi  -\nc~k\u00bb)2/O'ik)2 for  i  =  1, ... ,D, and \n~(k)2 := ~~k)2 + ... + ~~)2. F  > 0 denotes the maximal firing rate of the neurons, which \nrequires that maxz~o fj>(z)  =  1. \n\nWe  assume that the tuning widths O't), . .. ,O'~) of each neuron k are drawn from a distri(cid:173)\nbution PO' (0'1, ... ,O'D). For a population oftuning functions with centers e(l), ... , e(N), a \ndensity 1}(x)  is introduced according to 1}(x)  := L:~=l 8(x - e(k\u00bb). \nThe encoding accuracy  can  be  quantified by  the  Fisher information  matrix,  J,  which  is \ndefined as \n\n(3) \n\nwhere E[ ... J denotes the expectation value over the probability distribution  P(n; x)  [2]. \nThe Fisher information yields a lower bound on the expected error of an unbiased estimator \nthat retrieves the stimulus x from the noisy neural activity (Cramer-Rao inequality) [2].  The \nminimal estimation error for the i-th feature Xi, ti,min, is given by t;,min  =  (J- 1 )ii which \nreduces to t;,min  =  1/ Jii(X) if J  is diagonal. \nWe  shall  now  derive  a  general  expression  for  the  popUlation  Fisher information.  In  the \nnext chapter,  several cases and their consequences for neural encoding strategies will  be \ndiscussed. \nFor model neuron (k), the Fisher information (3) reduces to \n\n(k) \n\n(k)  _ \nJij  (X'O'I  \"\"'O'D) -\n\n(k) \n\n. \n\n1 \n(k) \n\n(  (k)2 \n\n(k)Aq..  ~ \n\n) \n\n(k)  (k) \n,F,T  ~i  ~j  , \n\n(4) \n\nO'i  O'j \n\n\fNeural Representation of Multi-Dimensional Stimuli \n\n117 \n\nwhere  the  dependence on  the  tuning  widths  is  indicated  by  the  list  of arguments.  The \nfunction  A.p  depends  on  the  shape  of the  tuning  function  and is  given  in  [13].  The  in(cid:173)\ndependence assumption  (1)  implies  that the  population Fisher information  is  the  sum  of \nh \nt  e contn  utlOns 0 \n.  ne now define \na population Fisher information which is  averaged over the  distribution  of tuning  widths \nPt:T(0\"1,  . .. ,O\"D): \n\nt  e III  IVI  ua  neurons,  L.Jk=1 \n\n.  d\u00b7\u00b7d  I \n\n, ... ,0\" D \n\n(k))  U7 \n\n\",N  J(k)( \n\nij  x; 0\"1 \n\n(k) \n\n\u00b7b\u00b7 \n\nf  h \n\n(Jij (x)) 17  =  L / d0\"1  . .. dO\"D  Pt:T(0\"1,\u00b7  .. , O\"D) Ji~k) (x; 0\"1, \u00b7 .. , O\"D)  . \n\nN \n\n(5) \n\nk= 1 \n\nIntroducing the  density  of tuning  curves,  1J(x),  into  (5)  and  assuming  a  constant  distri(cid:173)\nbution, 1J(x)  ==  1J  ==  const.,  one obtains the result that the  population Fisher information \nbecomes independentofx and that the off-diagonal elements of J  vanish [13]. The average \npopulation Fisher information then becomes \n\n(Jij)t:T  =  1J  K.p  F, r, D \n\nD \n\n( \n\n)  /  flt:l 0\"1) \n\\ \n\n0\"; \n\n~ \n\n17  Vij, \n\n(6) \n\nwhere K.p  depends on the geometry of the tuning curves and is defined in  [13]. \n\n3  Results \n\nIn  this  section,  we consider different distributions of tuning widths in  (6) and discuss ad(cid:173)\nvantageous and disadvantageous strategies for obtaining a high representational accuracy \nin the neural population. \n\nRadially symmetric tuning curves.  For radially  symmetric  tuning  curves  of width a, \nthe tuning-width distribution reads \n\nPt:T(O\"l,  .. . ,O\"D)  = II O(O\"i  -a); \n\nD \n\ni=l \n\nsee Fig.  1 a for a schematic  visualization  of the  arrangement of the  tuning  widths for the \ncase D  = 2.  The average population Fisher information (6) for i  = j  becomes \n\n(Jii)t:T  =  1JDK.p(F, r, D) aD - 2 , \n\n(7) \n\na result already obtained by Zhang and Sejnowski [13].  Equation (7) basically shows that \nthe  minimal estimation error increases with a for D  =  1, that it does not depend on a for \nD  =  2,  and that it decreases as a increases for D  2:  3.  We  shall discuss the relevance of \nthis case below. \n\nIdentical tuning curves without radial symmetry.  Next we discuss tuning curves which \nare identical but not radially symmetric; the tuning-width distribution for this case is \n\nPt:T(0\"1,  . .. ,O\"D)  = II O(O\"i  - ad, \n\nD \n\ni=l \n\nwhere ai denotes the fixed width in dimension i.  For i  =  j, the average population Fisher \ninformation (6) reduces to [11,4] \n\n) \n(Jii)t:T  =  1JDK.p  F, r, D \n\n( \n\n. \n\n(8) \n\nflD -\n\n1=1  0\"1 \n-2 \nO\"i \n\n\f118 \n\nc.  W.  Eurich, S.  D.  Wilke and H.  Schwegler \n\n(a) \n\n(b) \n\n/ \n\n(c) \n\nb \n\n_ \n\nb\n2\n\n. \n\n(d) \n\n. \n\n. \n\nFigure  1:  Visualization  of different  distributions  of \ntuning widths for D  =  2.  (a) Radially symmetric tun(cid:173)\ning curves. The dot indicates a fixed (j, while the diag(cid:173)\nonalline symbolizes a variation in (j discussed in [13]. \n(b) Identical tuning curves which are not radially sym(cid:173)\nmetric.  (c) Tuning widths uniformly distributed within \n(d)  Two  sUbpopulations  each  of \na  small  rectangle. \nwhich is narrowly tuned in one dimension and broadly \ntuned in the other direction. \n\nEquation (8) contains (7) as a special case. From (8) it becomes immediately clear that the \nexpected minimal square encoding error for the i-th stimulus feature, \u20ac~ min  =  1/ (Jii(X))u, \ndepends  on  i,  i. e.,  the  population specializes  in  certain features.  The  error obtained  in \ndimension i thereby depends on the tuning widths in all dimensions. \n\nWhich  encoding strategy  is  optimal  for  a population  whose  task  it  is  to  encode  a  single \nfeature,  say feature i, with high accuracy while not caring about the other dimensions?  In \norder to answer this question, we re-write (8) in terms of receptive field overlap. \n\nFor the tuning functions  f(k) (x) encountered empirically, large values ofthe single-neuron \nFisher information  (4)  are  typically restricted  to  a region around the center of the  tuning \nfunction,  c(k).  The fraction  p({3)  of the Fisher information that falls  into  a region  ED \nJ~(k)2 ~ (3  aroundc(k)  is given by \n\np({3)  := E; dD  2:~  J~~) (  ) \n\nt=l  u \n\nX \n\nX \n\nf  D \n\n(k) ( \nd  X  L....i=l  Jii  X \n\n\"\",D \n\n) \n\nX \n\nj3 f d~ ~D+l At/>(e, F, T) \no \n00 f d~ ~D+l At/>(~2, F, T) \no \n\n(9) \n\nwhere  the  index  (k)  was  dropped because  the  tuning curves  are  assumed  to  have  iden(cid:173)\ntical  shapes.  Equation  (9)  allows  the  definition  of an  effective  receptive  field,  RF~~, \ninside  of which  neuron  k  conveys  a  major fraction  Po  of Fisher  information,  RF~~ := \n{xl~ ~ {3o}  , where (3o  is  chosen such that p({3o)  =  Po.  The Fisher information a \nneuron k carries is small unless x  E RF~~. This has the consequence that a fixed stimulus \nx  is actually encoded only by a subpopulation of neurons. The point x in stimulus space is \ncovered by \n\nNcode:= 1]  Dr(D/2)  }1 (Jj \n\n27rD/ 2({30)D  D  _ \n\n(10) \n\nreceptive fields.  With the help of (10), the average population Fisher information (8) can \nbe re-written as \n\n(11) \n\nEquation  (11) can  be  interpreted as  follows:  We  assume  that the  population of neurons \nencodes stimulus dimension i  accurately, while all  other dimensions are of secondary im(cid:173)\nportance. The average population Fisher information for dimension i, (Jii ) u, is determined \nby the tuning width in dimension i, (ji, and by the size of the active subpopulation, Ncode ' \nThere is  a tradeoff between these quantities.  On the one hand, the encoding error can be \ndecreased by decreasing (ji, which enhances the Fisher information carried by each single \n\n\fNeural Representation of Multi-Dimensional Stimuli \n\n119 \n\nneuron.  Decreasing ai, on  the  other hand,  will  also  shrink the  active  subpopulation  via \n(10).  This impairs the encoding accuracy, because the stimulus position is evaluated from \nthe  activity  of fewer  neurons.  If (11)  is  valid due  to  a  sufficient receptive  field  overlap, \nNcode  can be increased by increasing the tuning widths, aj, in  all  other dimensions j  i- i. \nThis effect is illustrated in Fig. 2 for D  =  2. \n\nX2 \n\nc=:> \n\nx2, s \n\nX2 \n\nII\"\\.. \n, \n\n\\ U \n\nx2,s \n\nFigure 2:  Encoding strategy for a stimulus characterized by parameters Xl,s  and X2,s'  Fea(cid:173)\nture Xl  is  to  be encoded accurately.  Effective receptive field shapes are indicated for both \npopulations.  If neurons  are  narrowly  tuned  in  X2  (left),  the  active  population  (solid)  is \nsmall (here:  Ncode  =  3).  Broadly tuned receptive fields  for X2  (right) yield a much larger \npopulation (here:  Ncode  =  27) thus increasing the encoding accuracy. \n\nIt shall be noted that although a narrow tuning width ai is advantageous, the limit ai ---t 0 \nyields  a bad representation.  For narrowly  tuned cells,  gaps  appear between the receptive \nfields:  The  condition 17(X)  ==  const.  breaks  down,  and  (6)  is  no  longer  valid.  A  more \ndetailed calculation shows that the encoding error diverges as  ai  --* 0 [4].  The fact  that \nthe encoding error decreases for both narrow tuning and broad tuning - due to (11) - proves \nthe existence of an optimal tuning width,  An example is given in Fig. 3a. \n\n3  rTI~--~------~----~------~ \n\n(b) \n\n~~~~;::~-:.~~;: \n\n----- ---- ------- ---\n\n1\\ \nI i \n\n1\\ Ii I I \n\nI I \nI ; \n1\\ \n, \nI\n\n0.8 \n\n;to.6 \n~ \nN~O.4 \nw v \n\nA \n\n0.2 \n\n2 \n\nO'----~--~--~-----'-------' \n2 \n\n0.5 \n\n1.5 \n\no \n\n1 \nA \n\nFigure 3:  (a)  Example for the encoding behavior with narrow tuning curves arranged on \na regular lattice of dimension D  =  1 (grid spacing  ~). Tuning curves are  Gaussian,  and \nneural firing  is  modeled as  a Poisson process,  Dots indicate the minimal square encoding \nerror averaged over a uniform distribution of stimuli,  (E~in)' as  a function ofa. The mini(cid:173)\nmum is clearly visible.  The dotted line shows the corresponding approximation according \nto (8). The inset shows Gaussian tuning curves of optimal width, aopt  ~ 0.4~. (b) 9D()..) \nas a function of )..  for different values of D. \n\n\f120 \n\nc.  W.  Eurich, S.  D.  Wilke  and H.  Schwegler \n\nNarrow  distribution  of tuning curves. \nIn  order  to  study  the  effects  of encoding  the \nstimulus with distributed tuning widths instead of identical tuning widths as in the previous \ncases, we now consider the distribution \n\nPu(lT1,'\"  ,lTD)  = g :i e [lTi  - (O'i  - i)] e [(O'i  + i) -lTi]  , \n\nD \n\n(12) \n\nwhere e denotes the Heaviside  step function.  Equation  (12) describes  a  uniform distri(cid:173)\nbution in  a  D-dimensional cuboid of size b1, ... , b D  around  (0'1, .. . 0' D); cf.  Fig. 1 c.  A \nstraightforward calculation shows that in this case, the average population Fisher informa(cid:173)\ntion (6) for i  =  j  becomes \n\n(Jii)u  = f/DKtj)  F, T, D) \n\n( \n\nn~l 0'1  { \n\nO'~ \n\n1  (bi )  2  [( bi  )  4] } \n\n1 + 12  O'i  + 0 \n\nO'i \n\n. \n\n(13) \n\nA  comparison  with  (8)  yields  the  astonishing  result  that  an  increase  in  bi  results  in  an \nincrease in  the i-th diagonal element of the average population Fisher information matrix \nand thus in an improvement in the encoding of the i-th stimulus feature, while the encoding \n:f.  i  is  not  affected.  Correspondingly,  the  total  encoding error can  be \nin  dimensions  j \ndecreased by increasing an arbitrary number of edge lengths of the cube.  The encoding by \na population with a variability in the tuning curve geometries as described is more precise \nthan that by a uniform population.  This is true/or arbitrary D. Zhang and Sejnowski [13] \nconsider the more artificial situation of a correlated variability ofthe tuning widths:  tuning \ncurves  are  always  assumed to  be  radially  symmetric.  This  is  indicated by  the  diagonal \nline in Fig. 1 a.  A  distribution of tuning widths restricted to  this  subset yields an  average \npopulation Fisher information ex:  (O'D-2)  and does not improve the encoding for D  = 2 or \nD=3. \n\nFragmentation into D  subpopulations.  Finally,  we study  a family  of distributions  of \ntuning widths  which also yields a lower minimal encoding error than the uniform popula(cid:173)\ntion.  Let the density of tuning curves be given by \n\nPu(lT1,'\"  ,lTD)  =  D L 6(lTi - AO')  II 6(lTj  - 0'), \n\n1  D \n\ni=l \n\nj\u00a5-i \n\n(14) \n\nwhere A > O.  For A =  1,  the population is uniform as  in  (7).  For A :f.  1, the population \nis  split  up  into  D  subpopulations;  in  subpopulation  i,  lTi  is  modified  while  lTj  ==  0' for \nj  :f.  i.  See Fig. Id for an example. The diagonal elements ofthe average population Fisher \ninformation are \n\n(Jii)u  = f/DKtj)(F, T, D) IT \n\n-D-2 {1 + (D  - I)A2 } \n\nDA \n\n' \n\n(15) \n\nwhere the term in  brackets will be abbreviated as 9D(A).  (Jii)u  does not depend on i  in \nthis case because of the  symmetry in  the sUbpopulations.  Equation (15)  and the  uniform \ncase (7) differ by 9D(A) which will now be discussed.  Figure 3b shows 9D(A) for different \nvalues  of D.  For  A  =  1,  9D(A)  =  1 and  (7)  is  recovered  as  expected.  9D(A)  =  1 \nalso  holds  for  A = 1/ (D  - 1)  <  1:  narrowing one tuning  width  in each subpopulation \nwill  at first  decrease the  resolution provided D  2:  3;  this  is  due to  the fact  that  Ncode  is \ndecreased.  For A < 1/(D - 1), however, 9D(A)  > 1, and the resolution exceeds (Jii)u  in \n(7) because each neuron in the i-th subpopulation carries a high Fisher information in  the \ni-th dimension.  D  = 2 is a special case where no impairment of encoding occurs because \nthe effect of a decrease of Ncode  is  less  pronounced.  Interestingly, an  increase in  A also \nyields an improvement in the encoding accuracy.  This is a combined effect resulting from \nan  increase  in  Ncode  on  the  one hand and the  existence of D  subpopulations,  D  - 1 of \n\n\fNeural Representation of Multi-Dimensional Stimuli \n\n121 \n\nwhich  maintain their tuning  widths in  each dimension on  the other hand.  The discussion \nof 9D(>\")  leads  to  the  following encoding  strategy.  For small >..,  (Jii)u  increases rapidly, \nwhich  suggests  a  fragmentation  of the  population  into  D  subpopulations each  of which \nencodes one feature  with  high  accuracy,  i.e.,  one  tuning  width  in  each  subpopulation  is \nsmall whereas the remaining tuning widths are broad. Like in the case discussed above, the \ntheoretical limit of this method is a breakdown of the approximation of TJ  ==  const. and the \nvalidity of (6) due to insufficient receptive field overlap. \n\n4  Discussion and Outlook \n\nWe  have discussed the effects of a variation of the tuning widths on the encoding accuracy \nobtained by  a  population of stochastically  spiking neurons.  The  question  of an  optimal \ntuning  strategy  has  turned  out to  be  more complicated than  previously  assumed.  More \nspecifically,  the case which focused  most attention  in  the  literature - radially  symmetric \nreceptive fields [5,  1,9, 3,  13] - yields a worse encoding accuracy than most other cases we \nhave  studied:  uniform  populations with  tuning curves which  are  not radially  symmetric; \ndistributions of tuning curves around some symmetric or non-symmetric tuning curve; and \nthe fragmentation of the population into D  subpopulations each of which is  specialized in \none stimulus feature. \nIn a next step, the theoretical results will be compared to empirical data on encoding prop(cid:173)\nerties  of neural  popUlations.  One  aspect is  the  existence  of sensory  maps  which consist \nof neural  subpopulations with  characteristic tuning  properties for  the  features  which  are \nrepresented.  For example, receptive fields  of auditory neurons in  the midbrain of the barn \nowl  have elongated shapes [6].  A second aspect concerns the  short-term dynamics of re(cid:173)\nceptive  fields.  Using  single-unit recordings  in  anaesthetized  cats,  Worgotter et  al.  [12] \nobserved changes in  receptive field  size taking place in 50-lOOms.  Our findings  suggest \nthat these dynamics alter the resolution obtained for  the corresponding stimulus features. \nThe  observed effect may  therefore  realize  a mechanism of an  adaptable  selective  signal \nprocessing. \n\nReferences \n[1]  Baldi, P.  & HeiJigenberg, W.  (1988) BioI.  Cybern.  59:313-318. \n[2]  Deco, G.  &  Obradovic,  D.  (1997) An Information-Theoretic Approach to  Neural Computing. \n\nNew York:  Springer. \n\n[3]  Eurich, C. W.  & Schwegler, H.  (1997) BioI.  Cybern.  76:  357-363. \n[4]  Eurich, C. W. & Wilke, S. D. (2000) NeuraL Compo  (in press). \n[5]  Hinton, G. E., McClelland, J. L. &  Rumelhart, D. E (1986) In Rumelhart, D. E.  &  McClelland, \n\nJ. L. (eds.), ParaLLeL Distributed Processing,  Vol.  1, pp. 77-109. Cambridge MA:  MIT Press. \n\n[6]  Knudsen, E. I. &  Konishi, M.  (1978) Science 200:795-797. \n[7]  Kuffter, S. W.  (1953) 1.  Neurophysiol.  16:37-68. \n[8]  Lettvin, J.  Y.,  Maturana, H.  R., McCulloch, W.  S. & Pitts, W.  H. (1959)  Proc.  Inst. Radio Eng. \n\nNY 47:1940-1951. \n\n[9]  Snippe, H. P. & Koenderink, J. J.  (1992) BioI.  Cybern.  66:543-551. \n[10]  Wiggers, W., Roth, G., Eurich, C. W. & Straub, A. (1995) J.  Camp.  Physiol. A  176:365-377. \n[11]  Wilke, S.  D.  & Eurich, C.  W.  (1999) In Verleysen,  M.  (ed.), ESANN 99,  European Symposium \n\non Artificial Neural Networks, pp. 435-440. Brussels:  D-Facto. \n\n[12]  Worgotter,  F.,  Suder,  K.,  Zhao,  Y.,  Kerscher,  N.,  Eysel,  U.  T.  &  Funke,  K.  (1998)  Nature \n\n396:165-168. \n\n[13]  Zhang, K.  & Sejnowski, T.  J. (1999) NeuraL  Compo  11:75-84. \n\n\f", "award": [], "sourceid": 1760, "authors": [{"given_name": "Christian", "family_name": "Eurich", "institution": null}, {"given_name": "Stefan", "family_name": "Wilke", "institution": null}, {"given_name": "Helmut", "family_name": "Schwegler", "institution": null}]}