{"title": "Effects of Spatial and Temporal Contiguity on the Acquisition of Spatial Information", "book": "Advances in Neural Information Processing Systems", "page_first": 17, "page_last": 23, "abstract": null, "full_text": "Effects of Spatial and Temporal Contiguity on \n\nthe Acquisition of Spatial Information \n\nThea B. Ghiselli-Crippa and Paul W. Munro \n\nDepartment of Information Science and Telecommunications \n\nUniversity of Pittsburgh \nPittsburgh, PA  15260 \n\ntbgst@sis.pitt.edu, munro@sis.pitt.edu \n\nAbstract \n\nSpatial  information comes in two forms:  direct spatial information (for \nexample, retinal position) and indirect temporal contiguity information, \nsince objects encountered sequentially are in general spatially close. The \nacquisition  of spatial  information  by  a  neural  network  is  investigated \nhere. Given a spatial layout of several objects, networks are trained on a \nprediction task.  Networks using temporal sequences with no direct spa(cid:173)\ntial information are found to  develop internal representations that show \ndistances correlated with distances in the external layout.  The influence \nof spatial information is analyzed by providing direct spatial information \nto the system during training that is  either consistent with the layout or \ninconsistent with  it.  This  approach  allows  examination of the relative \ncontributions of spatial and temporal contiguity. \n\n1 \n\nIntroduction \n\nSpatial information is  acquired by a process of exploration that is  fundamentally tempo(cid:173)\nral,  whether it be on a small  scale, such as scanning a picture, or on a larger one, such as \nphysically navigating through a building, a neighborhood, or a city.  Continuous scanning \nof an environment causes locations that are spatially close to have a tendency to occur in \ntemporal  proximity  to  one another.  Thus,  a  temporal  associative mechanism (such  as  a \nHebb rule)  can be used in  conjunction with continuous exploration to  capture the spatial \nstructure of the environment [1].  However, the actual process of building a cognitive map \nneed not rely solely on temporal associations, since some spatial information is encoded in \nthe sensory array (position on the retina and proprioceptive feedback).  Laboratory studies \nshow different types of interaction between the relative contributions of temporal and spa(cid:173)\ntial contiguities to the formation of an internal representation of space.  While Clayton and \nHabibi's [2]  series of recognition priming experiments indicates that priming is controlled \nonly  by  temporal  associations,  in  the work  of McNamara et al.  [3]  priming in  recogni(cid:173)\ntion  is  observed only  when  space and  time are both contiguous.  In addition,  Curiel and \nRadvansky's [4]  work shows that the effects of spatial and temporal contiguity depend on \nwhether location or identity information is  emphasized during learning.  Moreover, other \nexperiments ([3]) also show  how the effects clearly depend on the task and can be quite \ndifferent if an explicitly spatial task is used (e.g., additive effects in location judgments). \n\n\f18 \n\nT.  B.  Ghiselli-Crippa and P  W.  Munro \n\nlabels \n\nlabels \n\nlabels \n\n(A coeff.) \n\ncoordinates \n(B  coeff.) \n\nlabels \n\nlabels \n\ncoordinates \n\nlabels \n\nFigure  1:  Network  architectures:  temporal-only network (left);  spatio-temporal network \nwith  spatial  units  part of the  input representation (center);  spatio-temporal network  with \nspatial units part of the output representation (right). \n\n2  Network architectures \n\nThe goal of the work presented in this  paper is  to  study the structure of the internal rep(cid:173)\nresentations  that  emerge from  the  integration  of temporal  and  spatial  associations.  An \nencoder-like network architecture is  used (see Figure 1),  with a set of N  input units and a \nset of N  output units representing N  nodes on a 2-dimensional graph.  A set of H  units is \nused for the hidden layer. To include space in the learning process, additional spatial units \nare included in the network architecture. These units provide a representation of the spatial \ninformation directly available during the learning/scanning process.  In the simulations de(cid:173)\nscribed in this paper, two units are used and are chosen to represent the (x, y) coordinates of \nthe nodes in the graph. The spatial units can be included as part of the input representation \nor as part of the output representation (see Figure 1, center and right panels):  both choices \nare used in the experiments, to investigate whether the spatial information could better ben(cid:173)\nefit training as  an  input or as an output [5].  In the second case, the relative contribution of \nthe spatial information can be directly manipulated by introducing weighting factors in the \ncost function being minimized.  A two-term cost function is used, with a cross-entropy term \nfor the N  label units and a squared error term for the 2 coordinate units, \n\nri indicates the actual output of unit i  and ti its desired output.  The relative influence of \nthe spatial information is controlled by the coefficients A and B. \n\n3  Learning tasks \n\nThe  left  panel  of Figure  2  shows  an  example  of the  type  of layout  used;  the  effective \nlayout used  in  the study consists  of N  =  28  nodes.  For each  node,  a set of neighboring \nnodes is defined, chosen on the basis of how an observer might scan the layout to learn the \nnode labels and their (spatial) relationships; in Figure 2, the neighborhood relationships are \nrepresented by lines connecting neighboring nodes. From any node in the layout, the only \nallowed transitions are those to a neighbor, thus defining the set of node pairs used to train \nthe network (66  pairs out of C(28, 2)  =  378 possible pairs).  In addition, the probability \nof occurrence of a  particular  transition  is  computed as  a  function  of the distance  to  the \ncorresponding neighbor.  It is then possible to generate a sequence of visits to the network \nnodes, aimed at replicating the scanning process of a human observer studying the layout. \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\nknife \n\ncoin \n\n19 \n\ncup \n\neraser \n\neraser \n\nbutton \n\nFigure 2:  Example of a  layout  (left)  and  its  permuted  version  (right).  Links  represent \nallowed transitions. A larger layout of 28 units was used in the simulations. \n\nThe basic learning task is  similar to  the grammar learning task of Servan-Schreiber et al. \n[6]  and to the neighborhood mapping task described in [1]  and is used to associate each of \nthe N  nodes on the graph and its  (x, y)  coordinates with the probability distribution of the \ntransitions to  its neighboring nodes.  The mapping can be learned directly, by associating \neach  node  with  the probability distribution of the transitions  to  all  its  neighbors:  in  this \ncase,  batch  learning  is  used  as  the  method  of choice for  learning  the  mapping.  On  the \nother  hand,  the  mapping  can  be  learned  indirectly,  by  associating  each  node  with  itself \nand  one of its  neighbors,  with  online learning being  the  method  of choice  in  this  case; \nthe  neighbor chosen  at  each  iteration  is  defined  by  the  sequence of visits  generated  on \nthe basis  of the transition probabilities.  Batch learning  was  chosen because it  generally \nconverges more  smoothly  and  more quickly  than  online  learning  and  gives  qualitatively \nsimilar  results.  While  the  task  and  network architecture  described  in  [1]  allowed  only \nfor temporal association learning, in  this study both temporal and spatial associations are \nlearned simultaneously, thanks to the presence of the spatial units. However, the temporal(cid:173)\nonly  (T-only)  case,  which  has  no  spatial  units,  is  included  in  the  simulations performed \nfor  this  study,  to provide a benchmark for  the evaluation of the results obtained with  the \nspatio-temporal (S-T) networks. \n\nThe task described above allows the network to learn neighborhood relationships for which \nspatial and temporal associations provide consistent information, that is, nodes experienced \ncontiguously in time (as defined by the sequence) are also contiguous in space (being spa(cid:173)\ntial neighbors).  To  tease apart the relative contributions of space and time, the task is kept \nthe same, but the data employed for  training the network is  modified:  the same layout is \nused to generate the temporal sequence, but the x , y coordinates of the nodes are randomly \npermuted (see right panel of Figure 2). If the permuted layout is then scanned following the \nsame sequence of node visits used in the original version, the net effect is that the temporal \nassociations remain the same, but the spatial associations change so that temporally neigh(cid:173)\nboring nodes can now  be spatially close or distant:  the spatial associations  are no  longer \nconsistent with the temporal associations.  As Figure 4 illustrates, the training pairs (filled \ncircles)  all  correspond to  short distances  in  the  original  layout,  but  can have  a  distance \nanywhere in the allowable range in  the permuted layout.  Since the temporal  and spatial \ndistances  were consistent in  the original layout,  the original spatial distance can be used \nas  an indicator of temporal distance and Figure 4 can be interpreted as  a plot of temporal \ndistance vs.  spatial distance for the permuted layout. \n\nThe simulations described in the following include three experimental conditions: temporal \nonly (no direct spatial information available); space and time consistent (the spatial coor(cid:173)\ndinates and the temporal sequence are from the same layout); space and time inconsistent \n(the spatial coordinates and the temporal sequence are from different layouts). \n\n\f20 \n\nT.  B.  Ghise/li-Crippa and P.  W.  Munro \n\nHidden unit representations are compared using Euclidean distance (cosine and inner prod(cid:173)\nuct measures give consistent results); the internal representation distances are also used to \ncompute their correlation with Euclidean distances between nodes  in  the layout (original \nand permuted).  The correlations increase with  the  number of hidden  units  for  values of \nH  between 5  and  10 and  then gradually taper off for  values greater than 10.  The results \npresented in the remainder of the paper all  pertain to networks trained with  H  =  20 and \nwith hidden units using a tanh transfer function; all the results pertaining to S-T networks \nrefer to networks with 2 spatial output units and cost function coefficients A  =  0.625 and \nB  =  6.25. \n\n4  Results \n\nFigure 3 provides a combined view of the results from all three experiments. The left panel \nillustrates  the  evolution  of the correlation  between  internal  representation  distances  and \nlayout (original  and permuted) distances.  The right panel  shows  the distributions of the \ncorrelations at the end of training (1000 epochs). The first general result is that, when spa(cid:173)\ntial information is available and consistent with the temporal information (original layout), \nthe  correlation  between hidden  unit distances  and  layout distances  is  consistently  better \nthan the correlation obtained in the case of temporal associations alone.  The second gen(cid:173)\neral result is that, when spatial information is available but not consistent with the temporal \ninformation (permuted layout), the correlation between hidden unit distances and original \nlayout distances (which represent temporal distances) is similar to that obtained in the case \nof temporal associations alone, except for the initial transient.  When the correlation is com(cid:173)\nputed with respect to  the permuted layout distances,  its  value peaks early during training \nand then decreases rapidly, to reach an asymptotic value well below the other three cases. \nThis behavior is  illustrated in the box plots in the right panel of Figure 3, which report the \ndistribution of correlation values at the end of training. \n\n4.1  Temporal-only vs. spatio-temporal \n\nAs  a first  step in this study, the effects of adding spatial information to the basic temporal \nassociations used to train the network can be examined. Since the learning task is the same \nfor  both  the T-only  and  the  S-T  networks  except for  the  absence or presence of spatial \ninformation during  training,  the differences  observed can be attributed to  the  additional \nspatial information available to the S-T networks.  The higher correlation between internal \nrepresentation distances and original layout distances obtained when spatial information is \n\n0 \n\n-\n., \n\n0 \n\n.. \n... \u2022 8 \" \n\n8 0 \nii \n\n0 \n\n'\" \n\nci \n\n0 \n0 \n\nS and T CO\"Isistent \n\nT-o\" \n\nSand T InCOnsistent \n(corr  with T distance) \n\nS and T Ir'ICOOSlStent \n(corr. Wflh S distance) \n\n~ \n\n., \n\n0 \n\n.. \n\n0 \n\n\" \n\n0 \n\nN \n0 \n\n0 \n0 \n\ni:i \n\n-==-\n~  ~ \n-\n\n=s: \n\n........... \nE:2 \n--'----' \n\n200 \n\n400 \n600 \nOllnber 01  epochs \n\n800 \n\n1000 \n\nSandT \ncon_atent \n\nT-only \n\nSandT \n\nInconsistent \n\n(corr  \" th T ast ) (corr  wth 5 dst ) \n\nSandT \n\nineon.stant \n\nFigure 3:  Evolution of correlation during training (0 - 1000 epochs) (left). Distributions of \ncorrelations at the end of training (1000 epochs) (right). \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\n21 \n\nN -\n0 -\n., \n\n0 \n\n\", \n'\" E  0 \n~ \n\n... \n\n0 \n\nN \n0 \n\n0 \n0 \n\ndHU  =  0.6  + 3.4d T  +  0.3ds  - 2.1(dT)2  +  0.4(d S )2  - 0.4d T ds \n\n2 5 \n\n15 \n\n05 \n\n15 \n\n00 \n\n02 \n\n04 \n\n08 \n\n1 0 \n\n12 \n\n14 \n\n\" \n\nFigure 4:  Distances in the original layout \n(x)  vs_  distances  in  the  permuted  layout \n(y)_  The 66 training pairs are identified by \nfilled circles_ \n\nFigure  5:  Similarities  (Euclidean  distances) \nbetween  internal  representations  developed \nby a S-T network (after 300 epochs)_  Figure \n4 projects the data points onto the x, y plane_ \n\navailable (see Figure 3) is apparent also when the evolution of the internal representations \nis examined_ As  Figure 6  illustrates,  the  presence of spatial  information results  in  better \ngeneralization for  the pattern pairs outside the  training set  While the distances between \ntraining pairs are mapped to similar distances in hidden unit space for both the T-only and \nthe S-T networks, the T-only network tends to cluster the non-training pairs into a narrow \nband of distances in hidden unit space.  In the case of the S-T network instead, the hidden \nunit  distances  between  non-training pairs  are  spread out over a  wider  range  and tend  to \nreflect the original layout distances. \n\n4.2  Permuted layout \n\nAs  described  above,  with  the  permuted  layout it  is  possible  to  decouple the  spatial  and \ntemporal contributions and therefore study the effects of each.  A comprehensive view of \nthe results at a particular point during training (300 epochs) is presented in Figure 5, where \nthe x, y plane represents temporal distance vs.  spatial distance (see also Figure 4) and the z \naxis represents the similarity between hidden unit representations.  The figure also includes \na quadratic regression surface fitted  to the data points.  The coefficients in the equation of \nthe surface provide a quantitative measure of the relative contributions of spatial (ds) and \ntemporal distances (dT )  to the similarity between hidden unit representations (dHU ): \n\n(2) \n\nIn general, after the transient observed in early training (see Figure 3), the largest and most \nsignificant coefficients  are found for  dT  and  (dT?,  indicating  a  stronger dependence of \ndHU on temporal distance than on spatial distance. \n\nThe results illustrated in Figure 5 represent the situation at a particular point during training \n(300 epochs).  Similar plots can be generated for different points during training, to  study \nthe evolution of the  internal representations.  A different view of the evolution process is \nprovided by Figure 7, in  which the data points are projected onto the x,Z plane (top panel) \nand the y,z  plane (bottom panel)  at  four different times  during  training.  In the top panel, \n\n\f22 \n\nN  ,.. \n\n~ \n\n0 \n\n_ \n\n\u2022 \n\n~, \n\n~ ~ \n~  ~ \n~  -... -\n\n00  02  \"  06  O.  \"  12 \n\n\"_d \n\n.. \n\n, \n\n~ \n\n:;  ~  ~' ;;  ~ \n~,  -\n~ \n~ \n: \ni \n~  ~ \n~  .~ \n~ \n\n~ \n\n~ \n\n::: \n~ \n\n00 \n\n' \n\n::: \n\n, \n\n0 \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n~ \n\n~ \n\n, \n\n::: \n~ \n\n, \n, \n\n. \n00  02  ..  06  ..  \"  12 \n\nT.  B.  Ghiselli-Crippa and P  W  Munro \n\n~ ,.. ~ \n~  roo  ~  ~ ~ \n\n~  ~. ~  .~. \n00  02  ..  .. ..  \"  12 \n00  02  ..  06  ..  \"  \" \n:::  ~ \n\n..  . \n\nf/Po \n\n,.~,o  0 \n\n.' \n\n~ : \n~  ~ \n\n~ \n\n~ \n~ \n~  ~ \n\n, \n, \n.I' \n\n. \n\n~ \n~ \n\n.:. \n\" \n\n',' \n\n:  s \n\ne , \n\n',~-, \n\n',' \n\n, \n\n<P \n\n, \n\n0 \n\n, \n\ntP \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\n~ \n\nrIP 0 \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n_ \n\n\u2022 \n\n0 \n\nN \n\n00 \n\n\u2022 \n\no \n\n::: \n\n\" _d \n\n::: \n\n\"_d \n\n\"_d \n\nDO \n\n~, \n\nDO \n\n00  02  O . \n\nos  ..  10  12 \n\n\"-' \n\n~ \ng \n\n00  02  \"  06  ..  \"  12 \n\n\"-' \n\n~ \n\n~ \n\n00  02  \"  ..  ..  10  12 \n\n\"-' \n\n00  02  \"  O.  o.  \"  12 \n\n\"-' \n\nFigure 6:  Internal representation distances vs.  original layout distances:  S-T network (top) \nvs.  T-only network (bottom). The training pairs are identified by filled circles. The presence \nof spatial information results in better generalization for the pairs outside the training set. \n\nthe internal representation distances are plotted as  a function of temporal distance (i.e., the \nspatial  distance from  the original layout), while in the  bottom panel  they are plotted as  a \nfunction of spatial distance (from the permuted layout). The higher asymptotic correlation \nbetween  internal  representation  distances  and  temporal  distances,  as  opposed  to  spatial \ndistances (see Figure 3),  is  apparent also from  the  examination of the evolutionary plots, \nwhich  show an asymptotic behavior with  respect to  temporal distances (see Figure 7,  top \npanel) very similar to the T-only case (see Figure 6, bottom panel). \n\n5  Discussion \n\nThe first general conclusion that can be drawn from the examination of the results described \nin the previous section is that, when the spatial information is available and consistent with \nthe temporal information (original layout), the similarity structure of the hidden unit rep(cid:173)\nresentations  is  closer  to  the  structure  of the  original  layout  than  that  obtained  by  using \ntemporal  associations  alone.  The second  general  conclusion  is  that,  when the  spatial  in(cid:173)\nformation is  available but not consistent with the temporal information (permuted layout), \nthe similarity structure of the hidden unit representations seems to correspond to temporal \nmore than spatial proximity.  Figures 5 and 7 both indicate that temporal associations take \nprecedence over spatial associations.  This result is in agreement with the results described \nin [1],  showing how temporal associations (plus some high-level constraints) significantly \ncontribute to the internal representation of global spatial information. However, spatial in(cid:173)\nformation certainly is very beneficial to the (temporal) acquisition of a layout, as proven by \nthe results obtained with the S-T network vs.  the T-only network. \n\nIn terms of the model presented in this paper, the results illustrated in Figures 5 and 7 can \nbe  compared  with  the  experimental data reported for  recognition priming  ([2],  [3],  [4]), \nwith  distance between internal representations corresponding to reaction time. The results \nof our model indicate that distances in both the spatially far  and  spatially close condition \nappear to be consistently shorter for the training pairs (temporally close) than for the non(cid:173)\ntraining pairs (temporally distant), highlighting a strong temporal effect consistent with the \ndata reported in  [2]  and  [4]  (for spatially far  pairs) and  in  [3]  (only for the spatially close \n\n\fSpatiotemporal Contiguity Effects on Spatial Information Acquisition \n\n23 \n\n;  ~-' ~. ~~. . \n~  0_  Sl \n.. \n~ \n........... -\n\nri \n\n0 \n\n0 \n\nj!I!A\" ...... . \n\\ .. \n\nlfIiiIo \n'0' \n\n,. \n\n,. \n\n0 \n\n\u2022 \n\n110 \n\n0 \n\n~ \n\n~  ~ \n\n~'--_____  -.J \n0 1  01  10  12 \n\n0 2  O. \n\n00 \n\n00  02  O.  01  01  10 \n\n1 2 \n\n00  02 \n\n0.4  01  01  10  12 \n\n0 0  02  O.  01  oa  10  12 \n\nIn_d (T} \n\nIn_d(TI \n\nl'I_d (T) \n\nIn_den \n\n~ L..-____  - . l  \n\n00  02  0\"  01  01 \n\n'0  12 \n\n00 \n\n0.2  o.  ot  01  10  12 \n\n0.0  02  04 \n\n0 8 \n\nall \n\n1 0 \n\n1 2 \n\n~l.-__ ___  -.J \n00  02  O.  06  oa  10  12 \n\nIn_d (S) \n\n.. _d(S) \n\n... u:I (S) \n\n!rUi (S) \n\nFigure 7:  Internal  representation distances  vs.  temporal  distances  (top)  and  vs.  spatial \ndistances (bottom) for  a S-T network (permuted layout).  The training pairs are identified \nby filled circles.  The asymptotic behavior with respect to  temporal distances (top panel) is \nsimilar to the T-only condition.  The bottom panel indicates a weak dependence on spatial \ndistances. \n\ncase).  For the training pairs (temporally close),  slightly shorter distances are obtained for \nspatially  close  pairs  vs.  spatially  far  pairs;  this  result  does  not  provide support for  the \nexperimental  data  reported  in  either  [3]  (strong  spatial  effect)  or  [2]  (no  spatial  effect). \nFor the non-training pairs (temporally distant),  long distances are found throughout, with \nno  strong  dependence on spatial  distance;  this  effect  is  consistent  with  all  the  reported \nexperimental data.  Further  simulations  and  statistical  analyses  are  necessary for  a  more \nconclusive comparison with the experimental data. \n\nReferences \n\n[1]  Ghiselli-Crippa, TB. &  Munro,  P.w.  (1994).  Emergence of global  structure  from local associa(cid:173)\ntions.  In J.D.  Cowan, G. Tesauro, & J.  Alspector (Eds.), Advances in Neural Information Processing \nSystems 6, pp.  1101-1108.  San Francisco, CA:  Morgan Kaufmann. \n\n[2]  Clayton, K.N. & Habibi, A. (1991). The contribution of temporal contiguity to the spatial priming \neffect.  Journal of Experimental Psychology:  Learning.  Memory.  and Cognition 17:263-271. \n\n[3]  McNamara,  TP.,  Halpin.  J.A. &  Hardy,  J.K.  (1992).  Spatial  and temporal  contributions  to  the \nstructure of spatial memory. Journal of Experimental Psychology:  Learning. Memory.  and Cognition \n18:555-564. \n\n[4]  Curiel,  J.M. & Radvansky,  G.A.  (1998).  Mental organization of maps.  Journal  of Experimental \nPsychology:  Learning.  Memory.  and Cognition 24:202-214. \n\n[5]  Caruana,  R.  & de  Sa,  VR.  (1997).  Promoting poor  features  to  supervisors:  Some inputs  work \nbetter as outputs . In M.e. Mozer,  M.I. Jordan,  &  T  Petsche (Eds.), Advances in Neural Information \nProcessing Systems 9, pp.  389-395.  Cambridge, MA:  MIT Press. \n\n[6]  Servan-Schreiber,  D., Cleeremans,  A.  & McClelland,  J.L.  (1989).  Learning sequential structure \nin simple recurrent networks.  In D.S. Touretzky  (Ed.), Advances in Neural  Information  Processing \nSystems 1, pp.  643-652. San Mateo, CA: Morgan Kaufmann. \n\n\f", "award": [], "sourceid": 1759, "authors": [{"given_name": "Thea", "family_name": "Ghiselli-Crippa", "institution": null}, {"given_name": "Paul", "family_name": "Munro", "institution": null}]}