{"title": "On Parallel versus Serial Processing: A Computational Study of Visual Search", "book": "Advances in Neural Information Processing Systems", "page_first": 10, "page_last": 16, "abstract": null, "full_text": "On Parallel Versus  Serial  Processing: \n\nA  Computational Study of Visual  Search \n\nEyal  Cohen \n\nDepartment of Psychology \n\nTel-Aviv University  Tel  Aviv 69978,  Israel \n\neyalc@devil. tau .ac .il \n\nEytan Ruppin \n\nDepartments of Computer Science  &  Physiology \n\nTel-Aviv University Tel  Aviv  69978,  Israel \n\nruppin@math.tau .ac.il \n\nAbstract \n\nA novel neural network model of pre-attention processing in visual(cid:173)\nsearch  tasks  is  presented.  Using displays of line orientations taken \nfrom  Wolfe's experiments  [1992], we  study the  hypothesis  that the \ndistinction  between  parallel  versus  serial  processes  arises  from  the \navailability of global information in  the internal representations  of \nthe  visual  scene.  The  model  operates  in  two  phases.  First,  the \nvisual  displays  are  compressed  via  principal-component-analysis. \nSecond,  the compressed data is processed by a target detector mod(cid:173)\nule in order to identify the existence of a  target in the display.  Our \nmain  finding  is  that  targets  in  displays  which  were  found  exper(cid:173)\nimentally  to  be  processed  in  parallel can  be  detected  by  the  sys(cid:173)\ntem,  while  targets  in  experimentally-serial  displays  cannot .  This \nfundamental  difference  is  explained  via  variance  analysis  of  the \ncompressed representations,  providing a  numerical criterion distin(cid:173)\nguishing parallel from serial displays.  Our model yields a  mapping \nof response-time slopes that is similar to Duncan and Humphreys's \n\"search  surface\"  [1989],  providing  an  explicit  formulation  of their \nintuitive  notion  of feature  similarity.  It presents  a  neural  realiza(cid:173)\ntion of the  processing that may underlie the classical metaphorical \nexplanations of visual search. \n\n\fOn Parallel versus Serial Processing: A  Computational Study a/Visual Search \n\n11 \n\n1 \n\nIntroduction \n\nThis  paper  presents  a  neural-model of pre-attentive  visual  processing.  The  model \nexplains why  certain displays can be processed  very fast,  \"in parallel\" , while others \nrequire slower,  \"serial\"  processing, in subsequent attentional systems.  Our approach \nstems from  the observation that the  visual environment is  overflowing with diverse \ninformation,  but  the  biological  information-processing  systems  analyzing  it  have \na  limited  capacity  [1].  This  apparent  mismatch  suggests  that  data  compression \nshould  be  performed  at  an  early  stage  of perception,  and  that  via  an  accompa(cid:173)\nnying  process  of  dimension  reduction,  only  a  few  essential  features  of the  visual \ndisplay  should  be  retained.  We  propose  that  only  parallel  displays  incorporate \nglobal features  that enable fast  target  detection,  and hence  they  can  be  processed \npre-attentively,  with  all  items  (target  and  dis tractors)  examined  at  once.  On  the \nother  hand,  in  serial  displays'  representations,  global  information  is  obscure  and \ntarget  detection  requires  a  serial,  attentional scan of local features  across  the  dis(cid:173)\nplay.  Using  principal-component-analysis (peA), our  main goal  is  to demonstrate \nthat  neural  systems  employing compressed,  dimensionally reduced  representations \nof the visual information can successfully  process  only parallel displays and not se(cid:173)\nrial ones.  The sourCe  of this difference  will be explained via variance analysis of the \ndisplays'  projections on the  principal axes. \n\nThe  modeling  of  visual  attention  in  cognitive  psychology  involves  the  use  of \nmetaphors,  e.g.,  Posner's  beam  of attention  [2].  A  visual  attention  system  of a \nsurviving  organism  must  supply  fast  answers  to  burning  issues  such  as  detecting \na  target  in  the  visual field  and characterizing  its primary features.  An  attentional \nsystem employing a  constant-speed  beam of attention [3]  probably cannot  perform \nsuch  tasks  fast  enough  and  a  pre-attentive  system  is  required.  Treisman's feature \nintegration  theory  (FIT)  describes  such  a  system  [4].  According  to  FIT, features \nof separate dimensions  (shape,  color, orientation)  are  first  coded  pre-attentively in \na locations map and  in separate feature  maps, each  map representing  the values of \na  particular dimension.  Then,  in  the  second  stage,  attention  \"glues\"  the  features \ntogether  conjoining them into objects  at their specified  locations.  This  hypothesis \nwas  supported  using  the  visual-search  paradigm  [4],  in  which  subjects  are  asked \nto detect  a  target  within an array of distractors,  which  differ  on given  physical di(cid:173)\nmensions such  as  color, shape or orientation.  As  long  as  the  target  is  significantly \ndifferent from  the distractors in one dimension, the reaction  time (RT)  is  short and \nshows  almost  no  dependence  on  the  number  of distractors  (low  RT slope).  This \nresult  suggests  that  in  this  case  the  target  is  detected  pre-attentively,  in  parallel. \nHowever,  if the  target  and distractors  are  similar,  or  the  target  specifications  are \nmore  complex,  reaction  time  grows  considerably  as  a  function  of the  number  of \ndistractors  [5,  6],  suggesting  that  the  displays'  items  are  scanned  serially  using  an \nattentional process. \nFIT and other related cognitive models of visual search are formulated on  the con(cid:173)\nceptual  level  and  do  not  offer  a  detailed  description  of the  processes  involved  in \ntransforming the  visual scene  from  an  ordered  set of data points  into given  values \nin specified  feature  maps.  This  paper  presents  a  novel  computational explanation \nof the source  of the  distinction  between  parallel and serial  processing,  progressing \nfrom  general  metaphorical terms  to a  neural  network  realization.  Interestingly,  we \nalso  come  out  with  a  computational interpretation  of some  of these  metaphorical \nterms, such  as feature similarity. \n\n\f12 \n\n2  The Model \n\nE.  Cohen and E.  Ruppin \n\nWe  focus  our study on  visual-search  experiments of line orientations  performed by \nWolfe et.  al.  [7],  using three set-sizes composed of 4,  8 and 12 items.  The number of \nitems equals the number of dis tractors + target in target displays, and in non-target \ndisplays the target was  replaced  by another distractor,  keeping a constant set-size. \nFive experimental conditions were simulated:  (A)  - a 20 degrees  tilted target among \nvertical  distractors  (homogeneous  background).  (B)  - a  vertical  target  among 20 \ndegrees  tilted distractors  (homogeneous background).  (C)  - a  vertical target among \nheterogeneous  background  ( a  mixture of lines  with  \u00b120,  \u00b140  , \u00b160  , \u00b180 degrees \norientations).  (E)  - a  vertical  target among two flanking distractor orientations  (at \n\u00b120 degrees),  and (G)  - a vertical target among two flanking distractor orientations \n(\u00b140 degrees).  The response  times  (RT)  as  a  function  of the set-size  measured  by \n[7]  show  that  type  A,  Band  G  displays  are  scanned  in  a  parallel \nWolfe  et.  al. \nmanner (1.2,  1.8,4.8 msec/item for  the RT slopes),  while type C and E displays are \nscanned serially (19.7,17.5 msec/item).  The input displays of our system were  pre(cid:173)\npared following Wolfe's prescription:  Nine images of the basic line orientations were \nproduced  as  nine matrices of gray-level  values.  Displays for  the  various conditions \nof Wolfe's  experiments  were  produced  by  randomly  assigning  these  matrices  into \na  4x4  array,  yielding  128x100  display-matrices  that  were  transformed  into  12800 \ndisplay-vectors.  A  total  number of 2400  displays  were  produced  in  30  groups  (80 \ndisplays  in  each  group):  5  conditions  (A,  B,  C,  E,  G  )  x  target/non-target  x  3 \nset-sizes  (4,8,  12). \n\nOur  model  is  composed  of two  neural  network  modules  connected  in  sequence  as \nillustrated in  Figure 1:  a  peA module which  compresses  the  visual data into a  set \nof principal axes,  and a  Target  Detector  (TD)  module.  The latter module uses  the \ncompressed data obtained by the former  module to detect  a  target  within an array \nof distractors.  The system is  presented  with  line-orientation displays  as  described \nabove. \n\nNO\u00b7TARGET  =\u00b71  TARGET-I \n\nTn  [JUTPUT  LAYER  (I  UN IT ) - - - - - - - ,  \n\nTARGET \nDETECTOR \nMODULE \n(11)) \n\nTn  INrnRMEDIATE  LAYER  (12  UNITS) \n\nPeA  O~=~ LAYER J DATA \n\nCOMPRESSION \n\n--..;;:::~~~ \n\nINPUT  LAYER  (12Il00  UNITS) \n\nMODULE \n(PeA) \n\n-\n\n_ \n\n/ \n\nDISPLAY \n\nt \n\nt \n\n/--- --\n\nFigure  1:  General architecture  of the model \n\nFor  the  PCA  module  we  use  the  neural  network  proposed  by  Sanger,  with  the \nconnections' values updated in accordance with his  Generalized Hebbian Algorithm \n(GHA)  [8].  The  outputs  of the  trained  system  are  the  projections  of the  display(cid:173)\nvectors  along  the first  few  principal  axes,  ordered  with  respect  to  their  eigenvalue \nmagnitudes.  Compressing  the  data is  achieved  by  choosing outputs from  the first \n\n\fOn Parallel versus Serial Processing: A  Computational Study o/Visual Search \n\n13 \n\nfew  neurons  (maximal variance and minimal information loss).  Target detection in \nour  system  is  performed  by  a  feed-forward  (FF)  3-layered  network,  trained  via  a \nstandard  back-propagation algorithm  in  a  supervised-learning  manner.  The  input \nlayer  of the  FF  network  is  composed of the first  eight  output  neurons  of the  peA \nmodule.  The  transfer  function  used  in  the  intermediate  and  output  layers  is  the \nhyperbolic  tangent function. \n\n3  Results \n\n3.1  Target Detection \n\nThe  performance  of the  system  was  examined  in  two  simulation experiments.  In \nthe first,  the peA module was  trained only with  \"parallel\" task displays, and in the \nsecond, only with \"serial\"  task displays.  There is an inherent difference in the ability \nof the  model  to  detect  targets  in  parallel  versus  serial  displays .  In  parallel  task \nconditions (A, B, G) the target detector module learns the task after a comparatively \nsmall number (800  to 2000)  of epochs,  reaching performance level  of almost 100%. \nHowever,  the  target  detector  module is  not  capable of learning  to  detect  a  target \nin  serial  displays  (e,  E  conditions) .  Interestingly,  these  results  hold  (1)  whether \nthe preceding peA module was trained to perform data compression using parallel \ntask  displays  or  serial  ones,  (2)  whether  the  target  detector  was  a  linear  simple \nperceptron,  or the more powerful,  non-linear network depicted  in  Figure  1,  and  (3) \nwhether  the full set of 144  principal axes  (with non-zero  eigenvalues)  was  used. \n\n3.2 \n\nInformation Span \n\nTo analyze the differences  between  parallel and serial  tasks  we  examined the  eigen(cid:173)\nvalues  obtained  from  the  peA  of  the  training-set  displays.  The  eigenvalues  of \ncondition B  (parallel) displays in 4 and 12 set-sizes  and of condition e (serial-task) \ndisplays  are  presented  in  Figure  2.  Each  training set  contains  a  mixture of target \nand  non-target displays. \n\n(a) \n\n40 \n\n35 \n\nPARALLEL \n\nl!J \nII> \n\"'I;l \n\n+4 ITEMS \n\no  12 ITEMS \n\n30  ~ \n\n25 \n\n~ \n~ \n\nw \n~20 \n~ 15 \nw \n\n10 \n\nSERIAL \n\n+4 ITEMS \n\no  12 ITEMS \n\n(b) \n\n40 \n\n35 \n\n30 \n\n25 \n\nw \n~20 \n~ 15 \nw \n\n10 \n\n5  ~  5 \n\n0 \n\n0 \n\n-\n\n-5 \n\n0 \n\n10 \n\n40 \nNo.  of PRINCIPAL AXIS \n\n20 \n\n30 \n\n-5 \n\n0 \n\n10 \n\n40 \nNo.  of PRINCIPAL AXIS \n\n20 \n\n30 \n\nFigure 2:  Eigenvalues spectrum  of displays  with different  set-sizes,  for  parallel and \nserial  tasks.  Due  to  the  sparseness  of  the  displays  (a  few  black  lines  on  white \nbackground),  it takes only  31  principal axes  to describe  the parallel training-set  in \nfull  (see fig  2a.  Note that the remaining axes  have  zero  eigenvalues,  indicating that \nthey  contain  no  additional information.), and  144  axes  for  the serial  set  (only  the \nfirst  50  axes  are shown  in fig  2b). \n\n\f14 \n\nE.  Cohen and E.  Ruppin \n\nAs  evident,  the eigenvalues distributions of the two display types are fundamentally \ndifferent:  in the parallel task, most of the eigenvalues  \"mass\"  is  concentrated in the \nfirst  few  (15)  principal  axes,  testifying  that  indeed,  the  dimension  of the  parallel \ndisplays  space  is  quite  confined.  But  for  the  serial  task,  the  eigenvalues  are  dis(cid:173)\ntributed almost uniformly over  144 axes.  This inherent difference  is  independent  of \nset-size:  4 and  12-item displays  have  practically the same eigenvalue spectra. \n\n3.3  Variance Analysis \n\nThe target detector inputs are the projections of the display-vectors along the first \nfew  principal  axes.  Thus,  some  insight  to  the  source  of  the  difference  between \nparallel  and  serial  tasks  can  be  gained  performing  a  variance  analysis  on  these \nprojections.  The  five  different  task  conditions  were  analyzed  separately,  taking  a \ngroup of 85  target displays  and  a  group of 85  non-target  displays for  each set-size. \nTwo types of variances were  calculated for  the projections on the 5th principal axis: \nThe  \"within  groups\"  variance,  which  is  a  measure  of the  statistical  noise  within \neach  group of 85  displays,  and the  \"between  groups\"  variance,  which  measures  the \nseparation between target and non-target groups of displays for each set-size.  These \nvariances  were  averaged for  each  task  (condition),  over  all set-sizes.  The  resulting \nratios Q of within-groups to between-groups standard deviations are:  QA  =  0.0259, \nQB =  0.0587  ,and  Qa =  0.0114 for  parallel displays  (A,  B,  G), and  QE  = 0.2125 \nQc =  0.771  for  serial ones  (E,  C). \nAs  evident,  for  parallel task  displays  the Q values  are smaller by  an order  of mag(cid:173)\nnitude  compared  with  the  serial  displays,  indicating  a  better  separation  between \ntarget  and  non-target  displays  in  parallel  tasks.  Moreover,  using  Q as  a  criterion \nfor  parallel/serial  distinction  one  can  predict  that  displays  with  Q  < <  1  will  be \nprocessed  in  parallel,  and serially  otherwise,  in  accordance  with  the  experimental \nresponse time (RT) slopes measured by Wolfe et.  al.  [7].  This differences are further \ndemonstrated in Figure 3,  depicting projections of display-vectors on the sub-space \nspanned by  the 5,  6 and 7th principal axes.  Clearly, for  the parallel task  (condition \nB),  the  PCA  representations  of the  target-displays  (plus  signs)  are separated  from \nnon-target  representations  (circles),  while for  serial displays  (condition C)  there  is \nno  such  separation.  It should  be  emphasized  that  there  is  no  other  principal  axis \nalong which such a  separation is  manifested for  serial displays. \n\n-11106 \n\n-1 un \n\n.11615 \n\n-11025 \n\n_1163 \n\n'.II , .. \n\" \n\n.,0' \n\nTil \n\n.. .  +  ++ \n. \n\n.+ \n\n+ \n\no \n\no \n\no \n\n0 \n\no \n\n7.&12 \n\n'.7 \n\n1.1186 \n\nINIIS \n\n11166  , .   18846  , .  \n\n\"'AXIS \n\n71hAXIS \n\n110\" \n\n::::~ \n\n_1157 \n\n-11M \n\n~ : . ,  Hill \n-'.1' \n\n-'181 \n\n_1182 \n'10 \n\n'07 \n\n,.II \n\n, .. .. \n\n\u2022  10~ \n\n, ow \n\n'~AXIS \n\n1\"1 \n\no \n\no \n\n.  l \n\u2022  0 \n+0 o \n\n. 0-\n\no \n\no \n\n1114  1113  1.1e2  , . ,   , .  \nno AXIS \n\n1.1'11 \n\n'.71  1 iTT  1.178 \n\n1.175  1 114 \n\n.,. \n\nFigure  3:  Projections of display-vectors  on  the sub-space  spanned  by  the  5,  6 and \n7th  principal  axes.  Plus  signs  and  circles  denote  target  and  non-target  display(cid:173)\nvectors  respectively,  (a)  for  a  parallel  task  (condition  B),  and  (b)  for  a  serial  task \n(condition C).  Set-size is 8 items. \n\n\fOn Parallel versus Serial Processing: A  Computational Study o/Visual Search \n\n15 \n\nWhile  Treisman  and  her  co-workers  view  the  distinction  between  parallel  and se(cid:173)\nrial  tasks  as  a  fundamental  one,  Duncan  and  Humphreys  [5]  claim  that  there  is \nno  sharp  distinction  between  them,  and  that search  efficiency  varies  continuously \nacross  tasks  and  conditions.  The  determining  factors  according  to  Duncan  and \nHumphreys  are  the  similarities between  the  target  and  the  non-targets  (T-N  sim(cid:173)\nilarities)  and  the similarities between  the  non-targets themselves  (N-N  similarity). \nDisplays with homogeneous background (high N-N similarity) and a target which is \nsignificantly different  from the distractors  (low T-N similarity) will exhibit parallel, \nlow RT slopes, and vice versa.  This claim was illustrated by them using a qualitative \n\"search surface\"  description as shown  in figure  4a.  Based  on results  from our vari(cid:173)\nance  analysis,  we  can now  examine this claim quantitatively:  We  have  constructed \na  \"search  surface\",  using  actual  numerical  data of RT slopes  from  Wolfe's exper(cid:173)\niments,  replacing  the  N-N  similarity axis  by  its  mathematical manifestation,  the \nwithin-groups standard deviation,  and  N-T similarity by  between-groups  standard \ndeviation 1.  The resulting surface (Figure 4b) is qualitatively similar to Duncan and \nHumphreys's.  This interesting result testifies  that the PCA representation succeeds \nin producing a  viable realization of such intuitive terms as inputs similarity, and is \ncompatible with the way we  perceive  the world  in  visual search  tasks. \n\n(b) \n\nSEARCH  SURFACE \n\n(a) \n\no \n\nCIo-..... ~:..-4.:0,........::~\"\"\"\"\"'\" \n1- _.-...-_ \n\nl.rgeI-.-Jargel \nIImllarll), \n\nFigun J. The seatcllaurface. \n\nFigure 4:  RT rates versus:  (a)  Input similarities (the search surface,  reprinted from \nDuncan  and  Humphreys,  1989).  (b)  Standard  deviations  (within  and  between)  of \nthe  PCA  variance analysis.  The asterisks  denote  Wolfe's experimental data. \n\n4  Summary \n\nIn  this  work  we  present  a  two-component  neural  network  model of pre-attentional \nvisual  processing.  The model has been  applied  to  the  visual search  paradigm per(cid:173)\nformed  by  Wolfe  et.  al.  Our  main finding  is  that when  global-feature compression \nis  applied to visual displays, there is an inherent difference  between  the representa(cid:173)\ntions of serial and parallel-task displays:  The  neural  network  studied in  this paper \nhas succeeded  in  detecting  a  target  among distractors  only for  displays  that  were \nexperimentally  found  to  be  processed  in  parallel.  Based  on  the  outcome  of the \n\n1 In general,  each principal  axis contains information from different features,  which may \nmask  the information  concerning  the existence  of a  target.  Hence,  the first  principal  axis \nmay  not  be  the  best  choice  for  a  discrimination  task.  In  our  simulations,  the  5th  axis \nfor  example,  was  primarily  dedicated  to target  information,  and  was  hence  used  for  the \nvariance  analysis  (obviously,  the  neural  network  uses  information  from  all  the first  eight \nprincipal  axes). \n\n\f16 \n\nE.  Cohen andE. Ruppin \n\nvariance  analysis  performed on the  PCA representations  of the  visual displays,  we \npresent  a  quantitative criterion enabling one to distinguish between serial and par(cid:173)\nallel  displays.  Furthermore,  the  resulting  'search-surface'  generated  by  the  PCA \ncomponents is in close correspondence with the metaphorical description of Duncan \nand  Humphreys. \n\nThe  network  demonstrates  an  interesting  generalization  ability:  Naturally,  it  can \nlearn to detect a target in parallel displays from examples of such displays.  However, \nit can also learn  to perform this task from examples of serial displays only!  On the \nother  hand,  we  find  that  it  is  impossible  to  learn  serial  tasks,  irrespective  of the \ncombination of parallel and serial displays that are presented to the network during \nthe  training  phase.  This  generalization  ability  is  manifested  not  only  during  the \nlearning  phase,  but  also  during  the  performance  phase;  displays  belonging  to  the \nsame task  have  a  similar eigenvalue spectrum,  irrespective of the actual set-size  of \nthe displays,  and  this  result  holds true for  parallel as  well  as for  serial displays. \n\nThe role of PCA in perception was previously investigated by  Cottrell [9],  designing \na  neural  network  which  performed tasks  as  face  identification  and gender  discrim(cid:173)\nination.  One  might  argue  that  PCA,  being  a  global  component  analysis  is  not \ncompatible with the existence  of local feature  detectors  (e.g.  orientation detectors) \nin  the  cortex.  Our  work  is  in  line  with  recent  proposals  [10J  that  there  exist  two \npathways  for  sensory  input  processing:  A  fast  sub-cortical  pathway  that contains \nlimited information, and a slow cortical pathway which is capable of providing richer \nrepresentations  of the stimuli.  Given  this  assumption this  paper  has presented  the \nfirst  neural  realization of the processing  that may underline the classical metaphor(cid:173)\nical explanations involved in  visual search. \n\nReferences \n[1]  J.  K. Tsotsos.  Analyzing vision  at the complexity level.  Behavioral  and Brain \n\nSciences,  13:423-469, 1990. \n\n[2J  M.  I.  Posner,  C.  R.  Snyder,  and  B.  J.  Davidson.  Attention  and  the detection \n\nof signals.  Journal  of Experimental Psychology:  General,  109:160-174, 1980. \n\n[3J  Y.  Tsal.  Movement of attention across the visual field.  Journal of Experimental \n\nPsychology:  Human  Perception  and Performance,  9:523-530, 1983. \n\n[4]  A.  Treisman and G. Gelade. A feature integration theory of attention.  Cognitive \n\nPsychology,  12:97-136,1980. \n\n[5]  J.  Duncan and G.  Humphreys.  Visual search  and stimulus similarity.  Psycho(cid:173)\n\nlogical  Review,  96:433-458, 1989. \n\n[6]  A.  Treisman and S.  Gormican.  Feature analysis in early vision:  Evidence from \n\nsearch  assymetries.  Psychological  Review,  95:15-48, 1988. \n\n[7]  J .  M.  Wolfe,  S.  R.  Friedman-Hill,  M.  I.  Stewart,  and  K.  M.  O'Connell.  The \nrole of categorization in  visual search for orientation.  Journal of Experimental \nPsychology:  Human  Perception  and Performance,  18:34-49, 1992. \n\n[8]  T.  D.  Sanger.  Optimal unsupervised  learning  in  a  single-layer  linear  feedfor(cid:173)\n\nward  neural  network.  Neural Network,  2:459-473,  1989. \n\n[9]  G.  W.  Cottrell.  Extracting  features  from  faces  using  compression  networks: \nFace,  identity, emotion and gender recognition using holons.  Proceedings  of the \n1990  Connectionist  Models  Summer School,  pages  328-337,  1990. \n\n[10]  J.  L.  Armony, D.  Servan-Schreiber, J . D.  Cohen,  and J. E.  LeDoux.  Computa(cid:173)\n\ntional  modeling of emotion:  exploration through  the  anatomy and  physiology \nof fear  conditioning.  Trends  in  Cognitive  Sciences,  1(1):28-34, 1997. \n\n\f", "award": [], "sourceid": 1356, "authors": [{"given_name": "Eyal", "family_name": "Cohen", "institution": null}, {"given_name": "Eytan", "family_name": "Ruppin", "institution": null}]}