{"title": "Perceptual Organization Based on Temporal Dynamics", "book": "Advances in Neural Information Processing Systems", "page_first": 38, "page_last": 44, "abstract": null, "full_text": "Perceptual Organization  Based on \n\nTemporal Dynamics \n\nXiuwen Liu and  DeLiang L.  Wang \n\nDepartment of Computer and Information Science \n\nCenter for  Cognitive Science \n\nThe Ohio State University, Columbus,  OR 43210-1277 \n\nEmail:  {liux,  dwang}@cis.ohio-state.edu \n\nAbstract \n\nA figure-ground  segregation  network is  proposed  based on a  novel \nboundary  pair  representation.  Nodes  in  the  network  are  bound(cid:173)\nary  segments  obtained  through  local  grouping.  Each  node  is  ex(cid:173)\ncitatorily  coupled  with  the  neighboring  nodes  that  belong  to  the \nsame region, and inhibitorily coupled with the corresponding paired \nnode.  Gestalt grouping rules are incorporated by  modulating con(cid:173)\nnections.  The  status  of  a  node  represents  its  probability  being \nfigural  and  is  updated  according  to  a  differential  equation.  The \nsystem solves the figure-ground segregation problem through tem(cid:173)\nporal  evolution.  Different  perceptual  phenomena,  such  as  modal \nand  amodal completion,  virtual contours,  grouping and  shape de(cid:173)\ncomposition are then explained through local diffusion.  The system \neliminates combinatorial optimization and accounts for  many psy(cid:173)\nchophysical results with a  fixed  set of parameters. \n\n1 \n\nIntroduction \n\nPerceptual organization refers  to the ability of grouping similar features in sensory \ndata.  This,  at  a  minimum,  includes  the operations of grouping and  figure-ground \nsegregation,  which  refers  to the process of determining relative depths of adjacent \nregions in  input  data and thus proper occlusion hierarchy.  Perceptual organization \nhas been studied extensively  and many of the existing approaches [5]  [4]  [8]  [10]  [3] \nstart from detecting discontinuities, i.e.  edges in the input; one or several configura(cid:173)\ntions are then selected according to certain criteria, for example, non-accidental ness \n[5] .  Those approaches have several disadvantages for perceptual organization.  Edges \nshould be localized between regions and an additional ambiguity,  the ownership of \na  boundary  segment,  is  introduced,  which  is  equivalent  to  figure-ground  segrega(cid:173)\ntion  [7].  Due  to  that,  regional  attributions  cannot  be  associated  with  boundary \nsegments.  Furthermore,  because  each  boundary  segment  can  belong  to  different \nregions, the potential search space is  combinatorial. \n\nTo overcome some of the problems, we propose a laterally-coupled network based on \na  boundary-pair representation to resolve figure-ground  segregation.  An occluding \nboundary is  represented by  a  pair of boundaries of the two associated regions,  and \n\n\fPerceptual Organization Based on Temporal Dynamics \n\n39 \n\n(a) \n\nFigure  1:  On- and off-center  cell  responses.  (a)  On- and  off-center cells.  (b)  Input \nimage. \n(d)  Off-center  cell  responses  (e)  Binarized \non- and off-center  cell  responses,  where  white  regions  represent on-center response \nregions and  black off-center  regions. \n\n(c)  On-center  cell  responses. \n\ninitiates a competition between the regions.  Each node in the network represents a \nboundary segment.  Regions  compete to be figural through boundary-pair competi(cid:173)\ntion and figure-ground  segregation is  resolved through temporal evolution.  Gestalt \ngrouping rules are incorporated by modulating coupling strengths between different \nnodes  within  a  region , which influences  the temporal dynamics and determines  the \npercept  of the  system.  Shape  decomposition  and  grouping  are  then  implemented \nthrough local  diffusion  using the results from  figur e-ground segregation. \n\n2  Figure-Ground  Segregation Network \n\nThe  central  problem  in  perceptual  organization  is  to  determine  relative  depths \namong  regions.  As  figure  reversal  occurs  in  certain  circumstances,  figure-ground \nsegregation cannot be resolved only  based on local attributes. \n\n2.1  The  Network  Architecture \n\nl(a).  Fig. \n\nThe  boundary-pair  representation  is  motivated  by  on- and  off-center  cells,  shown \nl(c)  and  (d)  show  the \nin  Fig. \non- and  off-center  responses.  Without  zero-crossing,  we  naturally  obtain  double \nresponses for  each occluding boundary, as shown in Fig.  l(e).  In our boundary-pair \nrepresentation, each boundary is  uniquely  associated with a  region. \n\nl(b)  shows  an  input  image  and  Fig. \n\nIn  this  paper,  we  obtain  closed  region  boundaries  from  segmentation  and  form \nboundary segments  using  corners  and junctions,  which  are  detected  through  local \ncorner  and junction  detectors.  A  node  i  in  the  figure-ground  segregation  network \nrepresents a boundary segment, and Pi represents its probability being figural, which \nis  set to 0.5  initially.  Each node is  laterally coupled with  neighboring nodes on the \nclosed  boundary.  The  connection  weight  from  node  i  to  j,  Wij,  is  1  and  can  be \nmodified  by  T-junctions  and  local  shape information.  Each occluding boundary is \nrepresented  by  a  pair of boundary segments  of the  involved  regions.  For example, \nin  Fig.  2(a),  nodes  1  and  5  form  a  boundary  pair,  where  node  1  belongs  to  the \nwhite region and node 5  belongs to the  black region.  Node i  updates its status by: \n\ndPi \nT  dt  =  ILL ~ Wki(Pk  - Pi) + ILJ(l - P i) ~ H(Qli) + ILB(l  - Pi)  exp( - K \nB i \nB \n\n\" \nkEN(i) \n\n\" \nlEJ (i) \n\n)  (1) \n\nHere N(i) is  the set  of neighboring nodes  of i,  and  ILL , ILJ , and  ILB  are  parameters \nto  determine  the  influences  from  lateral  connections , junctions,  and  bias.  J(i)  is \n\n\f40 \n\nX  Liu and D.  L.  Wang \n\n2 \n\n----\u00ae--<@r--- 6 \n\n8  ---oo{[D--@----\n\n4 \n\n3 \n(a) \n\nI \n\n(b) \n\nFigure  2:  (a)  The figure-ground  segregation  network for  Fig.  l(b).  Nodes  1,  2,  3 \nand  4  belong to the white  region;  nodes  5,  6,  7,  and  8  belong to the  black  region; \nand nodes  9 and  10,  and  nodes  11  and 12  belong to the left  and  right gray regions \nrespectively.  Solid  lines  represent  excitatory coupling while  dashed  lines  represent \ninhibitory  connections.  (b)  Result  after  surface  completion.  Left  and  right  gray \nregions are grouped together. \n\nthe set  of junctions  that  are  associated  with  i  and  Q/i  is  the junction  strength of \nnode i  of junction  l.  H(x)  is  given  by  H(x)  =  tanh(j3(x - OJ )),  where  j3  controls \nthe steepness and OJ  is  a  threshold. \n\nIn  (1),  the first  term  on  the right  reflects  the  lateral  influences.  When  nodes  are \nstrongly  coupled,  they  are  more  likely  to  be  in  the  same  status,  either  figure  or \nbackground.  The second  term  incorporates junction  information.  In  other  words, \nat a T-junction, segments that vary more smoothly are more likely to be figural.  The \nthird term is  a  bias, where Bi is  the bias introduced to simulate human perception. \nThe  competition  between  paired  nodes i  and j  is  through normalization based  on \nthe assumption that only one of the paired nodes should be figural  at a given time: \np(Hl) = pt/(P~ + pt) and  p(tH) = P~/(pt + P~) \n\nJ \n\nt \n\nJ '  \n\nt \n\nt \n\nt \n\nJ \n\nJ \n\n2.2 \n\nIncorporation of Gestalt  Rules \n\nTo  generate  behavior  that  is  consistent  with  human  perception,  we  incorporate \ngrouping  cues  and  some  Gestalt  grouping  principles.  As  the  network  provides  a \ngeneric model, additional grouping rules can also be incorporated. \nT-junctions  T-junctions  provide  important  cues  for  determining relative  depths \n[7]  [10]. \nIn  Williams  and  Hanson's  model  [10],  T-junctions  are  imposed  as \ntopological constraints.  Given aT-junction l,  the initial strength for  node i  that is \nassociated with  lis: \n\nQ \n\nexp( -Ci(i,C(i\u00bb/ KT) \n\nIi =  1/2 LkENJ(I) exp( -Ci(k ,c(k\u00bb)/ K T ) , \n\nwhere K T  is  a  parameter, N J  (l)  is  a set of all the nodes associated with junction l, \nc( i)  is  the other node in N J  (l) that belongs to the same region as node i, and  Ci(ij) \nis  the angle between segments i  and j. \n\nNon-accidentalness  Non-accidentalness  tries  to  capture  the  intrinsic  relation(cid:173)\nships among segments [5].  In our system, an additional connection is  introduced to \nnode i  if it is  aligned well  with a node j  from  the same region and j  rf.  N(i) initially. \nThe connection weight  Wij  is  a function  of distance and angle between the involved \nending points.  This can be viewed as virtual junctions, resulting in  virtual contours \nand conversion of a  corner into a T-junction if involved nodes  become figural.  This \ncorresponds to an organization criterion proposed by  Geiger et al  [3}. \n\n\fPerceptual Organization Based on Temporal Dynamics \n\n41 \n\nTime \n\nTime \n\nTime \n\nFigure 3:  Temporal behavior of each node in the network shown in Fig.  2(a).  Each \nplot shows the status of the corresponding node  with  respect to time.  The dashed \nline  is  0.5. \n\nShape  information Shape  information  plays  a  central  role  in  Gestalt  principles \nand  is  incorporated  through  enhancing  lateral  connections. \nIn  this  paper,  we \nconsider local symmetry.  Let j  and  k  be two neighboring nodes of i: \n\nWij  =  1 + C exp( -Iaij - akil/ KaJ * exp( -(Lj / Lk + Lk/ Lj  - 2)/ K L )), \n\nwhere C, KQ:,  and KL are parameters and L j  is the length of segment j.  Essentially \nthe  lateral  connections  are  strengthened  when  two  neighboring  segments  of i  are \nsymmetric. \n\nPreferences  Human perceptual systems often prefer some organizations over oth(cid:173)\ners.  Here we  incorporated a  well-known figure-ground segregation principle,  called \ncloseness.  In  other words,  the system  prefers  filled  regions  over  holes.  In  current \nimplementation, we  set  Bi ==  1.0 if node i  is  part of a  hole and otherwise Bi ==  o. \n\n2.3  Temporal Properties of the Network \n\nAfter  we  construct  the  figure-ground  segregation  network,  each  node  is  updated \naccording to (1).  Fig.  3 shows the temporal behavior of the network shown in Fig. \n2(a).  The system approaches to a stable solution.  For figure-ground segregation, we \ncan binarize the status of each node using threshold 0.5.  Thus the system generates \nthe  desired  percept  in  a  few  iterations.  The  black  region  occludes  other  regions \nwhile  gray regions occlude the white region.  For example, P5  is  close to 1 and thus \nsegment 5 is figural,  and PI  is  close  to 0 and thus segment 1 is  in the background. \n\n2.4  Surface  Completion \n\nAfter figure-ground segregation is resolved, surface completion and shape decompo(cid:173)\nsition are implemented through diffusion  [3].  Each boundary segment is  associated \nwith  regional  attributes such  as  the average intensity  value  because  its  ownership \nis known.  Boundary segments are then grouped into diffusion groups based on sim(cid:173)\nilarities  of  their  regional  attributes  and  if  they  are  occluded  by  common  regions. \nIn Fig.  1(b), three diffusion  groups are formed,  namely,  the black region,  two gray \nregions, and the white region.  Segments in one diffusion group are diffused simulta(cid:173)\nneously.  For a figural segment, a buffer with a given radius is generated.  Within the \nbuffer,  the values  are fixed  to 1 for  pixels  belonging  to the region  and 0 otherwise. \nNow  the problem becomes a  well-defined  mathematical problem.  We  need  to solve \n\n\f42 \n\nX  Liu and D. L.  Wang \n\n(c) \n\nFigure  4:  Images  with  virtual  contours.  In  each  column,  the top shows  the  input \nimage and the bottom the surface completion result,  where completed surfaces are \nshown according to their relative depths and the bottom one is the projection of all \nthe  completed  surfaces.  (a)  Alternate  pacman.  (b)  Reverse-contrast pacman.  (c) \nKanizsa triangle.  (d)  Woven square.  (e)  Double pacman. \n\nthe heat equation with given boundary conditions.  Currently, the heat equation is \nsolved through local  diffusion.  The results from  diffusion  are then binarized  using \nthreshold  0.5.  Fig.  2(b)  shows  the  results  for  Fig.  l(b)  after  surface  completion. \nHere the two gray regions are grouped together through surface completion because \noccluded  boundaries  allow  diffusion.  The  white  region  becomes  the  background, \nwhich  is the entire image. \n\n3  Experimental Results \n\nGiven  an image,  the system  automatically  constructs the  network and establishes \nthe connections based on the rules discussed in Section 2.2.  For all the experiments \nshown here,  a  fixed  set of parameters is  used. \n\n3.1  Modal and Amodal Completion \n\nWe  first  demonstrate  that  the  system  can  simulate  virtual  contours  and  modal \ncompletion.  Fig.  4  shows  the  input  images  and  surface  completion  results.  The \nsystem  correctly solves figure-ground  segregation  problem and  generates the  most \nprobable  percept.  Fig.  4  (a)  and  (b)  show  two  variations  of  pacman  images  [9] \n[4].  Even  though  the  edges  have  opposite  contrast,  the  virtual  rectangle  is  vivid. \nThrough boundary-pair representation, our system can handle both cases using the \nsame network.  Fig.  4(c)  shows a  typical virtual image [6]  and the system correctly \nsimulates the percept.  In Fig.  4( d)  [6],  the rectangular-like frame  is  tilted,  making \nthe order between the frame and virtual square not well-defined.  Our system handles \nthat  in  the  temporal  domain.  At  any  given  time,  the  system  outputs  one  of  the \n\n\fPerceptual Organization Based on Temporal Dynamics \n\n43 \n\n(a) \n\n~'---I \n\n(b) \n\n(c) \n\n(d) \n\n(e) \n\n(f) \n\nFigure 5:  Surface completion  results.  (a)  and  (b)  Bregman figures  [1].  (c)  and  (d) \nSurface completion results for  (a)  and  (b).  (e)  and  (f)  An  image of some  groceries \nand surface completion result. \n\ncompleted  surfaces.  Due  to this,  the system  can  also  handle  the  case  in  Fig.  4(e) \n[2],  where  the  percept  is  bistable,  as  the order  between  the two  virtual squares is \nnot  well  defined. \n\nFig.  5(a) and (b)  show the well-known Bregman figures [1].  In Fig.  5(a), there is no \nperceptual grouping and parts of B's remain fragmented.  However,  when occlusion \nis  introduced as  in  Fig.  5(b),  perceptual grouping is  evident  and fragments of B's \nare grouped together.  Our results, shown in Fig.  5  (c)  and  (d), are consistent with \nthe percepts.  Fig.  5(e) shows an image of groceries, which is used extensively in [8]. \nEven though the T-junction at the bottom is locally confusing, our system gives the \nmost plausible result through lateral influences of the other two strong T-junctions. \nWithout search and parameter tuning,  our system gives the optimal solution shown \nin  Fig.  5(f). \n\n3.2  Comparison with Existing Approaches \n\nAs  mentioned earlier, at the minimum, figure-groud segregation and grouping need \nto  be  addresssed  for  perceptual  organization.  Edge-based  approaches  [4]  [10]  at(cid:173)\ntempt to solve both problems simultaneously by prefering some configurations over \ncombinatorially  many  ones  according  to  certain  creteria.  There  are  several  diffi(cid:173)\nculties  common  to those  approaches.  First  it  cannot  account  for  different  human \npercepts  of cases  where  edge  elements  are  similar.  Fig.  5  (a)  and  (b)  are  well(cid:173)\nknown  examples  in  this  regard.  Another  example  is  that the edge-only  version  of \nFig.  4( c)  does  not give rise to a  vivid virtual contour as in Fig.  4( c)  [6].  To  reduce \nthe potential search space, often contrast signs of edges are used as  additional con(cid:173)\ntraints [10J.  However, both Fig.  4  (a)  and  (b)  give  rise  to virtual contours despite \nthe  opposite  edge  contrast  signs.  Essentially  based  on  Fig.  4(b),  Grossberg  and \nMingolla  [4]  claimed that illusory  contours can join edges  with different  directions \nof contrast, which does not hold in general.  As demonstrated through experiments, \nour approach does  offer a  common principle underlying these examples. \n\nOur  approach  shares  some  similarities  with  the  one  by  Geiger  et  al  [3].  In  both \napproaches,  perceptual  organization  is  solved  in  two  steps.  In  [3],  figure-ground \nsegregation is encoded implicitly in hypotheses which are defined at junction points. \nBecause  potential  hypotheses  are combinatorial,  only  a  few  manually  chosen  ones \nare tested in  their experiments, which  is  not sufficient  for  a  general computational \n\n\f44 \n\nX  Liu and D.  L.  Wang \n\nIn  [3],  \"heat\"  sources  for  diffusion  are  given  manually  for  each  hy(cid:173)\n\nmodel.  In our approach, by resolving figure-ground segregation, there is  no need to \ndefine  hypotheses  explicitly.  In  both  methods,  grouping  is  implemented  through \ndiffusion. \npothesis  whereas  our  approach  generates  \"heat\"  sources  automatically  using  the \nfigure-ground  segregation  results.  Finally,  in  our  approach,  local  ambiguities  can \nbe  resolved  through  lateral  connections  using  temporal  dynamics,  resulting  in  ro(cid:173)\nbust  behavior.  To  obtain  good  results  for  Fig.  5(e),  Nitzberg  et  al  [8]  need  to \ntune parameters and increase their search space substantially due to the misleading \nT-junction at the bottom of Fig.  5(e). \n\n4  Conclusion \n\nIn  this  paper we  have proposed  a  network  for  perceptual organization using  tem(cid:173)\nporal  dynamics.  The  pair-wise  boundary  representation  resolves  the  ownership \nambiguity  inherent in  an edge-based  representation and  is  equivalent  to a  surface \nrepresentation through  diffusion,  providing a  unified  edge- and  surface-based  rep(cid:173)\nresentation.  Through temporal dynamics, our model  allows for  interactions among \ndifferent  modules and top-down influences  can be incorporated. \n\nAcknowledgments \n\nAuthors  would  like  to  thank S.  C.  Zhu  and M.  Wu  for  their  valuable  discussions. \nThis research  is  partially supported by  an  NSF  grant  (IRI-9423312)  and  an  ONR \nYoung Investigator Award  (N00014-96-1-0676) to DLW. \n\nReferences \n[1]  A.  S. Bregman, \"Asking the 'What for' question in auditory perception,\"  In Perceptual \nOrganization,  M.  Kubovy and J  R.  Pomerantz,  eds.,  Lawrence  Erlbaum  Associates, \nPublishers,  Hillsdale,  New  Jersey,  pp.  99-118,  1981. \n\n[2]  M.  Fahle and G.  Palm,  \"Perceptual rivalry between illusory  and real contours,\"  Bio(cid:173)\n\nlogical  Cybernetics,  vol.  66,  pp. 1-8,  1991. \n\n[3]  D.  Geiger,  H.  Pao,  and  N.  Rubin,  \"Salient  and  multiple  illusory  surfaces,\"  In  Pro(cid:173)\n\nceedings  of IEEE  Computer  Society  Conference  on  Computer  Vision  and  Pattern \nRecognition,  pp. 118-124,  1998. \n\n[4]  S.  Grossberg  and  E.  Mingolla,  \"Neural  dynamics  of perceptual  grouping:  textures, \nboundaries,  and emergent segmentations,\"  Perception  &  Psychophysics,  vol.  38,  pp. \n141-170,  1985. \n\n[5]  D.  G.  Lowe,  Perceptual  Organization  and  Visual Recognition,  Kluwer Academic Pub(cid:173)\n\nlishers,  Boston,  1985. \n\n[6]  G.  Kanizsa,  Organization  in  Vision,  Praeger, New  York,  1979. \n[7]  K. Nakayama, Z. J . He, and S.  Shimojo,  \"Visual surface representation:  a critical link \nbetween lower-level  and higher-level  vision,\"  In  Visual  Cognition,  S.  M.  Kosslyn  and \nD.  N.  Osherson,  eds. ,  The  MIT  Press,  Cambridge,  Massachusetts,  vol.  2,  pp.  1-70, \n1995. \n\n[8]  M.  Nitzberg,  D.  Mumford,  and  T.  Shiota,  Filtering,  Segmentation  and  Depth, \n\nSpringer-Verlag, New York,  1993. \n\n[9]  R.  Shapley and J.  Gordon,  \"The existence of interpolated illusory  contours depends \non contrast and spatial separation,\"  In  The  Perception  of Illusory  Contours,  S.  Petry \nand G.  E.  Meyer, eds.,  Springer-Verlag,  New  York, pp. 109-115,  1987. \n\n[10]  L.  R.  Williams  and  A.  R.  Hanson,  \"Perceptual  Completion  of  Occluded  Surfaces,\" \n\nComputer  Vision  and  Image  Understanding,  vol.  64,  pp. 1-20, 1996. \n\n\f", "award": [], "sourceid": 1730, "authors": [{"given_name": "Xiuwen", "family_name": "Liu", "institution": null}, {"given_name": "DeLiang", "family_name": "Wang", "institution": null}]}