{"title": "Contour-Map Encoding of Shape for Early Vision", "book": "Advances in Neural Information Processing Systems", "page_first": 282, "page_last": 289, "abstract": null, "full_text": "282 \n\nKanerva \n\nContour-Map Encoding of Shape for Early Vision \n\nResearch  Institute  for  Advanced  Computer  Science \n\nPentti  Kanerva \n\nMail  Stop  230-5,  NASA  Ames  Research  Center \n\nMoffett  Field,  California  94035 \n\nABSTRACT \n\nContour  maps  provide  a  general  method  for \nrecognizing  two-dimensional  shapes.  All  but \nblank  images  give  rise  to  such  maps,  and  people \nare  good  at  recognizing  objects  and  shapes \nfrom  them.  The  maps  are  encoded  easily  in \nlong  feature  vectors  that  are  suitable  for \nrecognition  by  an  associative  memory.  These \nproperties  of  contour  maps  suggest  a  role  for \nthem  in  early  visual  perception.  The  prevalence \nof  direction-sensitive  neurons  in  the  visual \ncortex  of  mammals  supports  this  view. \n\nINTRODUCTION \n\nEarly  vision  refers  here  to  the  first  stages  of  visual \nperception  of  an  experienced  (adult  human)  observer. \nOverall,  visual  perception  results  in  the  identification  of \nwhat  is  being  viewed:  We  recognize  an  image  as  the  letter  A \nbecause  it  looks  to  us  like  other  As  we  have  seen.  Early \nvision  is  the  beginning  of  this  process  of  identification-(cid:173)\nthe  making  of  the  first  guess. \n\nEarly  vision  cannot  be  based  on  special  or  salient \n\nfeatures.  For  example,  we  normally  think  of  the  letter  A \nas  being  composed  of  two  slanted  strokes,  /  and  \\,  meeting \nat  the  top  and  connected  in  the  middle  by  a  horizontal \nstroke,  -.  The  strokes  and  their  coincidences  define  all \nthe  features  of  A.  However,  we  recognize  the  As  in  Figure  1 \neven  though  the  strokes  and  the  features,  if  present  at  all, \ndo  not  stand  out  in  the  images. \n\n\fContour-Map Encoding of Shape for Early Vision \n\n283 \n\nMost  telling  about  human  vision  is  that  we  can  recognize \n\nsuch  As  after  seeing  more  or  less  normal  As  only.  The \nchallenge  of  early  vision,  then,  is  to  find  general  encoding \nmechanisms  that  turn  these  quite  dissimilar  images  of  the \nsame  object  into  similar  internal  representations  while \nleaving  the  representations  of  different  objects  dissimilar; \nand  to  find  basic  pattern-recognition  mechanisms  that  work \nwith  these  representations.  Since  our  main  work  is  on \nassociative  memories,  we  have  been  interested  in  ways  to \nencode  images  into  long  feature  vectors  suitable  for  such \nmemories.  The  contour-map  method  of  this  paper  encodes  a \nvariety  of  images  into  vectors  for  associative  memories. \n\nREPRESENTING  AN  IMAGE  AS  A  CONTOUR  MAP \n\nline  drawings,  silhouettes, \n\nImages  take  many  forms: \noutlines,  dot-matrix  pictures,  gray-scale  pictures,  color \npictures,  and  the  like,  and  pictures  that  combine  all  these \nelements.  Common  to  all  is  that  they  occupy  a  region  of \n(two-dimensional)  space.  An  early  representation  of  an \nimage  should  therefore  be  concerned  with  how  the  image \ncontrols  its  space  or,  in  technical  terms,  how  might  it be \nrepresented  as  a  field. \n\nLet  us  consider  first  a  gray-scale  image. \n\nIt  defines \n\na  field  by  how  dark  it  is  in  different  places  (image \nintensity--a  scalar  field--the  image  itself  is  the  field). \nA  related  field  is  given  by  how  the  darkness  changes  from \nplace  to  place  (gradient  of  intensity--a  vector  field) . \nNeither  one  is  quite  right  for  recognizing  As  because \nreversing  the  field  (turning  dark  to  light  and  light  to \ndark)  leaves  us  with  the  \"same\"  A.  However,  the  dark(cid:173)\nand-light  reversal  leaves  the  contour  lines  of  the  image \nunchanged  (i.e.,  lines  of  uniform  intensity--technically \na  tangent  field  perpendicular  to  the  gradient  field).  My \nproposal  is  to  base  initial  recognition  on  the  contour \nlines. \n\nIn  line  drawings  and  black-and-white  images,  which  have \n\nonly  two  darkness  levels  or  \"colors\",  the  contour  lines  are \nnot  well  defined.  This  is  overcome  by  propagating  the  lines \nand  the  edges  of  the  image  outward  and  inward  over  areas  of \n\n............ .. ..... . \n:'. : :: :::: :::: ::: :: :: \n..... .....\u2022. .... .... \n............. ....... \n........ .\u2022.\u2022. ....... \n....... ....... ...... \n... .....\u2022........... \n........... ......... \n:: :::::=:: :: :::; :::: \n........ ............ \n:; ;:::::: ::;; :::;:: : \n: ;:::!:~:::!~~:::::: \n............... ..\u2022.. \n:; ~: : : : ~ :: : : : : : : ::;; \n\nto '   \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022  ,  \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022  \n\nFIGURE  1.  Various  kinds  of  As. \n\n\f284 \n\n]{anerva \n\nuniform  image  intensity,  in  the  manner  of  contour  lines, \nroughly  parallel  to  the  lines  and  the  edges.  Figure  2  shows \nonly  a  few  such  lines,  but,  in  fact,  the  image  is  covered \nwith  them,  running  roughly  parallel  to  each  other.  As  a \nrule,  exactly  one  contour  line  runs  through  any  given  point. \nComputing  its  direction  is  discussed  near  the  end  of  the \npaper. \n\nENCODING  THE  CONTOUR  MAP \n\nTable  1  shows  how  the  direction  of  the  contour  at  a  point \ncan  be  encoded  in  three  trits  (-1,  0,  1  ternary  variables) . \nThe  code  divides  180  degrees  into  six  equal  sectors  and \nassigns  a  codeword  to  each  sector.  The  distance  between \ntwo  codewords  is  the  number  of  (Hamming)  units  by  which \nthe  words  differ  (L1  distance).  The  code  is  circular,  and \nthe  distance  between  codewords  is  related  directly  to  the \ndifference  in  direction:  Directions  30,  60,  and  90  degrees \napart  are  encoded  with  words  that  are  2,  4,  and  6  units \napart,  respectively.  The  code  wraps  around,  as  do  tangents, \nso  that  directions  180  degrees  apart  are  encoded  the  same. \nFor  finer  discrimination  we  would  use  some  finer  circular \ncode.  The  zero-word  000,  which  is  equally  far  from  all \nother  words  in  the  code,  is  used  for  points  at  which  the \ndirection  of  the  contour  is  ill-defined,  such  as  the  very \ncenters  of  circles. \n\nThis  encoding  makes  the  direction  of  the  contour  at  any \n\npoint  on  a  map  into  a  three-component  vector.  To  encode \nthe  entire  map,  the  vector  field  is  sampled  at  a  fixed, \nfinite  set  of  points,  and  the  encodings  of  the  sample  points \nare  concatenated  in  fixed  order  into  a  long  vector. \npreliminary  studies  we  have  used  small  sample  sizes: \n(=  35)  sample  points,  each  encoded  into  three  trits,  for  a \ntotal  vector  of  (3  x  35  =)  105  trits,  and  8  x  8  sample \npoints  by  three  trits  for  a  total  vector  of  192  trits. \n\n7  x  5 \n\nIn \n\nFIGURE  2.  Propagating  the  contour. \n\n\fContour-Map Encoding of Shape for Early Vision \n\n285 \n\nFor  an  example,  Figure  3  shows  the  digit  4  drawn  on  a \n\n21-by-15-pixel  grid.  It  also  shows  a  7  x  5  sampling  grid \nlaid  over  the  image  and  the  direction  of  the  contour  at \nthe  sample  points  (shown  by  short  line  segments).  Below \nthe  image  are  the  three-trit  encodings  of  the  sample  points \nstarting  at  the  upper  left  corner  and  progressing  by  rows, \nconcatenated  into  a  105-trit  encoding  of  the  entire  image. \nIn  this  encoding,  +  means  +1  and  - means  -1. \n\nFrom  Positions  of  the  Code  to  Directional  Sensors \n\nEach  position  of  the  three-trit  code  can  be  thought  of  as  a \ndirectional  sensor.  For  example,  the  center  position  senses \ncontours  at  90  degrees,  plus  or  minus  45  degrees: \nwhen  the  direction  of  the  contour  is  closer  to  vertical  than \nto  horizontal  (see  Table  1).  Similarly,  each  position  of \nthe  long  (105-trit)  code  for  the  entire  map  can  be  thought \nof  as  a  sensor  for  a  specific  direction--plus  or  minus--at \na  specific  location  on  the  map. \n\nIt  is  1 \n\nAn  array  of  sensors  will  thus  encode  an  image.  The \n\nsensors  are  like  the  direction-sensitive  cells  of  the  visual \ncortex.  Such  cells,  of  course,  are  not  laid  down  with \nperfect  regularity  over  the  cortex,  but  that  does  not  mean \n\nI  I  I \n\n\\ \n\nTABLE  1 \n\nCoarse  Circular  Code  for \n\nDirection  of  Contour \n\n~~===================== \n\ndegrees \n\nDirection, \nCodeword \n-----------------------\n1 \n1 \n1 \n\n0  +  15 \n30  +  15 \n60  +  15 \n\n1  -1 \n-1  -1 \n-1 \n1 \n\nIi! \n\n~~  f \n--. \n-- --\n\n~I \u2022 \n[fl \n\nf \nf \n\n,~ . \nf.~.,.~ }:< \"', \n\n....... \n\n....... \n\n....... \n\n\\ \n\nf \n\\ \n\nf \nI \n\n90  +  15 \n120  +  15 \n150  +  15 \n\n180  +  15 \n.  .  . \n\nUndefined \n\n1  -1 \n-1 \n1 \n1  -1 \n1  -1  -1 \n\n1  -1 \n1 \n.  .  . \n0 \n0 \n\n0 \n\n======================= \n\n-++  -++  -++  --+  ++-\n-++  -++  -++  -+- -+-\n--+  --+  --+  -+- -+-\n--+  -++  000  -+- -+-\n000  +-+  +-+  +-+  --+ \n+-- +-+  +-+  -+- -+-\n+-- +-- ++- ++- -++ \n\nFIGURE  3.  Encoding  an  image. \n\n\f286 \n\nKanerva \n\nthat  they  could  not  perform  as  encoders.  Accordingly,  a \ndirection-sensitive  cell  can  be  thought  of  as  a  feature \ndetector  that  encodes  for  a  certain  direction  at  a  certain \nlocation  in  the  visual  or  attentional  field.  An  irregular \narray  of  randomly  oriented  sensors  laid  over  images  would \nproduce  perfectly  good  encodings  of  their  contour  maps. \n\nCOMPARING  TWO  CONTOUR  MAPS \n\nHow  closely  do  two  contour  maps  resemble  each  other?  For \nsimplicity,  we  will  compare  maps  of  equal  size  (and  shape) \nonly.  The  maps  are  compared  point  to  point.  The  difference \nat  a  point  is  the  difference  in  the  direction  of  the  contour \nat  that  point  on  the  two  maps--that  is,  the  magnitude  of  the \nlesser  of  the  two  angles  made  by  the  two  contour  lines  that \nrun  through  the  two  points  that  correspond  to  each  other \non  the  two  maps.  The  maximum  difference  at  a  point  is \ntherefore  90  degrees.  The  entire  maps  are  then  compared \nby  adding  the  pointwise  differences  over  all  the  points  (by \nintegrating  over  the  area  of  the  map). \n\nThe  purpose  of  the  encoding  is  to  make  the  comparing  of \n\nmaps  simple.  The  code  is  so  constructed  that  the  difference \nof  two  maps  at  a  point  is  roughly  proportional  to  the \ndistance  between  the  two  (3-trit)  codewords--one  from  each \nmap--for  that  point.  We  need  not  even  concern  ourselves \nwith  the  finding  of  the  lesser  of  the  two  angles  made  by  the \ncrossing  of  the  two  contours;  the  distance  between  codewords \naccounts  for  that  automatically. \n\nEntire  maps  are  then  compared  by  adding  together  the \n\ndistances  at  the  (35)  sample  points.  This  is  equivalent \nto  computing  the  distance  between  the  (105-trit)  codewords \nfor  the  two  maps.  This  distance  is  proportional  to  the \ndifference  between  the  maps,  and  it  is  approximately  so \nbecause  the  maps  are  sampled  at  a  small  number  of  points \nand  because  the  direction  at  each  point  is  coded  coarsely. \n\nCOMPUTING  THE  DIRECTION  OF  THE  CONTOUR \n\nWe  have  not  explored  widely  how  to  compute  contours  from \nimages  and  merely  outline  here  one  method,  not  exactly \nbiological,  that  works  for  line  drawings  and  two-tone  images \nand  that  can  be  generalized  to  gray-scale  images  and  even \nto  many  multicolor  images.  We  have  also  experimented  with \noriented,  difference-of-Gaussian  filters  of  Parent  and \nZucker  (1985)  and  with  cortex  transforms  of  Watson  (1987). \nThe  contours  are  based  on  a  simple  model  of  attraction, \nakin  to  gravity,  by  assuming  that  the  lines  and  the  edges \nof  the  image  attract  according  to  their  distance  from  the \npoint.  The  net  attraction  at  any  point  on  the  image  defines \n\n\fContour-Map Encoding of Shape for Early Vision \n\n287 \n\na  gradient  field,  and  the  contours  are  perpendicular  to  it. \nIn  practice  we  work  with  pixels  and  assume,  for  the  sake \n\nof  the  gravity  model,  that  pixels  of  the  same  color--same  as \nthat  of  the  sample  point  P  for  which  we  are  computing  the \ndirection--have  mass  zero  and  those  of  the  opposite  color \nhave  mass  one.  For  the  direction  to  be  independent  of \nscale,  the  attractive  force  must  be  inversely  proportional \nto  some  power  of  the  distance.  Powers  greater  than  2  make \nthe  computation  local.  For  example,  power  7  means  that  one \npixel,  twice  as  far  as  another,  contributes  only  1/128  as \nmuch  as  the  other  to  the  net  force.  To  make  the  attraction \nsomewhat  insensitive  to  noise,  a  small  constant,  3,  is  added \nto  the  distance. \nsmall  amount  of  experimentation.)  Hence,  pixel  X  (of  mass \n1)  attracts  P  with  a  magnitude \n\n(The  values  7  and  3  were  chosed  after  a \n\n[d(P,X)  +  3] \n\n-7 \n\nforce  in  the  direction  of  X,  where  d(P,X)  is  the  (Euclidean) \ndistance  between  P  and  X.  The  vector  sum  of  the  forces \nover  all  pixels  X  (of  mass  1)  then  is  the  attractive \nforce  at  point  P,  and  the  direction  of  the  contour  at  P  is \nperpendicular  to  it.  The  magnitude  of  the  vector  surn  is \nscaled  by  dividing  it  with  the  sum  of  the  magnitudes  of  its \ncomponents.  This  scaled  magnitude  indicates  how  well  the \ndirection  is  defined  in  the  image. \n\nWhen  this  computation  is  made  at  a  point  on  a  (one-pixel \n\nwide)  line,  the  result  is  a  zero-vector  (the  gradient  at \nthe  top  of  a  ridge  is  zero).  However,  we  want  to  use  the \ndirection  of  the  line  itself  as  the  direction  of  the \ncontour.  To  this  end,  we  compute  at  each  sample  point  P \nanother  vector  that  detects  linear  features,  such  as  lines. \nThis  computation  is  based  on  the  above  attraction  model, \nmodified  as  follows:  Pixels  of  the  same  color  as  P's  now \nhave  mass  one  and  those  of  the  opposite  color  have  mass  zero \n(the  pixel  at  P  being  always  regarded  as  having  mass  zero); \nand  the  direction  of  the  force,  instead  of  being  the  angle \nfrom  P  to  X,  is  twice  that  angle.  The  doubling  of  the  angle \nmakes  attractive  forces  in  opposite  directions  (along  a \nline)  reenforce  each  other  and  in  perpendicular  directions \ncancel  out  each  other.  The  angle  of  the  net  force  is  then \nhalved,  and  the  magnitude  of  the  force  is  scaled  as  above. \n\nThe  two  computations  yield  two  vectors,  both  representing \n\nthe  direction  of  the  contour  at  a  point.  They  can  be \ncombined  into  a  single  vector  by  doubling  their  angles, \nto  eliminate  lBO-degree  ambiguities,  by  adding  together \nthe  resulting  vectors,  and  by  halving  the  angle  of  the  sum. \nThe  direction  of  the  result  gives  the  direction  of  the \ncontour,  and  the  magnitude  of  the  result  indicates  how  well \n\n\f288 \n\nKanerva \n\nthis  direction  is  defined. \nthreshold,  the  direction  is  taken  to  be  undefined  and  is \nencoded  with  000. \n\nIf  the  magnitude  is  below  some \n\nSOME  COMPARISONS \n\nThe  method  is  very  general,  which  is  at  once  its  virtue  and \nlimitation.  The  virtue  is  that  it  works  where  more  specific \nmethods  fail,  the  limitation  that  the  specific  methods  are \nneeded  for  specific  problems. \n\nIn  our  preliminary  experiments  with  handwritten  Zip-code \ndigits,  low-pass  filtering  (blurring)  an  image,  as  a  method \nof  encoding  it,  and  contour  maps  resulted  in  similar  rates \nof  recognition  by  a  sparse  distributed  memory.  Higher  rates \non  this  same  task  were  gotten  by  Denker  et  al.  (1989)  by \nencoding  the  image  in  terms  of  features  specific  to \nhandwriting. \n\nTo  get  an  idea  of  the  generality  of  contour  maps,  Figure \n\n4  shows  encoded  maps  of  ten  normal  digits  like  that  in \nFigure  3,  and  for  three  unusual  digits  barely  recognizable \nby  humans.  The  labels  for  the  unusual  ones  and  for  their \nmaps,  8a,  8b,  and  9a,  tell  what  digits  they  were  intented \nto  be.  Table  2  of  distances  between  the  encoded  maps \nshows  that  8  gives  only  the  second  best  match  to  8a  and  8b, \nwhereas  the  digit  closest  to  9a  indeed  is  9.  This  suggest \nthat  a  system  trained  on  normal  letters  and  digits  would  do \n\n6 \n\n9a \n\na \n\n8a \n\n8b \n\n1 r \n\na \n\na \n\n8a \n\n8b \n\n9a \n\n\u2022 \n\n\u2022 \n\n\u2022 \n/ . /  \n\nFIGURE  4.  Contour  maps  of  digits.  Unusual  text. \n\n\fContour-Map Encoding of Shape for Early Vision \n\n289 \n\nDistances  Between  Normal  and  Unusual  Digits  of  Figure  4 \n\nTABLE  2 \n\no \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\n6 \n\n7 \n\n8 \n\n9 \n\n8a \n8b \n9a \n\n62 \n38 \n70 \n\n95 \n71 \n89 \n\n80 \n88 \n66 \n\n91 \n74 \n64 \n77 \n90  109 \n\n83 \n65 \n\n86 \n87 \n73 \n88 \n99  103  62 \n\n79 \n67 \n51 \n73 \n83  59 \n\n-============================================= \n\na  fair  job  at  recognizing  the  'NIPS  1989'  at  the  bottom  of \nFigure  4.  Systems  that  encode  characters  as  bit  maps,  or \nthat  take  them  as  composed  of  strokes,  likewise  trained, \nwould  not  do  nearly  as  well.  Going  back  to  the  As  of  Figure \n1,  they  can,  with  one  exception,  be  recognized  based  on  the \nmap  of  a  normal  A.  Logograms  are  a  rich  source  of  images  of \nthis  kind.  They  are  excellent  for  testing  a  vision  system \nfor  generality.  Finally,  other  oriented  fields,  not  just \ncontour  maps,  can  be  encoded  with  methods  similar  to  this \nfor  recognition  by  an  associative  memory. \n\nAcknowledgements \n\nThis  research  was  supported  by  the  National  Aeronautics  and \nSpace  Administration  (NASA)  with  cooperative  agreement  No. \nNCC2-387  with  the  Universities  Space  Research  Association. \nThe  idea  of  contour  maps  was  inspired  by  the  gridfonts  of \nDouglas  Hofstadter  (1985).  The  first  experiments  with  the \ncontour-map  method  were  done  by  Bruno  Olshausen.  The \ngravity  model  arose  from  discussions  with  Lauri  Kanerva. \nDavid  Rogers  made  the  computer-drawn  illustrations. \n\nReferences \n\nDenker,  J.S.,  Gardner,  W.R.,  Graf,  H.P.,  Henderson,  D., \n\nHoward,  R.E.,  Hubbard,  W.,  Jackel,  L.D.,  Baird,  H.S.,  and \nGuyon,  I. \nWritten  Zip  Code  Digits. \nIn  D.S.  Touretzky  (ed.), \nAdvances  in  Neural  Information  Systems,  Volume  I. \nSan  Mateo,  California:  Kaufmann.  323-331. \n\n(1989)  Neural  Network  Recognizer  for  Hand(cid:173)\n\nHofstadter,  D.R. \n\n(1985)  Metamagical  Themas.  New  Your: \n\nBasic  Books. \n\nParent,  P.,  and  Zucker,  S.W. \n\n(1985)  Trace  Inference, \n\nCurvature  Consistency,  and  Curve  Detection.  Report  CIM-\n86-3,  McGill  Research  Center  for  Intelligent  Machines, \nMontreal,  Canada. \n\nWatson,  A.W. \n\n(1987)  The  Cortex  Transform:  Rapid \n\nComputation  of  Simulated  Neural  Images.  Computer  Vision, \nGraphics,  and  Image  Processing  39(3) :311-327. \n\n\f", "award": [], "sourceid": 190, "authors": [{"given_name": "Pentti", "family_name": "Kanerva", "institution": null}]}