{"title": "A SNoW-Based Face Detector", "book": "Advances in Neural Information Processing Systems", "page_first": 862, "page_last": 868, "abstract": null, "full_text": "A  SNoW-Based Face Detector \n\nMing-Hsuan Yang \nNarendra Ahuja \nDepartment of Computer Science and the Beckman Institute \n\nDan Roth \n\nUniversity of Illinois at Urbana-Champaign \n\nUrbana, IL  61801 \n\nmhyang~vision.ai.uiuc.edu  danr~cs.uiuc.edu  ahuja~vision.ai.uiuc.edu \n\nAbstract \n\nA novel learning approach for human face detection using a network \nof linear units is  presented.  The SNoW  learning architecture is  a \nsparse network  of linear functions  over a  pre-defined or incremen(cid:173)\ntally learned feature  space  and  is  specifically tailored for  learning \nin the presence of a  very large number of features.  A wide range of \nface  images in different poses,  with different expressions and under \ndifferent  lighting conditions  are  used  as  a  training  set  to  capture \nthe  variations of human faces.  Experimental results on commonly \nused benchmark data sets of a wide range of face images show that \nthe  SNoW-based  approach  outperforms  methods  that  use  neural \nnetworks,  Bayesian  methods,  support  vector  machines  and  oth(cid:173)\ners.  Furthermore, learning and  evaluation using  the  SNoW-based \nmethod are significantly more efficient  than with other methods. \n\nIntroduction \n\n1 \nGrowing interest in intelligent human computer interactions has motivated a  recent \nsurge in research on problems such as face  tracking, pose estimation, face expression \nand gesture recognition.  Most methods, however, assume human faces in their input \nimages have been detected and localized. \n\nGiven a single image or a sequence of images, the goal of face detection is to identify \nand locate human faces  regardless of their positions, scales, orientations, poses and \nillumination.  To  support automated solutions for  the  above  applications,  this has \nto be done efficiently and robustly.  The challenge in building an efficient and robust \nsystem for  this problem stems from  the fact  that human faces  are highly non-rigid \nobjects with a  high degree of variability in size,  shape,  color and texture. \n\nNumerous  intensity-based methods  have  been  proposed  recently  to  detect  human \nfaces  in  a  single  image  or  a  sequence  of images.  Sung  and  Poggio  [24J  report  an \nexample-based learning approach for locating vertical frontal views of human faces. \nThey  use  a  number  of  Gaussian  clusters  to  model  the  distributions  of  face  and \nnon-face patterns.  A small window is moved over an image to determine whether a \nface  exists using  the estimated distributions.  In  [16],  a  detection algorithm is  pro(cid:173)\nposed that combines template matching and feature-based  detection method using \nhierarchical  Markov  random  fields  (MRF)  and  maximum  a  posteriori  probability \n(MAP)  estimation.  Colmenarez and Huang [4)  apply Kullback relative information \nfor  maximal discrimination between positive and negative examples of faces.  They \nuse  a  family  of discrete  Markov processes to model faces  and  background patterns \nand estimate the  density  functions.  Detection of a  face  is  based  on  the  likelihood \n\n\fA SNoW-Based Face Detector \n\n863 \n\nratio  computed  during  training.  Moghaddam  and  Pentland  [12]  propose  a  prob(cid:173)\nabilistic  method  that  is  based  on  density  estimation in  a  high  dimensional  space \nusing an eigenspace decomposition.  In [20],  Rowleyet al.  use an ensemble of neural \nnetworks to learn face and non-face patterns for face detection.  Schneiderman et al. \ndescribe a probabilistic method based on local appearance and principal component \nanalysis [23].  Their method gives some preliminary results on profile face  detection. \nFinally, hidden Markov models [17],  higher order statistics [17],  and support vector \nmachines  (SVM)  [14]  have  also  been  applied  to face  detection  and  demonstrated \nsome success in detecting upright frontal faces  under certain lighting conditions. \n\nIn  this  paper,  we  present  a  face  detection  method  that  uses  the  SNoW  learning \narchitecture [18, 3]  to detect faces with different features and expressions, in different \nposes, and under different lighting conditions.  SNoW  (Sparse Network of Winnows) \nis  a  sparse  network of linear functions  that  utilizes the  Winnow  update  rule  [10]. \nSNoW is specifically tailored for learning in domains in which the potential number \nof features taking part in decisions is very large, but may be unknown a priori.  Some \nof the  characteristics of this learning architecture are its sparsely connected units, \nthe  allocation of features  and  links  in  a  data driven  way,  the  decision  mechanism \nand the utilization of an efficient update rule.  SNoW has been used successfully on \na  variety of large scale learning tasks in the natural language domain  [18,  13, 5,  19] \nand this is its first  use in the visual processing domain. \n\nIn  training the  SNoW-based face  detector,  we  use  a  set  of 1,681  face  images from \nOlivetti  [22],  UMIST  [6],  Harvard  [7],  Yale  [1]  and  FERET  [15]  databases to  cap(cid:173)\nture  the variations in face  patterns.  In  order to compare our approach  with other \nmethods, our experiments involve two  benchmark data sets  [20,  24]  that have been \nused in other works on face detection.  The experimental results on these benchmark \ndata sets  (which  consist  of 225  images  with 619  faces)  show  that our method out(cid:173)\nperforms all other methods evaluated on this problem, including those using neural \nnetworks [20],  Kullback relative information [4],  naive Bayes [23]  and support vector \nmachines [14],  while being significantly more efficient  computationally.  Along with \nthese experimental results we describe further experiments that provide insight into \nsome  of the  theoretical  and  practical  considerations of SNoW-based  learning sys(cid:173)\ntems.  In  particular,  we  study the effect  of learning with  primitive as  well  as  with \nmulti-scale features,  and discuss some of the sources of the success of the approach. \n2  The SN oW System \nThe SNoW (Sparse Network of Winnows)  learning architecture is a sparse network \nof linear  units  over  a  common  pre-defined  or incrementally learned feature  space. \nNodes  in  the  input  layer of the  network  represent  simple  relations over  the  input \nand are being used as the input features.  Each linear unit is called a  target node and \nrepresents relations  which  are  of  interest  over  the  input  examples;  in  the  current \napplication, only two target nodes are being used, one as a representation for a  face \npattern and the other for  a  non-face pattern.  Given a set of relations  (Le.,  types of \nfeatures) that may be of interest in the input image, each input image is mapped into \na  set  of features  which  are  active  (present)  in  it;  this  representation  is  presented \nto  the  input  layer  of SNoW  and  propagates  to  the  target  nodes.  (Features  may \ntake either binary value, just indicating the fact  that the feature is active  (present) \nor  real  values,  reflecting  its  strength;  in  the  current  application,  all  features  are \nbinary.  See  Sec  3.1.)  Target  nodes  are linked  via weighted edges  to  (some  of the) \ninput  features.  Let  At  =  {i 1 ,  ... , i m }  be  the  set  of features  that  are  active  in  an \nexample and  are  linked  to the  target  node  t.  Then the  linear unit  is  active if and \nonly if 2:iEAt wf  > Ot,  where wf  is the weight on the edge connecting the ith feature \nto the target node t,  and Ot  is  its threshold. \nIn the current application a single SNoW  unit which includes two subnetworks, one \n\n\f864 \n\nM-H  Yang,  D.  Roth and N.  Ahuja \n\nfor  each of the targets,  is  used.  A  given example is  treated autonomously by each \ntarget subnetwork; that is,  an image labeled as a face  is used  as  a  positive example \nfor the face target and as a negative example for the non-face target, and vice-versa. \n\nThe learning policy is on-line and mistake-driven; several update rules can be used \nwithin  SNoW.  The  most  successful  update  rule,  and  the  only  one  used  in  this \nwork is  a  variant of Littlestone's Winnow update rule, a  mUltiplicative update rule \ntailored to the situation in  which the set of input features is  not  known a priori, as \nin the  infinite  attribute model  [2].  This mechanism  is  implemented via the sparse \narchitecture  of SNoW.  That  is,  (1)  input  features  are  allocated  in  a  data  driven \nway  - an  input  node  for  the  feature  i  is  allocated  only  if  the  feature  i  is  active \nin  the  input  image and  (2)  a  link  (Le.,  a  non-zero  weight)  exists  between  a  target \nnode  t and a  feature i  if and only if i  has been active in an image labeled  t.  Thus, \nthe architecture also supports augmenting the feature  types at later stages or from \nexternal sources in a  flexible  way,  an option we  do  not  use  in the current work. \nThe Winnow  update  rule  has,  in addition  to the threshold fh  at  the target  t,  two \nupdate  parameters:  a  promotion  parameter a  > 1 and  a  demotion  parameter 0 < \nf3  < 1.  These are being used to update the current representation of the target t (the \nset of weights w;) only when a mistake in prediction is made.  Let  At = {il' ... , i m } \nbe  the set  of active features  that  are  linked  to  the  target  node  t.  If the  algorithm \npredicts 0 (that is,  LiEAt w~ ::; fh)  and the received label is  1,  the active weights in \nthe current example are  promoted in a mUltiplicative fashion:  'Vi  E  At, wf  +- a . w~. \nIf the  algorithm predicts 1  (LiEAt wf  > Ot)  and the received  label is  0,  the  active \nweights in the current example are demoted:  'Vi  E At, w~ +- f3. wf.  All other weights \nare  unchanged.  The  key  property of the  Winnow  update rule is  that  the  number \nof examplesl  it  requires to learn a  linear function  grows  linearly with  the  number \nof  relevant  features  and  only  logarithmically  with  the  total  number  of  features. \nThis property seems crucial in domains in  which  the number of potential features \nis  vast,  but a  relatively small number of them is  relevant  (this does not  mean that \nonly  a  small  number  of them  will  be  active,  or  have  non-zero  weights).  Winnow \nis  known  to learn efficiently any linear threshold  function  and to  be  robust  in  the \npresence of various kinds of noise and in cases where no linear-threshold function can \nmake perfect classification, and still maintain its abovementioned dependence on the \nnumber of total and relevant attributes [11,  9].  Once target subnetworks have been \nlearned  and  the  network  is  being  evaluated,  a  winner-take-all  mechanism  selects \nthe  dominant active  target  node  in  the  SNoW  unit  to  produce  a  final  prediction. \nIn general, but not in  this work,  units' output may be cached and processed along \nwith the output of other SNoW units to produce a  coherent output. \n3  Learning to detect  faces \nFor training, we  use a set of 1,681 face  images (collected from  Olivetti [22],  UMIST \n[6],  Harvard  [7],  Yale  [1]  and  FE RET  [15]  databases)  which  have  wide  variations \nin  pose,  facial  expression  and  lighting  condition.  For  negative  examples  we  start \nwith 8,422  non-face examples from  400  images of landscapes,  trees,  buildings, etc. \nAlthough it is extremely difficult to collect a representative set of non-face examples, \nthe bootstrap method [24]  is used to include more non-face examples during training. \nFor positive examples, each face  sample is  manually cropped  and  normalized such \nthat  it  is  aligned  vertically and  its size  is  20  x  20  pixels.  To  make  the  detection \nmethod less sensitive to scale and rotation variation, 10 face examples are generated \nfrom  each  original  sample.  The  images  are  produced  by  randomly  rotating  the \nimages  by  up  to  15  degrees  with  scaling  between  80%  and  120%.  This  produces \n16,810  face  samples.  Then,  histogram  equalization  is  performed  that  maps  the \n\nlIn the on-line  setting  [10]  this  is  usually  phrased in  terms of a  mistake-bound but is \n\nknown to imply convergence in the PAC  sense  [25,  8]. \n\n\fA SNoW-Based Face Detector \n\n865 \n\nintensity values  to expand the range of intensities.  The same procedure is applied \nto input images in  detection phase. \n\n3.1  Primitive Features \nThe SNoW-based face  detector makes use of Boolean features that encode the po(cid:173)\nsitions and intensity values of pixels.  Let the pixel at (x, y)  of an image with width \nwand height  h  have  intensity value  I(x, y)  (O  :::;  I{x, y)  :::;  255).  This  information \nis  encoded as  a feature whose  index is  256{y * w + x) + I{x, y).  This representation \nensures that different points in the {position  x  intensity} space are mapped to \ndifferent  features.  (That is,  the feature indexed  256{y * w + x) + I{x, y)  is  active if \nand only if the intensity in position (x, y)  is  I{x, y).)  In our experiments, the values \nfor wand hare 20 since each face sample has been normalized to an image of 20 x 20 \npixels.  Note  that  although  the  number of potential features  in  our  representation \nis  102400 (400 x 256), only 400 of those are active (present)  in each example, and it \nis  plausible that many features  will never be active.  Since the algorithm's complex(cid:173)\nity depends on  the  number of active features  in  an  example,  rather than  the  total \nnumber of features,  the sparseness also ensures efficiency. \n\n3.2  Multi-scale Features \nMany  vision  problems  have  utilized  multi-scale features  to  capture  the  structures \nof an object.  However, extracting detailed multi-scale features using edge or region \ninformation from segmentation is a computationally expensive task.  Here we use the \nSNo W paradigm to extract Boolean features that represent multi-scale information. \nThis  is  done  in  a  similar  way  to  the  {position  x  intensity} used  in  Sec.  3.1, \nonly that in this case we encode, in addition to position, the mean and variance of a \nmulti-scale pixel.  The hope is  that the multi-scale feature  will capture information \nthat  otherwise  requires  many  pixel-based  features  to  represent,  and  thus  simplify \nthe  learning problem.  Uninformative  multi-scale features  will  be  quickly assigned \nlow  weights  by  the  learning  algorithm  and  will  not  degrade  performance.  Since \neach face  sample is normalized to be a rectangular image of the same size, it suffices \nto  consider  rectangular  sub-images  with  varying  size  from  face  samples,  and  for \neach generate features in terms of the means and variances of their intensity values. \nEmpirical results show that faces  can  be described effectively this way. \nInstead  of using the  absolute  values  of the  mean and  variance  when  encoding the \nfeatures,  we  discretize  these  values  into a  predefined  number of classes.  Since  the \ndistribution  of the  mean  values  as  well  as  the  variance  values  is  normal,  the  dis(cid:173)\ncretization  is  finer  near  the  means  of  these  distributions.  The  total  number  of \nvalues  was  determined  empirically to  be  100,  out  of which  80  ended  up  near  the \nmean.  Given that, we  use the same scheme as in Sec.  3.1  to map the {position  x \nintensi ty mean  x  intensity  variance} space  into  the  Boolean  feature  space. \nThis is  done separately for  four  different  sub-image scales, of 1 x  1,  2 x  2,  4 x  4 to \n10 x 10 pixels.  The multi-scale feature vector consists of active features correspond(cid:173)\ning  to all these scales.  The number of active features  in  each  example is  therefore \n400 + 100 + 25 + 4,  although the total number of features is much larger. \nIn recent work we have used more sophisticated conjunctive features for this purpose \nyielding even  better results.  However,  the  emphasis  here  is  that  with  the  SNoW \napproach, even very simplistic features  support excellent performance. \n4  Empirical Results \nWe  tested  the  SNoW-based  approach  with  both  sets  of features  on  the  two  sets \nof images  collected  by  Rowley  [20],  and  Sung  [24].  Each image  is  scanned  with  a \nrectangular window  to determine  whether  a  face  exists  in  the  window  or  not.  To \ndetect  faces  of  different  scales,  each  input  image  is  repeatedly  subsampled  by  a \nfactor  of  1.2  and  scanned  through  for  10  iterations.  Table  1  shows  the  reported \n\n\f866 \n\nM-H.  Yang,  D.  Roth and N  Ahuja \n\nexperimental results of the SNoW-based  face  detectors  and  several  face  detection \nsystems  using  the  two  benchmark data sets  (available at http://www.cs.cmu.edu/ \n-har/ faces.html).  The first  data set  consists of 130 images with  507 frontal  faces \nand  the  second  data  set  consists  of 23  images  with  155  frontal  faces.  There  are \na  few  hand drawn faces  and  cartoon faces  in  both  sets.  Since  some  methods  use \nintensity values as  their features,  systems 1-4 and 7 discard these such hand drawn \nand cartoon faces.  Therefore,  there are  125  images with 483  faces  in  test set  1 and \n20  images with  136 faces  in  test  set  2 respectively.  The  reported detection rate is \ncomputed as  the ratio between  the number of faces  detected  in  the images by  the \nsystem and  the  number of faces  identified  there  by  humans.  The number of false \ndetections is  the number of non-faces detected as faces. \nIt is  difficult  to  evaluate  the  performance  of different  methods  even  though  they \nuse  the  same  benchmark  data  sets  because  different  criteria  (e.g. \ntraining  time, \nnumber of training examples involved, execution time,  number of scanned  windows \nin  detection)  can  be  applied  to favor  one  over  another.  Also,  one  can  tune  the \nparameters of one's method to increase the detection rates while increasing also the \nfalse  detections.  The  methods  using  neural networks  [20],  distribution-based  [24], \nKullback relative information [4]  and  naive Bayes  [23]  report several experimental \nresults based on different sets of parameters.  Table 1 summarizes the best detection \nrates  and  corresponding false  detections  of these  methods.  Although  the  method \nin  [4]  has  the  highest  detection  rates  in  one  benchmark  test,  this  was  done  by \nsignificantly increasing the number of false detections.  Other than that, it is evident \nthat  the  SNoW-based  face  detectors  outperforms  others  in  terms  of  the  overall \nperformance.  These  results show  the credibility of SNoW for  these  tasks,  as  well \n\nTable 1:  Experimental results on images from test set 1 (125 images with 483 faces) \nin  [20]  and test set  2 (20  images with 136 faces)  in  [24]  (see  text for  details) \n\nMethod \nSNoW w/ priDlitive features \nSNoW wi Dlulti-scale features \nMixture of factor  analyzers  [261 \nFisher linear discriminant [271 \nDistribution-based. [24 J \nNeural network  [20J \nNaive Bayes  [23J \nKullback relative  information  [41 \nSupport vector machine [14J \n\nII \n/I  Detect Rate \n\nTest Set  1 \n\nFalse  Detects \n\n94.2'70 \n94.8% \n92.3'70 \n93.6'7. \nN~ \n92.5J\"o \n93.0% \n98.0'7. \nN/A \n\n84 \n78 \n82 \n74 \nN/A \n862 \n88 \n\n12758 \nN/A \n\nTest  Set  2 \n\nDetect Rate  1  False  Detects \n\n93.6'70 \n94.1% \n89.4'70 \n91.5'7. \n81.9% \n90.3% \n91.2% \nNjA \n74.2'7. \n\n3 \n3 \n3 \n1 \n13 \n42 \n12 \n\nNjA \n\n20 \n\nas exhibit the improvement achieved by increasing the expressiveness of the features. \nThis may indicate that further  elaboration of the features,  which  can be done in a \nvery general and flexible  way within SNoW,  would yield further improvements. \nIn addition to comparing feature sets,  we  started to investigate some of the reasons \nfor  the  success  of  SNoW  in  this  domain,  which  we  discuss  briefly  below.  Two \npotential contributions are the Winnow update rule and the architecture.  First, we \nstudied  the  update  rule  in  isolation,  independent  of the  SNoW  architecture.  The \nresults  we  got  when  using  the  Winnow  simply as  a  discriminator were  fairly  poor \n(63.9%/65.3% for  Test  Set  1,  primitive and  multi-scale features,  respectively,  and \nsimilar results for the Test Set 2.).  The results are not surprising, given that Winnow \nis used here only as a discriminator and is using only positive weights.  Investigating \nthe architecture in isolation reveals that weighting or discarding features  based on \ntheir contribution to mistakes during training, as is  done  within SNoW,  is  crucial. \nConsidering the active features uniformly (separately for faces and non-faces) yields \npoor results.  Specifically, studying the resulting SNoW network shows that the total \nnumber of features that were active with non-faces is 102,208, out of 102,400 possible \n\n\fA SNoW-Based Face Detector \n\n867 \n\n(primitive)  features.  The total number of active features  in  faces  was  only 82,608, \nmost  of which  are  active  only  a  few  times.  In  retrospect,  this  is  clear  given  the \ndiverse set of images used as negative examples, relative to the somewhat restricted \n(by  nature)  set of images that constitute faces.  (Similar phenomenon occurs  with \nthe multi-scale features,  where the numbers are 121572 and 90528, respectively, out \nof 135424.)  Overall it  exhibits  that  the  architecture,  the  learning  regime  and  the \nupdate rule all contribute significantly to the success of the approach. \n\nFigure 1 shows some faces  detected  in our experiments.  Note that profile faces  and \nfaces  under heavy illumination are detected.  Experimental results show that profile \nfaces  and faces  under different  illumination are detected  very  well  by our method. \nNote  that  although  there  may exist  several  detected  faces  around  each  face,  only \none window is  drawn to enclose each detected face  for  clear presentation . \n\n. \n\ni \n\n. . .  \n\n. f?,~' \"ru \n-\nFigure 1:  Sample experimental results using our method on images from two bench(cid:173)\nmark data sets.  Every detected face  is  shown with an enclosing window. \n\n' \n\n5  Discussion and Conclusion \nMany theoretical and experimental issues are to be addressed before a learning sys(cid:173)\ntem of this sort  can  be  used  to detect  faces  efficiently  and  robustly under  general \nconditions.  In  terms  of the face  detection  problem,  the  presented  method  is  still \nnot  able to detect  rotated faces.  A  recent  method  [21],  addresses this problem by \nbuilding upon a  upright face  detector [20]  and rotating each test sample to upright \nposition.  However,  it  suffers  from  degraded  detection rates  and  more  false  detec(cid:173)\ntions.  Given our results,  we  believe that the SNoW approach, if adapted in similar \nways,  would generalize very well to detect faces  under more general conditions. \n\nIn terms of the  SNoW  architecture, although the main ingredients of it  are  under(cid:173)\nstood theoretically, more work is required to better understand its strengths.  This \nis  increasingly  interesting  given  that  the  architecture  has  been  found  to  perform \nvery well  in large-scale problem in the natural language domain as  well \n\n\f868 \n\nM-H.  Yang,  D.  Roth and N  Ahuja \n\nThe contributions of this paper can be summarized as follows.  We  have introduced \nthe SNoW learning architecture to the domain of visual processing and described an \napproach that detect faces regardless of their poses, facial features and illumination \nconditions.  Experimental results show that this method outperforms other methods \nin terms of detection rates and false  detectionss,  while  being more efficient  both in \nlearning and evaluation. \nReferences \n[1)  P.  Belhumeur,  J.  Hespanha,  and D.  Kriegman.  Eigenfaces vs.  fisherfaces:  Recognition  using  class \nspecific  linear  projection.  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence, \n19(7):711-720,  1997. \n\n[2)  A.  Blum.  Learning  boolean functions  in an  infinite attribute space.  Machine  Learning,  9(4):373-\n\n386,  1992. \n\n[3)  A.  Carleson, C.  Cumby, J. Rosen, and D.  Roth.  The SNoW learning architecture. Technical Report \n\nUIUCDCS-R-99-2101,  UIUC  Computer Science  Department,  May  1999. \n\n[4)  A.  J.  Colmenarez and T .  S.  Huang.  Face  detection with  information-based  maximum discrimina(cid:173)\n\ntion.  In Proceedings  of the IEEE Computer Society Conference on Computer Vision  and Pattern \nRecognition,  pages  782-787,  1997. \n\n[5)  A.  R.  Golding  and  D.  Roth.  A  winnow  based  approach  to  context-sensitive  spelling  correction. \nMachine  Learning,  34:107-130,  1999.  Special Issue  on  Machine  Learning and  Natural Language. \n[6)  D.  B . Graham and N .  M.  Allinson.  Characterizing virtual eigensignatures for  general purpose face \nrecognition.  In H.  Wechsler, P.  J.  Phillips, V .  Bruce, F. Fogelman-Soulie, and T . S.  Huang, editors, \nFace  Recognition:  From  Theory to  Applications,  volume  163  of NATO  ASI Series  F,  Computer \nand  Systems  Sciences,  pages  446-456.  Springer, 1998. \n\n[7)  P.  Hallinan.  A  Deformable  Model  for  Face  Recognition  Under  Arbitrary  Lighting  Conditions. \n\nPhD  thesis,  Harvard  University,  1995. \n\n[8)  D.  Helmbold and M .  K.  Warmuth.  On weak learning.  Journal  of Computer  and System Sciences, \n\n50(3):551-573,  June  1995. \n\n[9)  J .  kivinen and M.  K .  Warmuth.  Exponentiated gradient versus  gradient descent for  linear  predic(cid:173)\n\n[10) \n\n[11) \n\n[12) \n\n[13) \n\n[14) \n\n[15) \n\n[16) \n\n[17) \n\n[18) \n\n[19) \n\n[20) \n\n[21) \n\n[22) \n\n[23) \n\n[24) \n\n[27) \n\ntors.  In  Proceedings  oj the  Annual  ACM Symposium  on  the  Theory  of Computing,  1995. \nN.  Littlestone.  Learning quickly when  irrelevant  attributes abound:  A  new  linear-threshold algo(cid:173)\nrithm.  Machine  Learning,  2:285-318,  1988. \nN.  Littlestone.  Redundant  noisy  attributes,  attribute  errors,  and  linear  threshold  learning  using \nwinnow.  In  Proceedings  oj  the  fourth  Annual  Workshop  on  Computational  Learning  Theory, \npages  147-156,  1991. \nB.  Moghaddam and A . Pentland.  Probabilistic visual learning for  object recognition.  IEEE  Trans(cid:173)\nactions  on  Pattern  Analysis  and Machine  Intelligence,  19(7):696-710,  1997. \nM.  Munoz,  V.  Punyakanok,  D.  Roth,  and D.  Zimak.  A  learning approach  to  shallow  parsing.  In \nEMNLP- VLC'99,  the  Joint  SIGDAT  Conference  on  Empirical  Methods  in  Natural  Language \nProcessing  and  Very  Large  Corpora,  June 1999. \nE.  Osuna,  R .  Freund,  and  F.  Girosi.  Training  support  vector  machines:  an  application  to  face \ndetection.  In  Proceedings  of the  IEEE  Computer  Society  Conference  on  Computer  Vision  and \nPattern  Recognition,  pages  130-136,  1997. \nP.  J.  Phillips,  H.  Moon,  S.  Rizvi ,  and  P .  Rauss.  The  feret  evaluation. \nIn  H.  Wechsler,  P.  J. \nPhillips, V .  Bruce, F.  Fogelman-Soulie, and T.  S. Huang,  editors,  Face  Recognition:  From  Theory \nto  Applications,  volume  163  of  NATO  ASI Series  F,  Computer  and  Systems  Sciences,  pages \n244-261.  Springer, 1998. \nR.  J.  Qian  and  T.  S.  Huang.  Object  detection  using  hierarchical  mrf and  map  estimation.  In \nProceedings of the  IEEE  Computer Society Conference  on  Computer  Vision  and Pattern Recog(cid:173)\nnition,  pages  186-192,  1997. \nA.  N.  Rajagopalan,  K.  S.  Kumar,  J.  Karlekar,  R.  Manivasakan,  and M .  M.  Patil.  Finding faces  in \nphotographs .  In  Proceedings  of the  Sixth  International  Conference  on  Computer  Vision,  pages \n640-645,  1998. \nD.  Roth.  Learning to resolve natural language  ambiguities:  A  unified approach.  In  Proceedings  of \nthe  Fifteenth  National  Conference  on  Artificial  Intelligence,  pages 806-813,  1998. \nD.  Roth and D.  Zelenko.  Part of speech tagging using a  network of linear separators.  In  COLING(cid:173)\nACL  98,  The  17th  Int.  Conference  on  Computational  Linguistics,  pages  1136-1142,  1998. \nH.  Rowley,  S.  Baluja,  and  T.  Kanade.  Neural  network-based  face  detection.  IEEE  Transactions \non  Pattern  Analysis  and  Machine  Intelligence,  20(1):23-38,  1998. \nH.  Rowley,  S.  Baluja,  and  T .  Kanade .  Rotation  invariant  neural  network-based  face  detection. \nIn  Proceedings  of  the  IEEE  Computer  Society  Conference  on  Computer  Vision  and  Pattern \nRecognition,  pages  38-44,  1998. \nF.  S.  Samaria.  Face  Recognition  Using  Hidden  Markov  Models.  PhD  thesis,  University  of Cam(cid:173)\nbridge,  1994. \nH.  Schneiderman  and  T .  Kanade.  Probabilistic  modeling  of  local  appearance  and  spatial  rela(cid:173)\ntionships  for  object  recognition.  In  Proceedings  of the  IEEE  Computer  Society  Conference  on \nComputer  Vision  and  Pattern  Recognition,  pages 45- 51,  1998. \nK.-K.  Sung  and T .  Poggio.  Example-based  learning  for  view-based  human  face  detection .  IEEE \nTransactions  on  Pattern  Analysis  and  Machine  Intelligence,  20(1):39-51,  1998. \nL.  G.  Valiant.  A  theory of the learnable.  Commun.  ACM,  27(11):1134-1142,  Nov.  1984. \nM .-H.  Yang,  N.  Ahuja,  and  D.  Kriegman .  Face  detection using  a  mixture  of factor  analyzers .  In \nProce'edings  of the  IEEE  International  Conference  on  Image  Processing,  1999 . \nM.-H.  Yang,  N.  Ahuja,  and D .  Kriegman.  Mixtures of linear subspaces for  face  detection.  In  Pro(cid:173)\nceedings  of the  Foruth  IEEE  International  Conference  on  Automatic  Face  and  Gesture  Recog-\nnition,  2000. \n\n\f", "award": [], "sourceid": 1747, "authors": [{"given_name": "Ming-Hsuan", "family_name": "Yang", "institution": null}, {"given_name": "Dan", "family_name": "Roth", "institution": null}, {"given_name": "Narendra", "family_name": "Ahuja", "institution": null}]}