{"title": "Performance Comparisons Between Backpropagation Networks and Classification Trees on Three Real-World Applications", "book": "Advances in Neural Information Processing Systems", "page_first": 622, "page_last": 629, "abstract": null, "full_text": "622 \n\nAtlas, Cole, Connor, EI-Sharkawi, Marks, Muthusamy and Barnard \n\nPerformance  Comparisons Between \n\nBackpropagation Networks  and Classification  Trees \n\non  Three  Real-World Applications \n\nLes Atlas \nDept.  of EE. Fr -10 \nUniversity of Washington \nSeattle. Washington 98195 \n\nRonald  Cole \nDept.  of CS&E \nOregon Graduate  Institute \nBeaverton.  Oregon 97006 \n\nJerome Connor, Mohamed EI-Sharkawi, and Robert J. Marks II \n\nUniversity  of Washington \n\nYeshwant Muthusamy \nOregon Graduate Institute \n\nEtienne Barnard \nCarnegie-Mellon University \n\nABSTRACT \n\nMulti-layer  perceptrons  and  trained  classification  trees  are  two  very \ndifferent  techniques  which  have  recently  become  popular.  Given \nenough  data  and  time,  both  methods  are  capable  of performing  arbi(cid:173)\ntrary  non-linear  classification.  We  first  consider  the  important \ndifferences  between  multi-layer  perceptrons  and  classification  trees \nand  conclude  that  there  is  not enough  theoretical  basis  for  the  clear(cid:173)\ncut  superiority  of one  technique  over  the  other.  For  this  reason,  we \nperformed  a  number  of empirical  tests  on  three  real-world  problems \nin  power  system  load  forecasting,  power  system  security  prediction, \nand  speaker-independent  vowel  identification.  In  all  cases,  even  for \npiecewise-linear  trees,  the  multi-layer  perceptron  performed  as  well \nas or better than  the trained classification  trees. \n\n\fPerformance Comparisons \n\n623 \n\n1  INTRODUCTION \nIn  this  paper  we  compare  regression  and  classification  systems.  A  regression  system \ncan  generate  an  output  f  for  an  input  X,  where  both  X  and  f  are  continuous  and, \nperhaps,  multi-dimensional.  A  classification  system  can  generate  an  output  class,  C, \nfor  an  input X, where X  is  continuous  and  multi-dimensional  and  C is  a member  of a \nfinite  alphabet. \nThe  statistical \ntechnique  of  Classification  And  Regression  Trees  (CART)  was \ndeveloped  during  the  years  1973  (Meisel  and  Michalpoulos)  through  1984  (Breiman  el \nal).  As  we  show  in  the  next  section,  CART,  like  the  multi-layer  perceptron  (MLP) , \ncan  be  trained  to  solve  the  exclusive-OR  problem.  Furthermore,  the  solution  it  pro(cid:173)\nvides  is  extremely easy  to  interpret.  Moreover,  both CART and MLPs  are able to  pro(cid:173)\nvide  arbitrary  piecewise  linear  decision  boundaries.  Although  there  have  been  no \nlinks  made  between  CART  and  biological  neural  networks,  the  possible  applications \nand paradigms  used  for  MLP and CART are very  similar. \nThe  authors  of this  paper represent  diverse  interests  in  problems  which  have  the  com(cid:173)\nmonality  of being  both  important  and  potentially  well-suited  for  trainable  classifiers. \nThe  load  forecasting  problem,  which  is  partially  a  regression  problem,  uses past load \ntrends  to  predict  the  critical  needs  of future  power  generation.  The  power  security \nproblem  uses  the classifier as  an  interpolator of previously known  states  of the system. \nThe  vowel  recognition  problem  is  representative  of  the  difficulties  in  automatic \nspeech recognition  due  to  variability  across speakers and phonetic  context. \nIn  each  problem  area,  large  amounts  of real  data  were  used  for  training  and  disjoint \ndata  sets  were  used  for  testing.  We  were careful  to  ensure  that  the experimental  con(cid:173)\nditions  were  identical  for  the  MLP  and  CART.  We  concentrated  only  on  performance \nas  measured  in  error  on  the  test  set  and  did  not  do  any  formal  studies  of training  or \ntesting  time.  (CART  was, in  general,  quite a bit faster.) \nIn  all  cases,  even  with  various  sizes  of training  sets,  the  multi-layer  perceptron  per(cid:173)\nformed  as  well  as  or better  than  the  trained  classification  trees.  We  also  believe  that \nintegration  of many  of CART's  well-designed  attributes  into  MLP  architectures  could \nonly improve the already promising performance of MLP's. \n\n2  BACKGROUND \n\n2.1  Multi-Layer  Perceptrons \nThe  name  \"artificial  neural  networks\"  has  in  some  commumbes  become  almost \nsynonymous  with  MLP's trained  by back-propagation.  Our power studies made  use of \nthis  standard  algorithm  (Rumelhart  el  ai,  1986)  and  our  vowel  studies  made  use  of a \nconjugate  gradient  version  (Barnard  and  Casasent,  1989)  of back-propagation.  In  all \ncases  the  training  data  consisted  of ordered  pairs  (X ,f)}  for  regression,  or  (X ,C)} \nfor  classification.  The  input  to  the  network  is  X  and  the  output  is,  after  training, \nhopefully very close to  f  or C. \nWhen  MLP's are  used  for  regression,  the  output,  f, can  take on  real  values  between 0 \nand  1.  This  normalized scale  was  used  as  the  prediction  value  in  the  power forecast(cid:173)\ning problem.  For MLP classifiers the output is formed  by taking  the  (0,1)  range of the \noutput  neurons  and  either  thresholding  or  finding  a  peak.  For example,  in  the  vowel \n\n\f624 \n\nAtlas, Cole, Connor, El-Sharkawi, Marks, Muthusamy and Barnard \n\nstudy we chose the maximum of the  12 output neurons  to indicate the vowel  class. \n\n2.2  Classification and Regression Trees (CART) \nCART  has  already  proven  to  be  useful  in  diverse  applications  such  as  radar  signal \nclassification,  medical  diagnosis,  and  mass  spectra classification  (Breiman  et  ai,  1984). \nGiven  a  set  of training  examples  {(X ,C)}, a binary  tree  is  constructed  by sequentially \npartitioning  the  p -dimensional  input  space,  which  may  consist  of quantitative  and/or \nqualitative  data,  into  p -dimensional  polygons.  The  trained  classification  tree  divides \nthe  domain  of the  data into  non-overlapping regions,  each  of which  is  assigned a class \nlabel  C.  For  regression,  the  estimated  function  is  piecewise  constant  over  these  re(cid:173)\ngions. \nThe  first  split  of the  data  space  is  made  to  obtain  the  best  global  separation  of the \nclasses.  The  next  step in  CART  is  to  consider the partitioned  training  examples  as  two \ncompletely  unrelated  sets-those examples  on  the  left of the  selected  hyper-plane,  and \nthose  on  the  right.  CART then  proceeds as in  the first step,  treating  each subset of the \ntraining  examples  independently.  A question  which  had  long  plagued  the  use of such \nsequential  schemes  was:  when  should  the  splitting  stop?  CART  implements  a  novel, \nand  very  clever  approach;  splits  continue  until  every  training  example  is  separated \nfrom  every  other,  then  a  pruning  criterion  is  used  to  sequentially  remove  less  impor(cid:173)\ntant splits. \n\n2.3  Relative  Expectations of MLP and CART \nThe  non-linearly  separable  exclusive-OR  problem  is  an  example  of a  problem  which \nboth  MLP  and  CART  can  solve  with  zero  error.  The  left  side  of Figure  1 shows  a \ntrained  MLP  solution  to  this  problem  and  the right  side  shows  the  very  simple  trained \nCART  solution.  For  the  MLP  the  values  along  the  arrows represent  trained  multipli(cid:173)\ncative  weights  and  the  values  in  the  circles  represent  trained  scalar offset values.  For \nthe CART  figure,  y and n represent yes or no  answers to  the  trained  thresholds and the \nvalues  in  the  circles  represent  the  output  Y.  It  is  interesting  that  CART did  not  train \ncorrectly  for  equal  numbers  of the  four  different  input  cases  and  that one extra exam(cid:173)\nple of one of the  input cases  was sufficient to  break  the  symmetry  and  allow  CART to \n(Note  the  similarity  to  the  well-known  requirement  of  random  and \ntrain  correctly. \ndifferent  initial  weights  for  training  the MLP). \n\n~ \n\ny~ 08 \n\nFigure 1:  The MLP and CART solutions to the exclusive-OR  problem. \n\n\fPerformance Comparisons \n\n625 \n\nCART  trains  on  the  exclusive-OR  very  easily  since  a  piecewise-linear  partition  in  the \ninput  space  is  a  perfect  solution.  In general,  the  MLP  will  construct classification  re(cid:173)\ngions  with  smooth  boundaries,  whereas  CART  will  construct  regions  with  \"sharp\" \ncomers  (each  region  being,  as  described  previously,  an  intersection  of  half  planes). \nWe  would  thus  expect  MLP  to  have  an  advantage  when classification  boundaries  tend \nto be smooth  and CART to  have an  advantage when they are sharper. \nOther important differences between  MLP and CART include: \nFor  an  MLP  the  number  of  hidden  units  can  be  selected  to  avoid  overfitting  or \nunderfitting  the  data.  CART  fits  the  complexity  by  using  an  automatic  pruning  tech(cid:173)\nnique  to  adjust the size of the  tree.  The selection of the number of hidden  units  or the \ntree size was implemented in  our experiments  by  using data from  a second training set \n(independent of the  first). \nAn  MLP  becomes  a  classifier  through  an  ad  hoc  application  of thresholds  or  peak.(cid:173)\npicking  to  the  output  value(s).  Great  care  has  gone  into  the  CART  splitting  rules \nwhile the  usual  MLP approach  is rather arbitrary. \nA  trained  MLP  represents  an  approximate  solution  to  an  optimization  problem.  The \nsolution  may  depend  on  initial  choice  of weights  and  on  the  optimization  technique \nused.  For  complex  MLP's  many  of the  units  are  independently  and  simultaneously \nadjusting  their weights to  best minimize output error. \nMLP  is  a  distributed  topology  where  a  single  point  in  the  input  space  can  have  an \neffect  across  all  units  or  analogously,  one  weight,  acting  alone,  will  have  minimal \naffect  on  the  outputs.  CART  is  very  different  in  that  each  split  value  can  be mapped \nonto  one  segment  in  the  input  space.  The  behavior  of CART  makes  it  much  more \nuseful  for  data  interpretation.  A  trained  tree  may  be  useful  for  understanding  the \nstructure  of  the  data.  The  usefulness  of MLP's  for  data  interpretation  is  much  less \nclear. \nThe  above  points,  when  taken  in  combination,  do  not  make  a  clear  case  for  either \nMLP or CART to be superior for  the best performance as a  trained classifier.  We thus \nbelieve  that  the  empirical  studies  of  the  next  sections,  with  their  consistent  perfor(cid:173)\nmance  trends,  will  indicate which  of the comparative aspects are the most significant. \n\n3  LOAD FORECASTING \n\n3.1  The Problem \nThe ability  to  predict  electric  power  system  loads  from  an  hour  to  several  days  in  the \nfuture  can  help  a  utility  operator  to  efficiently  schedule  and  utilize  power generation. \nThis  ability  to  forecast  loads  can  also  provide  information  which  can  be  used  to  stra(cid:173)\ntegically  trade energy  with  other generating  systems.  In  order for these forecasts  to  be \nuseful  to an operator,  they must be accurate and computationally  efficient. \n\n3.2  Methods \nHourly  temperature and  load  data for  the  Seattle{facoma area were  provided  for  us  by \nthe  Puget  Sound  Power  and  Light  Company.  Since  weekday  forecasting  is  a  more \ncritical  problem  for  the power industry  than  weekends,  we selected the  hourly  data for \n\n\f626 \n\nAtlas, Cole, Connor, El\u00b7Sharkawi, Marks, Muthusamy and Barnard \n\nall  Tuesdays  through  Fridays  in  the  interval  of November  1,  1988  through January  31, \n1989.  These  data  consisted  of  1368  hourly  measurements  that  consisted  of  the  57 \ndays of data collected. \nThese  data  were  presented  to  both  the  MLP  and  the  CART  classifier  as  a  6-\ndimensional  input  with  a single,  real-valued  output.  The  MLP required  that all values \nbe  normalized  to  the  range  (0,1).  These  same  normalized  values  were  used  with  the \nCART  technique.  Our  training  and  testing  process  consisted  of training  the  classifiers \non  53  days  of the  data  and  testing  on  the 4 days  left over  at  the  end of January  1989. \nOur  training  set consisted  of 1272  hourly  measurements  and  our  test  set contained  96 \nhourly readings. \nThe  MLP  we  used  in  these  experiments  had  6  inputs  (Plus  the  trained  constant  bias \nterm)  10  units  in  one  hidden  layer and one output.  This  topology  was chosen by mak(cid:173)\ning use of data outside the training  and  test sets. \n\n3.3  Results \n\nWe  used  an  11  norm  for  the  calculation  of error rates  and  found  that  both  techniques \nworked  quite  well.  The  average  error  rate  for  the  :MLP  was  1.39%  and  CART  gave \n2.86%  error.  While  this  difference  (given  the  number of testing  points)  is  not statisti(cid:173)\ncally  significant.  it is  worth  noting  that  the  trained  MLP  offers  performance  which  is \nat  least  as  good  as  the  current  techniques  used  by  the  Puget  Sound  Power  and  Light \nCompany  and  is  currently being  verified for  application  to  future  load prediction. \n\n4  POWER SYSTEM SECURITY \nThe assessment  of security  in  a power system  is  an  ongoing  problem  for  the  efficient \nand  reliable  generation  of electric  power.  Static  security  addresses  whether.  after  a \ndisturbance.  such  as  a  line  break  or  other  rapid  load  change.  the  system  will  reach  a \nsteady  state  operating  condition  that  does  not  violate  any  operating  constraint  and \ncause a  \"brown-out\"  or  \"black-out.\" \nThe  most  efficient  generation  of power  is  achieved  when  the  power system  is  operat(cid:173)\ning  near  its  insecurity  boundary.  In  fact.  the  ideal  case  for  efficiency  would  be  full \nknowledge  of the  absolute  boundaries of the secure regions.  Due  to  the complexity of \nthe  power  systems,  this  full  knowledge  is  impossible.  Load  flow  algorithms,  which \nare based on  iterative  solutions  of nonlinearly  constrained  equations, are conventional(cid:173)\nly  used  to  slowly  and  accurately  determine  points  of security  or  insecurity.  In  real \nsystems  the  trajectories  through  the  regions  are  not  predictable  in  fine  detail.  Also \nthese changes can  happen  too fast  to  compute  new  results  from  the  accurate  load  flow \nequations. \nWe  thus  propose  to  use  the  sparsely  known  solutions  of the  load  flow  equations  as  a \ntraining  set  The  test  set consists of points  of unknown  security.  The error of the  test \nset  can  then  be computed  by comparing  the result of the  trained classifier to  load  flow \nequation  solutions. \nOur  technique  for  converting  this  problem  to  a  problem  for  a  trainable  classifier  in(cid:173)\nvolves  defining  a  training  set  ((X ,C\u00bb)  where  X  is  composed  of real  power,  reactive \npower,  and  apparent  power  at  another  bus.  This  3-dimensional  input  vector  is  paired \nwith  the  corresponding  security  status  (C=l  for  secure  and  C=O  for  insecure).  Since \n\n\fPerformance Comparisons \n\n627 \n\nthe  system  was  small,  we  were  able  to  generate  a  large  number  of data  points  for \ntraining  and  testing.  In  fact,  well  over 20,000  total  data  points  were  available  for  the \n(disjoint)  training  and  test sets. \n\n4.1  Results \nWe  observed  that  for  any  choice  of training  data  set size,  the  error rate  for  the  MLP \nwas  always  lower  than  the  rate  for  the  CART  classifier.  At  10,000 points  of training \ndata,  the  MLP  had  an  error  rate  of 0.78%  and  CART  has  an  error  rate  of  1.46%. \nWhile  both  of these  results  are  impressive.  the  difference  was  statistically  significant \n(p>.99). \nIn  order  to  gain  insight  into  the  reasons  for  differences  in  importance,  we  looked  at \nclassifier  decisions  for  2-dimensional  slices  of  the  input  space.  While  the  CART \nboundary  sometimes  was  a  better  match,  certain  pathological  difficulties  made  CART \nmore  error-prone  than  the  MLP.  Our  other  studies  also  showed  that  there  were  worse \ninterpolation characteristics  for  CART.  especially for  sparse data.  Apparently,  starting \nwith  nonlinear  combinations  of inputs.  which  is  what  the  MLP  does.  is  better  for  the \naccurate fit  than  the  stair-steps of CART. \n\n5  SPEAKER-INDEPENDENT VOWEL CLASSIFICATION \nSpeaker-independent classification  of vowels excised  from  continuous speech  is  a most \ndifficult  task  because  of the  many  sources  of  variability  that  influence  the  physical \nrealization  of a  given  vowel.  These  sources  of variability  include  the  length  of the \nspeaker's  vocal  tract,  phonetic  context  in  which  the  vowel  occurs,  speech  rate  and \nsyllable stress. \nTo  make  the  task  even  more difficult  the  classifiers were presented  only  with  informa(cid:173)\ntion  from  a  single  spectral  slice.  The  spectral  slice,  represented  by  64  DFf \ncoefficients  (0-4  kHz),  was  taken  from  the  center  of the  vowel,  where  the  effects  of \ncoarticulation with  surrounding phonemes are least apparent. \nThe  training  and  test  sets  for  the  experiments  consisted  of  featural  descriptions,  X, \npaired  with  an  associated  class,  C.  for  each  vowel  sample.  The  12  monophthongal \nvowels of English  were used  for  the classes. as  heard in  the  following  words:  beat. bit. \nbet,  bat.  roses.  the,  but,  boot,  book. bought,  cot,  bird.  The  vowels  were excised  from \nthe  wide  variety  of phonetic  contexts  in  utterances  of the  TIMIT  database,  a standard \nacoustic  phonetic  corpus  of continuous  speech,  displaying  a  wide  range  of American \ndialectical  variation  (Fisher et ai,  1986)  (Lamel  et  ai,  1986).  The  training  set consist(cid:173)\ned  of 4104  vowels  from  320 speakers.  The  test set consisted of 1644  vowels  (137  oc(cid:173)\ncurrences of each  vowel)  from  a different  set of 100 speakers. \nThe  MLP  consisted  of 64  inputs  (the  DFf coefficients.  each  nonnalized between  zero \nand  one),  a  single  hidden  layer  of 40  units,  and  12  output  units;  one  for  each  vowel \ncategory.  The  networks  were  trained  using  backpropagation  with  conjugate  gradient \noptimization  (Barnard  and  Casasent,  1989).  The  procedure  for  training  and  testing  a \nnetwork proceeded  as  follows:  The network  was  trained  on  100 iterations through  the \n4104  training  vectors.  The  trained  network  was  then evaluated on  the training  set and \na  different  set of 1644  test  vectors  (the  test  set).  The  network  was  then  trained  for  an \nadditional  100  iterations  and  again  evaluated  on  the  training  and  test  sets.  This  pro(cid:173)\ncess  was  continued  until  the  network  had  converged;  convergence  was  observed  as  a \n\n\f628 \n\nAtlas, Cole, Connor, EI\u00b7Sharkawi, Marks, Muthusamy and Barnard \n\nconsistent decrease  or leveling  off of the  classification  percentage on  the  test data over \nsuccessive sets of 100  iterations. \nThe  CART  system  was  trained  using  two  separate  computer  routines.  One  was  the \nCART  program  from  California  Statistical  Software;  the  other  was  a  routine  we \ndesigned  ourselves.  We produced  our own routine  to  ensure a careful  and independent \ntest of the CART concepts described  in  (Breiman et  ai,  1984). \n\n5.1  Results \nIn  order to  better understand  the  results,  we performed  listening experiments on  a sub(cid:173)\nset of the  vowels  used  in  these  experiments.  The vowels  were excised from  their sen(cid:173)\ntence  context  and  presented  in  isolation.  Five  listeners  first  received  training  in  the \ntask  by classifying  900  vowel  tokens  and  receiving  feedback  about the correct answer \non  each  trial.  During  testing,  each  listener classified  600  vowels  from  the  test  set (50 \nfrom  each  category)  without  feedback.  The  average  classification  performance  on  the \ntest  set  was  51%,  compared  to  chance  performance  of 8.3%.  Details  of this  experi(cid:173)\nment  are  presented  in  (Muthusamy  et  ai,  1990).  When  using  the  scaled  spectral \ncoefficients  to  train  both  techniques,  the  MLP correctly classified 47.4%  of the  test set \nwhile CART employing  uni-variate  splits performed at only  38.2%. \nOne  reason  for  the  poor  performance  of  CART  with  un i-variate  splits  may  be  that \neach  coefficient  (corresponding  to  energy  in  a  narrow  frequency  band)  contains  little \ninformation  when considered  independently of the other coefficients.  For example,  re(cid:173)\nduced  energy  in  the  1  kHz  band  may  be  difficult  to  detect  if  the  energy  in  the  1.06 \nkHz  band  was  increased  by  an  appropriate  amount.  The  CART  classifier  described \nabove operates  by  making  a series of inquiries  about one frequency  band at  a  time,  an \nintuitively inappropriate approach. \nWe  achieved  our best CART results,  46.4%,  on  the  test set by  making  use  of arbitrary \nhyper-planes  (linear  combinations)  instead  of univariate  splits.  This  search-based  ap(cid:173)\nproach gave results  which  were  within  1 % of the  MLP results. \n\n6  CONCLUSIONS \nIn  all  cases  the  performance  of the  MLP  was,  in  terms  of percent  error,  better  than \nCART.  However,  the  difference  in  performance  between  the  two  classifiers  was  only \nsignificant (at the p >.99  level)  for  the power security problem. \nThere are several  possible reasons  for  the sometimes  superior performance of the  MLP \ntechnique,  all  of which  we  are currently investigating.  One advantage  may  stem  from \nthe  ability  of  MLP  to  easily  find  correlations  between  large  numbers  of  variables. \nAlthough  it  is  possible  for  CART  to  form  arbitrary  nonlinear  decision  boundaries,  the \nefficiency  of  the  recursive  splitting  process  may  be  inferior  to  MLP's  nonlinear  fit. \nAnother  relative  disadvantage  of CART  may  be due  to  the  successive  nature  of node \ngrowth.  For  example,  if the  first  split  that  is  made  for  a  problem  turns  out,  given  the \nsuccessive  splits,  to  be  suboptimal,  it  becomes  very  inefficient  to  change  the  first  split \nto be  more suitable. \nWe feel  that  the  careful  statistics  used  in  CART could also  be advantageously applied \nto  MLP.  The  superior  performance  of MLP  is  not  yet  indicative  of best performance \nand  it  may  turn  out  that  careful  application  of  statistics  may  allow  further  advance-\n\n\fPerformance Comparisons \n\n629 \n\nIt  also  may  be  possible  that  there  would  be  input \n\nments  in  the  MLP  technique. \nrepresentations that would cause better performance  for CART than  for MLP. \nThere  have  been  new  developments  in  trained  statistical  classifiers  since  the  develop(cid:173)\nment  of  CART.  More  recent  techniques,  such  as  projection  pursuit  (Friedman  and \nStuetzle,  1984),  may  prove  as  good  as  or  superior  to  MLP.  This  continued  interplay \nbetween  MLP techniques  and  advanced statistics  is a key part of our ongoing research. \n\nAcknowledgements \n\nThe  authors  wish  to  thank  Professor  R.D.  Martin  and  Dr.  Alan  Lippman  of  the \nUniversity  of Washington  Department of Statistics  and Professors Aggoune,  Damborg, \nand  Hwang  of the  University  of Washington  Department of Electrical  Engineering  for \ntheir  helpful  discussions.  David  Cohn  and  Carlos  Rivera  assisted  with  many  of  the \nexperiments. \nWe  also  would  like  to  thank  Milan Casey  Brace of Puget Power and Light for  provid(cid:173)\ning  the load  forecasting  data. \nThis  work  was  supported  by  a National  Science Foundation  Presidential  Young  Inves(cid:173)\ntigator  Award  for  L.  Atlas  and  also  by  separate  grants  from  the  National  Science \nFoundation  and Washington  Technology Center. \n\nReferences \n\nP.  E.  Barnard  and  D.  Casasent,  \"Image  Processing  for  Image  Understanding  with \nNeural  Nets,\"  Proc.  Int.  Joint  Con!  on  Neural  Nets,  Washington,  DC,  June  18-22, \n1989. \nL.  Breiman,  J.H.  Friedman,  R.A.  Olshen,  and  CJ.  Stone,  Classification  and  Regres(cid:173)\nsion Trees, Wadsworth International,  Belmont, CA,  1984. \nW.  Fisher,  G.  Doddington,  and  K.  Goudie-Marshall,  \"The  DARPA  Speech  Recogni(cid:173)\ntion  Research  Database:  Specification  and  Status,\"  Proc.  of  the  DARPA  Speech \nRecognition  Workshop,  pp.  93-100, February 1986. \nJ.H.  Friedman  and  W.  StuetzIe,  \"Projection  Pursuit  Regression,\"  J.  Amer.  Stat.  As(cid:173)\nsoc.  79, pp.  599-608, 1984. \nL.  Lamel,  R.  Kassel,  and  S.  Seneff,  \"Speech  Database  Development:  Design  and \nAnalysis  of the  Acoustic-Phonetic  Corpus,\"  Proc.  of the  DARPA  Speech  Recognition \nWorkshop,  pp.  100-110, February  1986. \nW.S.  Meisel  and  D.A.  Michalpoulos,  \"A  Partitioning  Algorithm  with  Application  in \nPattern  Classification  and  the  Optimization  of Decision  Trees,\"  IEEE  Trans.  Comput(cid:173)\ners C-22, pp.  93-103. 1973. \nY.  Muthusamy.  R.  Cole,  and  M.  Slaney.  \"Vowel  Information  in  a  Single  Spectral \nSlice:  Cochlcagrams  Versus  Spectrograms,\"  Proc.  ICASSP  '90,  April  3-6.  1990.  (to \nappear) \nD.E.  Rumelhart.  G.E.  Hinton,  and  RJ.  Williams.  \"Learning  Internal  Representations \nby  Error  Propagation,\"  Ch.  2  in  Parallel  Distributed  Processing,  D.E.  Rumelhart, \nJ.L.  McClelland,  and  the PDP Research Group, MIT Press, Cambridge. MA,  1986. \n\n\f", "award": [], "sourceid": 203, "authors": [{"given_name": "Les", "family_name": "Atlas", "institution": null}, {"given_name": "Ronald", "family_name": "Cole", "institution": null}, {"given_name": "Jerome", "family_name": "Connor", "institution": null}, {"given_name": "Mohamed", "family_name": "El-Sharkawi", "institution": null}, {"given_name": "Robert", "family_name": "Marks", "institution": null}, {"given_name": "Yeshwant", "family_name": "Muthusamy", "institution": null}, {"given_name": "Etienne", "family_name": "Barnard", "institution": null}]}