{"title": "Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 168, "page_last": 177, "abstract": null, "full_text": "168 \n\nLee and Lippmann \n\nPractical  Characteristics  of Neural  Network \n\nand  Conventional Pattern  Classifiers  on \n\nArtificial and Speech  Problems* \n\nYuchun Lee \n\nDigital Equipment Corp. \n\n40  Old  Bolton Road, \n\nOGOl-2Ull \n\nStow,  MA  01775-1215 \n\nRichard P.  Lippmann \nLincoln Laboratory, MIT \n\nRoom B-349 \n\nLexington,  MA  02173-9108 \n\nABSTRACT \n\nEight  neural  net  and  conventional  pattern  classifiers  (Bayesian(cid:173)\nunimodal Gaussian, k-nearest neighbor, standard back-propagation, \nadaptive-stepsize back-propagation, hypersphere, feature-map, learn(cid:173)\ning vector  quantizer,  and  binary  decision  tree)  were  implemented \non  a  serial  computer  and  compared  using  two  speech  recognition \nand two artificial tasks.  Error rates were statistically equivalent on \nalmost  all  tasks,  but classifiers  differed  by orders  of magnitude in \nmemory  requirements,  training time,  classification  time,  and ease \nof adaptivity.  Nearest-neighbor  classifiers  trained  rapidly  but  re(cid:173)\nquired  the most memory.  Tree classifiers  provided  rapid classifica(cid:173)\ntion but  were  complex to adapt.  Back-propagation classifiers  typ(cid:173)\nically  required  long  training  times  and  had intermediate  memory \nrequirements.  These results suggest that classifier selection should \noften  depend  more  heavily  on  practical considerations  concerning \nmemory  and  computation  resources,  and  restrictions  on  training \nand classification times than on error rate. \n\n-This  work  was  sponsored by  the  Department of the  Air  Force  and  the  Air  Force  Office  of \n\nScientific Research. \n\n\fPractical Characteristics of Neural Network \n\n169 \n\n1 \n\nIntroduction \n\nA  shortcoming  of much  recent  neural  network  pattern  classification  research  has \nbeen  an overemphasis on  back-propagation classifiers  and  a  focus  on  classification \nerror rate as the main measure of performance.  This research often ignores the many \nalternative  classifiers  that  have  been  developed  (see  e.g. \n[10])  and  the  practical \ntradeoffs  these  classifiers provide in training time, memory requirements,  classifica(cid:173)\ntion time,  complexity,  and adaptivity.  The purpose of this research  was  to explore \nthese tradeoffs and gain experience with many different  classifiers.  Eight neural net \nand  conventional pattern classifiers  were  used.  These  included  Bayesian-unimodal \nGaussian, k-nearest  neighbor  (kNN),  standard back-propagation, adaptive-stepsize \nback-propagation,.hypersphere, feature-map (FM), learning vector quantizer (LVQ) , \nand binary decision  tree classifiers. \n\nDISJOINT \n\nBULLSEYE \n\nB I. ) \n\nDimensionality:  2 \nTesting  Set  Size:  500 \nTraining  Set  Size:  500 \nClasses:  2 \n\nDimensionality:  2 \nTesting  Set  Size:  500 \nTraining  Set  Size:  500 \nClasses:  2 \n\nDIGIT \n\nDimensionality:  22  Cepstra \nTraining  Set  Size:  70 \nTesting  Set  Size:  112 \n16  Training  Sets \n16  Testing  Sets \nClasses: \nTalker  Dependent \n\n7  Digits \n\nVOWEL \n\nDimension:  2  Formants \nTraining  Set  Size:  338 \nTesting  Set  Size:  330 \nClasses:  10  Vowels \nTalker  Independent \n\nFigure 1:  Four problems used  to test  classifiers. \n\nClassifiers  were  implemented on a  serial  computer and tested  using  the four  prob(cid:173)\nlems  shown  in  Fig.  1.  The  upper  two  artificial  problems  (Bullseye  and  Disjoint) \nrequire simple two-dimensional convex or disjoint decision regions for  minimum er(cid:173)\nror classification.  The lower digit recognition task (7  digits, 22  cepstral parameters, \n\n\f170 \n\nLee and Lippmann \n\n16  talkers,  70  training and  112  testing  patterns  per  talker)  and  vowel  recognition \ntask (10 vowels,  2 formant parameters, 67 talkers,  338 training and 330 testing pat(cid:173)\nterns)  use  real speech  data and require  more complex decision  regions.  These tasks \nare described  in [6,  11]  and details of experiments  are available in  [9]. \n\n2  Training and  Classification Parameter Selection \nInitial experiments  were  performed  to select  sizes  of classifiers  that provided good \nperformance  with limited training data and also  to select  high-performing versions \nof each type of classifier.  Experiments determined  the number of nodes and hidden \nlayers in back-propagation classifiers, pruning techniques to use with tree and hyper(cid:173)\nsphere classifiers,  and numbers of exemplars or kernel nodes to use with feature-map \nand LVQ  classifiers. \n\n2.1  Back-Propagation Classifiers \n\nIn  standard  back-propagation,  weights  typically  are  updated  only  after  each  trial \nor  cycle.  A  trial is  defined  as  a  single  training pattern presentation  and  a  cycle  is \ndefined as a sequence of trials which sample all patterns in the training set.  In group \nupdating, weights are updated every T  trials while in trial-by-trial training, weights \nare  updated  every  trial.  Furthermore,  in  trial-by-trial updating,  training patterns \ncan  be  presented  sequentially where  a  pattern is  guaranteed  to  be presented  every \nT  trials,  or  they  can  be  presented  randomly where  patterns  are  randomly selected \nfrom  the training set.  Initial experiments demonstrated  that  random trial-by-trial \ntraining provided  the best  convergence  rate and error reduction  during training.  It \nwas  thus used  whenever  possible with all back-propagation classifiers. \n\nAll back-propagation classifiers used a single hidden layer and an output layer with \nas  many  nodes  as  classes.  The  classification  decision  corresponded  to  the  class  of \nthe  node  in the  output  layer  with  the  highest  output  value.  During  training,  the \ndesired  output  pattern,  D,  was  a  vector  with  all elements  set  to  0  except  for  the \nelement  corresponding  to  the  correct  class  of the  input  pattern.  This  element  of \nD  was  set  to  1.  The  mean-square  difference  between  the  actual  output  and  this \ndesired output error is minimized when the output of each node is exactly the Bayes \na  posteriori probability for  each  correct  class  [1,  10].  Back-propagation  with  this \n\"1  of m\"  desired  output  is  thus  well  justified  theoretically  because  it  attempts  to \nestimate minimum-error Bayes probability functions.  The number of hidden  nodes \nused in each back-propagation classifier was determined experimentally as described \nin  [6,  7,  9,  11]. \nThree \"improved\" back-propagation classifiers with the potential of reduced training \ntimes where studied.  The first,  the  adaptive-stepsize-classifier, has a global stepsize \nthat  is  adjusted  after  every  training  cycle  as  described  in  [4].  The  second,  the \nmultiple-adaptive-stepsize  classifier,  has  multiple  stepsizes  (one  for  each  weight) \nwhich are adjusted after every training cycle as  described in [8].  The third classifier \nuses  the  conjugate  gradient  method  [9,  12]  to  minimize  the  output  mean-square \nerror. \n\n\fPractical Characteristics of Neural Network \n\n171 \n\nThe goal of the three  \"improved\" versions of back-propagation was to shorten the of(cid:173)\nten lengthy training time observed with standard back-propagation.  These improve(cid:173)\nments relied on fundamental  assumptions  about the error surfaces.  However,  only \nthe multiple-adaptive-stepsize algorithm was used for  the final classifier comparison \ndue  to  the  poor  performance  of the  other  two  algorithms.  The  adaptive-stepsize \nclassifier often could not achieve adequately low error rates because the global step(cid:173)\nsize  (7])  frequently  converged  too  quickly  to  zero  during  training.  The  multiple(cid:173)\nadaptive-stepsize  classifier  did  not  train  faster  than  a  standard  back-propagation \nclassifier  with carefully selected  stepsize value.  Nevertheless,  it eliminated the need \nfor  pre-selecting  the  stepsize  parameter.  The conjugate  gradient  classifier  worked \nwell  on simple  problems but  almost  always  rapidly converged  to  a  local  minimum \nwhich  provided high error  rates on the  more complex speech  problems. \n\n4oo0~ ____ ~(A~)~H_Y_P_E~R_S_PH_E_RE~ ____ ~ \n\n(B)  BINARY DECISION TREE \n\n3000 \n\n2000 \n\nF2(Hz) \n\n1000 \n\n500  L.L __  ----L.;~___'~ __  .l__. __  ___l \n\no \n\n500 \n\n1000 \n\n1400  0 \n\n500 \n\n1000 \n\n1400 \n\nFl(Hz) \n\nFl(Hz) \n\nFigure  2:  Decision  regions  formed  by  the  hypersphere  classifier  (A)  and  by  the \nbinary  decision  tree  classifier  (B)  on  the  test  set  for  the  vowel  problem.  Inputs \nconsist  of the first  two  formants for  ten vowels  in the  words  A.  who'd, <>  hawed,  + \nhod, 0  hud,  x  had, > heed,  ~ hid,  0  head,  V  heard,  and < hood as  described  in \n[6,  9]. \n\n2.2  Hypersphere Classifier \n\nHypersphere  classifiers  build decision  regions from nodes  that form separate hyper(cid:173)\nsphere  decision  regions.  Many  different  types  of hypersphere  classifiers  have  been \ndeveloped  [2,  13].  Experiments discussed  in [9], led to the selection of a specific  ver(cid:173)\nsion of hypersphere  classifier  with  \"pruning\".  Each hypersphere  can only shrink in \nsize,  centers are not repositioned, an ambiguous response (positive outputs from hy(cid:173)\nperspheres  corresponding  to different  classes)  is  mediated using a  nearest-neighbor \n\n\f172 \n\nLee and Lippmann \n\nrule,  and hyperspheres  that do not contribute to the classification performance  are \npruned  from  the  classifier  for  proper  \"fitting\"  of the  data  and  to  reduce  memory \nusage.  Decision  regions formed  by  a  hypersphere  classifier  for  the  vowel  classifica(cid:173)\ntion  problem  are  shown  in  the  left  side  of Fig.  2.  Separate  regions  in  this figure \ncorrespond  to  different  vowels.  Decision  region  boundaries  contain arcs  which  are \nsegments of hyperspheres  (circles in two dimensions) and linear segments caused by \nthe application of the  nearest  neighbor  rule for  ambiguous responses. \n\n2.3  Binary Decision  Tree Classifier \n\nBinary decision tree classifiers from [3]  were used in all experiments.  Each node in a \ntree has only two immediate offspring and the splitting decision is based on only one \nof the input dimensions.  Decision boundaries are thus overlapping hyper-rectangles \nwith sides parallel to the axes of the input space and decision regions become more \ncomplex as more nodes are added to the tree.  Decision trees for  each problem were \ngrown until they classified all the training data exactly and then pruned back using \nthe  test  data  to  determine  when  to  stop  pruning.  A  complete  description  of the \ndecision  tree  classifier  used  is  provided  in  [9]  and  decision  regions  formed  by  this \nclassifier  for  the vowel problem are shown  in  the right  side of Fig.  2. \n\n2.4  Other Classifiers \n\nThe  remaining four  classifiers  were  tuned  by  selecting  coarse  sizing parameters  to \n\"fit\"  the  problem  imposed.  Some  of these  parameters  include  the  number  of ex(cid:173)\nemplars  in  the  LVQ  and  feature  map  classifiers  and  k  in  the  k-nearest  neighbor \nclassifier.  Different  types  of covariance matrices  (full,  diagonal,  and  various  types \nof grand  averaging)  were  also  tried for  the  Bayesian-unimodal Gaussian  classifier. \nBest  sizing  parameter  values for  classifiers  were  almost always  not  those  that  that \nbest classified the training set.  For the purpose of this study, training data was used \nto  determine  internal parameters  or  weights  in  classifiers.  The  size  of a  classifier \nand coarse sizing parameters were  selected  using the test data.  In real applications \nwhen  a  test  set  is  not available, alternative methods, such as  cross  validation[3,  14] \nwould be used. \n\n3  Classifier Comparison \n\nAll  eight  classifiers  were  evaluated  on  the  four  problems  using  simulations  pro(cid:173)\ngrammed in  C  on a  Sun 3/110 workstation with a  floating point accelerator.  Clas(cid:173)\nsifiers  were  trained until their training error rate converged. \n\n3.1  Error Rates \n\nError  rates  for  all  classifiers  on  all  problems  are  shown  in  Fig.  3.  The  middle \nsolid  lines  in  this  figure  correspond  to  the  average  error  rate  over  all  classifiers \nfor  each  problem.  The shaded  area is  one  binomial standard  deviation above  and \nbelow  this  average.  As  can  be  seen,  there  are  only  three  cases  where  the  error \nrate  of anyone  classifier  is  substantially  different  from  the  average  error.  These \nexceptions  are  the  Bayesian-unimodal  Gaussian  classifier  on  the  disjoint  problem \n\n\fPractical Characteristics of Neural Network \n\n173 \n\nIU~ ____________________ , \n\nBULLSEYE \n\nDISJOINT \n\nlU~--------------------, \n\nDIGIT \n\n2 \n\no~~-L~~~~==~~~ \n30~--------------------, \n\nVOWEL \n\n25 \n\nCC \nUJ \n\n-~ -a: o a: \nZ o -~ o -u. -Ul \nUl < ...J o \n\nFigure  3:  Error  rates  for  all  classifiers  on  all  four  problems.  The  middle  solid \nlines correspond  to the average error  rate over  all classifiers for each  problem.  The \nshaded area is  one binomial standard deviation above  and below  the  average error \nrate. \n\nand the decision tree classifier on the digit and the disjoint problem.  The Bayesian(cid:173)\nunimodal  Gaussian  classifier  performed  poorly  on  the  disjoint  problem because  it \nwas  unable  to  form  the  required  bimodal  disjoint  decision  regions.  The  decision \ntree  classifier  performed  poorly on  the  digit  problem because  the small amount  of \ntraining data (10 patterns per class) was adequately classified by a minimal13-node \ntree  which didn't generalize  well  and didn't even  use  all  22  input  dimensions.  The \ndecision tree classifier worked well for the disjoint problem because it forms decision \nregions parallel to both input axes  as required for  this  problem. \n\n3.2  Practical Characteristics \n\nIn contrast  to  the  small differences  in error  rate,  differences  between  classifiers  on \npractical performance  issues  such  as  training and  classification  time,  and memory \nusage were  large.  Figure 4 shows that the classifiers differed by orders of magnitude \nin training time.  Shown in log-scale, the k-nearest  neighbor stands out distinctively \n\n\f174 \n\nLee and Lippmann \n\n10,000  _\"\"\"T\"\"---r---\"\"T'\"'---r----,----,---.,.....--.,-:I \n\n-CI) -\n\n1000 \n\n100 \n\n10 \n\n1 \n\no  BULLSEYE \n\u2022  VOWEL \n6.  DISJOINT \no  DIGIT \n\n0.01  L--L __  -L __  --L. __  --1 __  ----' __  ---l ___  ' - -__  \"---..... \n\nBAYESIAN \n\nMUL TI\u00b7STEPSIZE \n\nkNN \n\nBACK\u00b7PROP \n\nHYPERSPHERE \n\nCLASSIFIERS \n\nFEATURE MAP \n\nLva \n\nTREE \n\nFigure 4:  Training time of all classifiers on all four problems. \n\nas  the  fastest  trained  classifier  by  many  orders  of magnitude.  Depending  on  the \nproblem, Bayesian-unimodal Gaussian, hypersphere,  decision tree,  and feature map \nclassifiers  also  have  reasonably  short  training  times.  LVQ  and  back-propagation \nclassifiers often required the longest training time.  It should be noted that alterna(cid:173)\ntive implementations, for example using parallel computers, would lead to different \nresults. \n\nAdaptivity or  the ability to adapt  using  new  patterns  after  complete  training also \ndiffered  across  classifiers.  The  k-nearest  neighbor  and  hypersphere  classifiers  are \nable to incorporate new information most readily.  Others such as back-propagation \nand  LVQ  classifiers  are  more  difficult  to  adapt  and  some,  such  as  decision  tree \nclassifiers,  are not designed  to handle further  adaptation after training is complete. \n\nThe binary decision tree can classify patterns much faster than others.  Unlike most \nclassifiers that depend on  \"distance\"  calculations between the input pattern and all \nstored exemplars,  the decision tree  classifier requires only a few  numerical compar(cid:173)\nisons.  Therefore,  the decision  tree  classifier  was  many  orders  of magnitude faster \n\n\fPractical Characteristics of Neural Network \n\n175 \n\nFM \n\nBAYES \n\nHYPERSPHERE \n\n0  BULLSEYE \n\n\u2022 VOWEL \n\nt:.  DISJOINT \n0  DIGIT \n\nBACK-PROPAGATION \n\nMULTIPLE STEPSIZE \n\n8000 \n\nkNN \n\n-f/) \n-\nCD -> a: \n\ncs: \n...J \n0 \n\nQ)  6000 \n>-\n\n0 \n:E \nw \n:E  4000 \nZ \n0 \n~ \n0 \n\nu::: en en  2000 \n\no \n\n100 \n\n200 \n\n300 \n\n400 \n\nTRAINING PROGRAM COMPLEXITY (Lines of Codes) \n\nFigure 5:  Classification memory usage  versus  training program complexity for  all \nclassifiers on all four  problems. \n\nin  classification  than other classifiers.  However,  decision  tree  classifiers  require  the \nmost  complex  training  algorithm.  As  a  rough  measurement  of the  ease  of imple(cid:173)\nmentation, subjectively  measured  by  the  number  of lines  in the  training program, \nthe  decision  tree  classifier  is  many times more  complex than the  simplest  training \nprogram- that of the  k-nearest  neighbor classifier.  However,  the  k-nearest  neighbor \nclassifier  is  one  of the  slowest  in  classification  when  implemented  serially  without \ncomplex search  techniques  such  as  k-d  trees  [5].  These  techniques  greatly  reduce \nclassification  time  but  make  adaptation  to  new  training  data  more  difficult  and \nincrease  complexity. \n\n4  Trade-Offs Between Performance Criteria \nNoone classifier  out-performed the  rest  on  all performance  criteria.  The selection \nof a  \"best\"  classifier  depends  on  practical  problem constraints  which  differ  across \nproblems.  Without  knowing  these  constraints  or  associating  explicit  costs  with \nvarious  performance  criteria,  a  classifier  that  is  \"best\"  can  not  be  meaningfully \ndetermined.  Instead,  there  are  numerous  trade-off  relationships  between  various \ncriteria. \n\n\f176 \n\nLee and Lippmann \n\nOne trade-off shown in Fig.  5 is classification memory usage versus  the complexity \nof the  training algorithm.  The far  upper left  corner,  where  training is  very  simple \nand memory is  not efficiently  utilized, contains the  k-nearest  neighbor classifier.  In \ncontrast,  the  binary  decision  tree  classifier  is  in  the  lower  right  corner,  where  the \noverall memory usage is minimized and the training process is  very complex.  Other \nclassifiers  are intermediate. \n\nI.  I  ---r \n\nI \nMULTIPLE STEPSIZE \n\n3000 \n\n- 2000 \n(/) -w \n~ ... \nC) z \nZ cc a: \n\n1000 \n\nto-\n\n0 \n\n\u2022 \n\nBACKPROPAGATION \n\nLva  BAYES \n\n\u2022 \n\nHYPERSPHERE \n\nI \n\n\u2022  TREE \n\nkNN \n\n4000 \n1000 \nCLASSIFICATION MEMORY USAGE (Bytes) \n\n2000 \n\n3000 \n\n5000 \n\nFigure 6:  Training time versus classification memory usage of all classifiers  on the \nvowel  problem. \n\nFigure  6  shows  the  relationship  between  training  time  and  classification  memory \nusage for  the vowel problem.  The k-nearest  neighbor classifier consistently provides \nthe  shortest  training  time  but  requires  the  most  memory.  The  hypersphere  clas(cid:173)\nsifier  optimizes  these  two  criteria  well  across  all four  problems.  Back-propagation \nclassifiers  frequently  require  long training times  and  require  intermediate  amounts \nof memory. \n\n5  Summary \nThis study explored practical characteristics of neural net and conventional pattern \nclassifiers.  Results  demonstrate  that  classification  error  rates  can  be  equivalent \nacross  classifiers  when  classifiers  are  powerful  enough  to  form  minimum error  de(cid:173)\ncision  regions,  when  they  are  rigorously  tuned,  and  when  sufficient  training  data \nis  provided.  Practical characteristics  such  as  training time,  memory requirements, \nand classification time, however, differed  by orders of magnitude.  In practice,  these \nfactors  are  more  likely  to  affect  classifier  selection.  Selection  will  often  be  driven \n\n\fPractical Characteristics of Neural Network \n\n177 \n\nby practical considerations concerning memory and computation resources,  restric(cid:173)\ntions on training,  test,  and adaptation times,  and ease  of use  and implementation. \nThe many existing neural net and conventional classifiers  allow system designers to \ntrade  these  characteristics  off'.  Tradeoffs  will  vary  with  implementation hardware \n(e.g.  serial  versus  parallel,  analog versus  digital)  and  details  of the  problem  (e.g. \ndimension of the input vector,  complexity of decision regions).  Our current research \nefforts  are  exploring these  tradeoff's  on  more  difficult  problems and studying addi(cid:173)\ntional classifiers including radial-basis-function classifiers,  high-order networks,  and \nGaussian mixture classifiers. \n\nReferences \n[1]  A.  R.  Barron  and  R.  1.  Barron.  Statistical learning  networks:  A  unifying  view.  In \n\n1988 Symposium  on  the  Interface:  Statistics  and  Computing  Science,  Reston,  Vir(cid:173)\nginia,  April  21-23  1988. \n\n[2]  B.  G.  Batchelor.  Classification  and data analysis in vector space.  In B.  G.  Batchelor, \n\neditor,  Pattern  Recognition, chapter  4,  pages  67-116.  Plenum  Press,  London,  1978. \n\n[3]  1.  Breiman,  J.  H.  Friedman,  R.  A.  Olshen,  and  C.  J.  Stone.  Classification  and \n\nRegression  Trees.  Wadsworth  International  Group,  Belmont,  CA,  1984. \n\n[4]  1.  W.  Chan  and  F.  Fallside.  An  adaptive  training  algorithm  for  back  propagation \n\nnetworks.  Computer Speech  and Language, 2:205-218,  1987. \n\n[5]  J.  H.  Friedman,  J.  L.  Bentley,  and  R.  A.  Finkel.  An  algorithm  for  finding  best \nmatches in logarithmic expected time.  ACM Transactions on  Mathematical Software, \n3(3):209-226,  September  1977. \n\n[6]  W.  M.  Huang  and R.  P.  Lippmann.  Neural  net  and  traditional classifiers.  In  D.  An(cid:173)\nderson,  editor,  Neural  Information  Processing  Systems,  pages  387-396,  New  York, \n1988.  American  Institute of Physics. \n\n[7]  William  Y.  Huang  and  Richard  P.  Lippmann.  Comparisons  between  conventional \nand neural net classifiers.  In  1st International Conference on  Neural Networks,  pages \nIV-485. IEEE,  June  1987. \n\n[8]  R. A.  Jacobs.  Increased rates of convergence through learning rate adaptation.  Neural \n\nNetworks,  1:295-307,  1988. \n\n[9]  Yuchun  Lee.  Classifiers:  Adaptive modules in pattern recognition  systems.  Master's \nthesis,  Massachusetts  Institute  of Technology,  Department  of Electrical  Engineering \nand  Computer  Science,  Cambridge,  MA,  May  1989. \n\n[10]  R.  P.  Lippmann.  Pattern classification using neural networks.  IEEE Communications \n\nMagazine,  27(11):47-54,  November  1989. \n\n[11]  Richard  P.  Lippmann  and  Ben Gold.  Neural classifiers  useful for  speech  recognition. \nIn  1st International Conference on Neural Networks,  pages IV-417. IEEE, June 1987. \n[12]  W. H. Press, B.  P. Flannery, S.  A. Teukolsky, and W. T. Vetterling, editors.  Numerical \n\nRecipes.  Cambridge  University  Press,  New  York,  1986. \n\n[13]  D.  1.  Reilly,  L.  N.  Cooper,  and  C.  Elbaum.  A  neural  model  for  category  learning. \n\nBiological Cybernetics,  45:35-41,  1982. \n\n[14]  M.  Stone.  Cross-validation  choice  and  assessment  of statistical  predictions.  Journal \n\nof the  Royal Statistical Society,  B-36:111-147,  1974. \n\n\f", "award": [], "sourceid": 259, "authors": [{"given_name": "Yuchun", "family_name": "Lee", "institution": null}, {"given_name": "Richard", "family_name": "Lippmann", "institution": null}]}