{"title": "Potential Boosters?", "book": "Advances in Neural Information Processing Systems", "page_first": 258, "page_last": 264, "abstract": null, "full_text": "Potential Boosters ? \n\nNigel Duffy \n\nDavid Helmbold \n\nDepartment of Computer Science \n\nDepartment of Computer Science \n\nUniversity of California \nSanta Cruz, CA  95064 \nnigedufJ@cse. ucsc. edu \n\nUniversity of California \nSanta Cruz,  CA 95064 \n\ndph@~se. ucsc. edu \n\nAbstract \n\nRecent  interpretations  of the  Adaboost  algorithm  view  it  as  per(cid:173)\nforming a gradient descent on a  potential function.  Simply chang(cid:173)\ning the potential function  allows  one to create new  algorithms re(cid:173)\nlated  to  AdaBoost.  However,  these  new  algorithms are generally \nnot  known  to have  the formal  boosting property.  This  paper ex(cid:173)\namines  the question  of which  potential  functions  lead  to  new  al(cid:173)\ngorithms that are boosters.  The two main results are general sets \nof  conditions  on  the  potential;  one  set  implies  that  the  resulting \nalgorithm is  a  booster,  while the other implies  that the algorithm \nis  not.  These conditions are applied to previously studied potential \nfunctions , such as those used by LogitBoost and  Doom II. \n\n1 \n\nIntroduction \n\nThe first  boosting algorithm appeared in Rob Schapire's thesis  [1].  This algorithm \nwas  able to boost the performance of a  weak  PAC  learner  [2]  so that the resulting \nalgorithm  satisfies  the  strong  PAC  learning  [3]  criteria.  We  will  call  any  method \nthat builds  a  strong PAC  learning algorithm from  a  weak PAC  learning algorithm \na  PAC  boosting  algorithm.  Freund  and  Schapire  later  found  an  improved  PAC \nboosting algorithm called AdaBoost [4],  which also tends to improve the hypotheses \ngenerated by  practical learning algorithms  [5]. \n\nThe  AdaBoost  algorithm  takes  a  labeled  training set  and  produces  a  master  hy(cid:173)\npothesis by  repeatedly calling a given learning method.  The given learning method \nis  used  with  different  distributions  on  the  training  set  to  produce  different  base \nhypotheses.  The  master  hypothesis  returned  by  AdaBoost  is  a  weighted  vote  of \nthese  base  hypotheses.  AdaBoost  works  iteratively,  determining  which  examples \nare  poorly  classified  by  the current  weighted  vote  and selecting  a  distribution  on \nthe training set to emphasize those examples. \nRecently, several researchers [6,  7, 8, 9, 10] have noticed that Adaboost is performing \na  constrained gradient descent on  an exponential potential function of the margins \nof the examples.  The margin of an example is  yF(x) where y  is the \u00b11 valued label \nof the example  x  and  F(x)  E  lR  is  the  net  weighted  vote  of master  hypothesis  F. \nOnce Adaboost is  seen  this  way  it is  clear that further  algorithms  may  be derived \nby  changing the potential function  [6,  7,  9,  10]. \n\n\fPotential Boosters? \n\n259 \n\nThe  exponential  potential  used  by  AdaBoost  has  the  property  that  the  influence \nof a  data point  increases  exponentially if it  is  repeatedly  misclassified  by  the base \nhypotheses.  This concentration on the \"hard\" examples allows AdaBoost to rapidly \nobtain  a  consistent  hypothesis  (assuming  that  the  base  hypotheses  have  certain \nproperties).  However,  it  also  means  that  an incorrectly  labeled  or  noisy  example \ncan  quickly  attract  much  of the  distribution.  It appears  that  this  lack  of noise(cid:173)\ntolerance is  one  of AdaBoost's few  drawbacks [11].  Several researchers  [7,  8,  9,  10] \nhave proposed potential functions which do not concentrate as much on these \"hard\" \nexamples.  However,  they generally  do  not  show  that the derived  algorithms  have \nthe PAC  boosting property. \nIn  this  paper we  return to  the original motivation behind boosting algorithms and \nask:  \"for  which  potential  functions  does  gradient  descent  lead  to  PAC  boosting \nalgorithms\"  (i.e.  boosters that create strong PAC learning algorithms from arbitrary \nweak  PAC  learners).  We  give  necessary  conditions  that  are  met  by  some  of the \nproposed potential functions  (most notably the LogitBoost potential introduced by \nFriedman  et  al.  [7]).  Furthermore,  we  show  that simple  gradient  descent  on  other \nproposed  potential  functions  (such  as  the  sigmoidal  potential  used  by  Mason  et \nal.  [10])  cannot  convert  arbitrary weak  PAC  learning  algorithms  into  strong PAC \nlearners.  The aim of this work is to identify properties of potential functions required \nfor  PAC  boosting, in order to guide the search for  more effective  potentials. \n\nSome potential functions  have an additional tunable parameter [10]  or change over \ntime  [12].  Our results do not yet apply to such dynamic potentials. \n\n2  PAC Boosting \n\nHere we  define the notions of PAC  learningl  and boosting, and define  the notation \nused throughout the paper. \nA concept C is  a subset of the learning domain X.  A random example of C is a pair \n(x  E  X,y  E {-1, +1})  where  x is  drawn from  some distribution on  X  and y =  1 if \nx  E C  and  -1 otherwise.  A  concept  class is  a set of concepts. \n\nDefinition 1  A  (strong)  PAC  learner  for  concept class C has  the property  that for \nevery  distribution D  on X,  all concepts C  E C,  and all 0 < E,O  < 1/2: with probabil(cid:173)\nity  at  least 1 - 0  the  algorithm  outputs  a hypothesis  h  where  P D [h( x)  :j:.  C (x)]  ~ E. \nThe  learning  algorithm  is  given C,  E,  0,  and the  ability  to  draw  random examples  of \nC  (w.r.t.  distribution  D),  and  must run in  time  bounded  by  poly(l/E,l/o). \n\nDefinition 2  A  weak PAC  learner  is  similar to  a  strong PA C learner,  except that \nit need only satisfy the  conditions for  a particular 0 < EO, 00  < 1/2 pair,  rather than \nfor  all  E,O  pairs. \n\nDefinition 3  A  PAC boosting algorithm is  a generic  algorithm  which  can  leverage \nany  weak  PAC learner to  meet the  strong PAC learning  criteria. \n\nIn  the remainder of the paper we  emphasize boosting the accuracy  E  as  it is  much \neasier to boost the confidence 0,  see Haussler  et al.  [13]  and Freund [14]  for  details. \nFurthermore, we emphasize boosting by re-sampling, where the strong PAC learner \ndraws a  large sample, and each iteration the weak learning algorithm is called with \nsome distribution over this sample. \n\nlTo simplify  the  presentation we  omit the instance space  dimension  and target  repre(cid:173)\n\nsentation length parameters. \n\n\f260 \n\nN.  Duffy and D.  Helmbold \n\nThroughout the paper we  use the following  notation. \n\nnot be normalized so that 2::, =1  at'  =  l. \n\n\u2022  m  is  the cardinality of the fixed sample {(Xl, Y1), ... , (Xm, Ym) }. \n\u2022  ht(x) is the \u00b11 valued weak hypothesis created at iteration t. \n\u2022  at is the weight or vote of ht in the master hypothesis, the a's mayor may \n\u2022  Ft (x)  =  2::'=1 (at' ht, (x) /2:;=1 aT)  E !R, is  the master hypothesis2 at iter(cid:173)\n\u2022  Ui,t  =  Yi 2::'=1 at,ht, (X)  is  the  margin  of  Xi  after  iteration  t;  the  t  sub(cid:173)\nscript is  often omitted.  Note that the margin is  positive  when the master \nhypothesis is  correct, and the normalized margin is Ui,t/ 2::'=1 at'\u00b7 \n\nation t. \n\n\u2022  p(u)  is  the potential of an instance with  margin u, and the total potential \n\nis  2:~1 p(Ui). \n\n\u2022  P v[ ],P s[  ],  and  Es[  ] are  the  probability  with  respect  to  the  unknown \ndistribution  over  the  domain,  and  the  probability  and  expectations  with \nrespect to the uniform distribution over the sample, respectively. \n\nOur  results  apply  to  total  potential  functions  of the  form  2:~1 p(Ui)  where  p  is \npositive and strictly decreasing. \n\n3  Leveraging Learners  by Gradient Descent \n\nAdaBoost  [4]  has  recently  been  interpreted as  gradient  descent  independently  by \nseveral groups [6, 7, 8, 9, 1m.  Under this interpretation AdaBoost is seen as minimiz(cid:173)\ning the total potential 2:i=l P(Ui)  =  2:~1 exp( -Ui)  via feasible  direction gradient \ndescent.  On each iteration t + 1, AdaBoost chooses the direction of steepest descent \nas the distribution on the sample, and  calls the weak learner to obtain a  new  base \nhypothesis hHl .  The weight at+!  of this new weak hypothesis is calculated to min(cid:173)\nimize3 the resulting potential 2:~1 p(Ui,H1)  =  2:~1 exp( -(Ui,t + aHIYi ht+! (Xi))). \nThis gradient descent  idea  has been generalized to  other potential functions  [6,  7, \n10].  Duffy  et  al.  [9]  prove bounds for  a  similar gradient descent  technique using a \nnon-componentwise, non-monotonic potential function. \n\nNote that if the weak learner returns a  good hypothesis  ht  (with  training error at \nmost  \u20ac  <  1/2), then  2:~1 Dt(Xi)Yiht(Xi)  > 1 - 2\u20ac  > O.  We  set  T  =  1 - 2\u20ac,  and \nassume that each base hypothesis produced satisfies  2:~1 Dt(Xi)Yiht(Xi)  ~ T. \nIn this paper we  consider this general gradient descent approach applied to various \npotentials  2:~1 P(Ui).  Note that each potential function P has two  corresponding \ngradient descent algorithms (see [6]).  The un-normalized algorithms (like AdaBoost) \ncontinually add in new weak hypotheses while preserving the old a's.  The normal(cid:173)\nized algorithms  re-scale  the a's so  that they  always  sum  to  1.  In  general,  we  call \nsuch  algorithms  \"leveraging algorithms\",  reserving the  term  \"boosting\"  for  those \nthat actually have the PAC  boosting property. \n\n4  Potentials that Don't Boost \n\nIn this  section we  describe sufficient  conditions  on  potential functions  so that the \ncorresponding leveraging algorithm does  not have the PAC  boosting property.  We \n\n2The prediction of the master hypothesis on instance  x  is  the sign of Ft(x). \n30ur current proofs require that the actual at's be no greater than a constant  (say  1). \n\nTherefore,  this minimizing a  may need to be reduced. \n\n\fPotential Boosters? \n\n261 \n\napply these conditions  to show that two  potentials from  the literature do  not lead \nto boosting algorithms. \n\nTheorem 1  Let p( u)  be  a potential function  for  which: \n\n1}  the  derivative,  p' (u),  is  increasing  (_p' (u)  decreasing}  in ?R+,  and \n\n2}  3{3  > \u00b0 such  that for  all u > 0,  -{3p'(u)  ~ -p'(-2u). \n\nThen  neither  the  normalized  nor  the  un-normalized  leveraging  algorithms  corre(cid:173)\nsponding  to  potential p  have  the  PAC boosting property. \n\nThis  theorem  is  proven by  an adversary argument.  Whenever the concept  class  is \nsufficiently  rich4 ,  the  adversary  can  keep  a  constant  fraction  of the  sample  from \nbeing correctly labeled by the master hypothesis.  Thus as the error tolerance \u20ac  goes \nto zero,  the master hypotheses will  not be sufficiently  accurate. \n\nWe  now  apply this  theorem to two  potential functions  from  the literature. \nFriedman  et  al.  [7J  describe  a  potential  they  call  \"Squared  Error(p)\"  where  the \n\n.  This  potential  can  be  re-written \n\npotential at  Xi  is  T  - eF(Xi) + e-F(Xi) \n\neF(Xi)) 2 \n\nyo  + 1 \n\n(\n\nas PSE(Ui)  =  4\"  1 + 2 eUi  + e-Ui  +  eUi  + e- Ui \n\n1  ( \n\ne- Ui  _  eUi \n\n(e-Ui  _  eUi )2) \n\n\u2022 \n\nCorollary  1  Potential  \"Squared  Error{p} \"  does  not lead  to  a  boosting  algorithm. \n\nProof:  This potential satisfies the conditions of Theorem 1.  It is  strictly decreas(cid:173)\ning,  and the second condition holds for  {3  =  2. \nMason  et  al.  [lOJ  examine  a  normalized  algorithm  using  the  potential  PD(U)  = \n1- tanh (AU).  Their algorithm optimizes over choices of A via cross-validation, and \nuses  weak  learners  with  slightly  different  properties.  However,  we  can  plug  this \npotential  directly  into  the  gradient  descent  framework  and  examine  the  resulting \nalgorithms. \n\nCorollary 2  The  DOOMII potential PD  does  not lead  to  a  boosting  algorithm  for \nany fixed  A. \n\nProof:  The potential is strictly decreasing, and the second condition of Theorem 1 \nholds for  {3  =  1. \n\nOur techniques  show  that  potentials  that  are  sigmoidal  in  nature do  not  lead  to \nalgorithms  with  the  PAC  boosting  property.  Since  sigmoidal  potentials  are  gen(cid:173)\nerally  better over-estimates  of the 0, 1  loss  than  the  potential  used  by  AdaBoost, \nour results  imply  that boosting algorithms  must  use  a  potential  with  more subtle \nproperties than simply upper bounding the 0, 1 loss. \n\n5  Potential Functions That Boost \n\nIn  this  section  we  give  sufficient  conditions  on  a  potential  function  for  it's  corre(cid:173)\nsponding un-normalized algorithm to have the PAC  boosting property.  This result \nimplies that AdaBoost [4J  and LogitBoost  [7J  have the PAC  boosting property (Al(cid:173)\nthough this  was  previously known for  AdaBoost  [4J,  we  believe this is  a  new result \nfor  LogitBoost). \n\n4The  VC-dimension  4  concept  class  consisting  of pairs of intervals  on  the  real  line  is \n\nsufficient  for  our  adversary. \n\n\f262 \n\nN.  Duffy and D.  Helmbold \n\nOne  set  of  conditions  on  the  potential  imply  that  it  decreases  roughly  exponen(cid:173)\ntially  when  the  (un-normalized)  margins  are  large.  Once  the  margins  are  in this \nexponential region, ideas similar to those used in AdaBoost's analysis show that the \nminimum normalized margin quickly becomes bounded away from zero.  This allows \nus  to bound the generalization error using a theorem from  Bartlett  et  al.  [15]. \nA  second  set  of conditions  governs  the  behavior  of the  potential  function  before \nthe un-normalized margins are large enough.  These conditions imply that the total \npotential decreases by a  constant  factor  each  iteration.  Therefore,  too much time \nwill  not be spent before all the margins enter the exponential region. \nThe margin  value  bounding  the  exponential  region  is  U,  and  once  2::=1 p(Ui)  ~ \np(U), all margins p(Ui) will remain in the exponential region.  The following theorem \ngives  conditions on p ensuring that 2::=1 P(Ui)  quickly becomes less than p(U). \n\nTheorem 2  If the  following  conditions  hold for p( u)  and U: \n\n1.  -p'(u)  is  strictly  decreasing -and 0 < pll(U)  ~ B,  and \n2.  3q> 0  such  that p(u)  ~ -qp'(u) Vu > U, \n\nm \n\nthen 2:i=l P(Ui)  ~ p(U)  after Tl  ~ \n\n4Bq2m 2p(0) In ( ~rb))) \n\np(U)2r2 \n\niterations. \n\nThe proof of this theorem approximates the new total potential by the old potential \nminus  a  times a  linear term, plus an error.  By bounding the error as a  function of \na  and minimizing  we  demonstrate that some values  of a  give a  sufficient  decrease \nin the total potential. \n\nTheorem 3  If the following  conditions  hold for p( u),  U,  q,  and iteration Tl: \n\n1.  3(3  ~ .J3  such  that  -p'(u + v)  ~ p(u + v)  ~ -p'(u)(3-Vq  whenever  -1 ~ \n\nv  ~ 1 and u > U, \n\n2.  2::1P(Ui,Tl) ~ p(U), \n3.  -p' (u)  is  strictly  decreasing,  and \n4.  3C > 0,  'Y>  1  such  that Cp(u)  ~ 'Y- u  Vu > U \n\nwhich  decreases  expo-\n\nThe proof of this theorem is  a  generalization of the AdaBoost proof. \n\nCombining  these  two  theorems,  and the generalization  bound  from  Theorem  2  of \nBartlett  et  al.  [15]  gives  the following  result,  where  d is  the  VC  dimension  of the \nweak hypothesis class. \n\nTheorem 4  If for  all  edges  0  <  r  <  1/2  there  exists  T1,r  ~ poly(m,l/r),  Ur , \nand  qr  satisfying  the  conditions  of  Theorem  3  such  that  p(Ur )  ~ poly(r)  and \nqrv'f'=T2 =  l(r)  < 1 - poly(r),  then  in  time  poly(m, 1/r)  all  examples  have  nor-\n\n\fPotential Boosters? \n\n263 \n\nmalized margin  at least ()  =  In  ((l~~~l)) / In(r)  and \n\nP D[yFT(X)  ~ 0]  E 0  Vm \n\n( \n\n)~) \n\n1  ( l n2(r)dlog2 (m/d) \n\n(In (l(r) +  1)  -In (21(r)))2  + 10g(1/8) \n\nChoosing  m  appropriately  makes  the  error  rate  sufficiently  small  so  that  the  algo(cid:173)\nrithm  corresponding  to  p  has  the  PAC boosting property. \n\nWe  now  apply  Theorem  4  to  show  that the  AdaBoost  and  LogitBoost  potentials \nlead to boosting algorithms. \n\n6  Some Boosting Potentials \n\nIn this section we show as a direct consequence of our Theorem 4 that the potential \nfunctions  for  AdaBoost  and  LogitBoost  lead  to  boosting  algorithms.  Note  that \nthe LogitBoost algorithm  we  analyze is  not exactly  the same as  that described  by \nFriedman  et  al.  [7],  their  \"weak learner\"  optimizes a  square loss  which  appears to \nbetter fit  the potential.  First we  re-derive the boosting property for  AdaBoost. \n\nCorollary 3  AdaBoost's [16] potential  boosts. \n\nProof:  To  prove this  we  simply  need  to show  that the potential p(u)  =  exp( -u) \nsatisfies the conditions of Theorem 4.  This is  done by setting Ur = -In(m), qr  = 1, \n'Y  = f3  = e,  C = 1,  and Tl  = o. \nCorollary 4  The  log-likelihood  potential  (as  used in LogitBoost [7])  boosts. \n\nProof:  In this  case p(u)  = In (1  +  e- U )  and  -p'(u) = l!~:u.  We  set  'Y  =  f3  = e, \nC =  2,  Ur = -In  ~ -1  and qr  = 1 +exp(-Ur )  =  ~. Now  The-\norem  2  shows  that after Tl  ~ poly(m, l/r) iterations the conditions of Theorem 4 \nare satisfied. \n\n( Jl- f2/2 \n\n{l?1Z \n\n) \n\n7  Conclusions \n\nIn this  paper we  have examined leveraging weak learners using a  gradient descent \napproach [9] .  This approach is  a  direct generalization of the Adaboost  [4,  16]  algo(cid:173)\nrithm,  where  Adaboost's  exponential  potential function  is  replaced  by  alternative \npotentials.  We  demonstrated  properties  of potentials  that  are  sufficient  to  show \nthat  the  resulting  algorithms  are  PAC  boosters,  and  other  properties  that  imply \nthat  the  resulting  algorithms  are  not  PAC  boosters.  We  applied  these  results  to \nseveral potential functions  from  the literature [7,  10,  16]. \n\nNew  insight  can  be  gained  from  examining  our  criteria  carefully.  The  conditions \nthat  show  boosting  leave  tremendous  freedom  in  the  choice  of potential  function \nfor  values less  than some  U,  perhaps this freedom  can be used to choose  potential \nfunctions  which  do  not overly concentrate on noisy examples. \n\nThere is still a significant gap between these two sets of properties, we are still a long \nway  from  classifying arbitrary potential functions  as to their boosting properties. \n\nThere  are  other  classes  of leveraging algorithms.  One class  looks  at  the  distances \nbetween  successive  distributions  [17,  18].  Another  class  changes  their  potential \n\n\f264 \n\nN.  Duffy and D.  Helmbold \n\nover  time  [6,  8,  12,  14].  The  criteria  for  boosting  may  change  significantly  with \nthese  different  approaches.  For  example,  Freund  recently  presented  a  boosting \nalgorithm [12]  that uses a time-varying sigmoidal potential.  It would be interesting \nto adapt our techniques to such  dynamic potentials. \n\nReferences \n\n[1]  Robert E.  Schapire.  The  Design  and Analysis  of Efficient Learning  Algorithms.  MIT \n\nPress,  1992. \n\n[2]  Michael  Kearns  and  Leslie  Valiant.  Cryptographic  limitations  on  learning  Boolean \n\nformulae  and finite  automata.  Journal  of the  ACM, 41(1):67-95,  January 1994. \n\n[3]  L.  G.  Valiant.  A theory of the learnable.  Commun.  ACM, 27(11):1134-1142,  Novem(cid:173)\n\nber 1984. \n\n[4]  Yoav  Freund and Robert  E.  Schapire.  A  decision-theoretic  generalization  of on-line \nlearning and an application to boosting.  Journal  of Computer  and  System  Sciences, \n55(1):119-139,  August 1997. \n\n[5]  Eric  Bauer  and Ron Kohavi.  An empirical  comparison of voting  classification  algo(cid:173)\n\nrithms:  Bagging,  boosting and variants.  Machine  Learning,  36(1-2):105-39,  1999. \n\n[6]  Leo  Breiman.  Arcing  the  edge.  Technical  Report  486,  Department  of  Statistics, \n\nUniversity of California,  Berkeley., 1997.  available  at www.stat.berkeley.edu. \n\n[7]  Jerome Friedman, Trevor Hastie, and Robert Tibshirani.  Additive logistic regression: \n\na  statistical view  of boosting.  Technical report, Stanford University,  1998. \n\n[8]  G.  lliitsch,  T.  Onoda, and K-R. Muller.  Soft margins for  AdaBoost.  Machine  Learn(cid:173)\n\ning,  2000.  To  appear. \n\n[9]  Nigel  Duffy and David P.  Helmbold.  A geometric approach to leveraging weak learn(cid:173)\n\ners.  In Paul Fischer and Hans Ulrich Simon, editors,  Computational Learning  Theory: \n4th European  Conference  (EuroCOLT '99), pages 18-33. Springer-Verlag, March 1999. \n[10]  Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean. Boosting algorithms \n\nas gradient descent.  To  appear in NIPS 2000. \n\n[11]  Thomas G.  Dietterich.  An experimental comparison of three methods for  construct(cid:173)\n\ning  ensembles  of  decision  trees:  Bagging,  Boosting,  and  Randomization.  Machine \nLearning.  To  appear. \n\n[12]  Yoav  Freund.  An adaptive version of the boost-by-majority algorithm.  In Proc.  l\u00a3th \n\nAnnu.  Conf.  on  Comput.  Learning  Theory,  pages 102-113.  ACM,  1999. \n\n[13]  David Haussler, Michael Kearns, Nick Littlestone, and Manfred K  Warmuth. Equiva(cid:173)\nlence of models for polynomiallearnability.  Information and Computation,  95(2):129-\n161,  December 1991. \n\n[14]  Y.  Freund.  Boosting a  weak  learning  algorithm  by  majority.  Information  and  Com(cid:173)\n\nputation,  121(2):256-285,  September 1995. \n\n[15]  Robert  E.  Schapire,  Yoav  Freund,  Peter  Bartlett,  and Wee  Sun  Lee.  Boosting  the \nmargin:  A  new  explanation for  the effectiveness  of voting  methods.  The  Annals  of \nStatistics,  26(5):1651-1686,  1998. \n\n[16]  Robert  E.  Schapire  and  Yoram  Singer. \n\nImproved  boosting  algorithms  using \n\nconfidence-rated predictions.  Machine  Learning,  37(3):297-336,  December 1999. \n\n[17]  Jyrki  Kivinen  and  Manfred  K  Warmuth.  Boosting  as  entropy  projection.  In  Proc. \n\nl\u00a3th Annu.  Conf.  on  Comput.  Learning  Theory,  pages 134-144.  ACM,  1999. \n\n[18]  John Lafferty.  Additive models,  boosting,  and inference for  generalized divergences. \n\nIn  Proc.  l\u00a3th  Annu.  Conf.  on  Comput.  Learning  Theory,  pages  125-133.  ACM. \n\n\f", "award": [], "sourceid": 1737, "authors": [{"given_name": "Nigel", "family_name": "Duffy", "institution": null}, {"given_name": "David", "family_name": "Helmbold", "institution": null}]}