{"title": "Examples of learning curves from a modified VC-formalism", "book": "Advances in Neural Information Processing Systems", "page_first": 344, "page_last": 350, "abstract": null, "full_text": "Examples of learning curves from a modified \n\nVC-formalism. \n\nA. Kowalczyk &  J. Szymanski \nTelstra Research Laboratories \n\n770 Blackbtun Road, \n\nClayton, Vic.  3168, Australia \n\n{akowalczyk,j.szymanski }@trl.oz.au) \n\nP.L. Bartlett &  R.C. Williamson \nDepartment of Systems Engineering \n\nAustralian National University \nCanberra, ACT 0200, Australia \n\n{bartlett, williams }@syseng.anu.edu.au \n\nAbstract \n\nWe  examine  the  issue  of evaluation of model  specific  parameters  in  a \nmodified VC-formalism.  Two examples are analyzed:  the 2-dimensional \nhomogeneous  perceptron  and  the  I-dimensional  higher  order  neuron. \nBoth models are solved theoretically, and their learning curves are com(cid:173)\npared against  true learning  curves.  It is  shown  that the  formalism  has \nthe  potential  to  generate  a  variety  of learning  curves,  including  ones \ndisplaying ''phase transitions.\" \n\n1 \n\nIntroduction \n\nOne of the main criticisms of the Vapnik-Chervonenkis theory of learning [15]  is  that the \nresults  of the theory appear  very  loose  when compared with empirical  data.  In contrast, \ntheory based  on statistical physics ideas  [1]  provides tighter numerical  results  as  well  as \nqualitatively  distinct  predictions  (such  as  \"phase  transitions\"  to  perfect  generalization). \n(See  [5,  14]  for  a fuller discussion.)  A question arises  as  to  whether the VC-theory can \nbe modified to give these improvements.  The general  direction of such a modification is \nobvious:  one needs to sacrifice the universality of the VC-bounds and introduce model (e.g. \ndistribution) dependent parameters. This obviously can be done in a variety of ways.  Some \nspecific examples are VC-entropy [15], empirical VC-dimensions [16], efficient complexity \n[17]  or (p.,  C)-uniformity [8,  9]  in a VC-formalism with error shells.  An extension of the \nIt  is  based  on  a  refinement  of the \nlast  formalism  is  of central  interest  to  this  paper. \n\"fundamental theorem of computational learning\" [2]  and its main innovation is  to split the \nset of partitions of a  training sample into separate \"error shells\",  each composed of error \nvectors corresponding to the different error values. \n\nSuch a split introduces a whole range of new parameters  (the average number of patterns \nin each  of a  series  of error shells)  in  addition to  the  VC  dimension.  The  difficulty  of \ndetermining these parameters  then arises.  There are  some crude,  \"obvious\" upper bounds \n\n\fExamples of Learning Curves from a Modified VC-fonnalism \n\n345 \n\non them  which lead  to  both the VC-based estimates  [2,  3,  15]  and the statistical physics \nbased formalism  (with phase  transitions) [5]  as  specific cases  of this novel  theory.  Thus \nthere is an obvious potential for improvement of the theory with tighter bounds.  In particular \nwe find that the introduction of a single parameter (order of uniformity), which in a sense \ndetermines shifts in relative sizes of error shells, leads to a full family of shapes of learning \ncurves  continuously  ranging  in behavior from  decay  proportional  to  the  inverse  of the \ntraining sample size to \"phase transitions\" (sudden drops) to perfect generalization in small \ntraining sample sizes.  We present initial comparison of the learning curves  from  this new \nformalism with \"true\" learning curves for two simple neral networks. \n\n2  Overview of the formalism \n\nThe  presentation is  set  in  the  typical PAC-style;  the  notation follows  [2].  We  consider \na  space  X  of samples  with  a  probability measure  J1.,  a  subspace  H  of binary functions \nX  -+ {O, 1} (dichotomies) (called the  hypothesis space)  and a target hypothesis t  E  H. \nForeachh E H  andeachm-samplez =  (:el, ... , :em)  E xm (m E {1, 2, ... }),wedenoteby \n\u20ach,z  d;j  ~ E::llt-hl(:ei)theempiricalerrorofhonz,andbY\u20ach  d;j  fx It- h l(:e)J1.(d:e) \nthe expected error of h  E H. \n\nFor each m  E {1, 2, ... } let us consider the random variable \n\nmaa: (-) de! \n\u20ac  H \n\n{ \n:l:  =  max  \u20ach \nhEH \n\nj  \u20ach  z = \nO} \n\n' \n\n(1) \n\ndefined as  the maximal expected error of an hypothesis h E H  consistent with t  on z.  The \nlearning curve of H, defined as the expected value of tJiaa: , \n\n\u20acj{(m)  d;j  Exm.[\u20acJiaa:]  =  f \nJx = \n\n\u20acJiaa:  (z)Jr (dz) \n\n(z E  xm) \n\n(2) \n\nand by \n\nde! \n\nis of central interest to us.  Upper bounds on it can be derived from basic PAC-estimates as \nthe subset of \u20ac-bad  hypotheses \n\nfollows.  For \u20ac  ~ \u00b0 we denote by HE  = {h E H  j  \u20ach  ~ \u20ac} \nQ;!  d;j  {z E Xm  j  3hE H.  \u20ach,ri  = O}  = {z E Xm  j  3hEH \u20ach,ri  = \u00b0 & \u20ach  ~ \u20ac} \n\n(3) \n\nthe  subset  of m-samples  for  which  there  exists  an  \u20ac-bad  hypothesis  consistent  with  the \ntarget t. \n\nLemmal  IfJ1.m(Q;!)  ~ 1J!(\u20ac,m), \nin the assumption implies equality in the conclusion. 0 \n\nthen\u20acj{(m)  ~ folmin(l,1J!(\u20ac,m))J1.(d\u20ac), \n\nand equality \n\nProof outline.  If the assumption holds, then 'lr(\u20ac,  m) d~ 1 - min(l, 1J!( \u20ac,  m)) is a lower \nbound  on  the  cumulative  distribution of the  random  variable  (1).  Thus  E x= [\u20acJiaa:]  ~ \nf01  \u20ac  tE 'lr( \u20ac,  m)d\u20ac  and integration by parts yields the conclusion. \no \nGivenz =  (:el, ... ,:em) E Xm,letusintroducethetransformation(projection)1rt,ri: H-+ \n{O, l}m allocating to each h E H  the vector \n\n1rt,:i!(h)  d;j  (Ih(:el) - t(:el)l, ... , Ih(:em) - t(:em)l) \n\ncalled the error pattern of h on z. For a subset G  C  H, let 1rt,:i!(G) = {1rt,:i!(h)  : hE G}. \nThe  space  {o,l}m  is  the  disjoint  union  of  error  shells  \u00a3i  d~  {(el, ... ,em)  E \n{O,l}m  j  el + ... + em  = i}  for  i  = 0,1, ... , m, and  l1rt,ri(HE)  n  \u00a3il  is  the  number \n\n\f346 \n\nA.  KOWALCZYK, J.  SZYMANSKI, P. L.  BARTLETT, R.  C. WILLIAMSON \n\nof different error patterns with i errors which can be obtained for h E  HE'  We shall emplOy \nthe following notation for its average: \n\nIHEli d~ Ex ... [l1I't,z(HE) n t:in =  r  l'II't,z(HE) n t:ilJ.\u00a3m(dz). \n\nJx ... \n\n(4) \n\nThe central result of this paper,  which gives  a bOlUld  on the probability of the set Qr;'  as \nin Lemma  1 in terms  of I HE Ii, will be given now.  It is  obtained by modification of the \nproof of [8, Theorem 1]  which is a refinement of the proof of the ''ftmdamental theorem of \ncomputational learning\" in [2].  It is a simplified version (to the consistent learning case) of \nthe basic estimate discussed in [9,  7]. \n\nTheorem 2  For any integer Ie  ~ 0 and 0 ::;  E, 'Y  ::;  1 \n\nI-'m(Q';\")::;  A f ,k,7 t (~) (m:- 1e)-lIHElj+A:, \n\nj~7k  J \n\nJ \n\n(5) \n\nwhereA E,k,7 d~ (1- E}~~ O)Ej(l-E)k- j )  -l,forle > OandA E,o,7 d~ 1.0 \n\nSince error shells are disjoint we have the following relation: \n\nPH(m) d~ 2- m  i_I\".(H)I!r(dZ) = 2- m  t.IHli ~ IIH(m)/2m \n\n(6) \n\nwhere  1I'z(h)  d~ 1I'0,z(h),  IHli d~ IHoli and  IIH(m)  d~ maxz E x'\" I 'll'z (H) I is  the \ngrowth function  [2]  of H.  (Note that assuming that the target t  ==  0 does  not affect  the \ncardinality of 1I't,z(H).)  If the VC-dimension of H, d =  dvc(H), is finite, we have the \nwell-known estimate [2] \n\nIIH(m)::; ~(d,m) d~ t (rr:) ::;  (em/d)d. \n\nj=O \n\nJ \n\n(7) \n\nCorollary 3  (i)  If the  VC-dimension d of H  is  finite  and m  >  8/E,  then  J.\u00a3m(Qr;')  ::; \n22 - mE/ 2(2em/ d)d. \n(ii) If H  has finite cardinality, then J.\u00a3m (Qr;')  ::;  EhEH. (1  - Eh)m. \n\nProof.  (i)  Use the estimate A E,k,E/2  ::;  2 for Ie  ~ 8/E  resulting from  the Chernoff bound \nand set'Y =  E /2 and Ie  =  m  in (5).  (ii) Substitute the following crude estimate: \n\nm \n\nm \n\nIHEli ::; L IHEli ::; L IHli ::;  PH  ::;  (em/d)d, \ninto the previous estimate.  (iii) Set Ie  = 0 into (i)  and use the estimate \n\ni=O \n\ni=O \n\nIHli::;  L  Prx ... (Eh,z =  i/m) =  L (1- Eh)m-iEhi. 0 \n\nThe inequality in Corollary 3.i (ignoring the factor of 2) is the basic estimate of the VC(cid:173)\nformalism (c.f.  [2]); the inequality in Corollary 3.ii is the union bound which is the starting \npoint for  the statistical  physics based formalism  developed in [5].  In this  sense both of \nthese theories are unified in estimate (5) and all their conclusions (including the prediction \n\n\fExamples of Learning Curves from  a Modified VC-formalism \n\n347 \n\n100 \n\n10- 1 \n\n10-2 \n4 \n\n(a) \n\n, \\ \n\\ \n\\ \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \n\\ \n\nI \nI \nI \n\n...... _-' .... \n\n5 \n\n6 \n\nmid \n\n7 \n\n8 \n\n9 \n\n(b) \n\nCJ  =  3  :  chain  line \nCJ  =  3  and  COl.  =  1  :  broken  line \n\n10-2 '-'-\"'--'-.L...l.....L...l.....L...l....~~.L...l.....L...l.....L...l.....L...l.....L...l.....L...J \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\no \n\nmi d \n\nFigure  1:  (a) Examples  of upper bounds on the learning curves for the case of finite VC(cid:173)\ndimension d  = dvc(H) implied by Corollary 4.ii for Cw,m  ==  const.  They split into five \ndistinct \"bands\" of four curves each, according to the values  of the order of uniformity w = \n2, 3,4,5, 10 (in the top-down order).  Each band contains a solid line (Cw,m  ==  1, d = 100), \na dotted line (Cw,m  ==  100, d  = 100), a chain line (Cw,m  ==  1, d = 1000) and a broken line \n(Cw,m  ==  100, d  =  1000). \n(b)  Various  learning curves  for  the  2-dimensional homogeneous perceptron.  Solid lines \n(top to bottom):  (i) - for the VC-theory bound (Corollary 3.ii) with VC-dimension d = 2; \n(ii) - for the bound (for Eqn,  5 and Lemma  1) with'Y  =  f,  k  =  m  and the upper bounds \nIHElr  ~ IHlr = 2 for i  = 1, \" \"  m - 1 and IHElr  ~ IHlr = 1 for i  = 0, m ; (iii) - as  in \n(ii) but with the exact values for IH Elr as  in (11); (iv) - true learning curve (Eqn.  13). The \nw-uniformity bound for w = 2 (with the minimal C w,m  satisfying (9), which turn out to be \n=  const = 1) is shown by dotted line; for w = 3 the chain line gives the result for minimal \nCw  m and the broken line for Cw  m set to 1. \n\n, \n\n, \n\nof phase transitions to perfect generalization for the Ising perceptron for a  =  mj d  < 1.448 \nin the thermodynamic limit [5]) can be derived from this estimate, and possibly improved \nwith the use of tighter estimates on IH E Ir. \nWe now formally introduce a family  of estimates  on IHElr in order to discuss a potential \nof our formalism . For any m, f  and w  ~ 1.0 there exists Cw,m  > 0 such that \n(for 0 s: i  ~ m), \n\nIH.lr s:  IHlr s:  Cw,m (7) PH(m)l-ll-2i/ml'\" \n\n(8) \n\nWe shall call such an estimate an w-uniformity bound. \n\nCorollary 4  (i) If an w -lllliformity bolllld (8) holds, then \n\nILm(Qm)  < A \nr-\n\nE \n\n_ \n\nC  ~ (m)PH (2m)l-ll-j/ml\"', \n\n, \n\nElm \u2022 ..,  W,m  ~  . \nj~\",m  J \n\n(ii) if additionallyd =  dvc(H) < 00, then \n\nm  m \n\n(Q.) s:  A.,m,,,,Cw,m  L \n\nJ1-\n\n.  (T \n\n2m \n\nm \n\nm \n) \nj~\",m  J \n\n( \n\nd  l-Il-j/ml'\" \n\n(2emjd)) \n\n(9) \n\n. 0 \n\n(10) \n\n3  Examples of learning curves \n\nIn this section we evaluate the above formalism on two examples of simple neural networks. \n\n\f348 \n\nA. KOWALCZYK, J.  SZYMANSKI, P.  L. BARTLETT, R.  C. WILLIAMSON \n\n20 \n\n(b) \n\nI \n\nI \n\nI \n\nI \n\n10- 1 \n\nC'oj  = 2  :  dolled  line \nCol  = 3:  chain  line \n\n0 \n\n10 \n\n20 \n30 \nm/(d+l) \n\n40 \n\n50 \n\n15 f-\n\nu \n2  10  -\n<III \n..2 \n\n5r-\n\n0 \n0 \n\n-'  -\n_.-r;-\"=  3 \n\n-'~ \n\n-\n------------. \nC'oj  = 2 \n\n-\n\n.-\n\n\" \n\" \n\n/ \n\n\" ,  \n\n/ \n\n/ \n\n/ \n\n'/J~ ~ \n\nI,' \n\n,/ ' \n\nI' \n\nI \n\n100 \n\nI \n\n200 \n\nm \n\nI \n\n300 \n\nI \n\n400 \n\n500 \n\nFigure 2:  (a) Different learning curves for the higher order neuron (analogous to Fig.  l.b). \nSolid lines (top to bottom)( i) - forthe VC-theory bound (Corollary 3.ii) with VC-dimension \nd + 1 = 21; (ii) - for the bound (5) with 'Y  = \u20ac  and the upper bounds I H E Ii ~ I H Ii with \nIHli given by (15);  (iii) - true learning curve (the upper bound given by (18)).  The w(cid:173)\nuniformity bound/approximation are plotted as chain and dotted lines for the minimal C w,m \nsatisfying (8), and as broken (long broken) line for C w,m  = const = 1 with w = 2 (w  = 3). \n(b) Plots of the minimal value of Cw,m  satisfying condition of w-uniformity bound (8) for \nhigher order neuron and selected values of w. \n\n3.1  2-dimensional homogeneous perceptron \n\nWe consider X  d.~ R2  and H  defined as  the family of all functions  (el, 6) ~ 8(el Wl  + \n6W2)' where (Wl, W2)  E  R2  and 8(r)  is defined as  1 if r  ~ 0 and 0,  otherwise, and the \nprobability measure  jJ.  on R2  has  rotational symmetry  with respect  to the  origin.  Fix  an \narbitrary target t  E H . In such a case \n\n2(1  -\n\n\u20ac)m  -\n\n(1  - 2\u20ac)m \n\nIH  I~=  1 \n\n{ \n\nE, \n\n22:;=0 \n\n.  ( )  \n\nj \n\n\u20aci  (1- \u20ac)m-; \n\n(for i = 0 and 0 ~ \u20ac  ~ 1/2), \n(fori = m), \n( otherwise). \n\nIn particular we find that IHli = 1 for i  = 0, m  and IHli = 2,  otherwise, and \n\nm \n\nPH(m)  = L IHli /2m =  (1 + 2 + ... + 2 + 1)/2m =  m/2m - l . \n\nand the true learning curve is \n\n\u20acj{  (m)  =  1.5(m + 1)-1. \n\nThe latter expression results from Lemma 1 and the equality \n\nm(Qm) _  {  2(1 -\n2(1  -\n\nf \n\n-\n\njJ. \n\n(1  - 2\u20ac)m \n\n\u20ac)m  -\n\u20ac)m \n\n(for 0 ~ \u20ac  ~ 1/2), \n(for 1/2 < \u20ac  ~ 1), \n\n(11) \n\n(12) \n\n(13) \n\n(14) \n\nDifferent learning curves  (bounds and approximations) for homogeneous  perceptron are \nplotted in Figure 1.b. \n\n3.2  I-dimensional higher order neuron \n\nWe  consider X  d.~  [0,1] c R  with a  continuous probability distribution jJ.,  Define  the \nhypothesis space  H  C  {O,  l}X as  the set of all functions of the form 8op(z) where p is  a \n\n\fExamples of Learning Curves from a Modified VC-formalism \n\n349 \n\npolynomial of degree :::;  d  on R.  Let the target be constant, t  ==  1.  It is easy to see that H \nrestricted to a finite  subset of [0,1]  is exactly the restriction of the family  of all fimctions \niI c {O, 1 }[O,lj with up to d \"jumps\" from a to lor 1 toO and thus dvc(H) = d+ 1.  With \nprobability 1 an m-sample Z = (Zl' \"\"  zm)  from xm  is  such that Zi  #- Zj  for i  #- j.  For \nsuch a generic Z,  l7rt,z(H) n  t:il = const = IHli. This observation was  used to derive \nthe following relations for the computation of I H Ii: \n\nmin(d,m-l) \n\nIHli =  L \n\nliI(6)li + liI(6)1:_i, \n\n(15) \n\n6=0 \n\n.  liI(6-l)l~ \n\nfor \u00b0 :::;  i  :::;  m, where  liI(6)li, for 0  =  0,1, ... ,d, is  defined  as  follows .  We  initialize \nliI(O)lo = liI(l)li d~ 1 fori = 1, .. \" m-1, liI(1) 10  = liI(l)l~ d~ \u00b0 and liI(6)li d~ \u00b0 \nfor i  = 0, 1, ... , m, 0  = 2,3, .. \" d,  and then,  recurrently, for 0  ~ 2 we set  liI(6) Ii d~ \n~m-l \nif 0 is odd and liI(6)1~ d~ ~m-l liI(6-l)l~ ifo is even \n. \nL.Jk=max(6,m-~) \n(Here  liI(6)li is defined by the relation (4) with the target t  ==  1 for the hypothesis space \nH(6)  C  iI composed of functions having the value 1 near a and exactly 0 jumps in (0,1), \nexactly  at entries  of z;  similarly  as  for  H, IH(6)li  =  l7rl,zH(6)  n  t:il for  a  generic \nm-sample z  E  (0, l)m.) \nAnalyzing an embedding of R into Rd, and using an argument based on the Vandermonde \ndeterminant as in [6,13], itcan be proved that the partition function IIH  is given by Cover's \ncounting function [4], and that \n\n~-m+k \n\n~ \n\nL.Jk=6 \n\n~ \n\nFor the uniform distribution on [0, 1] and a generic z  E  [0, l]m letAk(z) denote the sum of \nIe  largest segments of the partition of [0, 1] into m + 1 segments by the entries of Z. Then \n(17) \n\nAld/:lJ(Z):::; e'J;arz:(z):::;  Ald/:lJ+l(Z), \n\n(16) \n\nAn explicit expression for the expected value of Ak  is known [11], thus a very tight bound \non the true learning curve eH (m) defined by (2) can be obtained: \n\n~/2J1 (1 +  E  ~):::; eH(m):::;  Ld~2J :  1 (1 +  E  ~), \n\n(18) \n\n+ \n\ni=ld/:lJ+1 J \n\n+ \n\ni=ld/:lJ+:l J \n\nNumerical results are shown in Figure 2. \n\n4  Discussion and conclusions \n\nThe basic estimate (5) of Theorem 1 has been used to produce upper bounds on the learning \ncurve  (via Lemma  1)  in three different ways:  (i)  using  the  exact  values  of coefficients \nIHEli (Fig.  1a), (ii) using the estimate IHEli  :::;  IHli and the values of IHli and (iii) \nusing the w-uniformity bound (8) with minimal value of Cw,m  and as  an \"apprOximation\" \nwith Cw,m  = const = 1.  Both examples of simple learning tasks considered in the paper \nallowed us  to compare  these  results  with the true  learning curves  (or their tight bounds) \nwhich can serve as benchmarks. \n\nFigure 1.a implies that values  of parameter w  in the w-uniformity bound (approximation) \ngoverning  a distribution of error patterns between different error shells  (c.f,  [10)) has  a \n\n\f350 \n\nA.  KOWALCZYK, J.  SZYMANSKI, P. L. BARTLETT, R.  C. WILLIAMSON \n\nsignificant impact on learning curve shapes, changing from  slow decrease to rapid jumps \n(''phase transitions',) in generalization. \nFigure l.b proves that one loses tightness of the bound by using I HI i  rather than I HE Ii , and \neven more is  lost if w-unifonnity bounds  (with variable C W,17l)  are employed.  Inspecting \nFigures  l.b  and  2.a  we  find  that  approximate  approaches  consisting  of replacing  IHElr \nby  a  simple  estimate  (w-uniforrnity)  can  produce  learning  curves  very  close  to  IHli(cid:173)\nlearning curves suggesting that an application of this formalism to learning systems  where \nneither IHElr nor IHlr can by calculated might be possible.  This could lead to a sensible \napproximate theory capturing at least certain qualitative properties of learning curves  for \nmore complex learning tasks. \nGenerally, the results of this paper show that by incorporating the limited knowledge of the \nstatistical distribution of error patterns in the sample space one can dramatically improve \nbounds on the learning curve with respect  to the classical universal estimates  of the VC(cid:173)\ntheory.  This  is  particularly  important for  \"practical\"  training  sample  sizes  (m  ~ 12 x \nVC-dimension) where the VC-bounds are void. \nAcknowledgement.  The permission of Director, Telstra Research Laboratories, to publish \nthis  paper is  gratefully acknowledged.  A.K.  acknowledges  the  support of the Australian \nResearch Council. \nReferences \n\n(1)  S. Amari,  N. Fujita, and S.  Shinomoto.  Four types of learning curves.  Neural Computation, \n\n4(4):605-618, 1992. \n\n(2)  M. Anthony and N. Biggs. Computational Learning Theory. Cambridge University Press, 1992. \n(3)  A. Blumer, A.  Ehrenfeucht,  D.  Haussler, and M .K. Warmuth.  Learnability and  the Vapnik(cid:173)\n\nChervonenkis dimensions. Journal of the ACM, 36:929-965, (Oct.  1989). \n\n(4)  T.M.  Cover.  Geometrical and statistical  properties of linear  inequalities with  applications to \n\npattern recognition.  IEEE Trans . Elec. Comp., EC-14:326-334, 1965. \n\n(5)  D.  Haussler,  M.  Keams,  H.S.  Seung, and N.  Tishby.  Rigorous  learning curve bounds  from \nstatistical  mechanics.  In Proc.  7th  Ann.  ACM Con[.  on Compo  Learn. Theory, pages 76-87, \n1994. \n\n(6)  A.  Kowalczyk.  Estimates of storage capacity of multi-layer perceptron with  threshold  logic \n\nhidden units.  Neural Networks, to appear. \n\n(7)  A.  Kowalczyk.  VC-formalism  with  explicit  bounds  on  error  shells  size  distribution.  A \n\nmanuscript, 1994. \n\n(8)  A. Kowalczykand H.  Ferra.  Generalisation in feedforward networks. Adv. in NIPS  7, The MIT \n\nPress, Cambridge, 1995. \n\n(9)  A.  Kowalczyk, J. Szymanski, and H.  Ferra.  Combining statistical physics with VC-bounds on \n\ngeneralisation in learning systems. In Proc. ACNN'95, Sydney, 1995. University of Sydney. \n\n(10)  A.  Kowalczyk, J.  Szymanski,  and  R.C.  Williamson.  Learning curves  from  a  modified  vc(cid:173)\n\nformalism:  a  case stUdy.  In Proceedings of ICNN'95, Perth  (CD'ROM),  volume  VI,  pages \n2939-2943, Rundle Mall, South Australia, 1995. IEEE'J'Causal Production. \n\nIn Proc . ACNN'95, pages 45-48, Sydney, 1995. University of Sydney. \n\n(11)  J.G. Mauldon. Random division of an interval.  Proc. Cambridge Phil. Soc., 47:331-336,1951. \n(12)  K.R. Muller, M. Finke, N. Murata, and S. Amari. On large scale simulations for learning curves. \n(13)  A.  Sakurai.  n-h-l  networks  store  no  less  n  h + 1  examples  but  sometimes  no  more.  In \nProceedings of the 1992 International Conference on Neural Networks,pagesill-936-ill-941. \nIEEE, June 1992. \n\n(14)  H. Sompolinsky, H.S. Seung, and N. Tishby. Statistical mechanics of learning curves. Physical \n\nReviews, A45:6056-6091, 1992. \n\n(15)  V. Vapnik.  Estimation of Dependences Based on Empirical Data.  Springer-Verlag,  1982. \n(16)  V. Vapnik, E. Levin, and Y. Le Cun. Measuring the VC-dimension ofa learning machine. Neural \n\nComputation,  6 (5):851-876, 1994. \n\n(17)  C. Wang and S.S. Venkantesh.  Temporal dynamics of generalisation in neural networks.  Adv. \n\nin NIPS 7, The MIT Press, Cambridge, 1995. \n\n\f", "award": [], "sourceid": 1086, "authors": [{"given_name": "Adam", "family_name": "Kowalczyk", "institution": null}, {"given_name": "Jacek", "family_name": "Szymanski", "institution": null}, {"given_name": "Peter", "family_name": "Bartlett", "institution": null}, {"given_name": "Robert", "family_name": "Williamson", "institution": null}]}