{"title": "Learning Statistically Neutral Tasks without Expert Guidance", "book": "Advances in Neural Information Processing Systems", "page_first": 73, "page_last": 79, "abstract": null, "full_text": "Learning Statistically  Neutral Tasks \n\nwithout  Expert  Guidance \n\nTon Weijters \n\nInformation Technology, \nEindhoven University, \n\nThe Netherlands \n\nAntal van den Bosch \n\nILK, \n\nTilburg University, \nThe Netherlands \n\nAbstract \n\nEric  Postma \n\nComputer Science, \n\nUniversiteit  Maastricht, \n\nThe Netherlands \n\nIn  this  paper,  we  question  the necessity of levels  of expert-guided \nabstraction  in  learning  hard,  statistically  neutral  classification \ntasks.  We  focus  on two tasks, date calculation and parity-12, that \nare claimed to require intermediate levels of abstraction that must \nbe defined by a human expert.  We challenge this claim by demon(cid:173)\nstrating empirically that a single hidden-layer BP-SOM network can \nlearn both tasks without guidance.  Moreover,  we  analyze the net(cid:173)\nwork's  solution  for  the  parity-12  task  and  show  that  its  solution \nmakes use of an elegant intermediary checksum computation. \n\n1 \n\nIntroduction \n\nBreaking  up  a  complex  task  into  many  smaller  and  simpler  subtasks  facilitates \nits  solution.  Such  task  decomposition  has  proved  to be a  successful  technique  in \ndeveloping  algorithms  and  in  building  theories  of cognition.  In  their  study  and \nmodeling  of the  human  problem-solving  process,  Newell  and  Simon  [1]  employed \nprotocol  analysis  to  determine  the  subtasks  human  subjects  employ  in  solving  a \ncomplex task.  Even nowadays, many cognitive  scientists  take task decomposition, \nLe.,  the  necessity  of  explicit  levels  of  abstraction,  as  a  fundamental  property  of \nhuman problem solving.  Dennis Norris'  [2]  modeling study on the problem-solving \ncapacity of autistic  savants  is  a  case in  point.  In  the study,  Norris  focuses  on  the \ndate-calculation  task  (Le.,  to calculate  the  day  of the  week  a  given  date fell  on), \nwhich  some  autistic  savants  have  been  reported  to  perform  flawlessly  [3].  In  an \nattempt  to  train  a  multi-layer  neural  network  on  the  task,  Norris  failed  to get  a \nsatisfactory level of generalization performance.  Only by decomposing the task into \nthree sub-tasks,  and  training the separate networks  on  each  of the  sub-tasks,  the \ndate-calculation task could be learned.  Norris concluded  that the date-calculation \ntask is solvable (learnable) only when it is decomposed into intermediary steps using \nhuman assistance [2]. \nThe  date-calculation  task  is  a  very  hard  task  for  inductive  learning  algorithms, \nbecause  it  is  a  statistically  neutral  task:  all  conditional  output  probabilities  on \nany  input  feature  have  chance  values.  Solving  the  task  implies  decomposing  it, \nif  possible,  into  subtasks  that  are  not  statistically  neutral.  The  only  suggested \ndecomposition of the date-calculation task known to date involves explicit assistance \n\n\f74 \n\nT.  Weijters,  A.  v.  d.  Bosch and E.  Postma \n\nMFN \n\nSOM \n\n\u2022 \n\n- class A elements \n\no - class B elements \nI!I - unlabelled element \n\nFigure 1:  An  example BP-SOM  network. \n\nfrom  a  human supervisor  [2J.  This paper challenges the decomposition assumption \nby  showing  that  the  date-calculation  task  can  be  learned  in  a  single  step  with  a \nappropriately constrained single hidden-layer neural network.  In  addition,  another \nstatistically  neutral  task,  called  the  parity-n  task  (given  an  n-Iength  bit  string of \n1 's  and  O's,  calculate  whether  the  number  of  1 's  is  even  or  odd)  is  investigated. \nIn  an  experimental  study  by  Dehaene,  Bossini,  and  Giraux  [4],  it  is  claimed  that \nhumans  decompose  the  parity-n  task  by  first  counting over  the  input  string,  and \nthen perform the even/odd decision.  In our study, parity-12 is shown to be learnable \nby  a  network with a  single  hidden layer. \n\n2  BP-SOM \n\nBelow we  give  a brief characterization of the functioning  of BP-SOM.  For details we \nrefer  to  [5J.  The  aim  of the  BP-SOM  learning  algorithm  is  to  establish  a  coopera(cid:173)\ntion between  BP  learning and SOM  learning in order to find  adequately constrained \nhidden-layer  representations for  learning  classification  tasks.  To  achieve  this  aim, \nthe traditional MFN  architecture [6J  is  combined with SOMS  [7]:  each hidden layer of \nthe MFN is associated with one SOM  (See Figure 1).  During training of the weights in \nthe  MFN,  the corresponding SOM  is  trained on the hidden-unit  activation patterns. \n\nAfter  a  number  of training  cycles  of  BP-SOM  learning,  each  SOM  develops  a  two(cid:173)\ndimensional  representation,  that  is  translated  into  classification  information,  i.e., \neach  SOM  element  is  provided  with  a  class  label  (one  of  the  output  classes  of the \ntask).  For  example,  let  the  BP-SOM  network  displayed  in  Figure  1  be  trained  on \na  classification  task  which  maps  instances  to  either  output  class  A  or  B.  Three \ntypes of elements  can be distinguished  in  the  SOM:  elements labelled with class  A, \nelements labelled  with  class  B,  and unlabelled elements  (no  winning class could be \nfound).  The  two-dimensional representation of the  SOM  is  used  as  an  addition  to \nthe standard BP  learning rule [6J.  Classification and reliability information from the \nSOMS  is  included  when  updating the  connection  weights  of the  MFN.  The error of \na  hidden-layer vector is  an accumulation of the error computed by the  BP  learning \nrule,  and  a  SOM-error.  The  SOM-error  is  the  difference  between  the  hidden-unit \nactivation  vector  and  the  vector  of its  best-matching  element  associated  with  the \nsame  class  on the SOM. \n\nAn important effect of including SOM  information in the error signals is that clusters \nof hidden-unit  activation  vectors  of instances  associated  with  the  same  class  tend \nto  become  increasingly  similar  to  each  other.  On  top  of  this  effect,  individual \nhidden-unit activations tend to become more streamlined,  and often end up  having \nactivations near one of a  limited number of discrete values. \n\n\fLearning Statistically Neutral Tasks  without Expert Guidance \n\n75 \n\n3  The date-calculation task \n\nThe  first  statistically  neutral  calculation  task  we  consider  is  the  date-calculation \ntask:  determining  the  day  of the  week  on  which  a  given  date  fell.  (For  instance, \nOctober 24, 1997 fell on a Friday.)  Solving the task requires an algorithmic approach \nthat is  typically hard for  human calculators and requires one or more intermediate \nsteps.  It is  generally  assumed  that the identity of these intermediate  steps follows \nfrom  the  algorithmic  solution,  although  variations exist in  the steps  as  reportedly \nused  by  human  experts  [2] .  We  will  show  that  such  explicit  abstraction  is  not \nneeded,  after reviewing the case for  the necessity of \"human assistance\"  in learning \nthe task. \n\n3.1  Date calculation with expert-based abstraction \n\nNorris  [2]  attempted  to model  autistic  savant  date calculators  using  a  multi-layer \nfeedforward network (MFN)  and the back-propagation learning rule [6].  He intended \nto  build  a  model  mimicking  the  behavior  of the  autistic  savant  without  the  need \neither  to  develop  arithmetical  skills  or  to  encode  explicit  knowledge  about  reg(cid:173)\nularities  in  the  structure  of  dates.  A  standard  multilayer  network  trained  with \nbackpropagation  [6]  was  not able  to solve the date-calculation task.  Although  the \nnetwork  was  able  to  learn  the  examples  used  for  training,  it  did  not  manage  to \ngeneralize to novel  date-day combinations.  In a second attempt Norris split up the \ndate-calculation task in  three simpler subtasks and networks. \n\nU sing the three-stage learning strategy Norris obtained a nearly perfect performance \non the training material and a performance of over 90%  on the test material (errors \nare almost exclusively made on dates falling  in  January or February in leap years). \nHe concludes with the observation that \"The only reason that the network was able \nto learn so well  was because it had some human assistance.\"  [2, p.285].  In addition, \nNorris claims that  \"even if the  [backpropagation]  net did have the right number of \nlayers  there  would  be  no  way  for  the  net to distribute its learning throughout  the \nnet such that each layer learned the appropriate  step in computation.\"  [2,  p.  290]. \n\n3.2  Date calculation without expert-based abstraction \n\nWe  demonstrate that with the  BP-SOM  learning rule, a  single  hidden-layer feedfor(cid:173)\nward  network can  become  a  successful  date calculator.  Our experiment  compares \nthree  types  of learning:  standard  backpropagation  learning  (BP,  [6]),  backpropa(cid:173)\ngation learning with  weight  decay  (BPWD,  [8]),  and  BP-SOM  learning.  Norris  used \nBP  learning  in  his  experiment  which  leads  to  overfitting  [2]  (a considerably lower \ngeneralization accuracy on  new  material as  compared to reproduction accuracy on \ntraining material);  BPWD  learning was included to avoid overfitting. \nThe parameter values  for  BP  (including the number of hidden  units for  each task) \nwere optimized by performing pilot experiments with BP.  The optimal learning-rate \nand momentum values were 0.15 and 0.4,  respectively.  BP,  BPWD, and BP-SOM were \ntrained for  a fixed  number of cycles  m  =  2000.  Early  stopping,  a  common method \nto prevent overfitting, was used in all  experiments with  BP,  BPWD,  and BP-SOM  [9]. \n\nIn  our  experiments  with  BP-SOM,  we  used  the  same  interval  of dates  as  used  by \nNorris,  i.e., training and test dates ranged from  January  1,  1950  to  December  31, \n1999.  We  generated  two  training  sets,  each  consisting of 3,653  randomly selected \ninstances,  i.e.,  one-fifth of all  dates.  We  also generated two corresponding test sets \nand  two  validation  sets  (with  1,000 instances  each)  of new  dates  within  the  same \n50-year period.  In all  our experiments, the training set,  test set,  and validation set \n\n\f76 \n\nT.  Weijters,  A. v.  d.  Bosch and E.  Postma \n\nTable  1:  Average generalization  performances  (plus  standard deviation,  after  '\u00b1'; \naveraged over  ten  experiments)  in  terms of incorrectly-processed training  and test \ninstances,  of BP,  BPWD,  and  BP-SOM,  trained on  the date-calculation task and the \nparity-12 task. \n\nhad  empty  intersections.  We  partitioned  the  input  into  three  fields,  representing \nthe day of the month (31  units), the month  (12  units)  and the year (50  units).  The \noutput is  represented by  7 units, one for  each day of the week.  The  MFN  contained \none  hidden  layer  with  12  hidden  units  for  BP,  and 25  hidden  units  for  BPWD  and \nBP-SOM.  The  SOM  of the  BP-SOM network contained 12  x  12  elements.  Each of the \nthree learning types was tested on  two different  data sets.  Five runs with different \nrandom  weight  initializations  were  performed  on  each  set,  yielding  ten  runs  per \nlearning type.  The averaged classification  errors on the test  material  are reported \nin Table 1. \n\nFrom Table  1 it follows  that the  average classification error of  BP  is  high:  on  test \ninstances  BP  yields  a  classification  error of 28.8%,  while  the classification error  of \nBP  on  training instances is  20.8%.  Compared to the classification error of BP,  the \nclassification  errors  on  both  training  and  test  material  of  BPWD  and  BP-SOM  are \nmuch  lower.  However,  BPWD'S  generalization  performance  on  the test  material  is \nconsiderably worse than its performance on the training material:  a clear indication \nof overfitting.  We  note  in  passing  that  the  results  of  BPWD  contrast  with  Norris' \n[2J  claim that BP is  unable  to learn the date-calculation task when it is  not decom(cid:173)\nposed  into  subtasks.  The  inclusion  of  weight  decay  in  BP  is  sufficient  for  a  good \napproximation of the performance results of Norris' decomposed network. \n\nThe  results  in  Table  1  also  show  that  the  performance  of  BP-SOM  on  test  mate(cid:173)\nrial  is  significantly  better  than that  of  BPWD  (t(19)=7.39,  p<O.OOl);  BP-SOM  has \nlearned the date-calculation task at a  level  well  beyond the average of human date \ncalculators  as  reported by  Norris  [2J.  In contrast  with  Norris'  pre-structured net(cid:173)\nwork,  BP-SOM  does  not rely  on  expert-based levels  of  abstraction for  learning the \ndate-calculation task. \n\n4  The parity-12  task \n\nThe  parity-n  problem,  starting  from  the  XOR  problem  (parity-2),  continues  to \nbe  a  relevant  topic  on  the  agenda of many  neural  network  and  machine  learning \nresearchers.  Its  definition  is  simple  (determine  whether  there  is  an  odd  or  even \nnumber of l's in an n-Iength bit string of l's and O's), but established state-of-the-art \nalgorithms such as C4.5 [1OJ  and backpropagation [6J  cannot learn it even with small \nn, i.e.,  backpropagation fails  with n  2:  4  [l1J.  That is,  these algorithms are unable \nto generalize from  learning instances of a  parity-n task to unseen new  instances of \nthe  same  task.  As  with  date  calculation,  this  is  due  to  the  statistical  neutrality \nof the  task.  The  solution  of the  problem  must  lie  in  having  some  comprehensive \noverview over all  input values at an intermediary step before the odd/even decision \nis  made.  Indeed,  humans appear to follow  this strategy [4J . \n\n\fLearning Statistically Neutral Tasks without Expert Guidance \n\n77 \n\nBP \n\nBPWD \n\nBP-SOM \n\nFigure 2:  Graphic representation of a  7 x 7 SOM  associated with a  BP-trained  MFN \n(left)  and a BPwD-trained MFN  (middle),  and a 7 x 7 SOM  associated with a  BP-SOM \nnetwork  (right),  all  trained on the parity-12 task_ \n\nAnalogous to our study of the date-calculation task presented in Section 3, we apply \nBP,  BPWD,  and  BP-SOM  to  the  parity-n  task_  We  have  selected  n  to  be  12 _ The \ntraining set contained 1,000 different  instances selected at random out of the set of \n4,096  possible  bit  strings.  The  test  set  and  the  validation  set  contained  100  new \ninstances each.  The  hidden  layer of the  MFN  in  all  three  algorithms  contained  20 \nhidden  units,  and  the  SOM  in  BP-SOM  contained  7  x  7  elements.  The  algorithms \nwere run with  10 different  random weight  initializations.  Table  1 displays  the clas(cid:173)\nsification errors on training instances and test instances. \n\nAnalysis of the results shows that BP-SOM  performs significantly better than BP  and \nBPWD  on test material (t(19)=3.42, p<O.Ol  and t(19)=2.42,  p<0.05, respectively). \n(The  average error of 6.2%  made  by  BP-SOM  stems  from  a  single  experiment  out \nof the  ten  performing at  chance  level,  and  the  remaining  nine  yielding  about  1 % \nerror).  BP-SOM  is  able  to  learn  the  parity-12  task quite  accurately;  BP  and  BPWD \nfail  relatively, which  is  consistent  with other findings  [11]. \n\nAs  an additional analysis, we have investigated the differences in hidden unit activa(cid:173)\ntions  after training with the three learning algorithms.  To visualize the differences \nbetween the representations developed at the hidden layers of the MFNS  trained with \nBP,  BPWD,  and  BP-SOM,  we  also  trained  SOMs  with  the  hidden  layer  activities  of \nthe  trained  BP  and  BPWD  networks.  The  left  part of Figure  2  visualizes  the class \nlabelling of the SOM  attached to the BP-trained MFN  after training; the middle part \nvisualizes the SOM  of the BpwD-trained MFN,  and the right part displays the SOM  of \nthe  BP-SOM  network  after training on the same  material.  The SOM  of the  BP-SOM \nnetwork is  much more organized and clustered than that of the SOMs  corresponding \nwith  the BP-trained and BpwD-trained MFNS.  The reliability values of the elements \nof all  three  SOMS  are  represented  by  the width of the  black  and  white  squares.  It \ncan be seen that the overall reliability and the degree of clusteredness of the SOM  of \nthe  BP-SOM  network is  considerably higher than that of the SOM  of the BP-trained \nand  BpwD-trained  MFNS. \n\n5  How parity-12  is  learned \n\nGiven the hardness of the task and the supposed necessity of expert guidance,  and \ngiven  BP-SOM'S  success  in  learning  parity-12  in  contrast,  it  is  relevant  to  analyze \nwhat  solution  was  found  in  the  BP-SOM  learning  process.  In  this  subsection  we \nprovide such  an  analysis,  and  show  that  the  trained  network performs  an elegant \nchecksum calculation at the hidden layer as  the intermediary step_ \n\nAll  elements  of SOMS  of BP-SOM  networks  trained on  the paritY-12  task are  either \nthe  prototype  for  training  instances  that  are  all  labeled  with  the  same  class,  or \n\n\f78 \n\nT.  Weijters,  A. v.  d.  Bosch and E.  Postma \n\nTable  2:  List  of some training instances of the parity-12 task associated with  SOM \nelements  (1,1),  (2,4),  and  (3,3)  of a trained BP-SOM  network. \n\nSOM  (1,1), class-even, reliability  1.0 \n\n1  0 \n\ninl in2 in3 in4 in5 in6 in7 inS in9 inl0 inll in12  checksum \n1 \n0  0 \n1 \n\n0  0  0 \n1  0 \n1 \n1  0 \n0 \n\n0 \n0 \n1  0  0 \n\n0  0 \n1  0  0 \n\n-2 \n-2 \n-2 \n\n1  0 \n\n0 \n1 \n0 \n\n0 \n0 \n0 \n\n0 \n0 \n0 \n\nSOM 12,4), class-odd,  reliability  1.0 \n\ninl in2 in3 in4 in5 in6 in7 inS in9 inl0 inll in12  checksum \n0 \n1 \n1  1 \n1  0 \n\n1 \n1 \n1  0 \n1  1  0 \n\n1  1  0 \n1 \n1  1 \n\n1  1  0 \n1  0 \n\n1  1  0 \n\n-1 \n-1 \n-1 \n\n1 \n1 \n0 \n\n0 \n0 \n1 \n\n0 \n1 \n0 \n\n:OM  (3,3),  class-even,  reliability  1.0 \n\ninl in2 in3 in4 in5 in6 in7 inS in9 inl0 inll in12  checksum \n0 \n0 \n1 \n1 \n1  0 \n\n1  1  0 \n1  0  0 \n1  1 \n\n1 \n1 \n1  1  1 \n\n0 \n1  0 \n\n1  0 \n1 \n\n1  0 \n\n1 \n1 \n1 \n\n1 \n0 \n0 \n\n0 \n1 \n0 \n\n0 \n0 \n0 \n\nII  -\n\n-\n\n- 1+  +  + I -\n\n-\n\n- I +  +  +  II \n\nII \n\nprototype of no instances at all.  Non-empty elements  (the black and white squares \nin  the  right  part  of  Figure  2)  can  thus  be  seen  as  containers  of homogeneously(cid:173)\nlabeled subsets of the training set (i.e., fully  reliable elements).  The first step of our \nanalysis  consists  of collecting,  after  training,  for  each  non-empty  SOM  element  all \ntraining instances  clustered  at that  SOM  element.  As  an  illustration,  Table  2 lists \nsome  training  instances  clustered  at the  SOM  elements  at coordinates  (1,1),  (2,4), \nand  (3,3).  At  first  sight  the  only  common  property  of instances  associated  with \nthe same SOM  element  is  the class  to which  they belong;  e.g.,  all  instances  of SOM \nelement  (1,1)  are even,  all instances of SOM  element  (2,4)  are odd, and all instances \nof SOM  element  (3,3)  are again even. \n\nThe second step of our analysis focuses on the sign of the weights of the connections \nbetween input and hidden units.  Surprisingly, we  find  that the connections of each \nindividual  input  unit  to  all  hidden  units  have  the same  sign;  each  input  unit  can \ntherefore  be  labeled  with  a  sign  marker  (as  displayed  at  the  bottom  of Table  2). \nThis  allows  the  clustering  on  the  SOM  to  become  interpretable.  All  weights  from \ninput unit 1,2,3, 7,8, and 9 to all units of the hidden layer are negative, all weights \nfrom input unit 4,5,6, 10, 11, and 12 to all units of the hidden layer are positive.  At \nthe  hidden  layer,  this  information is  gathered  as  if a  checksum  is  computed;  each \nSOM  element  contains  instances  that  add  up  to  an  identical  checksum.  This  can \nalready  be  seen  using  only  the  sign  information  rather  than  the  specific  weights. \nFor  instance,  all  instances  clustered  at  SOM  element  (1,1)  lead  to  a  checksum  of \n-2  when  a  sum  is  taken  of the  product  of  all  input  values  with  all  weight  signs. \nAnalogously, all instances of cluster (2,4)  count up to -1 and the instances of cluster \n(3,3)  to  zero.  The  same  regularity  is  present  in  the  instances  of  the  other  SOM \nelements. \n\nIn sum,  the  BP-SOM  solution  to the  parity-12 task can  be  interpreted as  to  trans(cid:173)\nform  it  at  the  hidden  layer  into  the  mapping of different,  approximately  discrete, \nchecksums to either class  'even' or 'odd'. \n\n\fLearning Statistically Neutral Tasks  without Expert Guidance \n\n79 \n\n6  Conclusions \n\nWe  have  performed  two  learning experiments  in  which  the  BP-SOM  learning  algo(cid:173)\nrithm  was  trained  on  the  date-calculation  task  and  on  the  parity-12  task.  Both \ntasks  are  hard  to  learn  because  they  are  statistically  neutral,  but  can  be  learned \nadequately  and  without  expert  guidance  by  the  BP-SOM  learning  algorithm.  The \neffect  of the  SOM  part in  BP-SOM  (adequately constrained hidden-layer vectors,  re(cid:173)\nliable  clustering  of vectors  on  the  SOM,  and  streamlined  hidden-unit  activations) \nclearly contributes to this success. \n\nFrom the results of the experiments on the date-calculation task,  we  conclude that \nNorris'  claim  that,  without  human  assistance,  a  backpropagation net  would  never \nlearn the date-calculation task is  inaccurate.  While  BP with weight decay performs \nat  Norris'  target level  of accuracy,  BP-SOM  performs even  better.  Apparently  BP(cid:173)\nSOM  is  able  to  distribute  its  learning  throughout  the  net  such  that  the  two  parts \nof the network  (from input  layer to  hidden  layer,  and from  hidden layer to output \nlayer)  perform the mapping with an appropriate intermediary step. \n\nThe parity-12 experiment exemplified that such a  discovered intermediary step can \nbe  quite  elegant;  it  consists  of the  computation  of a  checksum  via the  connection \nweights  between  the  input  and  hidden  layers.  Unfortunately,  a  similar  elegant \nsimplicity  was  not found  in the connection weights and  SOM  clustering of the date \ncalculation task;  future research will  be aimed at developing  more generic analyses \nfor  trained  BP-SOM  networks,  so  that  automatically-discovered intermediary steps \nmay be made understandably explicit. \n\nReferences \n\n[1]  Newell,  A. and Simon,  H.A.  (1972)  Human  problem  solving.  Engelwood  Cliffs,  NJ: \n\nPrentice-Hall. \n\n[2]  Norris, D.  (1989).  How to build a connectionist idiot (savant) . Cognition, 35, 277-291. \n[3]  Hill, A.  L.  (1975).  An investigation of calendar calculating by an idiot savant.  Amer(cid:173)\n\nican  Journal  of Psychiatry,  132,  557- 560. \n\n[4]  Dehaene,  P., Bossini,  S.,  and Giraux, P.  (1993) .  The mental representation of parity \nand numerical  magnitude.  Journal  of Experimental  Psychology:  General,  122,  371-\n396. \n\n[5]  Weijters, A., Van den Bosch,  A.,  Van den Herik, H.  J. (1997).  Behavioural Aspects of \nCombining Backpropagation Learning and Self-organizing Maps.  Connection Science, \n9,235-252. \n\n[6]  Rumelhart, D.  E.,  Hinton,  G.  E. , and Williams,  R.  J.  (1986).  Learning internal rep(cid:173)\nresentations  by  error propagation.  In D.  E.  Rumelhart and J . L.  McClelland  (Eds.), \nParallel Distributed Processing:  Explorations  in the  Microstructure  of Cognition,  vol(cid:173)\nume  1:  Foundations (pp. 318-362).  Cambridge,  MA:  The MIT  Press. \n\n[7]  Kohonen,  T.  (1989).  Self-organisation  and  Associative  Memory.  Berlin:  Springer \n\nVerlag. \n\n[8]  Hinton, G.  E.  (1986).  Learning distributed representations of concepts. In  Proceedings \nof the  Eighth Annual Conference  of the  Cognitive Science  Society,  1-12. Hillsdale,  NJ: \nErlbaum. \n\n[9]  Prechelt, L.  (1994).  Probenl:  A  set of neural network benchmark problems  and bench(cid:173)\nmarking  rules.  Technical  Report  24/94,  Fakultat  fUr  Informatik,  Universitat  Karl(cid:173)\nsruhe, Germany. \n\n[10]  Quinlan,  J.  R.  (1993) .  C4.5:  Programs  for  Machine  Learning.  San Mateo,  CA:  Mor(cid:173)\n\ngan Kaufmann. \n\n[11]  Thornton,  C.  (1996).  Parity:  the problem that won't  go  away.  In  G.  McCalla  (Ed.), \n\nProceeding  of AI-96, Toronto,  Canada  (pp.  362-374).  Berlin:  Springer Verlag. \n\n\f", "award": [], "sourceid": 1780, "authors": [{"given_name": "Ton", "family_name": "Weijters", "institution": null}, {"given_name": "Antal", "family_name": "van den Bosch", "institution": null}, {"given_name": "Eric", "family_name": "Postma", "institution": null}]}