{"title": "Designing Application-Specific Neural Networks Using the Genetic Algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 447, "page_last": 454, "abstract": null, "full_text": "Designing Application-Specific Neural Networks \n\n447 \n\nDesigning Application-Specific \n\nNeural Networks \n\nUsing the  Genetic Algorithm \n\nSteven A.  Harp, Tariq Samad, Aloke  Guha \n\nHoneywell SSDC \n\n1000 Boone Avenue North \nGolden Valley, MN 55427 \n\nABSTRACT \n\nWe  present  a  general  and  systematic  method  for  neural  network \ndesign  based  on  the  genetic  algorithm.  The  technique  works  in \nconjunction  with  network  learning  rules,  addressing  aspects  of \nthe  network's  gross  architecture,  connectivity,  and  learning  rule \nparameters.  Networks  can  be  optimiled  for  various  application(cid:173)\nspecific  criteria, such  as  learning speed, generalilation,  robustness \nand  connectivity.  The  approach  is  model-independent.  We \ndescribe  a  prototype  system,  NeuroGENESYS,  that employs  the \nbackpropagation  learning  rule.  Experiments  on  several  small \nproblems  have  been  conducted.  In  each  case,  NeuroGENESYS \nhas  produced  networks  that perform significantly  better than  the \nrandomly  generated  networks  of its initial population.  The  com(cid:173)\nputational feasibility  of our approach  is  discussed. \n\n1  INTRODUCTION \nWith  the  growing  interest  in  the  practical  use  of neural networks,  addressing  the \nproblem  of  customiling  networks  for  specific  applications  is  becoming  increas(cid:173)\ningly  critical.  It  has  repeatedly  been  observed  that  different  network  structures \nand  learning  parameters  can  substantially  affect  performance.  Such  important \naspects  of neural  network  applications  as  generalilation,  learning  speed,  connec(cid:173)\ntivity  and  tolerance  to  network  damage  are  strongly  related  to  the  choice  of \n\n\f448 \n\nHarp, Samad and Guha \n\nnetwork  architecture.  Yet  there  are  few  analytic  results,  and  few  heuristics,  that \ncan help  the application developer  design  an appropriate network. \nWe  have  been  investigating  the  use  of  the  genetic  algorithm  (Goldberg,  1989; \nHolland,  1975)  for  designing  application-specific  neural  networks  (Harp,  Samad \nand  Guha,  1989ab).  In  our  approach,  the  genetic  algorithm  is  used  to  evolve \nappropriate  network  structures  and  values  of  learning  parameters.  In  contrast, \nother  recent  applications of the  genetic  algorithm  to  neural  networks  (e.g.,  Davis \n[1988],  Whitley  [1988])  have largely  restricted  the role  of the genetic  algorithm to \nupdating  weights  on  a  predetermined  network  structure-another \nlogical \napproach. \nSeveral  first-generation  neural  network  application  development  tools  already \nexist.  However,  they  are  only  partly  effective: \nthe  complexity  of  the  problem, \nour  limited  understanding  of  the  interdependencies  between  various  network \ndesign  choices,  and  the  extensive  human  effort  involved  permit  only  limited \nexploration  of the  design  space.  An  objective  of our  research  is  the  development \nof  a  next-generation  neural  network  application  development  tool  that  can  syn(cid:173)\nthesise  optimised custom networks.  The genetic  algorithm has  been  distinguished \nby its relative  immunity to high  dimensionality,  local minima and noise,  and it is \ntherefore a  logical candidate for  solving the network optimilation problem. \n\n2  GENETIC SYNTHESIS  OF  NEURAL NETWORKS \nFig.  1  outlines  our  approach.  A  network  is  represented  by  a  blueprint-a  bit(cid:173)\nstring  that  encodes  a  number  of characteristics  of the  network,  including  struc(cid:173)\ntural properties  and  learning  parameter  values.  Each  blueprint  directs  the  crea(cid:173)\ntion  of an  actual  network with  random  initial weights.  An  instantiated  network \nis  trained  using  some  predetermined  training  algorithm  and  training  data,  and \nthe  trained  network  can  then  be  tested  in  various  ways-e.g.,  on  non-training \ninputs,  after  disabling  some  units,  and  after  perturbing  learned  weight  values. \nMter testing,  a  network  is  evaluated-a  fitneu  estimate  is  computed  for  it based \non  appropriate  criteria.  This  process  of  instantiation,  training,  testing  and \nevaluation is  performed for  each of a  population of blueprints. \nMter the entire  population is  evaluated,  the  next  generation of blueprints is  pro(cid:173)\nduced.  A  number of genetic  operator3 are employed,  the most prominent of these \nbeing  crouotler,  in which  two  parent blueprints are spliced  together to produce  a \nchild  blueprint  (Goldberg,  1989).  The  higher  the  fitness  of  a  blueprint,  the \ngreater  the  probability  of it  being selected  as  a  parent for  the subsequent genera(cid:173)\ntion.  Characteristics  that are  found  useful  will thereby tend  to be  emphasized  in \nthe next generation, whereas harmful ones will tend to be  suppressed. \nThe  definition  of network  performance  depends  on  the  application.  If the  appli(cid:173)\ncation  requires  good  generalilation  capabilities, \nthe  results  of  testing  on \n(appropriately  chosen)  non-training  data  are  important.  If a  network  capable  of \nreal-time  learning  is  required,  the  learning  rate  must  be  optimiled.  For  fast \nresponse,  the  sile  of  the  network  must  be  minimized.  If hardware  (especially \nVLSI)  implementation  is  a  consideration,  low  connectivity  is  essential.  In  most \napplications  several  such  criteria  must  be  considered.  This  important  aspect  of \nIn  our \napplication-specific  network  design  is  covered  by  the  fitness  function. \napproach,  the fitness  of a  network can  be  an arbitrary function  of several distinct \n\n\fDesigning Application-Specific Neural Networks \n\n449 \n\nSampling & Synthesis \n\nof Network \n-Blueprints\u00b7 \n\nGenetic \nAlgorithm \n\nblueprint \n\nfitness \n\nestimates \n\nNetwork \n\nPerformance \nEvaluation \n\ntesting \n\nI \n\nTest Stimuli  L...-_--l \n\nFigure 11  A  population ot network  ~lueprint8\" 18  eyelically \n\nupdated by the genetic algorithm baaed on their fitne88. \n\nperformance and cost criteria, some  or  all of which can thereby  be simultaneously \noptimized. \n\n3  NEUROGENESYS \nOur  approach  is  model-independent:  it  can  be  applied  to  any  existing  or  future \nneural  network  model  (including  models  without  a  training  component).  As  a \nfirst  prototype  implementation  we  have  developed  a  working  system  called  Neu(cid:173)\nroGENESYS.  The  current  implementation  uses  a  variant  (Samad,  1988)  of  the \nbackpropagation  learning  algorithm  (Werbos,  1974;  Rumelhart,  Hinton,  and \nWilliams,  1985)  as  the  training  component  and  is  restricted  to  feedforward  net(cid:173)\nworks. \nWithin  these  constraints,  NeuroGENESYS  is  a  reasonably  general  system.  Net(cid:173)\nworks  can  have  arbitrary  directed  acyclic  graph  structures,  where  each  vertex  oC \nthe  graph  corresponds  to  an  4re4 or layer  oC  units  and  each  edge  to  a  projection \nCrom  one  area  to  another.  Units  in  an  area  have  a  spatial  organization; \nthe \ncurrent  system  arrays  units  in  2  dimensions.  Each  projection  specifies  indepen(cid:173)\ndent  radii  oC  connectivity,  one  Cor  each  dimension.  The  radii  of  connectivity \nallow  localized  receptive  field  structures.  Within  the  receptive  fields  connection \ndensities can  be  specified.  Two  learning parameters are  associated with both pro(cid:173)\njections  and  areas.  Each  projection  has  a  learning  rate  parameter  (\"11\"  in  back(cid:173)\npropagation)  and  a  decay  rate  Cor  11.  Each  area  has  11  and  11-decay  parameters \nfor  threshold weights. \nThese network characteristics are encoded  in the genetic  blueprint.  This bitstring \nis  composed  oC  several  segments,  one  Cor  each  area.  An  area segment  consists  of \nan  area  parameter  specification  (APS)  and  a  variable  number  of  projection \n\n\f450 \n\nHarp, Samad and Guha \n\nspecification  fields  (PSFs),  each  of which  describes  a  projection  from  the  area  to \nsome  other  area.  Both  the APS  and  the  PSF  contain  values  for  several  parame(cid:173)\nters  Cor  areas  and  projections  respectively.  Fig.  2  shows  a  simple  area  segment. \nNote  that  the  target  of a  projection  can  be  specified  through  either  Ab\"olute  or \nRelative  addressing.  More  than  one  projections  are  possible  between  two  given \nareas;  this  allows  the  generation  of  receptive  field  structures  at  different  scales \nand with different connection densities,  and it also  allows  the system  to model the \neffect  of larger  initial  weights.  In our current implementation,  all  initial weights \nare  randomly  generated  small  values  from  a  fixed  uniform  distribution.  In  the \nnear  future,  we  intend  to  incorporate  some  aspects  of  the  distribution  in  the \ngenetic  blueprint. \n\n~ AroaN \n-\n\n~ \n\nPROJEdTioN \n~arameters \n\nX-Share \nV -Share----' \n\nInitial Threhsold Eta-----' \nThreshold Eta Decay ----....I \n\nstart of ProjectiOn Marker - -..... \n\nConnection Density \nInitial Eta \nEla Decay \n\n--\n\n--\n\nX-Radius \nV-Radius \nT arget Address \nAddress Mode \n\nFigure 3.  Network Blueprint Representation \n\nIn  NeuroGENESYS,  the  score  of  a  blueprint  is  computed  as  a  linear  weighted \nsum of several performance  and cost criteria,  including  learning speed,  the results \nof testing  on  a  \"test set\",  the  numbers  of units  and  weights  in  the  network,  the \nresults  of  testing  (on  the  training  set)  after  disabling  some  of  the  units,  the \nresults  of testing  (on  the  training  set)  after  perturbing  the  learned  weight values, \nthe  average  fanout  of the  network,  and  the  maximum  fanout  for  any  unit  in  the \nnetwork.  Other  criteria  can  be  incorporated  as  needed.  The  user  of  Neuro(cid:173)\nGENESYS  supplies  the  weighting  factors  at  the  start of  the  experiment,  thereby \ncontrolling which  aspects of the network  are  to  be  optimized. \n\n4  EXPERIMENTS \nNeuroGENESYS  can  be  used  for  both  classification  and  function  approximation \nproblems.  We have conducted experiments on  three  classification  problems-digit \nrecognition  from  4x 8  pixel  images,  exclusive-OR  (XOR),  and  simple  convexity \n\n\fDesigning Application-Specific Neural Networks \n\n451 \n\ndetection;  and  one  function  approximation problem-modeling one cycle  of a  sine \nfunction.  Various  combinations  of  the  above  criteria  have  been  used.  In  most \nexperiments  NeuroGENESYS  has  produced  appropriate  network  designs  in  a \nrelatively small number of generations  \u00ab  50). \nOur first  experiment was  with  digit  recognition,  and NeuroGENESYS  produced  a \nsolution  that  surprised  us:  The  optimized  networks  had  no  hidden  layers  yet \nlearned  perfectly.  It had  not  been  obvious  to  us  that this  digit recognition  prob(cid:173)\nlem  is  linearly  separable.  Even  in  the  simple  case  of  no-hidden-Iayer  networks, \nour earlier  remarks  on application-specific  design  can be  appreciated.  When Neu(cid:173)\nroGENESYS  was  asked  to  optimile  for  average  fanout  for  the  digit  recognition \ntask  as  well  as  for  perfect  learning,  the  best  network  produced  learned  perfectly \n(although  comparatively  slowly)  and  had  an  average  fanout  of three  connections \nper  unit;  with  learning  speed  as  the sole  optimization  criterion,  the  best  network \nproduced  learned substantially  faster  (48  iterations)  but it had  an  average  fanout \nof almost  an order of magnitude higher. \nIn  this \nThe  XOR  problem,  of  course,  is  prototypically  non-linearly-separable. \ncase,  NeuroGENESYS  produced  many \nthat  had  a \n\"bypass\"  connection from  the input layer directly  to  the output layer  (in  addition \nto  connections  to  and  from  hidden  layers);  it is  an  as  yet  unverified  hypothesis \nthat these  bypass connections accelerate learning. \nIn  one  of  our  experiments  on  the  sine  function  problem,  NeuroGENESYS  was \nasked  to  design  networks  for  moderate  accuracy-the  error cutoff during  training \nwas  relatively  high.  The  networks  produced  typically  had  one  hidden  layer  of \ntwo  units,  which  is  the  minimum  possible  configuration  for  a  sufficiently  crude \napproximation.  When  the experiment was  repeated  with  a  low error cutoil',  intri(cid:173)\ncate  multilayer structures were  produced  that were  capable of modeling  the train(cid:173)\ning  data  very  accurately  (Fig.  3).  Fig.  4  shows  the  learning  curve  for  one  sine \nfunction  experiment.  The\" Average\"  and  \"Best\"  scores  are over all individuals  in \nthe  generation,  while  \"Online\"  and  \"amine\"  are  running  averages  of  Average \nand  Best,  respectively.  Performance  on  this  problem  is  quite  sensitive  to  initial \nweight  values,  hence  the  non-monotonicity  oC  the  Best  curve.  Steady  progress \noverall was still being observed when  the experiment was  terminated. \nWe  have  conducted  control  studies  using  random  search  (with  best  retention) \ninstead  of  the  genetic  algorithm.  The  genetic  algorithm  has  consisten tly  proved \nsuperior.  Random  search  is  the  weakest  possible  optimilation procedure,  but  on \nthe  other  hand  there  are  few  sophisticated  alternatives  for  this  problem-the \nsearch space is  discontinuous,  largely unknown,  and highly nonlinear. \n\nfast-learning  networks \n\n5  COMPUTATIONAL EFFICIENCY \nOur  approach  requires  the  evaluation  of  a  large  number  of  networks.  Even  on \nsome  of our  small-scale  problems,  experiments  have  taken  a  week  or  longer,  the \nbottleneck  being  the  neural  network  training  ~lgorithm.  While  computational \nfeasibility  is  a  real  concern,  Cor  several  reasons  we  are  optimistic  that  this \napproach will  be  practical for  realistic  applications: \n\u2022 \n\nThe  hardware  platform  for  our  experiments  to  date  has  been  a  Symbolics \ncomputer  without  any  floating-point  support.  This  choice  has  been  ideal \n\n\f-.! ... ~~ \n\n' . 9a \n\nP . 88 \n\nI \n1.4' \n1 . 34 \n\nJ.69 \n\n4'48 \n\n18.6' \n\n29 . 65 \n12.43 \n19.'8 \n29 . 89 \n\n14 \n18 \n~~~~ \n, \n22 \n2 \n, \n1'4 \n2 \n14 \n19 \n18 \n2 \n8 \n\n.'.41 \n21 . 3'  U_ \n\n5 . 9' \nJ . 18 \n5.98 \n5 . 9' \n7  5~ \n5 . 98 \n5.98 \n5 . 83 \nJ.39 \n5 . 88 \nS . 99 \n\n9 . 58 \n9 . 2' \na.45 \n1.4' \n1 . 6' \n1 . 46 \nI.\" \n1 . 4' \n9 . 31 \n1 . 4' \n8 . a9 \n\n19999 \n2956 \n19999 \n19999 \n4632 \n19099 \n5\"4 \n19999 \n5384 \n\n29.99 \n15 . 31 \n21.93 \n21 . 54 \n\n9.11 \n\nS \n\n\u2022 \n\n11 \n34 \n\nCJ \n13' \n1 \n8 \n36 \n2 \n32 \n11 \n15 \n2 \nQ \n\n5.09 \n12.99 \n1 . 99 \n6 . SU \n'\n. 00 \n2.99 \n5 . 91 \n'\n. 91 \n5 . 9a \n2. 99 \n9.89 \n\nt  92 \n9 . 99 \na . 99 \n9.99 \nB.la \n8 . S9 \n9 . 99 \n9 . 99 \na.a \n9 . 89 \n9.81 \n\nr PROJ-'7\u00b0U'PUr-AilEil \n\nPROJ- 8 \n\n/, \n\nPJPOJ-4  A  A-\n\n452 \n\nHarp, Samad and Guha \n\nGENESYS \n\n\u2022 \n\ntc  IU90~ teNt \n\nIon'  pe-r  IluP\\ :  49 \n.lton  ~ he:  39 \nC\"0'50\\l.\"  ) :  a.8 \nof  c:rO'SO\\le'r  pt s :  1  Z \n\n\"'-.JtetlC)f\"l  ):  a,31 \non  Rete:  9.81 \nI\"trons:  T., \n\n1'10 \n\n81n  eac.h  ~e\"e .... t  ion:  Ye,  \"0 \n\n~ PROJ-9 \n\nPROJ-I \n\nHPUI-\n\n-1 \n\n~,of-\nI PROJ\n\n-\n\n6 \n\n~q\u00a7i~::AZJGiibL::miC:::::=========:) \njAr \u2022\u2022  II: \n\n1't!OJ-2 \n\nI \n\nPIfOJ - 31 \n\n/ \n' \n\n/ \n\ntot.I \n\n.. h.'  12 ..  II  te 3214128 \n\nItf'enaton  1 : \n\nt  2\" I  18321114  128 \nDt.....,.to\"  2:  '2\"\" til  32  S4  1'8 \nIntti.l  Et.  n\"lre.hold :  0. 10.20\" a.,  1 II 3. 21. ' \n,,,,, \u2022 .nold  (t..  &1008 :  '  \u2022\u2022  0.002  0004  0008 a.QUI  0.032  a. olU 0. t21 \n\n' 2.1 \n\n(Mtt \n\nAbort \n\nAbort \nIlun \n\nBral..., \u2022\u2022 h \n\nSav. \n\nChart \nShe... \n\nCl.... \nStAtu. \n\nContinue \n\nLaM \n\nFigure I.  The  NeuroGENESYS interfaee, showing  a  network strueture \n\noptimised tor the sine tUnetion  problem \n\n\u2022 \n\n\u2022 \n\nfor  program  development,  and  NeuroGENESYS'  user  interface  features \nwould  not  have  been  possible  without  it,  but  the  performance  penalty  has \nbeen severe  (relative to machines with floating  point hardware). \nThe  genetic  algorithm  is  an  inherently  parallel  optimization  procedure,  a \nfeature  we  soon  hope  to  take  advantage  of.  We  have  recently  implemented \na  networked  version  of  NeuroGENESYS  that  will  allow  us  to  retain  the \ndesirable  aspects  of  the  Symbolics  version  and  yet  achieve  substantial \nspeedup  in  execution  (we  expect  two  to  three  orders  of magnitude):  up  to \n30  Apollo  workstationst  a  VAX,  and  10  Symbolics  computers  can  now  be \nevaluating different networks in  parallel  (Harp, Samad and  Guha,  1990). \nThe  current  version  of  NeuroGENESYS  employs  the  backpropagation \nlearning  rule,  which  is  notoriously  slow  for  many  applications.  However, \nfaster-learning  extensions  of  backpropagation  are  continually  being \ndeveloped.  We  have  incorporated  one  recent  extension  (Samad,  1988),  but \nothers,  especially  common  ones  such  as  including  a  \"momentum n  term  in \nthe  weight  update  rule  (Rumelhart,  Hinton  and  Williams,  1985),  could  also \nbe  considered.  More  generally,  learning  in  neural  networks  is  a  topic  of \nintensive  research  and  it  is  likely  that  more  efficient  learning  algorithms \nwill  become  popular in the near future. \n\n\f\u2022 /;  . \n\n;'  ~ \nr. \n, \n, \ni  i \n~  ;, \n; \ni  ! ; ,  \n,  ! ; ,  \n! ;   i \n.. \n.. \n~; \n~ \n!..i \n~ \n!  i \n! i ,  \n!  , .  \n,/  \\  / \n,  . \n~, \n. ;  \n.. \n;! \n\ni \n\\ \n\ni  i \ni \nI \n\\ \n\\ \n~ , \nI \ni \ni \n\nI t , , , , \n.. .. , \n\n., \n\n.'~ \n\n,-, \ni \n' \n,~. \n\n,. \n.\", .\u2022 / \n\n. '  \n\nI \n\n, \n., '. '~ \n\n\u00b7\u00b70- best \n- 0- average \n-+- offline \n-+- online \n\n6 \n\n2 \n\n\" \n...... \n\nt \n; \n\n.\"'. \ni \n, \n; \n\n,-, \n\\ \n.' \n\nA\" \n\n'e .... \n\n......... \n,.. \n. \n\nDesigning Application-Specific Neural Networks \n\n453 \n\n8~----------------------------------------------------~,~,----~ \n\nAccuracy  on the SINE  Function \n\n_ \n\n...a. 'a-\",  4.a.. -.... \n\n.,0- .... \n\n, \nA.. \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\no \n\n10 \n\n20 \n\nGeneration \n\n30 \n\nFigure 41  A  learning  curve for the Bine  function  problem \n\nThe  genetic  algorithm  is  a.n  active  field  of  research  itself.  Improvements, \nmany  or  which  are  concerned  with  convergence  properties,  are  frequently \nbeing  reported  a.nd  could  reduce  the  computational  requirements  (or  its \napplication significantly. \nThe  genetic  algorithm  is  an  iterative  optimization  procedure  that,  on  the \naverage,  produces  better  solutions  with  each  passing  generation.  Unlike \nsome  other  optimilation  techniques,  userul  results  can  be  obtained  during  a \nrun.  The  genetic  algorithm  can  thus  take  advantage  of whatever  time  and \ncomputational resources  are available  ror  an application. \nJust  as  there  is  no strict termination  requirement  for  the  genetic  algorithm, \nthere  is  no  constraint  on  its  initialilation.  In  our  experimen ts,  the  zeroth \ngeneration  consisted  or  randomly  generated  networks.  Not  surprisingly, \nalmost  all  or  these  are  poor  perrormers.  However,  better  better  ways  of \nselecting  the  initial population  are  possible.  In  particular,  the  initial  popu(cid:173)\nlation  can  consist  or manually  optimiled  networks.  Manual  optimization  of \nneural  networks  is  currently  the  norm,  but  it  leaves  much  or  the  design \nspace  unexplored.  Our  approach  would  allow  a  human  application \ndeveloper  to  design  one  or  more  networks  that  could  be  the  starting  point \nfor  further,  more  systematic  optimization  by  the  genetic  algorithm.  Other \ninitialization approaches  are  also  possible,  such  as  using optimized networks \nfrom  similar  applications,  or  using  heuristic  guidelines  to  generate  net(cid:173)\nworks. \n\nIt  should  be  emphasized  that computational efficiency  is  not  the  only  factor  that \nmust  be  considered  in  evaluating  this  (or  any)  approach.  Others  such  as  the \npotential  for  improved  perrormance  or neural  network  applications  and  the  costs \n\n\f454 \n\nHarp, Samad and Guha \n\nand  benefits  associated  with  alternative  approaches  for  designing  network  appli(cid:173)\ncations are  also critically important. \n\n6  FUTURE RESEARCH \nIn  addition to  running  further  experiments,  we  hope  in  the  future  to  develop  ver(cid:173)\nsions  of NeuroGENESYS  for  other  network  models,  including hybrid  models  that \nincorporate supervised  and unsupervised  learning components. \n\nSpace  restrictions  have  precluded  a  detailed  description  of NeuroGENESYS  and \nour  experiments.  The  interested  reader  is  referred  to  (Harp,  Samad,  and  Guha, \n1989ab,  1990). \n\nReferences \nIn \nDavis,  L.  (1988).  Properties  of  a  hybrid  neural  network-classifier  system. \nAdvcuz.cu  in  Neura.l  Information  Proceuing  Sydem8  1,  D.S.  Touretlky  (Ed.). \nSan Mateo:  Morgan Kaufmann. \nGoldberg,  D.E.  (1989).  Genetic  Algorithm8  in Search,  Optimization  and  Machine \nLearning.  Addison-Wesley. \nHarp,  S.A.,  T.  Samad,  and  A.  Guha  (1989a).  Towards  the  genetic  synthesis  of \nneural  networks.  Proceeding8  of the  Third  International  Conference  on  Genetic \nAlgorithm8,  J.D.  Schaffer  (ed.).  San Mateo:  Morgan Kaufmann. \nHarp,  S.A.,  T.  Samad,  and  A.  Guha  (1989b).  Genetic  Synthui8  of Neura.l  Net(cid:173)\nwork8.  Technical  Report  14852-CC-1989-2.  Honeywell  SSDC,  1000  Boone  Ave(cid:173)\nnue North, Golden Valley, MN 55427. \nHarp,  S.A.,  T.  Samad,  and  A.  Guha  (1990).  Genetic  synthesis  of neural network \narchitecture. \nIn  The  Genetic  Algorithm8  Handbook,  L.D.  Davis  (Ed.).  New \nYork:  Van Nostrand Reinhold.  (To appear.) \nHolland,  J.  (1975).  Adaptation  in  Natural  and  Artificial  Sydem,.  Ann  Arbor: \nUniversity of Michigan Press. \nRumelhart,  D.E.,  G.E.  Hinton,  and  R.J.  Williams  (1985).  Learning  Interna.l \nRepruentation,  by  Error-Propagation,  ICS  Report  8506,  Institute  for  Cognitive \nScience,  UCSD, La Jolla,  CA. \nSamad,  T.  (1988).  Back-propagation  is  significantly  faster  if the  expected  value \nof the source unit is  used  for  update.  Neural Network8,  1,  Sup.  1. \nWerbos,  P.  (1974).  Beyond Regru8ion:  New  Tool8  for  Prediction  and  AnalY8i8 \nin  the  Behavioral  Sciencu.  Ph.D.  Thesis,  Harvard  University  Committee  on \nApplied Mathematics,  Cambridge, MA. \nWhitley,  D.  (1988).  Applying  Genetic  Algorithm8  to  Neural  Net  Learning. \nTechnical  Report  CS-88-128,  Department  of  Computer  Science,  Colorado  State \nUniversity. \n\n\f", "award": [], "sourceid": 263, "authors": [{"given_name": "Steven", "family_name": "Harp", "institution": null}, {"given_name": "Tariq", "family_name": "Samad", "institution": null}, {"given_name": "Aloke", "family_name": "Guha", "institution": null}]}