{"title": "Evolving Learnable Languages", "book": "Advances in Neural Information Processing Systems", "page_first": 66, "page_last": 72, "abstract": "", "full_text": "Evolv .......... JIiIIIIIIo.\n\nBradley Tookes\n\nAlan Blair\n\nDept of Comp. Sci. and Elec. Engineering\n\nDepartment of Computer Science\n\nUniversity of Queensland\n\nQueensland, 4072\n\nAustralia\n\nbtonkes@csee.uq. edu. au\n\nUniversity of Melbourne\nParkville, Victoria, 3052\n\nAustralia\n\nblair@cs.mu. oz. au\n\nJanet Wiles\n\nDept of Comp. Sci. and Elec. Engineering\n,\n\n, School of Psychology\nUniversity of Queensland\n\nQueensland, 4072\n\nAustralia\n\njanetw@csee.uq. edu. au\n\nAbstract\n\nRecent theories suggest that language acquisition is assisted by the\nevolution of languages towards forms that are easily learnable. In\nthis paper, we evolve combinatorial languages which can be learned\nby a recurrent neural network quickly and from relatively few ex(cid:173)\namples. Additionally, we evolve languages for generalization in\ndifferent \"worlds\", and for generalization from specific examples.\nWe find that languages can be evolved to facilitate different forms\nof impressive generalization for a minimally biased, general pur(cid:173)\npose learner. The results provide empirical support for the theory\nthat the language itself, as well as the language environment of a\nlearner, plays a substantial role in learning: that there is far more\nto language acquisition than the language acquisition device.\n\n1\n\nIntroduction: Factors in language learnability\n\nIn exploring issues of language learnability, the special abilities of humans to learn\ncomplex languages have been much emphasized, with one dominant theory based\non innate, domain-specific learning mechanisms specifically tuned to learning hu(cid:173)\nman languages. It has been argued that without strong constraints on the learning\nmechanism, the complex syntax of language could .not be learned from the sparse\ndata that a 'child observes [1]. More recent theories challenge this claim and em(cid:173)\nphasize the interaction between learner and environment [~]. In addition to these\ntwo theories is the proposal that rather than \"language-savvy infants\", languages\nthemselves adapt to human learners, and the ones that survive are \"infant-friendly\nlanguages\" [3-5]. To date, relatively few empirical studies have explored how such\nadaptation of language facilitates learning. Hare and Elman [6] demonstrated that\n\n\fEvolving Learnable Lan~ages\n\n67\n\nclasses of past tense forms could evolve over simulated generations in response to\nchanges in the frequency of verbs, using neural networks. Kirby [7] showed, using\na symbolic system, how compositional languages are more likely to emerge when\nlearning is constrained to a limited set of examples. Batali [8] has evolved recurrent\nnetworks that communicate simple structured, concepts.\nOur argument is not that humans are general purpose learners. Rather, current\nresearch questions require exploring the nature and extent of biases that learners\nbring to language learning, and the ways in which languages exploit those biases\n[2]. Previous theories suggesting that many aspects of language were unlearnable\nwithout strong biases are graduallybrealdng down as new aspects of language are\nshown to be learnable with much weaker biases. Studies include the investigation\nof how languages may exploit biases as subtle as attention ~d memory limitations\nin children [9]. A complementary study has shown that general purpose learners\ncan evolve biases in the form of initial starting weights that facilitate the learning\nof a family of recursive languages [10]..\n\nIn this paper we present an empirical paradigm for continuing the exploration of fac(cid:173)\ntors that contribute to language learnability. The paradigm we propose necessitates\nthe evolution of languages comprising recursive sentences over symbolic strings (cid:173)\nlanguages whose sentences cannot be. conveyed without combinatorial composition\nof symbols drawn from a finite alphabet. The paradigm is not based on any specific\nnatural language, but rather, it is the simplest task we could find to illustrate the\npoint that languages with compositional structure can be evolved to be learnable\nfrom few sentences.. The simplicity of the communication task allows us to analyze\nthe language and its generalizability, and highlight the nature of the generalization\nproperties.\n\nWe start with the evolution of a recursive language that can be learned easily from\nfive sentences by a minimally biased learner. We then address issues of robust\nlearning of evolved languages, showing that different languages support generaliza(cid:173)\ntion in different ways. We also address a factor to which scant regard has been\npaid, namely that languages may evolve not just to their learners, but also to be\neasily generalizable from a specific set of concepts. It seems almost axiomatic that\nlearning paradigms should sample randomly from the training domain. It may be\nthat human languages are not learnable from random sentences, but are easily gen(cid:173)\neralizable from just those examples that a child is likely to be exposed to in its\nIn the third series of simulations, we test whether a language can\nenvironment.\nadapt to be learnable from a core \u00b7set of concepts.\n\n2 A paradig:m for exploring language learnability\n\nWe consider a simple language task in which two recurrent neural networks try to\ncommunicate a \"concept\" represented by a point in the unit interval, [0, 1] over a\nsymbolic\u00b7 channeL An encoder network sends a sequence of symbols (thresholded\noutputs) for each concept, which a decoder network receives and processes back into\na concept (the framework is described in greater detail in [11]). For communication\nto be successful, the decoder's output should approximate the encoder's input for\nall concepts.\nThe architecture for the encoder is a recurrent network with one input unit and\nfive output units, and with recurrent connections from both the output and hidden\nunits back to the hidden units. The encoder produces a sequence of up to five\nsymbols (states of the output units) taken from ~ = {A, ....., J}, followed by the $\nsymbol, for each concept taken from .[0, 1]. To encode a value x E [0,1], the network\n\n\f68\n\nB. Tonkes, A. Blair and J. Wiles\n\n~~------\n\ne\n\nS\n\nI\n\nI\n\nB\nI\nE\n\ntm\n\ne\n\nA E Wl\n\n~ ,,---A-- i ~ \"~~Lf\nIE e E\nA\n/ ' \" ~ Il\\ I f\\ ~ I\nI I 1/'\\\n1B1 E AB C Be E E A C BC A BB ~\nC [BI B A ECAEB B Q A E E\nB\n~/\\I\\II\\IIIIII\\II/\\IIl\\/\\/\\1l\\1/\\111l\\11/\\/\\IIl\\II/\\II/\\\niiiiiiffliiiiiiiiffliiiiil liiiii fiiiiiiiiiiiiiiiiiiiiiiiff\n\nSSSSSssOOSSSS$$$SSsOOSS$$$SSmSSS$SS mSSSSSSSSSS$SSSSSSSSSSSssm\n\nB ABC E B\n\nE\nI II A\\ I\n\nA\n\nI3l E\n\n/\\\n\nI\n\nFigure 1: Hierarchical decomposition of the language produced by an encoder, with\nthe first symbols produced appearing near the root of the tree. The ordering of\nleaves in the tree represent the input space, smaller inputs being encoded by those\nsentences on the left. The examples used to train the best decoder found during .\nevolution are highlighted. The decoder must generalize to all other branches. LTI\norder to learn the task, the decoder must generalize systematically to novel states\nin the tree, including generalizing to symbols in different positions in the sequence.\n(Figure 2 shows the sequence of states of a successful decoder.)\n\nis presented with a sequence of inputs (x, 0, 0, ..).At each step, the output units\nof the network assume one of eleven states: all zero if no output is greater than\n0.5 (denoted by $); or the saturation of the two highest activations at 1.0 and the\nremainder at 0.0 (denoted by A = [1,1,0,0,0] through J = [0,0,0,1, 1]). If the zero\noutput is produced, propagation is halted. Otherwise propagation continues for up\nto five steps, after which the output units assume the zero ($) state.\nThe decoder is a recurrent network with 5 input units and a single output, and a\nrecurrent hidden layer. Former work [11] has shown that due to conflicting con(cid:173)\nstraints of the encoder and decoder, it is easier for the decoder to process strings\nwhich are in the reverse order to those produced by the encoder. Consequently,\nthe input to the decoder is taken to be the reverse of the output from the decoder,\nexcept for $, which remains the last symbol.\n(For clarity, strings are written in\nthe order produced by the encoder.) Each input pattern presented to the decoder\nmatches the output of the encoder -\neither two units are active, or none are. The\nnetwork is trained with backpropagation through time to produce the desired value,\nx, on presentation of the final symbol. in the sequence ($).\nA simple hill-climbing evolutionary strategy with a two-stage evaluation function\nis used to evolve an initially random encoder into one which produces a language\nwhich a random decoder can learn easily from few examples. The evaluation of an\nencoder, mutated from the current \"champion\" by the addition of Gaussian noise\nto the weights, is performed against two criteria.\n(1) The mutated network must\nproduce a greater variety of sequences over the range of inputs; and (2) a decoder\nwith initially small random weights, trained on the mutated encoder's output, must\nyield lower sum-squared error across the entire range of inputs than the champion.\n\nEach mutant encoder is paired with a single decoder with initially random weight(cid:173)\ns. If the mutant encoder-decoder pair is more successful than the champion, the\nmutant becomes champion and the ptocess is repeated. Since the encoder's input\nspace is continuous and impossible to examine in its entirety, the input range is\napproximated with 100 uniformly distributed examples from 0.00 to 0.99. The final\noutput from the hill-climber is the language gen~rated by the best encoder found.\n\n\fEvolving Learnable Languages .\n\n69\n\n2.1 Evolving an easily learnable language\n\nHumans learn from sparse data. In the first series of simulations we test whether\na compositional language can be evolved that learners can reliably and effectively\nlearn from only five examples. From just five training examples, it seems unrea(cid:173)\nsonable to expect that any decoder would learn the task. The task is intentionally\nhard in that a language is restricted to sequences of discrete symbols with which\nit must describe a continuous space. Note that simple linear interpolation is not\npossible due to the symbolic alphabet of the languages. Recursive solutions are\npossible but are unable to be learned by an unbiased learner. The decoder is a\nminimally-biased learner and as the simulations showed, performed much better\nthan arguments based on learnability theory would predict.\n\nTen languages were evolved with the hill-climbing algorithm (outlined above) for\n10000 generations. 1 For each language, 100 new random decoders were trained\nunder the same conditions as during evolution (five examples, 400 epochs). All ten\nruns used encoders and decoders with five hidden units.\n\nAll of the evolved languages were learnable by some decoders (minimum 20, max(cid:173)\nimum 72, mean 48). A learner is said to have effectively learned the language if\nits sum-squared-error across the 100 points in the space is less than 1.0.2 Encoders\nemployed on average 36 sentences (minimum 21, maximum 60) to communicate\nthe 100 points. The 5 training examples for each decoder were sampled randomly\nfrom [0, 1] and hence some decoders faced very difficult generalization tasks. The\ndifficulty of the task is demonstrated by the language analyzed in Figures 1 and 2.\nThe evolved languages all contained' similar compositional structure to that of the\nlanguage described in Figures 1 and 2. The inherent biases of the decoder, although\nminimal, are clearly sufficient for learning the compositional structure.\n\n3 Evolving languages for particular generalization\n\nThe first series of simulations demonstrate that we can find languages for which a\nminimally biased learner can generalize from few examples. In the next simulations\nwe consider whether languages can be evolved to facilitate specific forms of general(cid:173)\nization in their users. Section 2.1 considered the case\u00b7 where the decoder's required\noutput was the same as the encoder's input. This setup yields the approximation\nto the line y == x in Figure 2. The compositional structure of the evolved languages\nallows the decoder to generalize to unseen regions of the space.\nIn the following\nseries of simulations we consider the relationship between the structure of a lan(cid:173)\nguage and the way in which the decoder is required to generalize. This association\nis studied by altering the desired relationship between the encoder'~ input (x) and\nthe decoder's output (y).\nTwo sets of ten languages were evolved, one set requiring y = x (identity, as in\nsection 2.1), the other using a function resembling a series of five steps at random\nheights: y == r(L5xJ); r = (0.3746, 0.5753,0.8102,0.7272,0.4527) (random step)3.\nAll conditions were as for section 2, with the exception that 10 training examples\nwere used and the hill-climber ran for 1000 generations. On completion of evolution,\n100 decoders were trained on the 20 final languages under both conditions above as\n\nlOne generation represents the creation of a more variable, mutated e~coder and the\n\nsubsequent training of a decoder.\n\nable to effectively learn it within 400 epochs.\n\n2A language is said to be reliably learnable when at least 50% of random decoders are\n3 L5xJ provides an index into the array r, based on the mag~tude of x.\n\n\f70\n\nTonkes) A. Blair and J. Wiles\n\n(t;~O.;2\n\n0.3\n\n0.4\n\n0.5\n\n0.8 07 01 0.9\n\nt\n\n(a)\n\n(c)\n\n\u00b7:~-q!JL~;l\nJ~,~\"\",.\".,.~~\n\n,{;;I o.\n\noQ 0,\n\n~:l \u00b7o:i~,4cio6 G.:;\n\n(b)\n\n(d)\n\n1,~\n\n'~-'\n\n....--.....---..---..---...---r-_~\n\nFigure 2: Decoder output after seeing the first n symbols in the message, for n == 1\n(a) to n == 6 (f) (from the language in Figure 1). The X-axis is the encoder's input,\nthe Y-axis is the decoder's output at that point in the sequence. The five points\nthat the decoder was trained on are shown as crosses in each graph. After the first\nsymbol (A, B, G, E or $), the decoder outputs one of five values (a); after the\nsecond symbol, more outputs are possible (b). Subsequent symbols in each string\nspecify finer gradations in the output. Note that the output is not constructed\nmonotonically, with each symbol providing a closer approximation to the target\nfunction, but rather recursively, only approximating the linear target at the final\nposition in each sequence. Structure inherent in the sequences allows the system to\ngeneralize to parts of the space it has never seen. Note that the generalization is not\nbased on interpolation between symbol values, but rather on their compositional\nstructure.\n\nwell as two others, a sine function and a cubic function.\n\nThe results show that languages can be evolved to enhance generalization prefer(cid:173)\nentially for one \"world\" over another. On average, the languages performed far\nbetter when tested in the world in which they were evolved than in other worlds.\nLanguages evolved for the identity mapping were on average learned by 64% of\ndecoders trained on the identity task compared with just 5% in the random step\ncase. Languages evolved for the random step task were learned by 60% of decoders\ntrained on the random step task but only 24% when trained on the identity task.\nDecoders generally performed poorly on the cubic function, and no decoder learned\nthe sine task from either set of evolved languages. The second series of simulation(cid:173)\ns show that the manner in which the decoder generalizes is not restricted to the\ntask of section 2.1. Rather, the languages evolve to facilitate generalization by the\ndecoder in different ways, aided by its minimal biases.\n\n\fEvolving Learnable LaliZJ!UGlJ!es\n\n71\n\n4\n\ncore concepts'\n\nIn the former simulations, randomly selected concepts were used to train decoders.\nIn some cases a pathological distribution of points made learning extremely difficult.\nIn contrast, it seems likely that human children learn language based on a common\nset of semantically-constrained core concepts (\"Mom\", \"I want milk\", \"no\", etc).\nFor the third series of simulations, we tested whether selecting a fortuitous set of\ntraining concepts could have a positive affect on the success of an evolved language.\nThe simulations with alternative generalization functions (section 3) indicated that\ndecoders had difficulty generalizing to the sine function. Even when encoders were\nevolved specifically on the sine task, in the best of 10 systems only 13 of 100 random\ndecoders successfully learned.\n\nWe evolved a new language on a specifically chosen set of 10 points for generalization\nto the sine function. One hundred decoders were then trained on the resulting\nlanguage ush\"1.g either the same set of 10 points, or a random set. Of the networks\ntrained on the fixed set, 92 learned the tasked, compared with 5 networks trained\non the random sets. That a language evolves to communicate a restricted set of\nconcepts is not particularly unusual. But what this simulation shows is the more\nsurprising result that a language can evolve to generalize from specific core concepts\nto a whole recursive langUage in a particular way (in this case, a sine function).\n\n5 Discussion\n\nThe first series of simulations show that a compositional language can be learned\nfrom five strings by an recurrent network. Generalization performance included\ncorrect decoding of novel branches and symbols in novel positions (Figure 1). The\nsecond series of simulations highlight how a language can be evolved to facilitate\ndifferent forms of generalization in the decoder. The final simulation demonstrates\nthat languages can also be tailored to generalize from a specific set of examples.\nThe three series of simulations modify the language environment of the decoder in\nthree different ways: (1) the relationship between utterances and meaning; (2) the\ntype of generalization required from the decoder; and (3) the particular utterances\nand meanings to which a learner is exposed. In each case, the language environment\nof the learner was sculpted to exploit the minimal biases present in the learner.\nWhile taking an approach similar to [10] of giving the learner' an additional bias\nin the form of initial weights was also likely to have been effective, the purpose\nof the simulations was to investigate how strongly external factors could assist in\nsimplifying learning.\n\n6 Conclusions\n\n\"The key to understanding language learnability does not lie in\nthe richly social context of language training, nor _in the incredi(cid:173)\nbly prescient guesses of young language learners; rather, it lies in\na process that seems otherwise far remote from the microcosm of\ntoddlers and caretakers -\nlanguage change. Although the rate of\nsocial evolutionary change in learning structure appears unchang~\ning compared to the time it takes a child to develop language a(cid:173)\nbilities, this process is crucial to understanding how the child can\nlearn a language that on the surface appears impossibly complex\nand poorly taught.\" [3, p115].\n\n\f72\n\nB. Tonkes, A. Blair and J. Wiles\n\nIn this paper we studied ways in which languages can adapt to their learners.\nrunning simulations of a language evolution process, 'We contribute additional com(cid:173)\nponents to the list of aspects of language that can be learned by minimally-biased,\ngeneral-purpose learners, namely that recursive structure can be learned from few\nexamples, that languages can evolve to facilitate generalization in a particular way,\nIn al(cid:173)\nand that they can evolve to be easily learnable from common sentences.\nl the simulations in this paper, enhancement of language learnability is achieved\nthrough changes to the learner's environment without resorting to adding biases in\nthe language acquisition device.\n\nThis work was supported by an APA to Bradley Tonkes, a UQ PostdoCtoral Fel(cid:173)\nlowship to Alan Blair and an ARC grant to Janet Wiles.\n\nReferences\n[1] N. Chomsky. Language and Mind. Harcourt, Brace, New York, 1968.\n[2] J. L. Elman, E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plun(cid:173)\nkett. Rethinking Innateness: A Connectionist Perspective on Development. MIT\nPress, Boston, 1996.\n\n[3] T. W. Deacon. The Symbolic Species: The Co-Evolution of Language and~the Brain.\n\nW. W. Norton and Company, New York, 1997.\n\n[4] S. Kirby. Fitness and the selective adaptation of language. In J. Hurford, C. Knight,\nand M. Studdert-Kennedy, editors, Approaches to the Evolution of Language. Cam(cid:173)\nbridge University Press, Cambridge, 1998.\n\n[5] M. H. Christiansen. Language as an organism -\n\nimplications for the evolution and\n\nacquisition of language. Unpublished manuscript, February 1995.\n\n[6] M. Hare and J. L. Elman. Learning and morphological change. Cognition, 56:61-98,\n\n1995.\n\n[7] S. Kirby. Syntax without natural selection: How compositionality emerges from\nIn C. Knight, J. Hurford, and M.. Studdert(cid:173)\nvocabulary in a population of learners.\nKennedy, editors, The Evolutionary Emergence of Language: Social function and the\norigins of linguistic form. Cambridge University Press, Cambridge, 1999.\n\n[8] J. Batali. Computational simulations of the emergence of grammar. In J. Hurford,\nC. Knight, and M. Studdert-Kennedy, editors, Approaches to the Evolution of Lan(cid:173)\nguage, pages 405-426. Cambridge University Press, Cambridge, 1998..\n\n[9] J. L. Elman. Learning and development in neural networks: The importance of\n\nstarting smalL Cognition, 48:71-99, 1993.\n\n[10] J. Batali. Innate biases and critical periods: Combining evolution and learning in the\nacquisition of syntax. In R. Brooks and P. Maes, editors, Proceedings of the Fourth\nArtificial Life Workshop, pages 160-171. MIT Press, 1994.\n\n[11] B. Tonkes, A. Blair, and J. Wiles. A paradox of neural encoders and decoders, Of,\nIn B. McKay, X. Yao, C. S. Newton, J. -H. Kim,\nwhy don't we talk backwards?\nand T. Furuhashi, editors, Simulated Evolution and Learning, volume 1585 of Lecture\nNotes in Artificial Intelligence. Springer, 1999.\n\n\f\f", "award": [], "sourceid": 1782, "authors": [{"given_name": "Bradley", "family_name": "Tonkes", "institution": null}, {"given_name": "Alan", "family_name": "Blair", "institution": null}, {"given_name": "Janet", "family_name": "Wiles", "institution": null}]}