{"title": "Bayesian model learning in human visual perception", "book": "Advances in Neural Information Processing Systems", "page_first": 1043, "page_last": 1050, "abstract": "", "full_text": "Bayesian model learning in\nhuman visual perception\n\nGerg\u02ddo Orb\u00b4an\n\nCollegium Budapest\n\nInstitute for Advanced Study\n\n2 Szenth\u00b4aroms\u00b4ag utca, Budapest,\n\nJ\u00b4ozsef Fiser\n\nDepartment of Psychology and\n\nVolen Center for Complex Systems\n\nBrandeis University\n\n1014 Hungary\n\nWaltham, Massachusetts 02454, USA\n\nogergo@colbud.hu\n\nfiser@brandeis.edu\n\nRichard N. Aslin\n\nDepartment of Brain and Cognitive\nSciences, Center for Visual Science\n\nUniversity of Rochester\n\nRochester, New York 14627, USA\naslin@cvs.rochester.edu\n\nM\u00b4at\u00b4e Lengyel\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London\n\n17 Queen Square, London WC1N 3AR\n\nUnited Kingdom\n\nlmate@gatsby.ucl.ac.uk\n\nAbstract\n\nHumans make optimal perceptual decisions in noisy and ambiguous\nconditions. Computations underlying such optimal behavior have been\nshown to rely on probabilistic inference according to generative models\nwhose structure is usually taken to be known a priori. We argue that\nBayesian model selection is ideal for inferring similar and even more\ncomplex model structures from experience. We \ufb01nd in experiments that\nhumans learn subtle statistical properties of visual scenes in a completely\nunsupervised manner. We show that these \ufb01ndings are well captured by\nBayesian model learning within a class of models that seek to explain\nobserved variables by independent hidden causes.\n\n1 Introduction\n\nThere is a growing number of studies supporting the classical view of perception as prob-\nabilistic inference [1, 2]. These studies demonstrated that human observers parse sensory\nscenes by performing optimal estimation of the parameters of the objects involved [3, 4, 5].\nEven single neurons in primary sensory cortices have receptive \ufb01eld properties that seem to\nsupport such a computation [6]. A core element of this Bayesian probabilistic framework is\nan internal model of the world, the generative model, that serves as a basis for inference. In\nprinciple, inference can be performed on several levels: the generative model can be used\nfor inferring the values of hidden variables from observed information, but also the model\nitself may be inferred from previous experience [7].\n\nMost previous studies testing the Bayesian framework in human psychophysical experi-\nments used highly restricted generative models of perception, usually consisting of a few\n\n\fobserved and latent variables, of which only a limited number of parameters needed to\nbe adjusted by experience. More importantly, the generative models considered in these\nstudies were tailor-made to the speci\ufb01c pscychophysical task presented in the experiment.\nThus, it remains to be shown whether more \ufb02exible, \u2018open-ended\u2019 generative models are\nused and learned by humans during perception.\n\nHere, we use an unsupervised visual learning task to show that a general class of gener-\native models, sigmoid belief networks (SBNs), perform similarly to humans (also repro-\nducing paradoxical aspects of human behavior), when not only the parameters of these\nmodels but also their structure is subject to learning. Crucially, the applied Bayesian model\nlearning embodies the Automatic Occam\u2019s Razor (AOR) effect that selects the models that\nare \u2018as simple as possible, but no simpler\u2019. This process leads to the extraction of inde-\npendent causes that ef\ufb01ciently and suf\ufb01ciently account for sensory experience, without a\npre-speci\ufb01cation of the number or complexity of potential causes.\n\nIn section 2, we describe the experimental protocol we used in detail. Next, the mathemat-\nical framework is presented that is used to study model learning in SBNs (Section 3). In\nSection 4, experimental results on human performance are compared to the prediction of\nour Bayes-optimal model learning in the SBN framework. All the presented human experi-\nmental results were reproduced and had identical roots in our simulations: the modal model\ndeveloped latent variables corresponding to the unknown underlying causes that generated\nthe training scenes.\n\nIn Section 5, we discuss the implications of our \ufb01ndings. Although structure and parame-\nter learning are not fundamentally different computations in Bayesian inference, we argue\nthat the natural integration of these two kinds of learning lead to a behavior that accounts\nfor human data which cannot be reproduced in some simpler alternative learning models\nwith parameter but without structure learning. Given the recent surge of biologically plau-\nsible neural network models performing inference in belief networks we also point out\nchallenges that our \ufb01ndings present for future models of probabilistic neural computations.\n\n2 Experimental paradigm\n\nHuman adult subjects were trained and then tested in an unsupervised learning paradigm\nwith a set of complex visual scenes consisting of 6 of 12 abstract unfamiliar black shapes\narranged on a 3x3 (Exp 1) or 5x5 (Exps 2-4) white grid (Fig. 1, left panel). Unbeknownst to\nsubjects, various subsets of the shapes were arranged into \ufb01xed spatial combinations (com-\nbos) (doublets, triplets, quadruplets, depending on the experiment). Whenever a combo\nappeared on a training scene, its constituent shapes were presented in an invariant spatial\narrangement, and in no scenes elements of a combo could appear without all the other ele-\nments of the same combo also appearing. Subjects were presented with 100\u2013200 training\nscenes, each scene was presented for 2 seconds with a 1-second pause between scenes. No\nspeci\ufb01c instructions were given to subjects prior to training, they were only asked to pay\nattention to the continuous sequence of scenes.\n\nThe test phase consisted of 2AFC trials, in which two arrangements of shapes were shown\nsequentially in the same grid that was used in the training, and subjects were asked which\nof the two scenes was more familiar based on the training. One of the presented scenes\nwas either a combo that was actually used for constructing the training set (true combo), or\na part of it (embedded combo) (e.g., a pair of adjacent shapes from a triplet or quadruplet\ncombo). The other scene consisted of the same number of shapes as the \ufb01rst scene in\nan arrangement that might or might not have occurred during training, but was in fact a\nmixture of shapes from different true combos (mixture combo).\n\nHere four experiments are considered that assess various aspects of human observational\n\n\fFigure 1: Experimental design (left panel) and explanation of graphical model parameters\n(right panel).\n\nlearning, the full set of experiments are presented elsewhere [8, 9]. Each experiment was\nrun with 20 na\u00a8\u0131ve subjects.\n\n1. Our \ufb01rst goal was to establish that humans are sensitive to the statistical struc-\nture of visual experience, and use this experience for judging familiarity. In the\nbaseline experiment 6 doublet combos were de\ufb01ned, three of which were pre-\nsented simultaneously in any given training scene, allowing 144 possible scenes\n[8]. Because the doublets were not marked in any way, subjects saw only a group\nof random shapes arranged on a grid. The occurrence frequency of doublets and\nindividual elements was equal across the set of scenes, allowing no obvious bias\nto remember any element more than others. In the test phase a true and a mixture\ndoublet were presented sequentially in each 2AFC trial. The mixture combo was\npresented in a spatial position that had never appeared before.\n\n2. In the previous experiment the elements of mixture doublets occurred together\nfewer times than elements of real doublets, thus a simple strategy based on track-\ning co-occurrence frequencies of shape-pairs would be suf\ufb01cient to distinguish\nbetween them. The second, frequency-balanced experiment tested whether hu-\nmans are sensitive to higher-order statistics (at least cross-correlations, which are\nco-occurence frequencies normalized by respective invidual occurence frequen-\ncies).\nThe structure of Experiment 1 was changed so that while the 6 doublet combo ar-\nchitecture remained, their appearance frequency became non-uniform introducing\nfrequent and rare combos. Frequent doublets were presented twice as often as rare\nones, so that certain mixture doublets consisting of shapes from frequent doublets\nappeared just as often as rare doublets. Note, that the frequency of the constituent\nshapes of these mixture doublets was higher than that of rare doublets. The train-\ning session consisted of 212 scenes, each scene being presented twice. In the test\nphase, the familiarity of both single shapes and doublet combos was tested. In the\ndoublet trials, rare combos with low appearance frequency but high correlations\nbetween elements were compared to mixed combos with higher element and equal\npair appearance frequency, but lower correlations between elements.\n\n3. The third experiment tested whether human performance in this paradigm can\nbe fully accounted for by learning cross-correlations. Here, four triplet combos\nwere formed and presented with equal occurrence frequencies. 112 scenes were\npresented twice to subjects. In the test phase two types of tests were performed. In\nthe \ufb01rst type, the familiarity of a true triplet and a mixture triplet was compared,\nwhile in the second type doublets consisting of adjacent shapes embedded in a\ntriplet combo (embedded doublet) were tested against mixture doublets.\n\n4. The fourth experiment compared directly how humans treat embedded and in-\ndependent (non-embedded) combos of the same spatial dimensions. Here two\n\n11w12w22w24wx2wx1y1x1y3y4y2wy1wy4wy2wy3x2w23w\fquadruplet combos and two doublet combos were de\ufb01ned and presented with\nequal frequency. Each training scene consisted of six shapes, one quadruplet and\none doublet. 120 such scenes were constructed. In the test phase three types of\ntests were performed. First, true quadruplets were compared to mixture quadru-\nplets; next, embedded doublets were compared to mixture doublets, \ufb01nally true\ndoublets were compared to mixture doublets.\n\n3 Modeling framework\n\nThe goal of Bayesian learning is to \u2018reverse-engineer\u2019 the generative model that could have\ngenerated the training data. Because of inherent ambiguity and stochasticity assumed by the\ngenerative model itself, the objective is to establish a probability distribution over possible\nmodels. Importantly, because models with parameter spaces of different dimensionality are\ncompared, the likelihood term (Eq. 3) will prefer the simplest model (in our case, the one\nwith fewest parameters) that can effectively account for (generate) the training data due to\nthe AOR effect in Bayesian model comparison [7].\n\nSigmoid belief networks The class of generative models we consider is that of two-layer\nsigmoid belief networks (SBNs, Fig. 1). The same modelling framework has been success-\nfully aplied to animal learning in classical conditioning [10, 11]. The SBN architecture\nassumes that the state of observed binary variables (yj, in our case: shapes being present\nor absent in a training scene) depends through a sigmoidal activation function on the state\nof a set of hidden binary variables (x), which are not directly observable:\n\n!!\u22121\n\n \n\n \n\u2212X\n\ni\n\nP (yj = 1|x, wm, m) =\n\n1 + exp\n\nwijxi \u2212 wyj\n\n(1)\n\nwhere wij describes the (real-valued) in\ufb02uence of hidden variable xi on observed vari-\nable yj, wyj determines the spontaneous activation bias of yj, and m indicates the model\nstructure, including the number of latent variables and identity of the observeds they can\nin\ufb02uence (the wij weights that are allowed to have non-zero value).\nObserved variables are independent conditioned on the latents (i.e. any correlation between\nthem is assumed to be due to shared causes), and latent variables are marginally indepen-\ndent and have Bernoulli distributions parametrised by wx:\n\nP (yj|x, wm, m) , P (x|wm, m) =Y\n\nP (y|x, wm, m) =Y\n\n(1 + exp (\u22121xiwxi))\u22121\n(2)\nFinally, scenes (y(t)) are assumed to be iid samples from the same generative distribution,\nand so the probability of the training data (D) given a speci\ufb01c model is:\ny(t)\nj\n\nP (D|wm, m) =Y\n\n=Y\n\ny(t)|wm, m\n\nX\n\nY\n\n, x|wm, m\n\n(cid:16)\n\ni\n\nP\n\n(cid:16)\n\n(cid:17)\n\n(cid:17)\n\n(3)\n\nj\n\nP\n\nt\n\nt\n\nx\n\nj\n\nThe \u2018true\u2019 generative model that was actually used for generating training data in the ex-\nperiments (Section 2) is closely related to this model, with the combos corresponding to\nlatent variables. The main difference is that here we ignore the spatial aspects of the task,\ni.e. only the occurrence of a shape matters but not where it appears on the grid. Although in\ngeneral, space is certainly not a negligible factor in vision, human behavior in the present\nexperiments depended on the fact of shape-appearances suf\ufb01ciently strongly so that this\nsimpli\ufb01cation did not cause major confounds in our results.\n\nA second difference between the model and the human experiments was that in the exper-\niments, combos were not presented completely randomly, because the number of combos\n\n\fper scene was \ufb01xed (and not binomially distributed as implied by the model, Eq. 2). Nev-\nertheless, our goal was to demonstrate the use of a general-purpose class of generative\nmodels, and although truly independent causes are rare in natural circumstances, always a\n\ufb01xed number of them being present is even more so. Clearly, humans are able to capture\ndependences between latent variables, and these should be modeled as well ([12]). Simi-\nlarly, for simplicity we also ignored that subsequent scenes are rarely independent (Eq. 3)\nin natural vision.\n\nTraining Establishing the posterior probability of any given model is straightforward\nusing Bayes\u2019 rule:\n\nP (wm, m|D) \u221d P (D|wm, m) P (wm, m)\n\n(4)\nwhere the \ufb01rst term is the likelihood of the model (Eq. 3), and the second term is the prior\ndistribution of models. Prior distributions for the weights were: P (wij) = Laplace (12, 2),\n\n(cid:1) = \u03b4 (\u22126). The prior over model structure preferred\n\nP (wxi) = Laplace (0, 2), P(cid:0)wxj\n\nsimple models and was such that the distributions of the number of latents and of the\nnumber of links conditioned on the number of latents were both Geometric (0.1). The\neffect of this preference is \u2018washed out\u2019 with increasing training length as the likelihood\nterm (Eq. 3) sharpens.\n\nTesting When asked to compare the familiarity of two scenes (yA and yB) in the testing\nphase, the optimal strategy for subjects would be to compute the posterior probability of\nboth scenes based on the training data\n\nX\n\nP(cid:0)yZ, x|wm, m(cid:1) P (wm, m|D)\n\nP(cid:0)yZ|D(cid:1) =X\n\nZ\n\ndwm\n\nm\n\nx\n\nand always (ie, with probability one) choose the one with the higher probability. However,\nas a phenomenological model of all kinds of possible sources of noise (sensory noise,\nmodel noise, etc) we chose a soft threshold function for computing choice probability:\n\n(5)\n\n(6)\n\n \n\n \n\n!!\u22121\n\nP(cid:0)yA|D(cid:1)\n\nP (yB|D)\n\nP (choose A) =\n\n1 + exp\n\n\u2212\u03b2 log\n\nand used \u03b2 = 1 (\u03b2 = \u221e corresponds to the optimal strategy).\nNote that when computing the probability of a test scene, we seek the probability that\nexactly the given scene was generated by the learned model. This means that we require\nnot only that all the shapes that are present in the test scene are present in the generated\ndata, but also that all the shapes that are absent from the test scene are absent from the\ngenerated data. A different scheme, in which only the presence but not the absence of the\nshapes need to be matched (i.e. absent observeds are marginalized out just as latents are in\nEq. 5) could also be pursued, but the results of the embedding experiments (Exp. 3 and 4,\nsee below) discourage it.\n\nThe model posterior in Eq. 4 is analytically intractable, therefore an exchange reversible-\njump Markov chain Monte Carlo sampling method [10, 13, 14] was applied, that ensured\nfair sampling from a model space containing subspaces of differring dimensionality, and\nintegration over this posterior in Eq. 5 was approximated by a sum over samples.\n\n4 Results\n\nPilot studies were performed with reduced training datasets in order to test the performance\nof the model learning framework. First, we trained the model on data consisting of 8 ob-\nserved variables (\u2018shapes\u2019). The 8 \u2018shapes\u2019 were partitioned into three \u2018combos\u2019 of different\n\n\fFigure 2: Bayesian learning in sigmoid belief networks. Left panel: MAP model of a 30-\ntrial-long training with 8 observed variables and 3 combos. Latent variables of the MAP\nmodel re\ufb02ect the relationships de\ufb01ned by the combos. Right panel: Increasing model\ncomplexity with increasing training experience. Average number of latent variables (\u00b1SD)\nin the model posterior distribution as a function of the length of training data was obtained\nby marginalizing Eq. 4 over weights w.\n\nsizes (5, 2, 1), two of which were presented simultaneously in each training trial. The AOR\neffect in Bayesian model learning should select the model structure that is of just the right\ncomplexity for describing the data. Accordingly, after 30 trials, the maximum a posteri-\nori (MAP) model had three latents corresponding to the underlying \u2018combos\u2019 (Fig. 2, left\npanel). Early on in training simpler model structures dominated because of the prior pref-\nerence for low latent and link numbers, but due to the simple structure of the training data\nthe likelihood term won over in as few as 10 trials, and the model posterior converged to\nthe true generative model (Fig. 2, right panel, gray line). Importantly, presenting more data\nwith the same statistics did not encourage the \ufb01tting of over-complicated model structures.\nOn the other hand, if data was generated by using more \u2018combos\u2019 (4 \u2018doublets\u2019), model\nlearning converged to a model with a correspondingly higher number of latents (Fig. 2,\nright panel, black line).\n\nIn the baseline experiment (Experiment 1) human subjects were trained with six equal-\nsized doublet combos and were shown to recognize true doublets over mixture doublets\n(Fig. 3, \ufb01rst column). When the same training data was used to compute the choice proba-\nbility in 2AFC tests with model learning, true doublets were reliably preferred over mixture\ndoublets. Also, the MAP model showed that the discovered latent variables corresponded\nto the combos generating the training data (data not shown).\n\nIn Experiment 2, we sought to answer the question whether the statistical learning demon-\nstrated in Experiment 1 was solely relying on co-occurrence frequencies, or was us-\ning something more sophisticated, such as at least cross-correlations between shapes.\nBayesian model learning, as well as humans, could distinguish between rare doublet com-\nbos and mixtures from frequent doublets (Fig. 3, second column) despite their balanced\nco-occurrence frequencies. Furthermore, although in this comparison rare doublet combos\nwere preferred, both humans and the model learned about the frequencies of their con-\nstituent shapes and preferred constituent single shapes of frequent doublets over those of\nrare doublets. Nevertheless, it should be noted that while humans showed greater pref-\nerence for frequent singlets than for rare doublets our simulations predicted an opposite\ntrend1.\nWe were interested whether the performance of humans could be fully accounted for by\nthe learning of cross-correlations, or they demonstrated more sophisticated computations.\n\n1This discrepancy between theory and experiments may be explained by Gestalt effects in human\nvision that would strongly prefer the independent processing of constituent shapes due to their clear\nspatial separation in the training scenes. The reconciliation of such Gestalt effects with pure statistical\nlearning is the target of further investigations.\n\n.\u22126\u22126\u22126\u22126131212\u22121\u22126\u22126\u22126\u221261213121312\u22121\u22121.010203001234Training lengthAvarage latent #\fExperiment 1\n\nExperiment 2\n\nExperiment 3\n\nExperiment 4\n\nI\n\nT\nN\nE\nM\nR\nE\nP\nX\nE\n\nL\nE\nD\nO\nM\n\nFigure 3: Comparison of human and model performance in four experiments. Bars show\npercent \u2018correct\u2019 values (choosing a true or embedded combo over a mixture combo, or a\nfrequent singlet over a rare singlet) for human experiments (average over subjects \u00b1SEM),\nand \u2018correct\u2019 choice probabilities (Eq. 6) for computer simulations. Sngls: Single shapes;\ndbls: Doublet combos; trpls: triplet combos; e\u2019d dbls: embedded doublet combos; qpls:\nquadruple combos; idbls: independent doublet combos.\n\nIn Experiment 3, training data was composed of triplet combos, and beside testing true\ntriplets against mixture triplets, we also tested embedded doublets (pairs of shapes from\nthe same triplet) against mixture doublets (pairs of shapes from different triplets). If learn-\ning only depends on cross-correlations, we expect to see similar performance on these\ntwo types of tests. In contrast, human performace was signi\ufb01cantly different for triplets\n(true triplets were preferred) and doublets (embedded and mixture doublets were not dis-\ntinguished) (Fig. 3, third column). This may be seen as Gestalt effects being at work: once\nthe \u2018whole\u2019 triplet is learned, its constituent parts (the embedded doublets) loose their sig-\nni\ufb01cance. Our model reproduced this behavior and provided a straightforward explanation:\nlatent-to-observed weights (wij) in the MAP model were so strong that whenever a latent\nwas switched on it could almost only produce triplets, therefore doublets were created by\nspontaneous independent activation of observeds which thus produced embedded and mix-\nture doublets with equal chance. In other words, doublets were seen as mere noise under\nthe MAP model.\n\nThe fourth experiment tested explicitly whether embedded combos and equal-sized inde-\npendent real combos are distinguished and not only size effects prevented the recognition\nof embedded small structures in the previous experiment. Both human experiments and\nBayesian model selection demonstrated that quadruple combos as well as stand-alone dou-\nblets were reliably recognized (Fig. 3, fourth column), while embedded doublets were not.\n\n5 Discussion\n\nWe demonstrated that humans \ufb02exibly yet automatically learn complex generative models\nin visual perception. Bayesian model learning has been implicated in several domains of\nhigh level human cognition, from causal reasoning [15] to concept learning [16]. Here we\nshowed it being at work already at a pre-verbal stage.\n\nWe emphasized the importance of learning the structure of the generative model, not only\nits parameters, even though it is quite clear that the two cannot be formally distinguished.\nNevertheless we have two good reasons to believe that structure learning is indeed impor-\n\ndbls020406080100Percent correctdblssngls020406080100trplse\u2019d dbls020406080100qplsidblse\u2019d dbls020406080100dbls020406080100Percent correctdblssngls020406080100trplse\u2019d dbls020406080100qplsidblse\u2019d dbls020406080100\ftant in our case. (1) Sigmoid belief networks identical to ours but without structure learning\nhave been shown to perform poorly on a task closely related to ours [17], F\u00a8oldi\u00b4ak\u2019s bar test\n[18]. More complicated models will of course be able to produce identical results, but we\nthink our model framework has the advantage of being intuitively simple: it seeks to \ufb01nd\nthe simplest possible explanation for the data assuming that it was generated by indepen-\ndent causes. (2) Structure learning allows Occam\u2019s automatic razor to come to play. This is\ncomputationally expensive, but together with the generative model class we use provides a\nneat and highly ef\ufb01cient way to discover \u2018independent components\u2019 in the data. We expe-\nrienced dif\ufb01culties with other models [17] developed for similar purposes when trying to\nreproduce our experimental \ufb01ndings.\n\nOur approach is very much in the tradition that sees the \ufb01nding of independent causes be-\nhind sensory data as one of the major goals of perception [2]. Although neural network\nmodels that can produce such computations exist [6, 19], none of these does model selec-\ntion. Very recently, several models have been proposed for doing inference in belief net-\nworks [20, 21] but parameter learning let alone structure learning proved to be non-trivial\nin them. Our results highlight the importance of considering model structure learning in\nneural models of Bayesian inference.\n\nAcknowledgements\n\nWe were greatly motivated by the earlier work of Aaron Courville and Nathaniel Daw\n[10, 11], and hugely bene\ufb01ted from several useful discussions with them. We would also\nlike to thank the insightful comments of Peter Dayan, Maneesh Sahani, Sam Roweis, and\nZolt\u00b4an Szatm\u00b4ary on an earlier version of this work. This work was supported by IST-FET-\n1940 program (GO), NIH research grant HD-37082 (RNA, JF), and the Gatsby Charitable\nFoundation (ML).\n\nReferences\n\n[1] Helmholtz HLF. Treatise on Physiological Optics. New York: Dover, 1962.\n[2] Barlow HB. Vision Res 30:1561, 1990.\n[3] Ernst MO, Banks MS. Nature 415:429, 2002.\n[4] K\u00a8ording KP, Wolpert DM. Nature 427:244, 2004.\n[5] Kersten D, et al. Annu Rev Psychol 55, 2004.\n[6] Olshausen BA, Field DJ. Nature 381:607, 1996.\n[7] MacKay DJC. Network: Comput Neural Syst 6:469, 1995.\n[8] Fiser J, Aslin RN. Psych Sci 12:499, 2001.\n[9] Fiser J, Aslin RN. J Exp Psychol Gen , in press.\n[10] Courville AC, et al. In NIPS 16 , Cambridge, MA, 2004. MIT Press.\n[11] Courville AC, et al. In NIPS 17 , Cambridge, MA, 2005. MIT Press.\n[12] Hinton GE, et al. In Arti\ufb01cial Intelligence and Statistics , Barbados, 2005.\n[13] Green PJ. Biometrika 82:711, 1995.\n[14]\n[15] Tenenbaum JB, Grif\ufb01ths TL. In NIPS 15 , 35, Cambridge, MA, 2003. MIT Press.\n[16] Tenenbaum JB. In NIPS 11 , 59, Cambridge, MA, 1999. MIT Press.\n[17] Dayan P, Zemel R. Neural Comput 7:565, 1995.\n[18] F\u00a8oldiak P. Biol Cybern 64:165, 1990.\n[19] Dayan P, et al. Neural Comput 7:1022, 1995.\n[20] Rao RP. Neural Comput 16:1, 2004.\n[21] Deneve S. In NIPS 17 , Cambridge, MA, 2005. MIT Press.\n\nIba Y. Int J Mod Phys C 12:623, 2001.\n\n\f", "award": [], "sourceid": 2868, "authors": [{"given_name": "Gerg\u0151", "family_name": "Orb\u00e1n", "institution": null}, {"given_name": "Jozsef", "family_name": "Fiser", "institution": null}, {"given_name": "Richard", "family_name": "Aslin", "institution": null}, {"given_name": "M\u00e1t\u00e9", "family_name": "Lengyel", "institution": null}]}