{"title": "Mapping paradigm ontologies to and from the brain", "book": "Advances in Neural Information Processing Systems", "page_first": 1673, "page_last": 1681, "abstract": "Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied  by the activation of brain regions, it is necessary to combine a wide exploration of the various brain functions and some inversion of the statistical inference. Here we introduce a methodology for accumulating knowledge towards a bidirectional link between observed brain activity and the corresponding function. We rely on a large corpus of imaging studies and a predictive engine. Technically, the challenges are to find commonality between the studies without denaturing the richness of the corpus. The key elements that we contribute are labeling the tasks performed with a cognitive ontology, and modeling the long tail of rare paradigms in the corpus. To our knowledge, our approach is the first demonstration of predicting the cognitive content of completely new brain images. To that end, we propose a method that predicts the experimental paradigms across different studies.", "full_text": "Mapping cognitive ontologies to and from the brain\n\nYannick Schwartz, Bertrand Thirion, and Gael Varoquaux\n\nParietal Team, Inria Saclay Ile-de-France\n\nSaclay, France\n\nfirstname.lastname@inria.fr\n\nAbstract\n\nImaging neuroscience links brain activation maps to behavior and cognition via\ncorrelational studies. Due to the nature of the individual experiments, based on\neliciting neural response from a small number of stimuli, this link is incomplete,\nand unidirectional from the causal point of view. To come to conclusions on the\nfunction implied by the activation of brain regions, it is necessary to combine a\nwide exploration of the various brain functions and some inversion of the statis-\ntical inference. Here we introduce a methodology for accumulating knowledge\ntowards a bidirectional link between observed brain activity and the correspond-\ning function. We rely on a large corpus of imaging studies and a predictive engine.\nTechnically, the challenges are to \ufb01nd commonality between the studies without\ndenaturing the richness of the corpus. The key elements that we contribute are\nlabeling the tasks performed with a cognitive ontology, and modeling the long\ntail of rare paradigms in the corpus. To our knowledge, our approach is the \ufb01rst\ndemonstration of predicting the cognitive content of completely new brain images.\nTo that end, we propose a method that predicts the experimental paradigms across\ndifferent studies.\n\n1\n\nIntroduction\n\nFunctional brain imaging, in particular fMRI, is the workhorse of brain mapping, the systematic\nstudy of which areas of the brain are recruited during various experiments. To date, 33K papers on\npubmed mention \u201cfMRI\u201d, revealing an accumulation of activation maps related to speci\ufb01c tasks or\ncognitive concepts. From this literature has emerged the notion of brain modules specialized to a\ntask, such as the celebrated fusiform face area (FFA) dedicated to face recognition [1]. However,\nthe link between the brain images and high-level notions from psychology is mostly done manually,\ndue to the lack of co-analysis framework. The challenges in quantifying observations across experi-\nments, let alone at the level of the literature, leads to incomplete pictures and well-known fallacies.\nFor instance a common trap is that of reverse inferences [2]: attributing a cognitive process to a\nbrain region, while the individual experiments can only come to the conclusion that it is recruited\nby the process under study, and not that the observed activation of the region demonstrates the en-\ngagement of the cognitive process. Functional speci\ufb01city can indeed only be measured by probing a\nlarge variety of functions, which exceeds the scale of a single study. Beyond this lack of speci\ufb01city,\nindividual studies are seldom comprehensive, in the sense that they do not recruit every brain region.\nPrior work on such large scale cognitive mapping of the brain has mostly relied on coordinate-based\nmeta-analysis, that forgo activation maps and pool results across publications via the reported Ta-\nlairach coordinates of activation foci [3, 4]. While the underlying thresholding of statistical maps\nand extraction of local maxima leads to a substantial loss of information, the value of this approach\nlies in the large amount of studies covered: Brainmap [3], that relies on manual analysis of the\nliterature, comprises 2 298 papers, while Neurosynth [4], that uses text mining, comprises 4 393\npapers. Such large corpuses can be used to evaluate the occurrence of the cognitive and behavioral\n\n1\n\n\fterms associated with activations and formulate reverse inference as a Bayesian inversion on stan-\ndard (forward) fMRI inference [2, 4]. On the opposite end of the spectrum, [5] shows that using a\nmachine-learning approach on studies with different cognitive content can predict this content from\nthe images, thus demonstrating principled reverse inference across studies. Similarly, [6] have used\nimage-based classi\ufb01cation to challenge the vision that the FFA is by itself speci\ufb01c of faces. Two\ntrends thus appear in the quest for explicit correspondences between brain regions and cognitive\nconcepts. One is grounded on counting term frequency on a large corpus of studies described by\ncoordinates. The other uses predictive models on images. The \ufb01rst approach can better de\ufb01ne func-\ntional speci\ufb01city by avoiding the sampling bias inherent to small groups of studies; however each\nstudy in a coordinate-based meta-analysis brings only very limited spatial information [7].\nOur purpose here is to outline a strategy to accumulate knowledge from a brain functional image\ndatabase in order to provide grounds for principled bidirectional reasoning from brain activation\nto behavior and cognition. To increase the breadth in co-analysis and scale up from [5], which\nused only 8 studies with 22 different cognitive concepts, we have to tackle several challenges. A\n\ufb01rst challenge is to \ufb01nd commonalities across studies, without which we face the risk of learning\nidiosyncrasies of the protocols. For this very reason we choose to describe studies with terms that\ncome from a cognitive paradigm ontology instead of a high-level cognitive process one. This setting\nenables not only to span the terms across all the studies, but also to use atypical studies that do\nnot clearly share cognitive processes. A second challenge is that of diminishing statistical power\nwith increasing number of cognitive terms under study. Finally, a central goal is to ensure some\nsort of functional speci\ufb01city, which is hindered by the data scarcity and ensuing biases in an image\ndatabase.\nIn this paper, we gather 19 studies, comprising 131 different conditions, which we labeled with\n19 different terms describing experimental paradigms. We perform a brain mapping experiment\nacross these studies, in which we consider both forward and reverse inference. Our contributions\nare two-fold: on the one hand we show empirical results that outline speci\ufb01c dif\ufb01culties of such\nco-analysis, on the second hand we introduce a methodology using image-based classi\ufb01cation and\na cognitive-paradigm ontology that can scale to large set of studies. The paper is organized as\nfollowing. In section 2, we introduce our methodology for establishing correspondence between\nstudies and performing forward and reverse inference across them. In section 3, we present our data,\na corpus of studies and the corresponding paradigm descriptions. In section 4 we show empirically\nthat our approach can predict these descriptions in unseen studies, and that it gives promising maps\nfor brain mapping. Finally, in section 5, we discuss the empirical \ufb01ndings in the wider context of\nmeta-analyses.\n\n2 Methodology: annotations, statistics and learning\n\n2.1 Labeling activation maps with common terms across studies\n\nA standard task-based fMRI study results in activation maps per subject that capture the brain re-\nsponse to each experimental condition. They are combined to single out responses to high-level\ncognitive functions in so-called contrast maps, for which the inference is most often performed at\nthe group level, across subjects. These contrasts can oppose different experimental conditions, some\nto capture the effect of interest while others serve to cancel out non-speci\ufb01c effects. For example,\nto highlight computation processes, one might contrast visual calculation with visual sentences, to\nsuppress the effect of the stimulus modality (visual instructions), and the explicit stimulus (reading\nthe numbers).\nWhen considering a corpus of different studies, \ufb01nding correspondences between the effects high-\nlighted by the contrasts can be challenging. Indeed, beyond classical localizers, capturing only very\nwide cognitive domains, each study tends to investigate fairly unique questions, such as syntactic\nstructure in language rather than language in general [8]. Combining the studies requires engineer-\ning meta-contrasts across studies. For this purpose, we choose to affect a set of terms describing\nthe content of each condition. Indeed, there are important ongoing efforts in cognitive science and\nneuroscience to organize the scienti\ufb01c concepts into formal ontologies [9]. Taking the ground-level\nobjects of these gives a suitable family of terms, a taxonomy to describe the experiments.\n\n2\n\n\f2.2 Forward inference: which regions are recruited by tasks containing a given term?\n\nArmed with the term labels, we can use the standard fMRI analysis framework and ask using a\nGeneral Linear Model (GLM) across studies for each voxels of the subject-level activation images if\nit is signi\ufb01cantly-related to a term in the corpus of images. If x \u2208 Rp is the observed activation map\nwith p voxels, the GLM tests P(xi (cid:54)= 0|T ) for each voxel i and term T . This test relies on a linear\nmodel that assumes that the response in each voxel is a combination of the different factors and on\nclassical statistics:\n\nx = Y \u03b2 + \u03b5,\n\nwhere Y is the design matrix yielding the occurrence of terms and \u03b2 the term effects. Here, we\nassemble term-versus-rest contrasts, that test for the speci\ufb01c effect of the term. The bene\ufb01t of the\nGLM formulation is that it estimates the effect of each term partialing out the effects of the other\nterms, and thus imposes some form of functional speci\ufb01city in the results. Term co-occurrence in\nthe corpus can however lead to collinearity of the regressors.\n\n2.3 Reverse inference: which regions are predictive of tasks containing a given term?\nPoldrack 2006 [2] formulates reverse inferences as reasoning on P(T|x), the probability of a term\nT being involved in the experiment given the activation map x. For coordinate-based meta analysis,\nas all that is available is the presence or the absence of signi\ufb01cant activations at a given position, the\ninformation on x boils down to {i, xi (cid:54)= 0}. Approaches to build a reverse inference framework\nupon this description have relied on Bayesian inversion to go from P(xi (cid:54)= 0|T ), as output by the\nGLM, to P(T|xi (cid:54)= 0) [2, 4]. In terms of predictive models on images, this approach can be under-\nstood as a naive Bayes predictor: the distribution of the different voxels are learned independently\nconditional to each term, and Bayes\u2019 rule is used for prediction. Learning voxels-level parameters\nindependently is a limitation as it makes it harder to capture distributed effects, such as large-scale\nfunctional networks, that can be better predictors of stimuli class than localized regions [6]. How-\never, learning the full distribution of x is ill-posed, as x is high-dimensional. For this reason, we\nmust resort to statistical learning tools.\nWe choose to use an (cid:96)2-regularized logistic regression to directly estimate the conditional probability\nP(T|x) under a linear model. The choice of linear models is crucial to our brain-mapping goals,\nas their decision frontier is fully represented by a brain map1 \u03b2 \u2208 Rp. However, as the images are\nspatially smooth, neighboring voxels carry similar information, and we use feature clustering with\nspatially-constrained Ward clustering [10] to reduce the dimensionality of the problem. We further\nreduce the dimensionality by selecting the most signi\ufb01cant features with a one-way ANOVA. We\nobserve that the classi\ufb01cation performance is not hindered if we reduce the data from 48K voxels\nto 15K parcels2 and then select the 30% most signi\ufb01cant features. The classi\ufb01er is quite robust\nto these parameters, and our choice is motivated by computational concerns. We indeed use a\nleave-one-study out cross validation scheme, nested with a 10-fold strati\ufb01ed shuf\ufb02e split to set the\n(cid:96)2 regularization parameter. As a result, we need to estimate 1200 models per term label, which\namounts to over 20K in total. The dimension reduction helps making the approach computationally\ntractable.\nThe learning task is rendered dif\ufb01cult by the fact that it is highly multi-class, with a small number\nof samples in some classes. To divide the problem in simpler learning tasks, we use the fact that our\nterms are derived from an ontology, and thus can be grouped by parent category. In each category,\nwe apply a strategy similar to one-versus-all: we train a classi\ufb01er to predict the presence of each\nterm, opposed to the others. The bene\ufb01ts of this approach are i) that it is suited to the presence of\nmultiple terms for a map, and ii) that the features it highlights are indeed selective for the associated\nterm only.\nFinally, an additional challenge faced by the predictive learning task is that of strongly imbalanced\nclasses: some terms are very frequent, while others hardly present. In such a situation, an empirical\nrisk minimizer will mostly model the majority class. Thus we add sample weights inverse of the\n\n1In this regard, the Naive Bayes prediction strategy does yield clear cut maps, as its decision boundary is a\n\nconic section.\n\n2Reducing even further down to 2K parcels does not impact the classi\ufb01cation performance, however the\n\nbrain maps \u03b2 are then less spatially resolved.\n\n3\n\n\fCATEGORY\n\nStimulus modality\nExplicit stimulus\nInstructions\nOvert response\n\nTERMS\n\nvisual, auditory\nwords, shapes, digits, abstract patterns, non-vocal sounds, scramble, face\nattend, read, move, track, count, discriminate, inhibit\nsaccades, none, button press\n\nTable 1: Subset of CogPO terms and categories that are present in our corpus\n\npopulation imbalance in the training set. This strategy is commonly used to compensate for covari-\nate shift [11]. However, as our test set is drawn from the same corpus, and thus shows the same\nimbalance, we apply an inverse bias in the decision rule of the classi\ufb01er by shifting the probability\noutput by the logistic model: if P is the probability of the term presence predicted by the logistic,\nwe use: Pbiased = \u03c1termP , where \u03c1term is the fraction of train samples containing the term.\n\n3 An image database\n\n3.1 Studies\n\nWe need a large collection of task fMRI datasets to cover the cognitive space. We also want to avoid\nparticular biases regarding imaging methods or scanners, and therefore prefer images from different\nteams. We use 19 studies, mainly drawn from the OpenfMRI project [12], which despite remaining\nsmall in comparison to coordinate databases, is as of now the largest open database for task fMRI.\nThe datasets include risk-taking tasks [13, 14], classi\ufb01cation tasks [15, 16, 17], language tasks [18, 8,\n19], stop-signal tasks [20], cueing tasks [21], object recognition tasks [22, 23], functional localizers\ntasks [24, 25], and \ufb01nally a saccades & arithmetic task [26]. The database accounts for 486 subjects,\n131 activation map types, and 3 826 individual maps, the number of subjects and map types varying\nacross the studies. To avoid biases due to heterogeneous data analysis procedures, we re-process\nfrom scratch all the studies with the SPM (Statistical Parametric Mapping) software.\n\n3.2 Annotating\n\nTo tackle highly-multiclass problems, computer vision greatly bene\ufb01ts from the WordNet ontology\n[27] to standardize annotation of pictures, but also to impose structure on the classes. The neuro-\nscience community recognizes the value of such vocabularies and develops ontologies to cover the\ndifferent aspects of the \ufb01eld such as protocols, paradigms, brain regions and cognitive processes.\nAmong the many initiatives, CogPO (The Cognitive Paradigm Ontology) [9] aims to represent the\ncognitive paradigms used in fMRI studies. CogPO focuses on the description of the experimental\nconditions characteristics, namely the explicit stimuli and their modality, the instructions, and the\nexplicit responses and their modality. Each of those categories use standard terms to specify the\nexperimental condition. As an example a stimulus modality may be auditory or visual, the explicit\nstimulus a non-vocal sound or a shape. We use this ontology to label with the appropriate terms all\nthe experimental conditions from the database. The categories and terms that we use are listed in\nTable 1.\n\n4 Experimental results\n\n4.1 Forward inference\n\nIn our corpus, the occurrence of some terms is too correlated and gives rise to co-linear regressors.\nFor instance, we only have visual or auditory stimulus modalities. While a handful of contrasts\ndisplay both stimulus modalities, the fact that a stimulus is not auditory mostly amounts to it being\nvisual. For this reason, we exclude from our forward inference visual, which will be captured by\nnegative effects on auditory, and digits, that amounts mainly to the instruction being count. We\n\ufb01t the GLM using a design matrix comprising all the remaining terms, and consider results with\np-values corrected for multiple comparisons at a 5% family-wise error rate (FWER). To evaluate the\nspatial layout of the different CogPO categories, we report the different term effects as outlines in\nthe brain, and show the 5% top values for each term to avoid clutter in Figure 3. Forward inference\n\n4\n\n\foutlines many regions relevant to the terms, such as the primary visual and auditory systems on the\nstimulus modality maps, or pattern and object-recognition areas in the ventral stream, on the explicit\nstimulus maps.\nIt can be dif\ufb01cult to impose a functional speci\ufb01city in forward inference because of several phe-\nnomena: i) the correlation present in the design matrix, makes it hard to separate highly associated\nii) the assumption inherent to this\n(often anti-correlated) factors, as can be seen in Fig. 1, right.\nmodel that a certain factor is expressed identically across all experiments where it is present. This\nassumption ignores modulations and interactions effects that are very likely to occur; however their\njoint occurrence is related to the protocol, making it impossible to disentangle these factors with\niii) important confounding effects are not modeled, such as the effect of\nthe database used here.\nattention. Indeed the count map captures networks related to visuo-spatial orientation and attention:\na dorsal attentional network, and a salience network (insulo-cingulate network [28]) in Figure 3.\n\n4.2 Reverse inference\n\nThe promise of predictive modeling on a large statistical map database is to provide principled re-\nverse inference, going from observations of neural activity to well-de\ufb01ned cognitive processes. The\nclassi\ufb01cation model however requires a careful setting to be speci\ufb01c to the intended effect. Figure 1\nhighlights some confounding effects that can captured by a predictive model: two statistical maps\noriginating from the same study are closer than two maps labeled as sharing a same experimental\ncondition in the sense of a Euclidean distance. We mitigate the capture of undesired effect with\ndifferent strategies. First we use term labels at span across studies, and refrain from using those that\nwere not present in at least two. We ensure this way that no term is attached to a speci\ufb01c study.\nSecond, we only test the classi\ufb01ers on previously unseen studies and if possible subjects, using for\nexample a leave-one-study out cross validation scheme. A careless classi\ufb01cation setting can very\neasily lead to training a study detector.\nFigure 2 summarizes the highly multi-class and imbalanced problem that we face: the distribution\nof the number of samples per class displays a long tail. To \ufb01nd non-trivial effects we need to be able\nto detect the under-represented terms as well as possible. As a reference method, we use a K-NN,\nas it is in general a fairly good approach for highly multi-class problems. Its training is independent\nof the term label structure and predicts the map labels instead. It subsequently assigns to a new map\nterms that are present in more than half of its nearest neighbors from the training3. We compare this\napproach to training independent predictive models for each term and use three types of classi\ufb01ers:\na naive Bayes, a logistic regression, and a weighted logistic regression. Figure 2 shows the results\nfor each method in terms of precision and recall, standard information-retrieval metrics. Note that\nthe performance scores mainly follow the class representation, i.e. the number of samples per class\nin the train set. Considering that rare occurrences are also those that are most likely to provide\nnew insight, we want a model that promotes recall over precision in the tail of the term frequency\ndistribution. On the other hand, well represented classes are easier to detect and correspond to\nmassive, well-known mental processes. For these, we want to favor precision, i.e. not affecting the\ncorresponding term to other processes, as these term are fairly general and non-descriptive.\nOverall the K-NN has the worst performance, both in precision and recall. It con\ufb01rms the idea\noutlined in Figure 1, that an Euclidean distance alone is not appropriate to discriminate underlying\nbrain functions because of overwhelming confounding effects4. Similarly, the naive bayes performs\npoorly, with very high recall and low precisions scores which lead to a lack of functional speci\ufb01city.\nOn the contrary, the methods using a logistic regression show better results, and yield performance\nscores above the chance levels which are represented by the red horizontal bars for the leave-one-\nstudy out cross validation scheme in Figure 2. Interestingly, switching the cross validation scheme to\na leave-one-laboratory out does not change the performance signi\ufb01cantly. This result is important, as\nit con\ufb01rms that the classi\ufb01ers do not rely on speci\ufb01cities from the stimuli presentation in a research\ngroup to perform the prediction. We mainly use data drawn from 2 different groups in this work,\nand use those data in turn to train and test a logistic regression model. The predicitions scores for\n\n3K was chosen in a cross-validation loop, varying between 5 and 20. Such small numbers for K are useful\nto avoid penalizing under-represented terms of rare classes in the vote of the KNN. For this reason we do not\nexplore above K=20, in respect to the small number of occurrences of the faces term.\n\n4Note that the picture does not change when (cid:96)1 distances are used instead of (cid:96)2 distances.\n\n5\n\n\fFigure 1: (Left) Histogram of the distance\nbetween maps owing to their commonalities:\nstudy of origin, functional labels, functional\ncontrast.\n(Right) Correlation of the design\nmatrix.\n\nthe terms present in both groups are displayed in Figure 2, with the chance levels represented by the\ngreen horizontal bars for this cross validation scheme.\nWe evaluate the spatial layout of maps representing CogPO categories for reverse inference as well,\nand report boundaries of the 5% top values from the weighted logistic coef\ufb01cients. Figure 3 reports\nthe outlined regions that include motor cortex activations in the instructions category, and activations\nin the auditory cortex and FFA respectively for the words and faces terms in the explicit stimulus\ncategory. Despite being very noisy, those regions report \ufb01ndings consistent with the literature and\ncomplementary to the forward inference maps. For instance, the move instruction map comprises\nthe motor cortex, unlike for forward inference. Similarly, the saccades over response map segments\nthe intra-parietal sulci and the frontal eye \ufb01elds, which corresponds to the well known signature of\nsaccades, unlike the corresponding forward inference map, which is very non speci\ufb01c of saccades5.\n\n5 Discussion and conclusion\n\nLinking cognitive concepts to brain maps can give solid grounds to the diffuse knowledge derived\nin imaging neuroscience. Common studies provide evidence on which brain regions are recruited in\ngiven tasks. However coming to conclusions on the tasks in which regions are specialized requires\ndata accumulation across studies to overcome the small coverage in cognitive domain of the tasks\nassessed in a single study. In practice, such a program faces a variety of roadblocks. Some are\ntechnical challenges, that of build a statistical predictive engine that can overcome the curse of\ndimensionality. While others are core to meta-analysis. Indeed, \ufb01nding correspondence between\nstudies is a key step to going beyond idiosyncrasies of the experimental designs. Yet the framework\nshould not discard rare but repeatable features of the experiments as these provide richness to the\ndescription of brain function.\nWe rely on ontologies to solve the correspondence problem.\nIt is an imperfect solution, as the\nlabeling is bound to be inexact, but it brings the bene\ufb01t of several layers of descriptions and thus\nenable us to fraction the multi-class learning task in simpler tasks. A similar strategy based on\nWordNet was essential to progress in object recognition in the \ufb01eld of computer vision [27]. Previous\nwork [5] showed high classi\ufb01cation scores for several mental states across multiple studies, using\ncross-validation with a leave-one-subject out strategy. However, as this work did not model common\nfactors across studies, the mental state was confounded by the study. In every study, a subject was\nrepresented by a single statistical map, and there is therefore no way to validate whether the study or\nthe mental state was actually predicted. As \ufb01gure 1 shows, predicting studies is much easier albeit of\nlittle neuroscienti\ufb01c interest. Interestingly, [5] also explores the ability of a model to be predictive on\ntwo different studies sharing the same cognitive task, and a few subjects. When using the common\nsubjects, their model performs worse than without these subjects, as it partially mistakes cognitive\n\n5This failure of forward inference is probably due to the small sample size of saccades.\n\n6\n\nDistancebetweentwomapsNbofmaps0AllSamelabelSamestudySamecontrast\fFigure 2: Precision and recall for all terms per classi\ufb01cation method, and term representation in\nthe database. The * denotes a leave-one-laboratory out cross validation scheme, associated with\nthe green bars representing the chance levels. The other methods use a leave-one-study out cross\nvalidation, whose chance levels are represented by the red horizontal bars.\n\ntasks for subjects. This performance drop illustrates that a classi\ufb01er is not necessarily speci\ufb01c to the\ndesired effect, and in this case detects subjects in place of tasks to a certain degree. To avoid this\nloophole, we included in our corpus only studies that had terms in common with at least on other\nstudy and performed cross-validation by leaving a study out, and thus predicting from completely\nnew activation maps. The drawback is that it limits directly the number of terms that we can attempt\nto predict given a database, and explain why we have fewer terms than [5] although we have more\nthan twice as many studies. Indeed, in [5], the terms cannot be disambiguated from the studies.\nOur labeled corpus is riddled with very infrequent terms giving rise to class imbalance problems\nin which the rare occurrences are the most dif\ufb01cult to model.\nInterestingly, though coordinates\ndatabases such as Neurosynth [4] cover a larger set of studies and a broader range of cognitive\nprocesses, they suffer from a similar imbalance bias, which is given by the state of the literature.\nIndeed, by looking at the terms in Neurosynth, that are the closest to the one we use in this work,\nwe \ufb01nd that motor is cited in 1090 papers, auditory 558, word 660, and the number goes as low\nas 55 and 31 for saccade and calculation respectively. Consequently, these databases may also\nyield inconsistent results. For instance, the reverse inference map corresponding to the term digits\nis empty, whereas the forward inference map is well de\ufb01ned 6. Neurosynth draws from almost\n5K studies while our work is based on 19 studies; however, unlike Neurosynth, we are able to\nbene\ufb01t from the different contrasts and subjects in our studies, which provides us with 3 826 training\nsamples. In this regard, our approach is particularly interesting and can hope to achieve competitive\nresults with much less studies.\nThis paper shows the \ufb01rst demonstration of zero-shot learning for prediction of tasks from brain\nactivity: paradigm description is given for images from unseen studies, acquired on different scan-\nners, in different institutions, on different cognitive domains. More importantly than the prediction\nper se, we pose the foundation of a framework to integrate and co-analyze many studies. This data\n\n6http://neurosynth.org/terms/digits\n\n7\n\n\fForward inference atlas\n\nReverse inference atlas\n\nFigure 3: Maps for the forward inference (left) and the reverse inference (right) for each term cat-\negory. To minimize clutter, we set the outline so as to encompass 5% of the voxels in the brain on\neach \ufb01gure, thus highlighting only the salient features of the maps. In reverse inference, to reduce\nthe visual effect of the parcellation, maps were smoothed using a \u03c3 of 2 voxels.\n\naccumulation, combined with the predictive model can provide good proxies of reverse inference\nmaps, giving regions whose activation supports certain cognitive functions. These maps should, in\nprinciple, be better suited for causal interpretation than maps estimated from standard brain mapping\ncorrelational analysis. In future work, we plan to control the signi\ufb01cance of the reverse inference\nmaps, that show promising results but would probably bene\ufb01t from thresholding out non-signi\ufb01cant\nregions. In addition, we hope that further progress, in terms of spatial and cognitive resolution in\nmapping the brain to cognitive ontologies, will come from enriching the database with new studies,\nthat will bring more images, and new low and high-level concepts.\n\nAcknowledgments\n\nThis work was supported by the ANR grants BrainPedia ANR-10-JCJC 1408-01 and IRMGroup\nANR-10-BLAN-0126-02, as well as the NSF grant NSF OCI-1131441 for the OpenfMRI project.\n\nReferences\n[1] N. Kanwisher, J. McDermott, and M. M. Chun, \u201cThe fusiform face area: a module in human extrastriate\n\ncortex specialized for face perception.,\u201d J Neurosci, vol. 17, p. 4302, 1997.\n\n[2] R. Poldrack, \u201cCan cognitive processes be inferred from neuroimaging data?,\u201d Trends in cognitive sciences,\n\nvol. 10, p. 59, 2006.\n\n[3] A. Laird, J. Lancaster, and P. Fox, \u201cBrainmap,\u201d Neuroinformatics, vol. 3, p. 65, 2005.\n[4] T. Yarkoni, R. Poldrack, T. Nichols, D. V. Essen, and T. Wager, \u201cLarge-scale automated synthesis of\n\nhuman functional neuroimaging data,\u201d Nature Methods, vol. 8, p. 665, 2011.\n\n[5] R. Poldrack, Y. Halchenko, and S. Hanson, \u201cDecoding the large-scale structure of brain function by\n\nclassifying mental states across individuals,\u201d Psychological Science, vol. 20, p. 1364, 2009.\n\n8\n\nLRy=-60x=-46LRz=49InstructionsTermscountinhibitdiscriminatereadmovetrackattendLRy=-60x=-46LRz=49\f[6] S. Hanson and Y. Halchenko, \u201cBrain reading using full brain support vector machines for object recogni-\n\ntion: there is no face identi\ufb01cation area,\u201d Neural Computation, vol. 20, p. 486, 2008.\n\n[7] G. Salimi-Khorshidi, S. M. Smith, J. R. Keltner, T. D. Wager, et al., \u201cMeta-analysis of neuroimaging data:\na comparison of image-based and coordinate-based pooling of studies,\u201d Neuroimage, vol. 45, p. 810, 2009.\n[8] C. Pallier, A. Devauchelle, and S. Dehaene, \u201cCortical representation of the constituent structure of sen-\n\ntences,\u201d Proc Natl Acad Sci, vol. 108, p. 2522, 2011.\n\n[9] J. Turner and A. Laird, \u201cThe cognitive paradigm ontology: design and application,\u201d Neuroinformatics,\n\nvol. 10, p. 57, 2012.\n\n[10] V. Michel, A. Gramfort, G. Varoquaux, E. Eger, C. Keribin, and B. Thirion, \u201cA supervised clustering\n\napproach for fMRI-based inference of brain states,\u201d Pattern Recognition, vol. 45, p. 2041, 2012.\n\n[11] H. Shimodaira, \u201cImproving predictive inference under covariate shift by weighting the log-likelihood\n\nfunction,\u201d Journal of statistical planning and inference, vol. 90, p. 227, 2000.\n\n[12] R. Poldrack, D. Barch, J. Mitchell, T. Wager, A. Wagner, J. Devlin, C. Cumba, and M. Milham, \u201cTowards\nopen sharing of task-based fMRI data: The openfMRI project (in press),\u201d Frontiers in Neuroinformatics.\n[13] T. Schonberg, C. Fox, J. Mumford, C. Congdon, C. Trepel, and R. Poldrack, \u201cDecreasing ventromedial\nprefrontal cortex activity during sequential risk-taking: an fMRI investigation of the balloon analog risk\ntask,\u201d Frontiers in Neuroscience, vol. 6, 2012.\n\n[14] S. Tom, C. Fox, C. Trepel, and R. Poldrack, \u201cThe neural basis of loss aversion in decision-making under\n\nrisk,\u201d Science, vol. 315, p. 515, 2007.\n\n[15] A. Aron, M. Gluck, and R. Poldrack, \u201cLong-term test\u2013retest reliability of functional MRI in a classi\ufb01ca-\n\ntion learning task,\u201d Neuroimage, vol. 29, p. 1000, 2006.\n\n[16] K. Foerde, B. Knowlton, and R. Poldrack, \u201cModulation of competing memory systems by distraction,\u201d\n\nProc Natl Acad Sci, vol. 103, p. 11778, 2006.\n\n[17] R. Poldrack, J. Clark, E. Pare-Blagoev, D. Shohamy, J. Creso Moyano, C. Myers, and M. Gluck, \u201cInter-\n\nactive memory systems in the human brain,\u201d Nature, vol. 414, p. 546, 2001.\n\n[18] G. Xue and R. Poldrack, \u201cThe neural substrates of visual perceptual learning of words: implications for\n\nthe visual word form area hypothesis,\u201d J Cognitive Neurosci, vol. 19, p. 1643, 2007.\n\n[19] L. Vagharchakian, G. Dehaene-Lambertz, C. Pallier, and S. Dehaene, \u201cA temporal bottleneck in the lan-\n\nguage comprehension network,\u201d J Neurosci, vol. 32, p. 9089, 2012.\n\n[20] G. Xue, A. Aron, and R. Poldrack, \u201cCommon neural substrates for inhibition of spoken and manual\n\nresponses,\u201d Cerebral Cortex, vol. 18, p. 1923, 2008.\n\n[21] A. Kelly, L. Q. Uddin, B. B. Biswal, F. Castellanos, and M. Milham, \u201cCompetition between functional\n\nbrain networks mediates behavioral variability,\u201d Neuroimage, vol. 39, p. 527, 2008.\n\n[22] J. Haxby, I. Gobbini, M. Furey, A. Ishai, J. Schouten, and P. Pietrini, \u201cDistributed and overlapping repre-\n\nsentations of faces and objects in ventral temporal cortex,\u201d Science, vol. 293, p. 2425, 2001.\n\n[23] K. Duncan, C. Pattamadilok, I. Knierim, and J. Devlin, \u201cConsistency and variability in functional localis-\n\ners,\u201d Neuroimage, vol. 46, p. 1018, 2009.\n\n[24] P. Pinel, B. Thirion, S. Meriaux, A. Jobert, J. Serres, D. L. Bihan, J. B. Poline, and S. Dehaene, \u201cFast\nreproducible identi\ufb01cation and large-scale databasing of individual functional cognitive networks,\u201d BMC\nneuroscience, vol. 8, p. 91, 2007.\n\n[25] P. Pinel and S. Dehaene, \u201cGenetic and environmental contributions to brain activation during calculation,\u201d\n\nNeuroImage, vol. in press, 2013.\n\n[26] A. Knops, B. Thirion, E. M. Hubbard, V. Michel, and S. Dehaene, \u201cRecruitment of an area involved in\n\neye movements during mental arithmetic,\u201d Science, vol. 324, p. 1583, 2009.\n\n[27] J. Deng, A. Berg, K. Li, and L. Fei-Fei, \u201cWhat does classifying more than 10,000 image categories tell\n\nus?,\u201d in Computer Vision\u2013ECCV 2010, p. 71, 2010.\n\n[28] W. W. Seeley, V. Menon, A. F. Schatzberg, J. Keller, G. H. Glover, H. Kenna, A. L. Reiss, and M. D.\nGreicius, \u201cDissociable intrinsic connectivity networks for salience processing and executive control,\u201d J\nneurosci, vol. 27, p. 2349, 2007.\n\n9\n\n\f", "award": [], "sourceid": 844, "authors": [{"given_name": "Yannick", "family_name": "Schwartz", "institution": "INRIA"}, {"given_name": "Bertrand", "family_name": "Thirion", "institution": "INRIA"}, {"given_name": "Gael", "family_name": "Varoquaux", "institution": "INRIA"}]}