{"title": "Learning Structural Equation Models for fMRI", "book": "Advances in Neural Information Processing Systems", "page_first": 1329, "page_last": 1336, "abstract": null, "full_text": "Learning Structural Equation Models for fMRI\n\nAmos J. Storkey\n\nSchool of Informatics\nUniversity of Edinburgh\n\nEnrico Simonotto\n\nDivision of Psychiatry\nUniversity of Edinburgh\n\nHeather Whalley\n\nDivision of Psychiatry\nUniversity of Edinburgh\n\nStephen Lawrie\n\nDivision of Psychiatry\nUniversity of Edinburgh\n\nLawrence Murray\nSchool of Informatics\nUniversity of Edinburgh\n\nAbstract\n\nDavid McGonigle\n\nCentre for Functional Imaging Studies\n\nUniversity of Edinburgh\n\nStructural equation models can be seen as an extension of Gaussian belief net-\nworks to cyclic graphs, and we show they can be understood generatively as the\nmodel for the joint distribution of long term average equilibrium activity of Gaus-\nsian dynamic belief networks. Most use of structural equation models in fMRI\ninvolves postulating a particular structure and comparing learnt parameters across\ndifferent groups. In this paper it is argued that there are situations where priors\nabout structure are not \ufb01rm or exhaustive, and given suf\ufb01cient data, it is worth\ninvestigating learning network structure as part of the approach to connectivity\nanalysis. First we demonstrate structure learning on a toy problem. We then show\nthat for particular fMRI data the simple models usually assumed are not supported.\nWe show that is is possible to learn sensible structural equation models that can\nprovide modelling bene\ufb01ts, but that are not necessarily going to be the same as a\ntrue causal model, and suggest the combination of prior models and learning or\nthe use of temporal information from dynamic models may provide more bene\ufb01ts\nthan learning structural equations alone.\n\n1 Introduction\nStructural equation modelling (SEM) is a technique widely used in the behavioural sciences. It has\nalso appeared as a standard approach for analysis of what has become known as effective connectivity\nin the functional magnetic resonance imaging (fMRI) literature and is still in common use despite the\nincreasing interest in dynamical methods such as dynamic causal models [6]. Simply put, effective\nconnectivity analysis involves looking at the possible causal in\ufb02uences between brain regions given\nmeasurements of the activity of those regions. Structural equation models are a Gaussian modelling\ntool, and are similar to Gaussian belief networks. In fact Gaussian belief networks can be seen as\na subset of valid structural equation models. However structural equation models do not have the\nsame acyclicity constraints as belief networks. It should be noted that the graphical form used in\nthis paper is at odds with traditional SEM representations, and consistent with that used for belief\nnetworks, as those will be more familiar to the expected audience.\nWithin the fMRI context, the use of structural equation generally takes the following form. First\ncertain regions of interests (commonly called seeds) are chosen according to some understanding\nof what brain regions might be of interest or of importance. Then neurobiological knowledge is\nused to propose a connectivity model. This connectivity model states what regions are connected\nto what other regions, and the direction of the connectivity. This connectivity model is used to\nde\ufb01ne a structural equation model. The parameters of this model are then typically estimated using\nmaximum likelihood methods, and then comparison of connection parameters is made across subject\nclasses.\n\n\fIn this paper we consider what can be done when it is hard to specify connectivity a priori, and ask\nhow much we can achieve by learning network structures from the fMRI data itself. The novel de-\nvelopments of this paper include the examination of various generative representations for structural\nequation models which allow straightforward comparisons with belief networks and other models\nsuch as dynamic causal models. We implement Bayesian Information Criterion approximations to\nthe evidence and use this in a Metropolis-Hastings sampling scheme for learning structural equation\nmodels. These models are then applied to toy data, and to fMRI data, which allows the examination\nof the types of assumptions typically made.\n1.1 Related Work: Structural Equation Models\nStructural equation models and path analysis have a long history. The methods were introduced in\nthe context of genetics in [20], and in econometrics in [7]. They have been used extensively in the\nsocial sciences in a variety of ways. Linear Gaussian structural equation models can be split into the\ncase of path analysis [20], where the all the variables are directly measurable and structural equation\nmodels with latent variables [1], where latent variable models are allowed. Factor analysis is another\nspecial case of this latter. Furthermore structural equation models can also be characterised by the\ninclusion of exogenous in\ufb02uences.\nStructural equation models have been analysed and understood in Bayesian terms before. They form\na part of the causal modelling framework of Pearl [11], and have been discussed within that context,\nas well as a number of others [11, 4, 13, 10]. Approaches to learning structural equation models\nhave not played a signi\ufb01cant part in fMRI methods. One approach is described in [3] where they use\na genetic algorithm approach for the search. In [21], the authors look at learning Bayesian networks\nbut do not consider cyclic networks. For dynamic causal models (rather than structural equation\nmodels) the issue of model comparison was dealt with in [12], but large scale structure learning was\nnot considered.\nIn fMRI literature, SEMs have generally been used to model \u2018effective connectivity\u2019, or rather mod-\nelling the causal relationships between different brain regions. They were \ufb01rst applied to imaging\ndata by [9], and there have been many further applications [2, 5, 14]. The \ufb01rst analysis on data from\nschizophrenia studies was detailed in [15]. In fact it seems SEMs have been the most widely used\nmodel for connectivity analyses in neuroimaging. In all of the studies cited above the underlying\nstructure was presumed known or presumed to be one of a small number of possibilities. There has\nbeen some discussion of how best to obtain reasonable structures from neuro-anatomical data, but\nthis approach is currently used only very rarely.\n2 Why Learn SEMs?\nThe presumption in much fMRI connectivity analysis is that we can obtain models for activity\ndependence from neuro-anatomical sources. The problem with this is that it fails to account for\nthe fact that connectivity analysis is usually done with a limited number of regions. It is highly\npossible that a connection from one region to another is mediated via a third region, which is not\nincluded in the SEM model. The strength of that mediation is unknown from neuro-anatomical data\nand is generally ignored: most connectivity models focus only on direct anatomical connections,\nwith the accompanying implicit assumption that there are no other regions involved in the network\nunder study, or that these regions would contribute only minimally to the model. Furthermore, just\nbecause regions are physically connected does not mean there is any actual functional in\ufb02uence in\na particular context. Hence it has to be accepted that neuro-anatomically derived connectivity is a\n\ufb01rst guess at best.\nIt is not the purpose of this paper to propose that anatomical connectivity be ignored, but instead\nit asks what happens if we go to the other extreme: can we say something about connectivity from\nthe data? In reality anatomical connectivity models are needed, and can be used to provide good\npriors for the connections and even for the relative connection strengths. Statistically there are huge\nequivalences in structural equation models that will not be determined by the data alone.\n3 Understanding Structural Equation Models\nIn this section two generative views of structural equation modelling are presented. The idea behind\nstructural equation modelling is that it represents causal dependence between different variables.\nThe fact that cyclic structures are allowed in structural equation models could be seen as an implicit\nassumption of some underlying dynamic which the structural equation model is an equilibrium rep-\n\n\fresentation of. Indeed that is commonly how effective connectivity models are interpreted in an\nfMRI context. Two linear models, both of which produce a structural equation model prior, are pre-\nsented here. Though these models have the same statistical properties, they have different generative\nmotivations and different non-linear extensions, so they are both potentially instructive.\n3.1 The Traditional Model\nThe standard SEM view is that the core SEM structure is a covariance produced by the solution to a\nset of linear equations x = Ax + \u03c9 with Gaussian term \u03c9. This does not have any direct generative\nelucidation, but can instead be thought of as relating to a deterministic dynamical system subject to\nuncertain \ufb01xed input. Suppose we have a dynamical system xt+1 = Axt + \u03c9, subject to some input\n\u03c9, where we presume the system input is unknown and Gaussian distributed. To generate from the\nmodel, we sample \u03c9, run the dynamical system to its \ufb01xed point, and use that \ufb01xed point as a sample\nof x. This \ufb01xed point is given by x = (I \u2212 A)\u22121\u03c9 which produces the standard SEM covariance\nstructure for x. This requires A to be a contraction map to obtain stable \ufb01xed points. All the other\naspects of the general form of SEM are either inputs to or measurements from this system.\n3.2 Average Activity Of A Gaussian Dynamic Bayesian Network\nAn alternative and potentially appealing view is that the the SEM represents the distribution of the\nlong term activity of the nodes in a Gaussian dynamic Bayesian network (Kalman \ufb01lter). Sup-\npose we have xt = Axt\u22121 + \u03c9t, where \u03c9t are IID Gaussian variables, and x0, x1, . . . is a se-\nries of real variables. This de\ufb01nes a Markov chain, and is the evolution equation of a Gaus-\nsian dynamic Bayesian network. Suppose we are at the equilibrium distribution of this Markov\nchain. Then setting \u02dcx = (1/\nt=1 xt for large N, we can use the Kalman \ufb01lter to see that\n\u221a\n\u221a\nN)[A(x0 \u2212 xN ) + A\n\u221a\nN)\nt=1 xt = (1/\n(1/\nt=1 \u03c9t. Presuming A is a\nN)[A(x0 \u2212 xN )] becomes negligibly small and so \u02dcx \u2248 A\u02dcx + \u03c9 where \u03c9 is\ncontraction map, (1/\ndistributed identically to \u03c9t due to the fact that the variance of a sum of Gaussians is the sum of the\nvariances. The approximation becomes an equality in the large N limit. Again this is the required\nform for obtaining the covariance of the SEM.\nThis interpretation says that if we have some latent system running as a Gaussian dynamic Bayesian\nnetwork, but our measuring equipment is only capable of capturing longer term averages of the\nnetwork activity then our measurements are distributed according to an SEM. This generative inter-\npretation is appealing in the context of fMRI acquisition. Note in both of these interpretations that it\nis important that A is a contraction. By formulating the generative framework we see it is important\nto restrict the form of connectivity model in this way.\n4 Model Structure\nThe standard formalism for Structural Equation Models is now outlined. A structural equation\nmodel for observational variables y, latent variables, x and sometimes for latent input variables \u03c6\nand observations of the input variables z is given by the following equations\n\ni=1 xt] + (1/\n\n(cid:80)N\n\n(cid:80)N\n\n(cid:80)N\n\n\u221a\nN)\n\n(cid:80)N\n\n\u221a\nN)\n\nx = (I \u2212 A)\u22121(R\u03c6 + \u03c9), y = Bx + \u03c3 and z = C\u03c6 + \u03b4\n\n(1)\n\n(2)\n\nwhere \u03c3, \u03c9, \u03c6 and \u03b4 are Gaussian, and A is presumed to be zero diagonal.\nFor for S = I \u2212 A, the resulting covariance for the observed variables (y, z) is given by\n\n(cid:18)\n\nBS\u22121(RK\u03c6RT + K\u03c9)[S\u22121]T + K\u03c3 BS\u22121RK\u03c6C T\nCK\u03c6C T + K\u03b4\n\nCK\u03c6RT [S\u22121]T BT\n\n(cid:19)\n\n.\n\nwhere K\u03c9 is the covariance of \u03c9, K\u03c3 the covariance of \u03c3 etc. There are a number of common\nsimpli\ufb01cations to this framework. The \ufb01rst case involves presuming no inputs and a fully visible\nsystem. Hence we marginalise the observations of the input variables z, set K\u03b4 = \u221e, C = 0\n,R = 0, B = 1, \u03c3 = 0. Then the covariance K1 of y is K1 = (I \u2212 A)\u22121K\u03c9[(I \u2212 A)\u22121]T .\nThe next simplest case would involve presuming once again that there are no inputs but that in fact\nthe observations are stochastic functions of the latent variables. This involves setting K\u03b4 = \u221e,\nC = 0, R = 0, B = 1. We then have K2 = (I \u2212 A)\u22121K\u03c9[(I \u2212 A)\u22121]T + K\u03c3. If we view the\nobservations as noisy versions of the latent variables then K\u03c3 is diagonal. This will be the most\ngeneral case considered in this paper. Adding any of the remaining components is not particularly\ndemanding as it simply uses a conditional rather than unconditional model.\n\n\f(cid:89)\n\nj\n\n(cid:181)\n\u22121\n2\n\nSuppose we denote by K the covariance corresponding to the required model. For most of this\npaper we presume K = K2. We then have the following probability for the whole data Y =\n{y1, y2, . . . , yN}.\n\n(cid:182)\n\nP (Y |K, \u00afy) =\n\n1\n\n(2\u03c0)m/2|K|1/2\n\nexp\n\n(yj \u2212 \u00afy)T K\u22121(yj \u2212 \u00afy)\n\n(3)\n\nwhere the observable model mean is \u00afy = \u00afx + \u00af\u03c3 and the latent mean is \u00afx = (I \u2212 A)\u22121 \u00af\u03c9, and where\n\u03c3 and \u03c9, along with elements of the matrix A and covariances K\u03c9 and K\u03c3, are parameters.\n5 Priors, Maximum Posterior and Bayesian Information Criterion\nThe previous section outlines the basic model of the data given the various parameters. In this section\nwe provide prior distributions for the parameters of the structural equation model.\nIndependent\nGaussian priors are put on the parameters:\n\nP (Aij|T ) = T 1/2\n(2\u03c0)1/2\n\nexp\n\n(cid:181)\n\u22121\n2 T (Aij \u2212 \u00afAij)2\n\n(cid:182)\n\n(4)\n\nwith regularisation parameter T . For the purposes of this paper we take \u00afAij = 0, presume we\nhave no particular a priori bias towards positive or negative connections and a uniform prior over\nstructures. An independent prior over connections seems reasonable as two separate connections\nbetween different brain regions would have no a priori reason to be related. Any relationship is due\nto functional purpose and is therefore a posteriori. The use of a uniform prior over all structures is\nan extreme position, which we have taken in this paper to contrast with using only one structure. In\nreality we would want to use neurobiologically guided priors over structures.\nInverse gamma priors were also speci\ufb01ed for K\u03c9 and K\u03c3 originally, along with a prior for the\nmean \u00af\u03c9. It was found that these typically had no effects on the experiments and were dropped for\nsimplicity. Hence K\u03c9 and K\u03c3 will be optimised without regularisation, and \u00af\u03c9 is set to zero. T is\nchosen by 10 fold cross-validation from a set of 10 possible values.\nWe can calculate all the relevant derivatives for the SEM straightforwardly, and adapt the parameters\nto maximize the posterior of the structural equation model. In this paper we use a conjugate gradient\napproach. By adding a Bayesian Information Criterion term [16], (\u22120.5m log N) for m parameters\nand N data points, to the log posterior at the maximum posterior solution, we can obtain an approx-\nimation of the evidence P (Y |M) where M encodes the structural information we are interested in\nand consists of indicator variables Mij indicating a connection for node j to node i. This will enable\nus to sample from an approximate posterior distribution of structures to \ufb01nd a sample which best\nrepresents the data.\n6 Sampling From SEMs\nIn order to represent the posterior distribution over network structures, we resort to a sampling ap-\nproach. Because there are no acyclicity constraints, MCMC proposals are simpler than the compa-\nrable situation for belief networks in that no acyclicity checking needs to be done for the proposals.\nA simple proposal scheme is to randomly generate a swap matrix MS which is XORed with M. We\nchoose highly sparse swap matrices, but to reduce the possibility of transitioning randomly about the\nlarger graphs, without ever considering smaller networks we introduce a bias towards removing con-\nnections rather than adding connections in generating the swap matrix. This means the proposal is\nno longer symmetric, and so a corresponding Hastings factor needs to be included in the acceptance\nprobability, so the result is still a sample from the original posterior.\n7 Tests On A Toy Problem\nWe tested the approach on a toy problem with 8 variables. We sampled 800 data points from y =\n(I\u2212A)\u22121\u0001+\u03c1 for \u0001 Gaussian with unit diagonal covariance, \u03c1 Gaussian with 0.2 diagonal covariance\n\nand with A given by\uf8eb\uf8ec\uf8ec\uf8ec\uf8ed 0\n\n0\n0\n0\n0\n\n0\n\n0\n\n0\n0\n0.47\n0\n\u22120.36\n0\n0\n0.27\n0.49\n0.42 \u22120.13\n0\n\n0\n\n0\n\n\u22120.26\n\n0\n0\n0\n0\n0\n0\n0\n\n0\n\n\u22120.36\n0.34\n\n0\n0\n\n\u22120.18\n\n0\n\n0.16\n\n0\n\n\u22120.03\n\n0\n\n0\n0\n\n0.55\n\u22120.03 \u22120.08\n\u22120.25\n\n0\n0\n0\n0.5\n\n0\n0\n0\n\n0\n0\n0\n\n0.25\n\n0\n0\n\n\u22120.22\n\n0\n\n\u22120.17\n0.31\n0.385\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f8\n\n\fThis connectivity matrix is represented graphically in Figure 11. In modelling this we used T =\n10. This prior ensures that any A that is not of very low prior probability is a contraction. Also\ncontraction constraints were added to the optimisation. Priors on other parameters were set to be\nbroad. An annealed sampling procedure was used for the \ufb01rst 4000 samples from the Metropolis-\nHastings Monte-Carlo procedure. After that a further 4000 samples burn-in was used. The next\n4000 samples were used for analysis.\nWe assess what edges are common in the samples, and what the highest posterior sampled graphs\nare. Figure 1b provides illustrations of edges which are present in more than 0.15 of the samples. It\ncan be seen that many of the critical edges are there in most samples (indeed some are always there).\nThose which are missing in both cases tend to be either low magnitude connections or are due to\ndirectional confusion.\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: (a) Graphical structure of the ground truth model for the toy data, and (b) Edges present in more\nthan 0.15 of the cases, (c) the highest posterior structure from the sample (d) a random sample.\nThe graphs for the maximum posterior sample and a random sample are shown in Figure 1. We can\nsee that again in the maximum posterior sample, there is a misplaced edge (the edge from 5 to 6 is\nreplaced by one from 5 to 1) and a number are missing or have exchanged direction. The samples\ngenerally have likelihoods which are very similar to the likelihood for the true model.\nWe can conclude from this that we can gain some information from learning SEM structures, but as\nwith learning any graphical models there are many symmetries and equivalences, so it is vital not to\ninfer too much from the learnt structures.\n8 Tests On fMRI Data\nThe approach of this paper was tested on two different fMRI datasets. The \ufb01rst dataset (D1) was\ntaken from a dataset that had previously used to examine inter-session variance in a single subject [8,\n17]. We used the auditory paced \ufb01nger-tapping task; brie\ufb02y, a single subject tapped his right index\n\ufb01nger, paced by an auditory tone (1.5Hz). Each activation epoch was alternated with a rest epoch,\nin which the pacing tone was delivered to control for auditory activation. Thirteen blocks were\ncollected per session (seven rest and six active). Each block was 24s/6 scans long, making 78 scans\nin total for each of 33 sessions. The subject maintained \ufb01xation on a cross that was backprojected\nonto a transparent screen by a LCD video projector as in previous experiments.\nThe subject was a healthy 23 year old right-handed male. The data were acquired on a Siemens\nMAGNETOM Vision (Siemens, Erlangen, Germany) at 2T. Each BOLD-EPI volume scan consisted\nof 48 transverse slices (inplane matrix 64x64; voxel size 3x3x3mm; TE=40ms; TR=4.1s). A T1-\nweighted high-resolution MRI of the subject (1 x 1 x 1.5mm resolution) was acquired to facilitate\nanatomical localisation of the functional data.\nThe data were processed with statistical parametric mapping (SPM) software SPM5 (Wellcome\nDepartment of Cognitive Neurology; www.\ufb01l.ion.ucl.ac.uk/spm). After removal of the \ufb01rst two\nvolumes to account for T1 saturation effects, cerebral volumes were realigned to correct for both\nwithin- and between-session subject motion). The data were \ufb01ltered with a 128s high-pass \ufb01lter,\nand an AR(1)-model was used to account for serial correlation in the data Experimental effects were\nestimated using session design matrices modeling the hemodynamically convolved time-course of\nthe active movement condition, and 6 subject movement parameters. Note that no spatial smoothing\nwas applied to this dataset, to attempt to preserve single-voxel timeseries.\nSeeds were selected from signi\ufb01cantly active voxels identi\ufb01ed using a random effects analysis in\nSPM5 (ones-sample t-test across 33 sessions; p < 0.05 FWE corrected for multiple comparsions).\n\n\fFor comparison with previous extant work, the most signi\ufb01cant voxel in each cluster was chosen\nas a seed, giving 13 seeds representing 13 separate anatomical regions. When it was obvious that\na given cluster encompassed more than one distinct anatomical region, seeds were also selected for\nother regions covered by the cluster. 2000 data points were used for training, the remaining 574\nwere reserved as a test set.\nThe second dataset (D2) was from a long term study of subjects who are at genetically enhanced risk\nof schizophrenia. Imaging was carried out on 90 subjects at the Brain Imaging Research Centre for\nScotland (Edinburgh, Scotland, UK) on a GE 1.5 T Signa scanner. A high resolution structural scan\nwas acquired using a 3D T1-weighted sequence (TI = 600 ms). Functional data was acquired using\nan EPI sequence. A total of 204 volumes were acquired. The \ufb01rst four volumes of each acquisition\nwere discarded. Preliminary analysis was carried out using SPM99. Data were \ufb01rst realigned to\ncorrect for head movement, normalized to the standard EPI template and smoothed.\nThe resulting data consists of a image-volume series of 200 time points for each of the remaining\n90 patients. The voxel time courses were temporally \ufb01ltered. In order to reduce task related effects,\nwe modelled the task conditions with standard block effects (Hayling), all convolved with canonical\nhemodynamical response functions, and \ufb01tted a general linear model (which also included regressors\nfor the estimated head movement) to the time \ufb01ltered data; the residuals of this procedure were used\nas the data for all the work described in this paper. The full data set was split into two halves, a\ntraining and a test set. Data from 45 of the subjects was used for training and 45 for testing.\nFor an effective connectivity analysis, a number of brain regions (seeds) were chosen on the basis\nof the results of a functional connectivity study [19] and taking regard of areas which may be of\nparticular clinical interest. In total 14 regions were chosen, along with their 14 cross-hemisphere\ncounterparts. Hence we are interested in learning a 28 by 28 connectivity matrix.\n8.1 Learning SEM Structure\nFor both datasets a similar procedure to the toy example was followed for learning structure for the\nfMRI data. The stability of the log posterior along with estimations of cross-correlation against lag\nwere used as heuristics to determine convergence prior to obtaining 10000 sample points.\nAssuming a fully visible path analysis (covariance K1) model, where no measurement noise is in-\ncluded is typical in fMRI analysis (e.g. [15] for a Schizophrenia studies), we found that samples\nfrom the posterior of this model were in fact so highly connected that displaying them would infea-\nsible. For D2 a connectivity of 350 of the 752 total possible connections was typical. However note\nthat only 376 connections are needed to fully specify a general covariance. Hence we can assume\nthat in this situation the data is not suggesting any particular structure in the data which is reasonably\namenable to path analysis.\nWe can generalise the path analysis model by making the region activities latent variables, and allow\nthe measurement variables to be noisy versions of those regions. In SEM terms this is equivalent to\nassuming the covariance structure given by K2. A repeat of the whole procedure with this covariance\nresults in much smaller structures. We focus on this approach.\nFor dataset D1, we sample posterior structures given the training data with T = 100. There is\nnotable variation in these structures although some key links (eg Left Motor Cortex (L M1) to Left\nPosterior Parietal Cortex (L PPC) are included in most samples. In addition an a priori connectivity\n\n(a)\n\n(b)\n\nFigure 2: Structure for (a) the hand speci\ufb01ed model (b) the highest posterior sample.\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 3: Graphical structure for (a) the highest posterior structure from the sample (b) random sample. (c) a\nsample from the two tier model. The regions are Inferior Frontal Gyrus, Medial Frontal Gyrus, Ant. Cingulate,\nFrontal Operculum, Superior Frontal Gyrus, Middle Frontal Gyrus, Superior Temporal Gyrus, Middle Temporal\nGyrus, Insula, Thalamus, Amygdala Hippocampal Region, Cuneus/Precuneus, Inferior Parietal Lobule and\nPosterior Cerebellum.\n\nstructure is proposed for the regions in the study, taking into account the task involved. This was ob-\ntained by using knowledge of neuroanatomical connectivity drawn from studies using tract-tracing in\nnon-human primates. It was produced independent of the connectivity analysis and without knowl-\nedge of its results, but taking into account the seed locations and their corresponding activities. Note\nthat this is a simple \ufb01nger tapping motor task with seeds corresponding to the associated regions.\nThough not trivial, we would expect the speci\ufb01cation to be easier and more accurate here than for\nmore complicated cognitive tasks, due to the high number of papers using this task in functional\nneuroimaging. Task D1 is also of note due to its focus on repeated scanning in a single individual,\nthus negating any problems in seed selection that may arise from inter-subject spatial variance.\nThese two cases described above are speci\ufb01ed as different hypothesised models. We denote the\nhand-speci\ufb01ed structure MH and we select the maximum a posteriori sample ML (for \u201dLearnt\nModel\u201d) as a potential alternative. The two structures are illustrated in Figure 2. The maximum\na posteriori parameters are then estimated for the two models using the same conjugate gradient\nprocedure on the same dataset. These two models are then used predictively on the remaining un-\nseen test data. We compute the predictive log-likelihoods for each model. We \ufb01nd that the best\npredictive log-likelihoods for each approach are the same (3SF) for both models. They are also the\nsame as the predictive likelihood using the full sample covariance, which given the large data sizes\nused is well speci\ufb01ed. Both these models perform better than other random models with equivalent\nnumbers of connections. In reality learnt models are going to be used in new situations and situa-\ntions with less data. One test of the appropriateness of a model is to assess its predictive capability\nwhen trained on little data. By estimating the model parameters on 100 data points, instead of 2000,\nwe \ufb01nd that the learnt model performs very slightly better than the hand speci\ufb01ed model (log odds\nratio of 63 on a 574 point test set), and both perform better than the full covariance (log odds of 292).\nThis indicates that both MH and ML are providing salient reduced representations which capture\nuseful characteristics of the data.\nWe also ran tests on D2. Maximum posterior samples and a random sample are illustrated in Figure\n3. Note that although these samples appear to still be quite highly connected, they in fact have about\n130 connections. Even so this is signi\ufb01cantly greater than the idealised connectivity structures\ntypically used in most studies.\nOne further approach is to assume a fully connected structure, but where the connectivity is in two\ncategories. We have priors on connectivity with the same values of Tij as before for the strong\nconnections and much larger values for the weaker connections. When this is added to the form\nof the model (where we make the incorrect but practical assumption that the BIC assumption still\nholds for the stronger connections) we obtain even simpler structures. Following this procedure\nwe \ufb01nd that models of the form of 3c are typical samples from the posterior where only the larger\n\n\fconnections are shown. Again connections such as those between the Cuneus/Precuneus and the\nSuperior Frontal Gyrus, the Thalamic connections, and some of the cross-hemispheric connections\nare amongst those that would be expected. This approach is related to recent work on the use of\nsparse priors for effective connectivity [18].\n9 Future Directions\nThis work demonstrates that if we learn structural equation models from data, we \ufb01nd there is little\nevidence for the simple forms of path analysis model which is in common use in the fMRI literature.\nWe suggest that learning connectivity can be a reasonable complement to current procedures where\nprior speci\ufb01cation is hard. Learning on its own does discover useful parameterised representations,\nbut these parameterisations are not the same as reasonable prior speci\ufb01cations. This is unsurprising\ndue to the statistical equivalence of many SEM structures. It should be expected that combining\nlearnt structures with prior anatomical models will help in the speci\ufb01cation of more accurate con-\nnectivity assumptions, as it will reduce the number of equivalence and focus on more reasonable\nstructural forms. Furthermore future comparisons can be made using a sample of reasonable models\ninstead of a single a priori chosen model. We would also expect that the major gains in learning\nmodels with come from the focus on dynamical networks which do not suffer from speci\ufb01city prob-\nlems. Even if the level of temporal information is small, any temporal information provides handles\nfor inferring causality that are unavailable with static equilibrium models.\nReferences\n[1] K. A. Bollen. Structural Equations with Latent Variables. John Wiley and Sons, 1989.\n[2] C. Buchel, J.T. Coull, and K.J. Friston. The preedictive value of changes in effective connectivity for\n\nhuman learning. Science, 283:1528\u20131541, 1999.\n\n[3] E. Bullmore, B. Howitz, G. Honey, M. Brammer, S. Williams, and T. Sharma. How good is good enough\n\nin path analysis of fMRI data? Neuroimage, 11:289\u2013301, 2000.\n\n[4] D. Dash. Restructing dynamic causal systems in equilibrium. In Proc. Uncertainty in AI 2005, 2005.\n[5] K.J. Friston and C. Buchel. Attentional modulation of effective connectivity from V2 to V5/MT in hu-\n\nmans. Proceedings of the National Academy of Sciecnes, 97:7591\u20137596, 2000.\n\n[6] K.J. Friston, L. Harrison, and W.D. Penny. Dynamic causal modelling. NeuroImage, 19:1273\u20131302,\n\n[7] T. Haavelmo. The statistical implications of a system of simultaneous equations. Econometrica, 11:1\u201312,\n\n2003.\n\n1943.\n\n[8] D. McGonigle, A. Howseman, B. Athwal, K.J. Friston, R. Frackowiak, and A. Holmes. Variability in\n\nfmri: An examination of intersession differences. Neuroimage, 11:708\u2013734, 2000.\n\n[9] A. R. McIntosh and F. Gozales-Lima. Structural equation modelling and its application to network anal-\n\nysis in functional brain imaging. Human Brain Mapping, 2:2\u201322, 1994.\n\n[10] C. Glymour P. Spirtes and R. Scheines. Causation, Prediction and Search. MIT Press, 2 edition, 2001.\n[11] J. Pearl. Causality. Cambridge University Press, 2000.\n[12] W.D. Penny, K.E. Stephan, A. Mechelli, and K.J. Friston. Comparing dynamic causal models. Neuroim-\n\n[13] T. Richardson. A discovery algorithm for directed cyclic graphs. In Proceedings of the 12th Conference\n\nage, 22:1157\u20131172, 2004.\n\non Uncertainty in Arti\ufb01cial Intelligence, 1996.\n\nParkinsons disease. Brain, 125:276\u2013289, 2002.\n\n[14] J. Rowe, K.E. Stephan, K. Friston, R. Frackowiak, A. Lees, and R. Passingham. Attention to action in\n\n[15] R. Schlosser, T. Gesierich, B. Kauffman, G. Vucurevic, S. Hunsche, J. Gawehn, and P. Stoeter. Altered\neffective connectivity during working memory performance in schizophrenia: a study with fMRI and\nstructural equation modeling. Neuroimage, 19:751\u2013763, 2003.\n\n[16] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461\u2013464, 1978.\n[17] S.M.Smith, C.F. Beckmann, N. Ramnani, M.W. Woolrich, P.R. Bannister, M. Jenkinson, P.M. Matthews,\nand D. McGonigle. Variability in fMRI: A re-examination of intersession differences. Human Brain\nMapping, 24:248\u2013257, 2005.\n\n[18] P.A. Valdes Sosa, J.M. Sanchez-Bornot, A. Lage-Castellanos, M. Vega-Hernandez, J. Bosch Bayard,\nL. Melie-Garcia, and E. Canales-Rodriguez. Estimating brain functional connectiivty with sparse multi-\nvariate autoregression. Philosophical Transactions of the Royal Society onf London B Biological Sciences,\n360:969\u2013981, 2005.\n\n[19] H.C. Whalley, E. Simonotto, I. Marshall, D.G.C Owens, N.H. Goddard, E.C. Johnstone, and S.M. Lawrie.\nFunctional disconnectivity in subjects at high genetic risk of schizophrenia. Brain, 128:2097\u20132108, 2005.\n\n[20] S. Wright. Correlation and causation. Journal of Agricultural Research, 20:557\u2013585, 1921.\n[21] X. Zheng and J. C. Rajapakse. Learning functional structure from fMR images. Neuroimage, 31:1601\u2013\n\n1613, 2006.\n\n\f", "award": [], "sourceid": 3022, "authors": [{"given_name": "Enrico", "family_name": "Simonotto", "institution": null}, {"given_name": "Heather", "family_name": "Whalley", "institution": null}, {"given_name": "Stephen", "family_name": "Lawrie", "institution": null}, {"given_name": "Lawrence", "family_name": "Murray", "institution": null}, {"given_name": "David", "family_name": "Mcgonigle", "institution": null}, {"given_name": "Amos", "family_name": "Storkey", "institution": null}]}