{"title": "Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits", "book": "Advances in Neural Information Processing Systems", "page_first": 1304, "page_last": 1312, "abstract": "We develop an inference and optimal design procedure for recovering synaptic weights in neural microcircuits. We base our procedure on data from an experiment in which populations of putative presynaptic neurons can be stimulated while a subthreshold recording is made from a single postsynaptic neuron. We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for large amounts of information about the biological system to be incorporated if available. We then present a simpler model to facilitate online experimental design which entails the use of efficient Bayesian inference. The optimized approach results in equal quality posterior estimates of the synaptic weights in roughly half the number of experimental trials under experimentally realistic conditions, tested on synthetic data generated from the full model.", "full_text": "Bayesian Inference and Online Experimental Design\n\nfor Mapping Neural Microcircuits\n\nBen Shababo \u2217\n\nDepartment of Biological Sciences\n\nColumbia University, New York, NY 10027\n\nbms2156@columbia.edu\n\nAri Pakman\n\nDepartment of Statistics,\n\nCenter for Theoretical Neuroscience,\n\n& Grossman Center for the Statistics of Mind\nColumbia University, New York, NY 10027\n\nap3053@columbia.edu\n\nBrooks Paige \u2217\n\nDepartment of Engineering Science\n\nUniversity of Oxford, Oxford OX1 3PJ, UK\n\nbrooks@robots.ox.ac.uk\n\nLiam Paninski\n\nDepartment of Statistics,\n\nCenter for Theoretical Neuroscience,\n\n& Grossman Center for the Statistics of Mind\nColumbia University, New York, NY 10027\n\nliam@stat.columbia.edu\n\nAbstract\n\nWith the advent of modern stimulation techniques in neuroscience, the oppor-\ntunity arises to map neuron to neuron connectivity.\nIn this work, we develop\na method for ef\ufb01ciently inferring posterior distributions over synaptic strengths\nin neural microcircuits. The input to our algorithm is data from experiments in\nwhich action potentials from putative presynaptic neurons can be evoked while\na subthreshold recording is made from a single postsynaptic neuron. We present\na realistic statistical model which accounts for the main sources of variability in\nthis experiment and allows for signi\ufb01cant prior information about the connectivity\nand neuronal cell types to be incorporated if available. Due to the technical chal-\nlenges and sparsity of these systems, it is important to focus experimental time\nstimulating the neurons whose synaptic strength is most ambiguous, therefore we\nalso develop an online optimal design algorithm for choosing which neurons to\nstimulate at each trial.\n\n1\n\nIntroduction\n\nA major goal of neuroscience is the mapping of neural microcircuits at the scale of hundreds to\nthousands of neurons [1]. By mapping, we speci\ufb01cally mean determining which neurons synapse\nonto each other and with what weight. One approach to achieving this goal involves the simultaneous\nstimulation and observation of populations of neurons. In this paper, we speci\ufb01cally address the\nmapping experiment in which a set of putative presynaptic neurons are optically stimulated while\nan electrophysiological trace is recorded from a designated postsynaptic neuron. It should be noted\nthat the methods we present are general enough that most stimulation and subthreshold monitoring\ntechnology would be well \ufb01t by our model with only minor changes. These types of experiments\nhave been implemented with some success [2, 3, 6], yet there are several issues which prevent\nef\ufb01cient, large scale mapping of neural microcircuitry. For example, while it has been shown that\nmultiple neurons can be stimulated simultaneously [4, 5], successful mapping experiments have thus\nfar only stimulated a single neuron per trial which increases experimental time [2, 3, 6]. Stimulating\nmultiple neurons simultaneously and with high accuracy requires well-tuned hardware, and even\nthen some level of stimulus uncertainty may remain.\nIn addition, a large portion of connection\n\n\u2217These authors contributed equally to this work.\n\n1\n\n\fweights are small which has meant that determining these weights is dif\ufb01cult and that many trials\nmust be performed. Due to the sparsity of neural connectivity, potentially useful trials are spent\non unconnected pairs instead of re\ufb01ning weight estimates for connected pairs when the stimuli\nare chosen non-adaptively. In this paper, we address these issues by developing a procedure for\nsparse Bayesian inference and information-based experimental design which can reconstruct neural\nmicrocircuits accurately and quickly despite the issues listed above.\n\n2 A realistic model of neural microcircuits\n\nIn this section we propose a novel and thorough statistical model which is speci\ufb01c enough to capture\nmost of the relevant variability in these types of experiments while being \ufb02exible enough to be used\nwith many different hardware setups and biological preparations.\n\n2.1 Stimulation\n\nIn our experimental setup, at each trial, n = 1, . . . , N, the experimenter stimulates R of K possible\npresynaptic neurons. We represent the chosen set of neurons for each trial with the binary vector\nzn \u2208 {0, 1}K, which has a one in each of the the R entries corresponding to the stimulated neurons\non that trial. One of the dif\ufb01culties of optical stimulation lies in the experimenter\u2019s inability to\nstimulate a speci\ufb01c neuron without possibly failing to stimulate the target neuron or engaging other\nnearby neurons. In general, this is a result of the fact that optical excitation does not stimulate a\nsingle point in space but rather has a point spread function that is dependent on the hardware and the\nbiological tissue. To complicate matters further, each neuron has a different rheobase (a measure of\nhow much current is needed to generate an action potential) and expression level of the optogenetic\nprotein. While some work has shown that it may be possible to stimulate exact sets of neurons,\nthis setup requires very speci\ufb01c hardware and \ufb01ne tuning [4, 5].\nIn addition, even if a neuron\n\ufb01res, there is some probability that synaptic transmission will not occur. Because these events are\ndif\ufb01cult or impossible to observe, we model this uncertainty by introducing a second binary vector\nxn \u2208 {0, 1}K denoting the neurons that actually release neurotransmitter in trial n. The conditional\ndistribution of xn given zn can be chosen by the experimenter to match their hardware settings and\nunderstanding of synaptic transmission rates in their preparation.\n\n2.2 Sparse connectivity\n\nNumerous studies have collected data to estimate both connection probabilities and synaptic weight\ndistributions as a function of distance and cell identity [2, 3, 6, 7, 8, 9, 10, 11, 12]. Generally, the\ndata show that connectivity is sparse and that most synaptic weights are small with a heavy tail of\nstrong connections. To capture the sparsity of neural connectivity, we place a \u201cspike-and-slab\u201d prior\non the synaptic weights wk [13, 14, 15], for each presynaptic neuron k = 1, . . . , K; these priors are\ndesigned to place non-zero probability on the event that a given weight wk is exactly zero. Note that\nwe do not need to restrict the \u201cslab\u201d distributions (the conditional distributions of wk given that wk\nis nonzero) to the traditional Gaussian choice, and in fact each weight can have its own parameters.\nFor example, log-normal [12] or exponential [8, 10] distributions may be used in conjunction with\ninformation about cell type and location to assign highly informative priors 1.\n\n2.3 Postsynaptic response\n\nIn our model a subthreshold response is measured from a designated postsynaptic neuron. Here we\nassume the measurement is a one-dimensional trace yn \u2208 RT , where T is the number of samples in\nthe trace. The postsynaptic response for each synaptic event in a given trial can be modeled using an\nappropriate template function fk(\u00b7) for each presynaptic neuron k. For this paper we use an alpha\nfunction to model the shape of each neuron\u2019s contribution to the postsynaptic current, parameterized\nby time constants \u03c4k which de\ufb01ne the rise and decay time. As with the synaptic weight priors, the\ntemplate functions could be designed based on the cells\u2019 identities. The onset of each postsynaptic\n\n1A cell\u2019s identity can be general such as excitatory or inhibitory, or more speci\ufb01c such as VIP- or PV-\ninterneurons. These identities can be identi\ufb01ed by driving the optogenetic channel with a particular promotor\nunique to that cell type or by coexpressing markers for various cell types along with the optogenetic channel.\n\n2\n\n\fFigure 1: A schematic of the model experiment. The left \ufb01gure shows the relative location of\n100 presynaptic neurons; inhibitory neurons are shown in yellow, and excitatory neurons in purple.\nNeurons marked with a black outline have a nonzero connectivity to the postsynaptic neuron (shown\nas a blue star, in the center). The blue circles show the diffusion of the stimulus through the tissue.\nThe true connectivity weights are shown on the upper right, with blue vertical lines marking the \ufb01ve\nneurons which were actually \ufb01red as a result of this stimulus. The resulting time series postsynaptic\ncurrent trace is shown in the bottom right. The connected neurons which \ufb01red are circled in red, the\ntriangle and star marking their weights and corresponding postsynaptic events in the plots at right.\n\nresponse may be jittered such that each event starts at some time dnk after t = 0, where the delays\ncould be conditionally distributed on the parameters of the stimulation and cells. Finally, at each time\nstep the signal is corrupted by zero mean Gaussian noise with variance \u03bd2. This noise distribution is\nchosen for simplicity; however, the model could easily handle time-correlated noise.\n\n2.4 Full de\ufb01nition of model\n\nThe full model can be summarized by the likelihood\n\np(Y|w, X, D) =\n\nN\n\nynt\n\nn=1\nwith the general spike-and-slab prior\n\nt=1\n\nN(cid:89)\n\nT(cid:89)\n\n(cid:18)\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)\n\nk\n\nwkxnkfk(t \u2212 dnk, \u03c4k), \u03bd2\n\n(cid:19)\n\n(1)\n\np(\u03b3k) = Bernoulli(ak),\n\np(wk|\u03b3k) = \u03b3kp(wk|\u03b3k = 1) + (1 \u2212 \u03b3k)\u03b40(wk)\n\n(2a, 2b)\nwhere Y \u2208 RN\u00d7T , X \u2208 {0, 1}N\u00d7K, and D \u2208 RN\u00d7K are composed of the responses, latent\nneural activity, and delays, respectively; \u03b3k is a binary variable indicating whether or not neuron k\nis connected.\nWe restate that the key to this model is that it captures the main sources of uncertainty in the exper-\niment while providing room for particulars regarding hardware and the anatomy and physiology of\nthe system to be incorporated. To infer the marginal distribution of the synaptic weights, one can\nuse standard Bayesian methods such as Gibbs sampling or variational inference, both of which are\ndiscussed below. An example set of neurons and connectivity weights, along with the set of stimuli\nand postsynaptic current trace for a single trial, is shown in Figure 1.\n\n3\n\nInference\n\nThroughout the remainder of the paper, all simulated data is generated from the model presented\nabove. As mentioned, any free hyperparameters or distribution choices can be chosen intelligently\nfrom empirical evidence. Biological parameters may be speci\ufb01c and chosen on a cell by cell basis\nor left general for the whole system. We show in our results that inference and optimal design still\nperform well when general priors are used. Details regarding data simulation as well as speci\ufb01c\nchoices we make in our experiments are presented in Appendix A.\n\n3\n\nWeightCurrent [pA]Postsynaptic current tracePresynaptic weightsLocation of presynaptic neurons and stimuli050100150200\u221230\u221220\u221210010020406080100\u2212101Time [samples]Neuron k\f3.1 Charge as synaptic strength\n\nsingle variable ck =(cid:80)\nyn =(cid:80)T\n\nTo reduce the space over which we perform inference, we collapse the variables wk and \u03c4k into a\nt wkfk(t \u2212 dnk, \u03c4k) which quanti\ufb01es the charge transfer during the synaptic\nevent and can be used to de\ufb01ne the strength of a connection. Integrating over time also eliminates\nany dependence on the delays dnk. In this context, we reparameterize the likelihood as a function of\n\nt=0 ynt and \u03c3 = \u03bdT 1/2 and the resulting likelihood is\n\n(cid:89)\n\np(y|X, c) =\n\nN (yn|x(cid:62)\n\nn c, \u03c32).\n\n(3)\n\nn\n\nWe found that na\u00a8\u0131ve MCMC sampling over the posterior of w, \u03c4 , \u03b3, X, and D insuf\ufb01ciently ex-\nplored the support and inference was unsuccessful. In this effort to make the inference procedure\ncomputationally tractable, we discard potentially useful temporal information in the responses. An\nimportant direction for future work is to experiment with samplers that can more ef\ufb01ciently explore\nthe full posterior (e.g., using Wang-Landau or simulated tempering methods).\n\n3.2 Gibbs sampling\nThe reparameterized posterior p(c, \u03b3, X|Z, y) can be inferred using a simple Gibbs sampler. We\napproximate the prior over c as a spike-and-slab with Gaussian slabs where the slabs could be\ntruncated if the cells\u2019 excitatory or inhibitory identity is known. Each xnk can be sampled by\ncomputing the odds ratio, and following [15] we draw each ck, \u03b3k from the joint distribution\np(ck, \u03b3k|Z, y, X,{cj, \u03b3j|j (cid:54)= k}) by sampling \ufb01rst \u03b3k from p(\u03b3k|Z, y, X,{cj|j (cid:54)= k}), then\np(ck|Z, y, X,{cj,|j (cid:54)= k}, \u03b3k).\n\n3.3 Variational Bayes\n\nAs stated earlier we do not only want to recover the parameters of the system, but want to perform\noptimal experimental design, which is a closed-loop process. One essential aspect of the design\nprocedure is that decisions must be returned to the experimenter quickly, on the order of a few\nseconds. This means that we must be able to perform inference of the posterior as well as choose\nthe next stimulus extremely quickly. For realistically sized systems with hundred to thousands of\nneurons, Gibbs sampling will be too slow, and we have to explore other options for speeding up\ninference.\nTo achieve this decrease in runtime, we approximate the posterior distribution of c and \u03b3 using a\nvariational approach [16]. The use of variational inference for spike-and-slab regression models has\nbeen explored in [17, 18], and we follow their methods with some minor changes. If we, for now,\nassume that X is known and let the spike-and-slab prior on c have untruncated Gaussian slabs, then\nthis variational approach \ufb01nds the best fully-factorized approximation to the true posterior\n\np(c, \u03b3|x1:n, y1:n) \u2248(cid:89)\n(cid:26)\u03b1kN (ck|\u00b5k, s2\n\nk\n\nk)\n(1 \u2212 \u03b1k)\u03b40(ck)\n\nq(ck, \u03b3k) =\n\nq(ck, \u03b3k)\n\nif \u03b3k = 1\notherwise.\n\nwhere the functional form of q(ck, \u03b3k) is itself restricted to a spike-and-slab distribution\n\nThe variational parameters \u03b1k, \u00b5k, sk for k = 1, . . . , K are found by minimizing the KL-divergence\nKL(q||p) between the left and right hand sides of Eq. 4 with respect to these values. As is the case\nwith fully-factorized variational distributions, updating the posterior involves an iterative algorithm\nwhich cycles through the parameters for each factor.\nThe factorized variational approximation is reasonable when the number of simultaneous stimuli,\nR, is small. Note that if we examine the posterior distributions of the weights\n\n(cid:89)\n\n(cid:2)akN (ck|\u03b7k, \u03c32\n\nk) + (1 \u2212 ak)\u03b40(ck)(cid:3)\n\np(c|y, X) \u221d(cid:89)\n\nN (yn|x(cid:62)\n\nn c, \u03c32)\n\nn\n\nk\n\nwe see that if each xn contains only one nonzero value then each factor in the likelihood is depen-\ndent on only one of the K weights and can be multiplied into the corresponding kth spike-and-slab.\n\n4\n\n(4)\n\n(5)\n\n(6)\n\n\fTherefore, since the product of a spike-and-slab and a Gaussian is still a spike-and-slab, if we stim-\nulate only one neuron at each trial then this posterior is also spike-and-slab, and the variational\napproximation becomes exact in this limit.\nSince we do not directly observe X, we must take the expectation of the variational parameters\n\u03b1k, \u00b5k, sk with respect to the distribution p(X|Z, y). We Monte Carlo approximate this integral\nin a manner similar to the approach used for integrating over the hyperparameters in [17]; how-\never, here we further approximate by sampling over potential stimuli xnk from p(xnk = 1|zn). In\npractice we will see this approximation suf\ufb01ces for experimental design, with the overall variational\napproach performing nearly as well for posterior weight reconstruction as Gibbs sampling from the\ntrue posterior.\n\n4 Optimal experimental design\n\nThe preparations needed to perform these type of experiments tend to be short-lived, and indeed, the\nvery act of collecting data \u2014 that is, stimulating and probing cells \u2014 can compromise the health of\nthe preparation further. Also, one may want to use the connectivity information to perform additional\nexperiments. Therefore it becomes critical to complete the mapping phase of the experiment as\nquickly as possible. We are thus strongly motivated to optimize the experimental design: to choose\nthe optimal subset of neurons zn to stimulate at each trial to minimize N, the overall number of\ntrials required for good inference.\nThe Bayesian approach to the optimization of experimental design has been explored in [19, 20,\n21]. In this paper, we maximize the mutual information I(\u03b8;D) between the model parameters \u03b8\nand the data D; however, other objective functions could be explored. Mutual information can be\ndecomposed into a difference of entropies, one of which does not depend on the data. Therefore\nthe optimization reduces to the intuitive objective of minimizing the posterior entropy with respect\nto the data. Because the previous data Dn\u22121 = {(z1, y1), . . . , (zn\u22121, yn\u22121)} are \ufb01xed and yn is\ndependent on the stimulus zn, our problem is reduced to choosing the optimal next stimulus, denoted\nn, in expectation over yn,\nz(cid:63)\n\nz(cid:63)\nn = arg max\n\nzn\n\nEyn|zn [I(\u03b8;D)] = arg min\n\nEyn|zn [H(\u03b8|D)] .\n\nzn\n\n(7)\n\n5 Experimental design procedure\n\nThe optimization described in Section 4 entails performing a combinatorial optimization over zn,\nwhere for each zn we consider an expectation over all possible yn. In order to be useful to experi-\nmenters in an online setting, we must be able to choose the next stimulus in only one or two seconds.\nFor any realistically sized system, an exact optimization is computationally infeasible; therefore in\nthe following section we derive a fast method for approximating the objective function.\n\n5.1 Computing the objective function\n\nThe variational posterior distribution of ck, \u03b3k can be used to characterize our general objective\nfunction described in Section 4. We de\ufb01ne the cost function J to be the right-hand side of Equation 7,\nJ \u2261 Eyn|zn [H(c, \u03b3|D)]\n(8)\nn can be found by minimizing J. We bene\ufb01t immediately from\n\nsuch that the optimal next stimulus z(cid:63)\nthe factorized approximation of the variational posterior, since we can rewrite the joint entropy as\n\nH[c, \u03b3|D] \u2248(cid:88)\n\nH[ck, \u03b3k|D]\n\n(9)\n\nallowing us to optimize over the sum of the marginal entropies instead of having to compute the\n(intractable) entropy over the full posterior. Using the conditional entropy identity H[ck, \u03b3k|D] =\nH[ck|\u03b3k,D] + H[\u03b3k|D], we see that the entropy of each spike-and-slab is the sum of a weighted\nGaussian entropy and a Bernoulli entropy and we can write out the approximate objective function\nas\n\n(cid:105)\nk,n)) \u2212 \u03b1k,n log \u03b1k,n \u2212 (1 \u2212 \u03b1k,n) log(1 \u2212 \u03b1k,n)\n\nJ \u2248(cid:88)\n\n(1 + log(2\u03c0s2\n\n.\n\n(10)\n\n(cid:104) \u03b1k,n\n\n2\n\nEyn|zn\n\nk\n\nk\n\n5\n\n\fHere, we have introduced additional notation, using \u03b1k,n, \u00b5k,n, and sk,n to refer to the parameters of\nthe variational posterior distribution given the data through trial n. Intuitively, we see that equation\n10 represents a balance between minimizing the sparsity pattern entropy H[\u03b3k] of each neuron and\nminimizing the weight entropy H[ck|\u03b3k = 1] proportional to the probability \u03b1k that the presynaptic\nneuron is connected. As p(\u03b3k = 1) \u2192 1, the entropy of the Gaussian slab distribution grows to\ndominate. In algorithm behavior, we see when the probability that a neuron is connected increases,\nwe spend time stimulating it to reduce the uncertainty in the corresponding nonzero slab distribution.\nTo perform this optimization we must compute the expected joint entropy with respect to p(yn|zn).\nFor any particular candidate zn, this can be Monte Carlo approximated by \ufb01rst sampling yn from the\nposterior distribution p(yn|zn, c,Dn\u22121), where c is drawn from the variational posterior inferred at\ntrial n \u2212 1. Each sampled yn may be used to estimate the variational parameters \u03b1k,n and sk,n with\nwhich we evaluate H[ck, \u03b3k]; we average over these evaluations of the entropy from each sample to\ncompute an estimate of J in Eq. 10.\nOnce we have chosen z(cid:63)\nn, we execute the actual trial and run the variational inference procedure on\nthe full data to obtain the updated variational posterior parameters \u03b1k,n, \u00b5k,n, and sk,n which are\nneeded for optimization. Once the experiment has concluded, Gibbs sampling can be run, though\nwe found only a limited gain when comparing Gibbs sampling to variational inference.\n\n5.2 Fast optimization\n\n1:n\u22121 \u02dcx(cid:62)\n\nnk = 1 for the R neurons corresponding to the \u02dczk\n\nThe major cost to the algorithm is in the stimulus selection phase. It is not feasible to evaluate the\nright-hand side of equation 10 for every zn because as K grows there is a combinatorial explosion\nof possible stimuli. To avoid an exhaustive search over possible zn, we adopt a greedy approach\nfor choosing which R of the K locations to stimulate. First we rank the K neurons based on an\napproximation of the objective function. To do this, we propose K hypothetical stimuli, \u02dczk\nn, each\nall zeros except the kth entry equal to 1 \u2014 that is, we examine only the K stimuli which represent\nstimulating a single location. We then set z\u2217\nn which\ngive the smallest values for the objective function and all other entries of z\u2217\nn to zero. We found that\nthe neurons selected by a brute force approach are most likely to be the neurons that the greedy\nselection process chooses (see Figure 1 in the Appendix).\nFor large systems of neurons, even the above is too slow to perform in an online setting. For each of\nthe K proposed stimuli \u02dczk\nn, to approximate the expected entropy we must compute the variational\nposterior for M samples of [X(cid:62)\nn ](cid:62) and L samples of yn (where \u02dcxn is the random variable\ncorresponding to p(\u02dcxn|\u02dczn)). Therefore we run the variational inference procedure on the full data\non the order of O(M KL) times at each trial. As the system size grows, running the variational\ninference procedure this many times becomes intractable because the number of iterations needed\nto converge the coordinate ascent algorithm is dependent on the correlations between the rows of\nX. This is implicitly dependent on both N, the number of trials, and R, the number of stimulus\nlocations (see Figure 2 in the Appendix). Note that the stronger dependence here is on R; when\nR = 1 the variational parameter updates become exact and independent across the neurons, and\ntherefore no coordinate ascent is necessary and the runtime becomes linear in K.\nWe therefore take one last measure to speed up the optimization process by implementing an online\nBayesian approach to updating the variational posterior (in the stimulus selection phase only). Since\nthe variational posterior of ck and \u03b3k takes the same form as the prior distribution, we can use the\nposterior from trial n \u2212 1 as the prior at trial n, allowing us to effectively summarize the previous\ndata. In this online setting, when we stimulate only one neuron, only the parameters of that speci\ufb01c\nneuron change. If during optimization we temporarily assume that \u02dcxk\nn, this results in explicit\nupdates for each variational parameter, with no coordinate ascent iterations required.\nIn total, the resulting optimization algorithm has a runtime O(KL) with no coordinate ascent al-\ngorithms needed. The combined accelerations described in this section result in a speed up of\nseveral orders of magnitude which allows the full inference and optimization procedure to be run\nin real time, running at approximately one second per trial in our computing environment for\nK = 500, R = 8. It is worth mentioning here that there are several points at which parallelization\ncould be implemented in the full algorithm. We chose to parallelize over M which distributes the\nsampling of X and the running of variational inference for each sample. (Formulae and step-by-step\nimplementation details are found in Appendix B.)\n\nn = \u02dczk\n\n6\n\n\fFigure 2: A comparison of normalized reconstruction error (NRE) over 800 trials in a system with\n500 neurons, between random stimulus selection (red, magenta) and our optimal experimental de-\nsign approach (blue, cyan). The heavy red and blue lines indicate the results when running the\nGibbs sampler at that point in the experiment, and the thinner magenta and cyan lines indicate the\nresults from variational inference. Results are shown over three noise levels \u03bd = 1, 2.5, 5, and for\nmultiple numbers of stimulus locations per trial, R = 2, 4, 8, 16. Each plot shows the median and\nquartiles over 50 experiments. The error decreases much faster in the optimal design case, over a\nwide parameter range.\n\n6 Experiments and results\n\nWe ran our inference and optimal experimental design algorithm on data sets generated from the\nmodel described in Section 2. We benchmarked our optimal design algorithm against a sequence\nof randomly chosen stimuli, measuring performance by normalized reconstruction error, de\ufb01ned as\n(cid:107)E[c] \u2212 c(cid:107)2/(cid:107)c(cid:107)2; we report the variation in our experiments by plotting the median and quartiles.\nBaseline results are shown in Figure 2, over a range of values for stimulations per trial R and\nbaseline postsynaptic noise levels \u03bd. The results here use an informative prior, where we assume the\nexcitatory or inhibitory identity is known, and we set individual prior connectivity probabilities for\neach neuron based on that neuron\u2019s identity and distance from the postsynaptic cell. We choose to\nlet X be unobserved and let the stimuli Z produce Gaussian ellipsoids which excite neurons that are\nlocated nearby. All model parameters are given in Appendix A.\nWe see that inference in general performs well. The optimal procedure was able to achieve equiva-\nlent reconstruction quality as a random stimulation paradigm in signi\ufb01cantly fewer trials when the\nnumber of stimuli per trial and response noise were in an experimentally realistic range (R = 4\nand \u03bd = 2.5 being reasonable values). Interestingly, the approximate variational inference methods\nperformed about as well as the full Gibbs sampler here (at much less computational cost), although\nGibbs sampling seems to break down when R grows too large and the noise level is small, which\nmay be a consequence of strong, local peaks in the posterior.\nAs the the number of stimuli per trial R increases, we start to see improved weight estimates and\nfaster convergence but a decrease in the relative bene\ufb01t of optimal design; the random approach\n\u201ccatches up\u201d to the optimal approach as R becomes large. This is consistent with the results of [22],\nwho argue that optimal design can provide only modest gains in performing sparse reconstructions,\n\n7\n\n0.20.40.60.811.2\u03bd=1.0NREofE[c]R=20.20.40.60.811.21.4\u03bd=2.5NREofE[c]02004006008000.40.60.811.21.4\u03bd=5.0NREofE[c]trial,nR=40200400600800trial,nR=80200400600800trial,nR=160200400600800trial,n\fFigure 3: The results of in-\nference and optimal design\n(A) with a single spike-and-\nslab prior for all connections\n(prior connection probability\nof .1, and each slab Gaus-\nsian with mean 0 and stan-\ndard deviation 31.4); and (B)\nwith X observed. Both ex-\nperiments show median and\nquartiles range with R = 4\nand \u03bd = 2.5.\n\nif the design vectors x are unconstrained. (Note that these results do not apply directly in our setting\nif R is small, since in this case x is constrained to be highly sparse \u2014 and this is exactly where we\nsee major gains from optimal online designs.)\nFinally, we see that we are still able to recover the synaptic strengths when we use a more general\nprior as in Figure 3A where we placed a single spike-and-slab prior across all the connections. Since\nwe assumed the cells\u2019 identities were unknown, we used a zero-centered Gaussian for the slab and\na prior connection probability of .1. While we allow for stimulus uncertainty, it will likely soon be\npossible to stimulate multiple neurons with high accuracy. In Figure 3B we see that - as expected -\nperformance improves.\nIt is helpful to place this observation in the context of [23], which proposed a compressed-sensing\nalgorithm to infer microcircuitry in experiments like those modeled here. The algorithms proposed\nby [23] are based on computing a maximum a posteriori (MAP) estimate of the weights w; note\nthat to pursue the optimal Bayesian experimental design methods proposed here, it is necessary\nto compute (or approximate) the full posterior distribution, not just the MAP estimate. (See, e.g.,\n[24] for a related discussion.)\nIn the simulated experiments of [23], stimulating roughly 30 of\n500 neurons per trial is found to be optimal; extrapolating from Fig. 2, we would expect a limited\ndifference between optimal and random designs in this range of R. That said, large values of R\nlead to some experimental dif\ufb01culties: \ufb01rst, stimulating large populations of neurons with high\nspatial resolution requires very \ufb01ned tuned hardware (note that the approach of [23] has not yet\nbeen applied to experimental data, to our knowledge); second, if R is suf\ufb01ciently large then the\npostsynaptic neuron can be easily driven out of a physiologically realistic regime, which in turn\nmeans that the basic linear-Gaussian modeling assumptions used here and in [23] would need to be\nmodi\ufb01ed. We plan to address these issues in more depth in our future work.\n\n7 Future Work\n\nThere are several improvements we would like to explore in developing this model and algorithm\nfurther. First, the implementation of an inference algorithm which performs well on the full model\nsuch that we can recover the synaptic weights, the time constants, and the delays would allow us to\navoid compressing the responses to scalar values and recover more information about the system.\nAlso, it may be necessary to improve the noise model as we currently assume that there are no\nspontaneous synaptic events which will confound the determination of each connection\u2019s strength.\nFinally, in a recent paper, [25], a simple adaptive compressive sensing algorithm was presented\nwhich challenges the results of [22]. It would be worth exploring whether their algorithm would be\napplicable to our problem.\n\nAcknowledgements\n\nThis material is based upon work supported by, or in part by, the U. S. Army Research Laboratory\nand the U. S. Army Research Of\ufb01ce under contract number W911NF-12-1-0594 and an NSF CA-\nREER grant. We would also like to thank Rafael Yuste and Jan Hirtz for helpful discussions, and\nour anonymous reviewers.\n\n8\n\n02004006008000.40.60.811.2trial,nXObserved02004006008000.40.50.60.70.80.911.1NREofE[c]trial,nGeneralPrior\fReferences\n\n[1] R. Reid, \u201cFrom Functional Architecture to Functional Connectomics,\u201d Neuron, vol. 75, pp. 209\u2013217, July\n\n2012.\n\n[2] M. Ashby and J. Isaac, \u201cMaturation of a recurrent excitatory neocortical circuit by experience-dependent\n\nunsilencing of newly formed dendritic spines,\u201d Neuron, vol. 70, no. 3, pp. 510 \u2013 521, 2011.\n\n[3] E. Fino and R. Yuste, \u201cDense Inhibitory Connectivity in Neocortex,\u201d Neuron, vol. 69, pp. 1188\u20131203,\n\nMar. 2011.\n\n[4] V. Nikolenko, K. E. Poskanzer, and R. Yuste, \u201cTwo-photon photostimulation and imaging of neural cir-\n\ncuits,\u201d Nat Meth, vol. 4, pp. 943\u2013950, Nov. 2007.\n\n[5] A. M. Packer, D. S. Peterka, J. J. Hirtz, R. Prakash, K. Deisseroth, and R. Yuste, \u201cTwo-photon optogenet-\n\nics of dendritic spines and neural circuits,\u201d Nat Meth, vol. 9, pp. 1202\u20131205, Dec. 2012.\n\n[6] A. M. Packer and R. Yuste, \u201cDense, unspeci\ufb01c connectivity of neocortical parvalbumin-positive interneu-\nrons: A canonical microcircuit for inhibition?,\u201d The Journal of Neuroscience, vol. 31, no. 37, pp. 13260\u2013\n13271, 2011.\n\n[7] B. Barbour, N. Brunel, V. Hakim, and J.-P. Nadal, \u201cWhat can we learn from synaptic weight distribu-\n\ntions?,\u201d Trends in neurosciences, vol. 30, pp. 622\u2013629, Dec. 2007.\n\n[8] C. Holmgren, T. Harkany, B. Svennenfors, and Y. Zilberter, \u201cPyramidal cell communication within local\n\nnetworks in layer 2/3 of rat neocortex,\u201d The Journal of Physiology, vol. 551, no. 1, pp. 139\u2013153, 2003.\n\n[9] J. Kozloski, F. Hamzei-Sichani, and R. Yuste, \u201cStereotyped position of local synaptic targets in neocor-\n\ntex,\u201d Science, vol. 293, no. 5531, pp. 868\u2013872, 2001.\n\n[10] R. B. Levy and A. D. Reyes, \u201cSpatial pro\ufb01le of excitatory and inhibitory synaptic connectivity in mouse\n\nprimary auditory cortex,\u201d The Journal of Neuroscience, vol. 32, no. 16, pp. 5609\u20135619, 2012.\n\n[11] R. Perin, T. K. Berger, and H. Markram, \u201cA synaptic organizing principle for cortical neuronal groups,\u201d\n\nProceedings of the National Academy of Sciences, vol. 108, no. 13, pp. 5419\u20135424, 2011.\n\n[12] S. Song, P. J. Sj\u00a8ostr\u00a8om, M. Reigl, S. Nelson, and D. B. Chklovskii, \u201cHighly nonrandom features of\n\nsynaptic connectivity in local cortical circuits.,\u201d PLoS biology, vol. 3, p. e68, Mar. 2005.\n\n[13] E. I. George and R. E. McCulloch, \u201cVariable selection via gibbs sampling,\u201d Journal of the American\n\nStatistical Association, vol. 88, no. 423, pp. 881\u2013889, 1993.\n\n[14] T. J. Mitchell and J. J. Beauchamp, \u201cBayesian variable selection in linear regression,\u201d Journal of the\n\nAmerican Statistical Association, vol. 83, no. 404, pp. 1023\u20131032, 1988.\n\n[15] S. Mohamed, K. A. Heller, and Z. Ghahramani, \u201cBayesian and l1 approaches to sparse unsupervised\n\nlearning,\u201d CoRR, vol. abs/1106.1157, 2011.\n\n[16] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2007.\n[17] P. Carbonetto and M. Stephens, \u201cScalable variational inference for bayesian variable selection in regres-\nsion, and its accuracy in genetic association studies,\u201d Bayesian Analysis, vol. 7, no. 1, pp. 73\u2013108, 2012.\n[18] M. Titsias and M. Lzaro-Gredilla, \u201cSpike and Slab Variational Inference for Multi-Task and Multiple\n\nKernel Learning,\u201d in Advances in Neural Information Processing Systems 24, pp. 2339\u20132347, 2011.\n\n[19] Y. Dodge, V. Fedorov, and H. Wynn, eds., Optimal Design and Analysis of Experiments. North Holland,\n\n1988.\n\n[20] D. J. C. MacKay, \u201cInformation-based objective functions for active data selection,\u201d Neural Comput.,\n\nvol. 4, pp. 590\u2013604, July 1992.\n\n[21] L. Paninski, \u201cAsymptotic Theory of Information-Theoretic Experimental Design,\u201d Neural Comput.,\n\nvol. 17, pp. 1480\u20131507, July 2005.\n\n[22] E. Arias-Castro, E. J. Cand`es, and M. A. Davenport, \u201cOn the fundamental limits of adaptive sensing,\u201d\n\nIEEE Transactions on Information Theory, vol. 59, no. 1, pp. 472\u2013481, 2013.\n\n[23] T. Hu, A. Leonardo, and D. Chklovskii, \u201cReconstruction of Sparse Circuits Using Multi-neuronal Excita-\n\ntion (RESCUME),\u201d in Advances in Neural Information Processing Systems 22, pp. 790\u2013798, 2009.\n\n[24] S. Ji and L. Carin, \u201cBayesian compressive sensing and projection optimization,\u201d in Proceedings of the\n24th international conference on Machine learning, ICML \u201907, (New York, NY, USA), pp. 377\u2013384,\nACM, 2007.\n\n[25] M. Malloy and R. D. Nowak, \u201cNear-optimal adaptive compressed sensing,\u201d CoRR, vol. abs/1306.6239,\n\n2013.\n\n9\n\n\f", "award": [], "sourceid": 670, "authors": [{"given_name": "Ben", "family_name": "Shababo", "institution": "Columbia University"}, {"given_name": "Brooks", "family_name": "Paige", "institution": "University of Oxford"}, {"given_name": "Ari", "family_name": "Pakman", "institution": "Columbia University"}, {"given_name": "Liam", "family_name": "Paninski", "institution": "Columbia University"}]}