{"title": "Spectral methods for neural characterization using generalized quadratic models", "book": "Advances in Neural Information Processing Systems", "page_first": 2454, "page_last": 2462, "abstract": "We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic form followed by a point nonlinearity and exponential-family noise. The quadratic form characterizes the neuron's stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model (Marmarelis and Marmarelis 1978, Koh and Powers 1985) and the elliptical Linear-Nonlinear-Poisson model (Park and Pillow 2011). Here we show that for canonical form\" GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximum-likelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains.\"", "full_text": "Spectral methods for neural characterization using\n\ngeneralized quadratic models\n\nIl Memming Park\u2217123, Evan Archer\u221713, Nicholas Priebe14, & Jonathan W. Pillow123\n\n1. Center for Perceptual Systems, 2. Dept. of Psychology,\n\n3. Division of Statistics & Scienti\ufb01c Computation, 4. Section of Neurobiology,\n\n{memming@austin., earcher@, nicholas@, pillow@mail.} utexas.edu\n\nThe University of Texas at Austin\n\nAbstract\n\nWe describe a set of fast, tractable methods for characterizing neural responses\nto high-dimensional sensory stimuli using a model we refer to as the generalized\nquadratic model (GQM). The GQM consists of a low-rank quadratic function fol-\nlowed by a point nonlinearity and exponential-family noise. The quadratic func-\ntion characterizes the neuron\u2019s stimulus selectivity in terms of a set linear receptive\n\ufb01elds followed by a quadratic combination rule, and the invertible nonlinearity\nmaps this output to the desired response range. Special cases of the GQM include\nthe 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson\nmodel [3]. Here we show that for \u201ccanonical form\u201d GQMs, spectral decomposi-\ntion of the \ufb01rst two response-weighted moments yields approximate maximum-\nlikelihood estimators via a quantity called the expected log-likelihood. The result-\ning theory generalizes moment-based estimators such as the spike-triggered co-\nvariance, and, in the Gaussian noise case, provides closed-form estimators under a\nlarge class of non-Gaussian stimulus distributions. We show that these estimators\nare fast and provide highly accurate estimates with far lower computational cost\nthan full maximum likelihood. Moreover, the GQM provides a natural framework\nfor combining multi-dimensional stimulus sensitivity and spike-history dependen-\ncies within a single model. We show applications to both analog and spiking data\nusing intracellular recordings of V1 membrane potential and extracellular record-\nings of retinal spike trains.\n\n1\n\nIntroduction\n\nAlthough sensory stimuli are high-dimensional, sensory neurons are typically sensitive to only a\nsmall number of stimulus features. Linear dimensionality-reduction methods seek to identify these\nfeatures in terms of a subspace spanned by a small number of spatiotemporal \ufb01lters. These \ufb01lters,\nwhich describe how the stimulus is integrated over space and time, can be considered the \ufb01rst stage in\na \u201ccascade\u201d model of neural responses. In the well-known linear-nonlinear-Poisson (LNP) cascade\nmodel, \ufb01lter outputs are combined via a nonlinear function to produce an instantaneous spike rate,\nwhich generates spikes via an inhomogeneous Poisson process [4, 5].\nThe most popular methods for dimensionality reduction with spike train data involve the \ufb01rst two\nmoments of the spike-triggered stimulus distribution: (1) the spike-triggered average (STA) [7\u20139];\nand (2) major and minor eigenvectors of spike-triggered covariance (STC) matrix [10, 11]1. STC\nanalysis can be described as a spectral method because the estimate is obtained by eigenvector\n\n\u2217 These authors contributed equally.\n1Related moment-based estimators have also appeared in the statistics literature under the names \u201cinverse\nregression\u201d and \u201csuf\ufb01cient dimensionality reduction\u201d, although the connection to STA and STC analysis does\nnot appear to have been noted previously [12, 13].\n\n1\n\n\fFigure 1: Schematic of generalized quadratic model (GQM) for analog or spike train data.\n\ndecomposition of an appropriately de\ufb01ned matrix. Compared to likelihood-based methods, spectral\nmethods are generally computationally ef\ufb01cient and devoid of (non-global) local optima.\nRecently, Park and Pillow [3] described a connection between STA/STC analysis and maximum\nlikelihood estimators based on a quantity called the expected log-likelihood (EL). The EL results\nfrom replacing the nonlinear term in the log-likelihood and with its expectation over the stimulus\ndistribution. When the stimulus is Gaussian, the EL depends only on moments (mean spike rate,\nSTA, STC, and stimulus mean and covariance) and leads to a closed-form spectral estimate for LNP\n\ufb01lters, which has STC analysis as a special case. More recently, Ramirez and Paninski derived EL-\nbased estimators for the linear Gaussian model and proposed fast EL-based inference methods for\ngeneralized linear models (GLMs) [14].\nHere, we show that the EL framework can be extended to a more general class that we refer to\nas the generalized quadratic model (GQM). The GQM represents a straightforward extension of\nthe generalized linear model GLM [15, 16] wherein the linear predictor is replaced by a quadratic\nfunction (Fig. 1). For Gaussian and Poisson GQMs, we derive computationally ef\ufb01cient EL-based\nestimators that apply to a variety of non-Gaussian stimulus distributions; this substantially extends\nprevious work on the conditions of validity for moment-based estimators [7,17\u201319]. In the Gaussian\ncase, the EL-based estimator has a closed form solution that relies only on the \ufb01rst two response-\nweighted moments and the \ufb01rst four stimulus moments.\nIn the Poisson case, GQMs provide a\nnatural synthesis of models that have multiple \ufb01lters (i.e., where the response depends on multiple\nprojections of the stimulus) and dependencies on spike history. We show that spectral estimates of a\nlow-dimensional feature space are nearly as accurate as maximum likelihood estimates (for GQMs\nwithout spike-history), and demonstrate the applicability of GQMs for both analog and spiking data.\n\n2 Generalized Quadratic Models\n\nWe begin by brie\ufb02y reviewing of the class of models known as GLMs, which includes the single-\n\ufb01lter LNP model, and the Wiener model from the systems identi\ufb01cation literature. A GLM has three\nbasic components: a linear stimulus \ufb01lter, an invertible nonlinearity (or \u201cinverse link\u201d function),\nand an exponential-family noise model. The GLM describes the conditional response y to a vector\nstimulus x as:\n\n(1)\nwhere w is the \ufb01lter, f is the nonlinearity, and P(\u03bb) denotes a noise distribution function with\nmean \u03bb. From the standpoint of dimensionality reduction, the GLM makes the strong modeling\nassumption that response y depends upon x only via its one-dimensional projection onto w.\nAt the other end of the modeling spectrum sits the very general \u201cmultiple \ufb01lter\u201d linear-nonlinear\n(LN) cascade model, which posits that the response depends on a p-dimensional projection of\nthe stimulus, represented by a bank of \ufb01lters {wi}p\ni=1, and combined via some arbitrary multi-\ndimensional function f : Rp \u2192 R:\n\ny|x \u223c P(f (w(cid:62)x)),\n\n(2)\nSpike-triggered covariance analysis and related methods provide low-cost estimates of the \ufb01lters\n{wi} under Poisson or Bernoulli noise models, but only under restrictive conditions on the stimulus\n\np x)).\n\ny|x \u223c P(f (w(cid:62)\n\n1 x, . . . , w(cid:62)\n\n2\n\nresponseanalogspikesorstimulus...nonlinearfunctionGeneralized Quadratic Modelnoisequadraticlinearfiltersrecurrent filters\fdistribution (e.g., elliptical symmetry) and some weak conditions on f [17, 19]. Semi-parametric\nestimators like \u201cmaximally informative dimensions\u201d (MID) eliminate these restrictions [20], but do\nnot practically scale beyond two or three \ufb01lters without additional modeling assumptions [21].\nThe generalized quadratic model (GQM) provides a tractable middle ground between the GLM and\ngeneral multi-\ufb01lter LN models. The GQM allows for multi-dimensional stimulus dependence, yet\nrestricts the nonlinearity to be a transformed quadratic function [22\u201325]. The GQM can be written:\n(3)\nwhere Q(x) = x(cid:62)Cx + b(cid:62)x + a denotes a quadratic function of x, governed by a (possibly low-\nrank) symmetric matrix C, a vector b, and a scalar a. Note that the GQM may be regarded as a\nGLM in the space of quadratically transformed stimuli [6], although this approach does not allow\nQ(x) to be parametrized directly in terms of a projection onto a small number of linear \ufb01lters.\nIn the following, we show that the elliptical-LNP model [3] is a GQM with Poisson noise, and make\na detailed study of canonical GQMs with Gaussian noise. We show that the maximum-EL estimates\nfor C, b, and a have similar forms for both Gaussian and Poisson GQMs, and that the eigenspectrum\nof C provides accurate estimates of a neuron\u2019s low-dimensional feature space. Finally, we show that\nthe GQM provides a natural framework for combining multi-dimensional stimulus sensitivity with\ndependencies on spike train history or other response covariates.\n\ny|x \u223c P(f (Q(x))),\n\n3 Estimation with expected log-likelihoods\n\nThe expected log-likelihood is a quantity that approximates log-likelihood but can be computed very\nef\ufb01ciently using moments. It exists for any GQM or GLM with \u201ccanonical\u201d nonlinearity (or link\nfunction). The canonical nonlinearity for an exponential-family noise distribution has the special\nproperty that it allows the log-likelihood to be written as the sum of two terms: a term that depends\nlinearly on the responses {yi}, and a second (nonlinear) term that depends only on the stimuli\n{xi} and parameters \u03b8. The expected log-likelihood (EL) results from replacing the nonlinear term\nwith its expectation over the stimulus distribution P (x), which in neurophysiology settings is often\nknown a priori to the experimenter. Maximizing the EL results in maximum expected log-likelihood\n(MEL) estimators that have very low computational cost while achieving nearly the accuracy of\nfull maximum likelihood (ML) estimators. Spectral decompositions derived from the EL provide\nestimators that generalize STA/STC analysis. In the following, we derive MEL estimators for three\nspecial cases\u2014two for the Gaussian noise model, and one for the Poisson noise model.\n\n3.1 Gaussian GQMs\n\nGaussian noise provides a natural model for analog neural response variables like membrane poten-\ntial or \ufb02uorescence. The canonical nonlinearity for Gaussian noise is the identity function, f (x) = x.\nThe the canonical-form Gaussian GQM can therefore be written: y|x \u223c N (Q(x), \u03c32). Given a\ndataset {xi, yi}N\nL = \u2212 1\n2\u03c32\n\ni=1, the log-likelihood per sample is:\n(Q(xi) \u2212 yi)2 = \u2212 1\n2\u03c32\n\n(cid:0)\u22122Q(xi)yi + Q(xi)2(cid:1) + const\n\n1\nN\n\n1\nN\n\n(cid:88)\n(cid:32)\n\u22122(cid:0)Tr(C\u039b) + \u00b5(cid:62)b + a\u00afy(cid:1) +\n\ni\n\n(cid:33)\n\n= \u2212 1\n2\u03c32\n\nQ(xi)2\n\n+ const,\n\n(cid:80)\n\nwhere \u03c32 is the noise variance, const is a parameter-independent constant, \u00afy = 1\ni yi is the mean\nN\nresponse, and \u00b5 and \u039b denote cross-correlation statistics that we will refer to (in a slight abuse of\nterminology) as the response triggered average and response-triggered covariance:\n\nN(cid:88)\n\ni=1\n\n\u00b5 =\n\n1\nN\n\nyixi (\u201cRTA\u201d)\n\n\u039b =\n\n1\nN\n\nyixixi\n\n(cid:62) (\u201cRTC\u201d).2\n\nThe expected log-likelihood results from replacing the troublesome nonlinear term 1\ni Q(xi)2\nN\nby its expectation over the stimulus distribution. This is justi\ufb01ed by the law of large numbers, which\n\n2When responses yi are spike counts, these correspond to the STA and STC.\n\n3\n\n(cid:88)\n(cid:88)\n\ni\n\ni\n\n1\nN\n\nN(cid:88)\n\ni=1\n\n(4)\n\n(5)\n\n(cid:80)\n\n\f(cid:80)\n\n(6)\n\nasserts that 1\nN\nthis leads to the per-sample expected log-likelihood [3, 14], which is de\ufb01ned:\n\ni Q(xi)2 converges to EP (x)[Q(x)2] asymptotically. Leaving off the const term,\n\n\u02dcL = \u2212 1\n\n(cid:0)\u22122(cid:0)Tr(C\u039b) + \u00b5(cid:62)b + a\u00afy(cid:1) + E[Q(x)2](cid:1) .\nE[Q(x)2] = 2 Tr(cid:8)(C\u03a3)2(cid:9) + Tr(bT \u03a3b) + (Tr(C\u03a3) + a)2.\n\n2\u03c32\n\nGaussian stimuli\nIf the stimuli are drawn from a Gaussian distribution, x \u223c N (0, \u03a3), then we have (from [26]):\n\n(7)\nThe EL is concave in the parameters a, b, C, so we can obtain the MEL estimates by \ufb01nding the\nstationary point:\n\u02dcL = \u2212 1\n\u02dcL = \u2212 1\n\u02dcL = \u2212 1\n2\u03c32\n\n2\u03c32 (\u22122\u00afy + 2 (Tr(C\u03a3) + a)) = 0 =\u21d2 amel = \u00afy \u2212 Tr(Cmel\u03a3))\n2\u03c32 (\u22122\u00b5 + 2\u03a3b) = 0 =\u21d2\n(cid:0)\u22122\u039b +(cid:0)4\u03a3C\u03a3 + 2\u00afy\u03a3(cid:1)(cid:1) = 0 =\u21d2 Cmel =\n\n(cid:0)\u03a3\u22121\u039b\u03a3\u22121 \u2212 \u00afy\u03a3\u22121(cid:1) (10)\n\nbmel = \u03a3\u22121\u00b5\n\n\u2202\n\u2202a\n\u2202\n\u2202b\n\u2202\n\u2202C\n\n(8)\n\n(9)\n\n1\n2\n\nNote that this coincides with the moment-based estimate for the 2nd-order Volterra model [2].\nAxis-symmetric stimuli\nMore generally, we can derive the MEL estimator for stimuli with arbitrary axis-symmetric dis-\ntributions with \ufb01nite 4th-order moments. Axis-symmetric distributions exhibit invariance under\nre\ufb02ections around each axis, that is, P (x1, . . . , xd) = P (\u03c11x1, . . . , \u03c1dxd) for any \u03c1i \u2208 {\u22121, 1}.\nThe class of axis-symmetric distributions subsumes both radially symmetric and independent prod-\nuct distributions. However, axis symmetry is a strictly weaker condition; signi\ufb01cantly, marginals\nneed not be identically distributed.\nTo simplify derivation of the MEL estimator for axis-symmetric stimuli, we take the derivative of\nQ(x) with respect to (a, b, C) before taking the expectation. Derivatives with respect to model\nparameters are given by \u2202E[Q(x)2]\n\n. For each \u03b8i, we solve the equation,\n\n= E(cid:104)\n\n(cid:105)\n\u2202(cid:0)Tr(C\u039b) + \u00b5(cid:62)b + a\u00afy(cid:1)\n\n2Q(x) \u2202Q(x)\n\u2202\u03b8i\n\n\u2202\u03b8i\n\n(cid:21)\n\n(cid:20)\n\n\u2202\u03b8i\n\n+ 2E\n\nQ(x)\n\n\u2202Q(x)\n\n\u2202\u03b8i\n\n= 0.\n\n\u2202 \u02dcL\n\u2202\u03b8i\n\n= \u22122\n\nFrom derivatives w.r.t. a, b, and C, respectively, we obtain conditions for the MEL estimates:\n\n\u00afy = E [Q(x)] = a + b(cid:62)E[x] + Tr(CE[xx(cid:62)])\n\u00b5 = E [Q(x)x] = aE[x] + b(cid:62)E[xx(cid:62)] +\n\n\u039b = E(cid:2)Q(x)xx(cid:62)(cid:3) = aE[xx(cid:62)] +\n\n(cid:88)\n\ni,j\n\nbiE[xixx(cid:62)] +\n\n(cid:88)\n\nCijE[xixjx]\n\n(cid:88)\n\ni\n\ni,j\n\nCijE[xixjxx(cid:62)]\n\n(cid:3).\nMij = E(cid:2)x2\n(cid:88)\n\ni x2\nj\n\nij\n\nwhere the subindices within the sums are for components. Due to axis symmetry, E[x], E[xixjxk]\nand E[xix3\nj ] are all zero for distinct indices. Thus, the MEL estimates for a and b are identical to the\nGaussian case given above. If we further assume that the stimulus is whitened so that E[xx(cid:62)] = I,\nsuf\ufb01cient stimulus statistics are the 4th order even moments, which we represent with the matrix\n\nIn general, when the marginals are not identical but the joint distribution is axis-symmetric,\n\nCijE[xixjxx(cid:62)] =\n\nCii diag(x2\n\ni x2\n\n1,\u00b7\u00b7\u00b7 , x2\n\ni x2\n\nd) +\n\nCijMijeie(cid:62)\n\nj\n\n(11)\n\nwhere 1 is a vector of 1\u2019s, ei is the standard basis, and \u25e6 denotes the Hadamard product. We can solve\nthese sets of linear equations for the diagonal terms and off-diagonal terms separately obtaining,\n\n= diag(1(cid:62)(I \u25e6 C)M ) + C \u25e6 M \u25e6 (11(cid:62) \u2212 I).\n\n(cid:88)\n\ni(cid:54)=j\n\n(cid:88)\n\ni\n\n(cid:40) \u039bij\n\n,\n\n4\n\n[Cmel]ij =\n\n2Mij\n\n\u2126(M \u2212 11(cid:62))\u22121,\n\ni (cid:54)= j\ni = j\n\n(12)\n\n\fFigure 2: Maximum expected log-likelihood (MEL) estimators for a Gaussian GQM under different\nassumptions about the stimulus distribution. (left) Axis-symmetric stimulus distribution in 2D. The\nhorizontal axis is a (symmetric) mixture of Gaussian, and the vertical axis is a uniform distribution.\nRed dots indicate samples from the distribution. (right) Response prediction based on various \u02c6C\nestimated using eq. 10, eq. 14, and eq. 12. Performance is evaluated on a cross-validation test set\nwith no noise for each C, and we see a huge loss in performance as a result of incorrect assumption\nabout the stimulus distribution.\n\nwhere \u2126 = diag(1(cid:62)(I \u25e6 \u039b) \u2212 \u00afy1(cid:62)).\nFor the special case when the marginal distributions are identical, we note that\n\nE[x(cid:62)Cx(xx(cid:62))] = \u00b522 Tr(C)I + (\u00b54 \u2212 \u00b522)C \u25e6 I + 2\u00b522C \u25e6 (11(cid:62) \u2212 I)\n\n(13)\n\n2] = M1,2 and \u00b54 = E[x4\n\n1x2\n\n1] = M1,1. This gives the simpli\ufb01ed formula (also\n\nwhere \u00b522 = E[x2\ngiven in [27]):\n\n(cid:40) \u039bij\n\n,\n2\u00b522\n\u039bii\u2212\u00afy\n\u00b54\u2212\u00b522\n\n[Cmel]ij =\n\ni (cid:54)= j\ni = j\n\n,\n\n(14)\n\nWhen the stimulus is not Gaussian or the marginals not identical, the estimates obtained from\n(eq. 10) and (eq. 14) are not consistent. In this case, the general axis-symmetric estimate (eq. 12)\ngives much better performance, as we illustrate with a simulated example in Fig. 2.\n\n3.2 Poisson GQM\n\nPoisson noise provides a natural model for discrete events like spike counts, and extends easily to\npoint process models for spike trains. The canonical nonlinearity for Poisson noise is exponential,\nf (x) = exp(x), so the canonical-form Poisson GQM is: y|x \u223c Poiss(exp(Q(x))).\nIgnoring\nirrelevant constants, the log-likelihood per sample is\n\nL = 1\n\nN\n\nyi log(exp(Q(xi))) \u2212 1\n\nexp(Q(xi))\n\n(cid:88)\n\ni\n\n= Tr(C\u039b) + \u00b5(cid:62)b + a\u00afy \u2212 1\n\nN\n\nexp(Q(xi)),\n\n(15)\n\nwhere \u00afy, \u00b5 and \u039b denote mean response, STA, and STC, as given above (eq. 5). We obtain the\nEL for a Poisson GQM by replacing the term 1\nN\nP (x). Under a zero-mean Gaussian stimulus distribution with covariance \u03a3, the closed-form MEL\nestimates are (from [3]):\n\n(cid:80) exp(Q(xi)) by its expectation with respect to\n\ni\n\n\u00b5,\n\n\u039b + 1\n\nbmel =\n\n(16)\n\u00afy 2\u00b5\u00b5(cid:62) is invertible. Note that the MEL estimator combines information\nwhere we assume that \u039b + 1\nfrom \u00b5 and \u039b, unlike standard STA and STC-based estimates, which maximize EL only when either\nb or C is zero (respectively). Park and Pillow 2011 used Poisson EL in conjunction with a log-prior\nto obtain approximate Bayesian estimates, an approach referred to as Bayesian STC [3].\n\nCmel = 1\n2\n\n\u039b + 1\n\n(cid:16)\n\n\u00afy 2\u00b5\u00b5(cid:62)(cid:17)\u22121\n\n(cid:18)\n\n(cid:16)\n\n\u03a3\u22121 \u2212 \u00afy\n\n\u00afy 2\u00b5\u00b5(cid:62)(cid:17)\u22121(cid:19)\n\n,\n\n(cid:88)\n\ni\n\nN\n\n(cid:88)\n\n5\n\ntimeneural responsetrueresponseiid axis-sym [r2 = 0.894]general axis-sym [r2 = 0.99]Gaussian [r2 = 0.424]2D stimulus distributionassumed stimulus distribution\fFigure 3: Rank-1 quadratic\n\ufb01lter\nreconstruction perfor-\nmance. Both rank-1 models\nwere optimized using conju-\ngate gradient descent. (Left)\nl1 distance from the ground\ntruth \ufb01lter. (Right) Computa-\ntion time for the optimization.\n\nMixture-of-Gaussians stimuli\nResults for Gaussian stimuli extend naturally to mixtures of Gaussians, which can be used to approx-\nimate arbitrary stimulus distributions. The EL for mixture-of-Gaussian stimuli can be computed\ni \u03b1jN (\u00b5j, \u03a3j) with\n\nsimply via the linearity of expectation. For stimuli drawn from a mixture(cid:80)\nmixing weights(cid:80)\n\nj \u03b1j = 1, the EL is\n\n\u02dcL = Tr(C\u039b) + \u00b5(cid:62)b + a\u00afy \u2212(cid:88)\n\n\u03b1jEN (\u00b5j ,\u03a3j )[eQ(x)],\n\ni\n\n(cid:16)\n\n(17)\n\n(18)\n\n(cid:17)\n\n.\n\n(b+2C\u00b5j )\n\nwhere the Gaussian expectation terms are given by\n\nEN (\u00b5j ,\u03a3j )[eQ(x)] =\n\n1\n\n|I\u22122C\u03a3j| 1\n\n2\n\ne\n\na+\u00b5(cid:62)\n\nj C\u00b5j +b(cid:62)\u00b5j + 1\n\n2 (b+2C\u00b5j )(cid:62)(\u03a3\n\n\u22121\nj \u22122C)\n\n\u22121\n\nAlthough the MEL estimator does not have a closed analytic form in this case, the EL can be ef-\n\ufb01ciently optimized numerically, as it still depends on the responses only via the spike-triggered\nmoments \u00afy, \u00b5 and \u039b, and on the stimuli only via the mean, covariance, and mixing weight of each\nGaussian.\n\n4 Spectral estimation for low-dimensional models\n\n4.1 Low-rank parameterization\n\nWe have so far focused upon MEL estimators for the parameters a, b, and C. These results have a\nnatural mapping to dimensionality reduction methods. Under the GQM, a low-dimensional stimulus\ndependence is equivalent to having a low-rank C. If C = BB(cid:62) for some d\u00d7 p matrix B, we have a\np-\ufb01lter model (or p + 1 \ufb01lter model if the linear term b is not spanned by the columns of B). We can\nobtain spectral estimates of a low-dimensional GQM by performing an eigenvector decomposition\nof Cmel and selecting the eigenvectors corresponding to the largest p eigenvalues. The eigenvectors\nof Cmel also make natural initializers for maximization of the full GQM likelihood.\nIn Fig. 3, we show the results of three different methods for recovering a simulated rank-1 GQM\nwith Poisson noise: (1) the largest eigenvector of Cmel, (2) numerically maximizing the expected\nlog-likelihood for a rank-1 GQM (i.e., with C parametrized as a rank-1 matrix), and (3) maximizing\nthe (full) likelihood for a rank-1 GQM. Although the difference in performance between expected\nand full GQM log-likelihood is negligible, there is a drastic difference in optimization time com-\nplexity between the full and expected log-likelihood. The expected log-likelihood only requires\ncomputation of the suf\ufb01cient statistics, while the full ML estimate requires a full pass through the\ndataset for each evaluation of the log-likelihood. Thus, the expected log-likelihood offers a fast yet\naccurate estimate for C. In the following section we show that, asymptotically, the eigenvectors of\nCmel span the \u201ccorrect\u201d (in an appropriate sense) low-dimensional subspace.\n\n4.2 Consistency of subspace estimates\nIf the conditional probability y|x = y|\u03b2(cid:62)x for a matrix \u03b2, the neural feature space is spanned by the\ncolumns of \u03b2. As a generalization of STC, we introduce moment-based dimensionality reduction\n\n6\n\n10410510010-1(cid:31)lter estimation error104105012optimization timesecondsL1 error# of samples# of samplesMELE (1st eigenvector) rank\u22121 MELErank\u22121 ML\fFigure 4: GQM \ufb01t and prediction for intracellular recording in cat V1 with a trinary noise stimulus.\n(A) On top, estimated linear (b) and quadratic (w1 and w2) \ufb01lters for the GQM, lagged by 20ms. On\nbottom, the empirical marginal nonlinearities along each dimension (black) and model prediction\n(red). (B) Cross-validated model prediction (red) and n = 94 recordings with repeats of identical\nstimulus (light grey) along with their mean (black). Reported performance metric (r2 = 0.55) is for\nprediction of the mean response.\n\n2 \u039b\u03a3\u2212 1\n\n2 \u00b5 and eigenvectors of \u03a3\u2212 1\n\ntechniques that recover (portions of) \u03b2 and show the relationship of these techniques to the MEL\nestimators of GQM.\nWe propose to use \u03a3\u2212 1\n2 (whose eigenvalues are signi\ufb01cantly\nsmaller or larger than 1) as the feature space basis. When the response is binary, this coincides\nwith the traditional STA/STC analysis, which is provably consistent only in the case of stimuli\ndrawn from a spherically symmetric (for STA) or independent Gaussian distribution (for STC) [5].\nBelow, we argue that this procedure can identify the subspace when y has mean f (\u03b2(cid:62)x) with \ufb01nite\nvariance, f is some function, and the stimulus distribution is zero-mean with white covariance, i.e.,\nE[x] = 0 and E[xxT ] = I.\n\nFirst, note that by the law of large numbers, \u039b \u2192 E[y xxT ] = E(cid:2)yE[xxT|y](cid:3) . Let \u03a8 = \u03b2\u03b2T be a\n\nprojection operator to the feature space, and \u03a8\u22a5 = I \u2212 \u03a8 be the perpendicular space. We follow the\ndiscussion in [12, 13] regarding the related \u201csliced regression\u201d literature. Recalling that E[X] = 0,\nwe can exploit the independence of \u03a8\u22a5x and y to \ufb01nd,\n\nE(cid:2)xx(cid:62)|y = \u03be(cid:3) = E(cid:2)(\u03a8 + \u03a8\u22a5)xx(cid:62)(\u03a8 + \u03a8\u22a5)|y = \u03be(cid:3)\n\n= \u03a8E(cid:2)xx(cid:62)|y = \u03be(cid:3) \u03a8 + \u03a8\u22a5E(cid:2)xx(cid:62)(cid:3) \u03a8\u22a5 = \u03a8E(cid:2)xx(cid:62)|y = \u03be(cid:3) \u03a8 + \u03a8\u22a5\n\nthus, E(cid:2)yxx(cid:62)(cid:3) = \u03a8E(cid:2)yxx(cid:62)(cid:3) \u03a8 + E[y]\u03a8\u22a5 and therefore the eigenvectors of E(cid:2)yxx(cid:62)(cid:3) whose\nGQM, the eigenvectors of E(cid:2)yxx(cid:62)(cid:3) are closely related to the expected log-likelihood estimators we\n\neigenvalues signi\ufb01cantly differ from E[y] span a subspace of the range of \u03a8. Effective estimation\nof the subspace depends critically on both the stimulus distribution and the form of f. Under the\n\nderived earlier. Indeed, those eigenvectors of eq. 10, eq. 12 and eq. 16 whose associated eigenvalues\ndiffer signi\ufb01cantly from zero span precisely the same space.\n\n5 Results\n\n5.1\n\nIntracellular membrane potential\n\nWe \ufb01t a Gaussian GQM to intracellular recordings of membrane potential from a neuron in cat V1,\nusing a 2D spatiotemporal \u201c\ufb02ickering bars\u201d stimulus aligned with the cell\u2019s preferred orientation\n(Fig. 4). The recorded time-series is a continuous signal, so the Gaussian GQM provides an appro-\npriate noise model. The recorded voltage was median-\ufb01ltered (to remove spikes) and down-sampled\nto a 10 ms sample rate. We \ufb01t the GQM to a 21.6 minute recording of responses to non-repeating\ntrinary noise stimulus . We validated the model using responses to 94 repeats of a 1 second frozen\nnoise stimulus. Panel (B) of Fig. 4 illustrates the GQM prediction on cross-validation data.\nAlthough the cell was classi\ufb01ed as \u201csimple\u201d, meaning that its response is predominately linear, the\nGQM \ufb01t reveals two quadratic \ufb01lters that also in\ufb02uence the membrane potential response. The GQM\ncaptures a substantial percentage of the variance in the mean response, systematically outperforming\nthe GLM in terms of r2 (GQM:55% vs. GLM:50%).\n\n7\n\n\u2212101\u22122024mVbx\u221210100.40.8mVw1x\u221210100.4mVw2xtime (10ms/frame)linear filter b space (0.70 deg/bar)quadratic filter w1space (0.70 deg/bar)quadratic filter w2space (0.70 deg/bar)meanmodel (r2 = 0.55)AB0.20.40.60.81\u221260\u221255\u221250\u221245time (s)Membrane Potential (mV)Prediction Performance (test data) \fFigure 5: (left) GLM and GQM \ufb01lters \ufb01t to spike responses of a retinal ganglion cell stimulated\nwith a 120 Hz binary full \ufb01eld noise stimulus [28]. The GLM has only linear stimulus and spike\nhistory \ufb01lters (top left) while the GQM contains all four \ufb01lters. Each plot shows the exponentiated\n\ufb01lter, so the ordinate has units of gain, and \ufb01lters interact multiplicatively. Quadratic \ufb01lter outputs are\nsquared and then subtracted from other inputs, giving them a suppressive effect on spiking (although\n(right) Cross-validated rate prediction averaged over 167\nquadratic excitation is also possible).\nrepeated trials.\n\n5.2 Retinal ganglion spike train\nThe Poisson GLM provides a popular model for neural spike trains due to its ability to incorporate\ndependencies on spike history (e.g., refractoriness, bursting, and adaptation). These dependencies\ncannot be captured by models with inhomogeneous Poisson output like the multi-\ufb01lter LNP model\n(which is also implicit in information-theoretic methods like MID [21]). The GLM achieves this\nby incorporating a one-dimensional linear projection of spike history as an input to the model. In\ngeneral, however, a spike train may exhibit dependencies on more than one linear projection of spike\nhistory.\nThe GQM extends the GLM by allowing multiple stimulus \ufb01lters and multiple spike-history \ufb01lters.\nIt can therefore capture multi-dimensional stimulus sensitivity (e.g., as found in complex cells) and\nproduce dynamic spike patterns unachievable by GLMs. We \ufb01t a Poisson GQM with a quadratic\nhistory \ufb01lter to data recorded from a retinal ganglion cell driven by a full-\ufb01eld white noise stimu-\nlus [28]. For ease of comparison, we \ufb01t a Poisson GLM, then added quadratic stimulus and history\n\ufb01lters, initialized using a spectral decomposition of the MEL estimate (eq. 16) and then optimized by\nnumerical ascent of the full log-likelihood. Both quadratic \ufb01lters (which enter with negative sign),\nhave a suppressive effect on spiking (Fig. 5). The quadratic stimulus \ufb01lter induces strong suppres-\nsion at a delay of 5 frames, while the quadratic spike history \ufb01lter induces strong suppression during\na 50 ms window after a spike.\n\n6 Conclusion\nThe GQM provides a \ufb02exible class of probabilistic models that generalizes the GLM, the 2nd-\norder Volterra model, the Wiener model, and the elliptical-LNP model [3]. Unlike the GLM, the\nGQM allows multiple stimulus and history \ufb01lters and yet remains tractable for likelihood-based\ninference. We have derived expected log-likelihood estimators in a general form that reveals a deep\nconnection between likelihood-based and moment-based inference methods. We have shown that\nGQM performs well on neural data, both for discrete (spiking) and analog (voltage) data. Although\nwe have discussed the GQM in the context of neural systems, but we believe it (and EL-based\ninference methods) will \ufb01nd applications in other areas such as signal processing and psychophysics.\n\nAcknowledgments\nWe thank the L. Paninski and A. Ramirez for helpful discussions and V. J. Uzzell and E. J. Chichilnisky for\nretinal data. This work was supported by Sloan Research Fellowship (JP), McKnight Scholar\u2019s Award (JP),\nNSF CAREER Award IIS-1150186 (JP), NIH EY019288 (NP), and Pew Charitable Trust (NP).\n\n8\n\n0102030400.20.61time (stimulus frames)012010020011.41.8time (ms)0.20.61spike historystimulus filtergaingainlinearquadratic1 secrate prediction (test data)GQM ( )GLM ( )\fReferences\n\n[1] P. Z. Marmarelis and V. Marmarelis. Analysis of physiological systems: the white-noise approach. Plenum\n\nPress, New York, 1978.\n\n[2] Taiho Koh and E. Powers. Second-order volterra \ufb01ltering and its application to nonlinear system identi\ufb01-\n\ncation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(6):1445\u20131455, 1985.\n\n[3] Il Memming Park and Jonathan W. Pillow. Bayesian spike-triggered covariance analysis. Advances in\n\nNeural Information Processing Systems 24, pp 1692\u20131700, 2011.\n\n[4] E. P. Simoncelli, J. W. Pillow, L. Paninski, and O. Schwartz. Characterization of neural responses with\nstochastic stimuli. The Cognitive Neurosciences, III, chapter 23, pp 327\u2013338. MIT Press, Cambridge,\nMA, October 2004.\n\n[5] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network:\n\nComputation in Neural Systems, 15:243\u2013262, 2004.\n\n[6] S. Gerwinn, J. H. Macke, M. Seeger, and M. Bethge. Bayesian inference for spiking neuron models with\n\na sparsity prior. Advances in Neural Information Processing Systems, pp 529\u2013536, 2008.\n\n[7] J. Bussgang. Crosscorrelation functions of amplitude-distorted gaussian signals. RLE Technical Reports,\n\n216, 1952.\n\n[8] E. deBoer and P. Kuyper. Triggered correlation. IEEE Transact. Biomed. Eng., 15, pp 169\u2013179, 1968.\n[9] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in\n\nNeural Systems, 12:199\u2013213, 2001.\n\n[10] R. R. de Ruyter van Steveninck and W. Bialek. Real-time performance of a movement-senstivive neuron\nin the blow\ufb02y visual system: coding and information transmission in short spike sequences. Proc. R. Soc.\nLond. B, 234:379\u2013414, 1988.\n\n[11] O. Schwartz, J. W. Pillow, N. C. Rust, and E. P. Simoncelli. Spike-triggered neural characterization. J.\n\nVision, 6(4):484\u2013507, 7 2006.\n\n[12] RD Cook and S. Weisberg. Comment on \u201dsliced inverse regression for dimension reduction\u201d by k.-c. li.\n\nJournal of the American Statistical Association, 86:328\u2013332, 1991.\n\n[13] Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical\n\nAssociation, 86(414):316\u2013327, 1991.\n\n[14] Alexandro D. Ramirez and Liam Paninski. Fast inference in generalized linear models via expected\n\nlog-likelihoods. Journal of Computational Neuroscience, pp 1\u201320, 2013.\n\n[15] W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. Brown. A point process framework\nfor relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. J.\nNeurophysiol, 93(2):1074\u20131089, 2005.\n\n[16] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, and E. P. Chichilnisky, E. J. Simoncelli. Spatio-\ntemporal correlations and visual signaling in a complete neuronal population. Nature, 454:995\u2013999,\n2008.\n\n[17] L. Paninski. Convergence properties of some spike-triggered analysis techniques. Network: Computation\n\nin Neural Systems, 14:437\u2013464, 2003.\n\n[18] J. W. Pillow and E. P. Simoncelli. Dimensionality reduction in neural models: An information-theoretic\n\ngeneralization of spike-triggered average and covariance analysis. J. Vision, 6(4):414\u2013428, 4 2006.\n\n[19] In\u00b4es Samengo and Tim Gollisch. Spike-triggered covariance: geometric proof, symmetry properties, and\n\nextension beyond gaussian stimuli. Journal of Computational Neuroscience, 34(1):137\u2013161, 2013.\n\n[20] Tatyana Sharpee, Nicole C. Rust, and William Bialek. Analyzing neural responses to natural signals:\n\nmaximally informative dimensions. Neural Comput, 16(2):223\u2013250, Feb 2004.\n\n[21] R. S. Williamson, M. Sahani, and J. W. Pillow. Equating information-theoretic and likelihood-based\n\nmethods for neural dimensionality reduction. arXiv:1308.3542 [q-bio.NC], 2013.\n\n[22] J. D. Fitzgerald, R. J. Rowekamp, L. C. Sincich, and T. O. Sharpee. Second order dimensionality reduction\n\nusing minimum and maximum mutual information models. PLoS Comput Biol, 7(10):e1002249, 2011.\n\n[23] K. Rajan and W. Bialek. Maximally informative \u201dstimulus energies\u201d in the analysis of neural responses\n\nto natural signals. arXiv:1201.0321v1 [q-bio.NC], 2012.\n\n[24] James M. McFarland, Yuwei Cui, and Daniel A. Butts. Inferring nonlinear neuronal computation based\n\non physiologically plausible inputs. PLoS Comput Biol, 9(7):e1003143+, July 2013.\n\n[25] L. Theis, A. M. Chagas, D. Arnstein, C. Schwarz, and M. Bethge. Beyond glms: A generative mixture\n\nmodeling approach to neural system identi\ufb01cation. PLoS Computational Biology, Nov 2013. in press.\n\n[26] A. M. Mathai and S. B. Provost. Quadratic forms in random variables: theory and applications. M.\n\nDekker, 1992.\n\n[27] Y. S. Cho and E. J. Powers. Estimation of quadratically nonlinear systems with an i.i.d. input. [Pro-\nceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing pp\n3117\u20133120 vol.5. IEEE, 1991.\n\n[28] V. J. Uzzell and E. J. Chichilnisky. Precision of spike trains in primate retinal ganglion cells. Journal of\n\nNeurophysiology, 92:780\u2013789, 2004.\n\n9\n\n\f", "award": [], "sourceid": 1157, "authors": [{"given_name": "Il Memming", "family_name": "Park", "institution": "UT Austin"}, {"given_name": "Evan", "family_name": "Archer", "institution": "UT Austin"}, {"given_name": "Nicholas", "family_name": "Priebe", "institution": "UT Austin"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "UT Austin"}]}