{"title": "Locally Adaptive Bayesian Multivariate Time Series", "book": "Advances in Neural Information Processing Systems", "page_first": 1664, "page_last": 1672, "abstract": "In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application.", "full_text": "Locally Adaptive Bayesian Multivariate Time Series\n\nDaniele Durante\n\nDepartment of Statistical Sciences\n\nUniversity of Padua\n\nBruno Scarpa\n\nDepartment of Statistical Sciences\n\nUniversity of Padua\n\nVia Cesare Battisti 241, 35121, Padua, Italy\n\nVia Cesare Battisti 241, 35121, Padua, Italy\n\ndurante@stat.unipd.it\n\nscarpa@stat.unipd.it\n\nDavid B. Dunson\n\nDepartment of Statistical Science\n\nDuke University\n\nDurham, NC 27708-0251, USA\n\ndunson@duke.edu\n\nAbstract\n\nIn modeling multivariate time series, it is important to allow time-varying smooth-\nness in the mean and covariance process. In particular, there may be certain time\nintervals exhibiting rapid changes and others in which changes are slow. If such\nlocally adaptive smoothness is not accounted for, one can obtain misleading in-\nferences and predictions, with over-smoothing across erratic time intervals and\nunder-smoothing across times exhibiting slow variation. This can lead to mis-\ncalibration of predictive intervals, which can be substantially too narrow or wide\ndepending on the time. We propose a continuous multivariate stochastic process\nfor time series having locally varying smoothness in both the mean and covari-\nance matrix. This process is constructed utilizing latent dictionary functions in\ntime, which are given nested Gaussian process priors and linearly related to the\nobserved data through a sparse mapping. Using a differential equation representa-\ntion, we bypass usual computational bottlenecks in obtaining MCMC and online\nalgorithms for approximate Bayesian inference. The performance is assessed in\nsimulations and illustrated in a \ufb01nancial application.\n\n1\n\nIntroduction\n\n1.1 Motivation and background\n\nIn analyzing multivariate time series data, collected in \ufb01nancial applications, monitoring of in\ufb02uenza\noutbreaks and other \ufb01elds, it is often of key importance to accurately characterize dynamic changes\nover time in not only the mean of the different elements (e.g., assets, in\ufb02uenza levels at different\nlocations) but also the covariance. It is typical in many domains to cycle irregularly between pe-\nriods of rapid and slow change; most statistical models are insuf\ufb01ciently \ufb02exible to capture such\nlocally varying smoothness in assuming a single bandwidth parameter. Inappropriately restricting\nthe smoothness to be constant can have a major impact on the quality of inferences and predictions,\nwith over-smoothing occurring during times of rapid change. This leads to an under-estimation of\nuncertainty during such volatile times and an inability to accurately predict risk of extremal events.\nThere is a rich literature on modeling a p \u00d7 1 time-varying mean vector \u00b5t, covering multivariate\ngeneralizations of autoregressive models (VAR, e.g. [1]), Kalman \ufb01ltering [2], nonparametric mean\nregression via Gaussian processes (GP) [3], polynomial spline [4], smoothing spline [5] and Ker-\nnel smoothing methods [6]. Such approaches perform well for slowly-changing trajectories with\n\n1\n\n\fconstant bandwidth parameters regulating implicitly or explicitly global smoothness; however, our\ninterest is allowing smoothness to vary locally in continuous time. Possible extensions for local\nadaptivity include free knot splines (MARS) [7], which perform well in simulations but the dif-\nferent strategies proposed to select the number and the locations of knots (stepwise knot selection\n[7], Bayesian knot selection [8] or via MCMC methods [9]) prove to be computationally intractable\nfor moderately large p. Other \ufb02exible approaches include wavelet shrinkage [10], local polynomial\n\ufb01tting via variable bandwidth [11] and linear combination of kernels with variable bandwidths [12].\nOnce \u00b5t has been estimated, the focus shifts to the p \u00d7 p time-varying covariance matrix \u03a3t. This is\nparticular of interest in applications where volatilities and co-volatilities evolve through non constant\npaths. Multivariate generalizations of GARCH models (DVEC [13], BEKK [14], DCC-GARCH\n[15]), exponential smoothing (EWMA, e.g. [1]) and approaches based on dimensionality reduction\nthrough a latent factor formulation (PC-GARCH [16] and O-GARCH [17]-[18]) represent common\napproaches in multivariate stochastic volatility modeling. Although widely used in practice, such\napproaches suffer from tractability issues arising from richly parameterized formulations (DVEC\nand BEKK), and lack of \ufb02exibility resulting from the adoption of single time-constant bandwidth\nparameters (EWMA), time-constant factor loadings and uncorrelated latent factors (PC-GARCH,\nO-GARCH) as well as the use of the same parameters regulating the evolution of the time varying\nconditional correlations (DCC-GARCH). Such models fall far short of our goal of allowing \u03a3t to\nbe fully \ufb02exible with the dependence between \u03a3t and \u03a3t+\u2206 varying with not just the time-lag\n\u2206 but also with time.\nIn addition, these models do not handle missing data easily and tend to\nrequire long series for accurate estimation [16]. Bayesian dynamic factor models for multivariate\nstochastic volatility [19] lead to apparently improved performance in portfolio allocation by allowing\nthe dependence in the covariance matrices \u03a3t and \u03a3t+\u2206 to vary as a function of both t and \u2206.\nHowever, the result is an extremely richly parameterized and computationally challenging model,\nwith selection of the number of factors via cross validation. Our aim is instead on developing\ncontinuous time stochastic processes for \u00b5(t) and \u03a3(t) with locally-varying smoothness.\nWilson and Ghahramani [20] join machine learning and econometrics efforts by proposing a model\nfor both mean and covariance regression in multivariate time series, improving previous work of\nBru [21] on Wishart Processes in terms of computational tractability and scalability, allowing more\ncomplex structure of dependence between \u03a3(t) and \u03a3(t + \u2206). Speci\ufb01cally, they propose a contin-\nuous time Generalised Wishart Process (GWP), which de\ufb01nes a collection of positive semi-de\ufb01nite\nrandom matrices \u03a3(t) with Wishart marginals. Nonparametric mean regression for \u00b5(t) is also con-\nsidered via GP priors; however, the trajectories of means and covariances inherit the smooth behav-\nior of the underlying Gaussian processes, limiting the \ufb02exibility of the approach in times exhibiting\nsharp changes.\nFox and Dunson [22] propose an alternative Bayesian covariance regression (BCR) model, which\nde\ufb01nes the covariance matrix of a vector of p variables at time ti, as a regularized quadratic function\nof time-varying loadings in a latent factor model, characterizing the latter as a sparse combination\nof a collection of unknown Gaussian process dictionary functions. More speci\ufb01cally given a set of\np \u00d7 1 vector of observations yi \u223c Np(\u00b5(ti), \u03a3(ti)) where i = 1, ..., T indexes time, they de\ufb01ne\n\ncov(yi|ti = t) = \u03a3(t) = \u0398\u03be(t)\u03be(t)T \u0398T + \u03a30,\n\n(1)\nwhere \u0398 is a p \u00d7 L matrix of coef\ufb01cients, \u03be(t) is a time-varying L \u00d7 K matrix with unknown\ncontinuous dictionary functions entries \u03belk : T \u2192 (cid:60), and \ufb01nally \u03a30 is a positive de\ufb01nite diagonal\nmatrix. Model (1) can be induced by marginalizing out the latent factors \u03b7i in\n\nt \u2208 T \u2282 (cid:60)+,\n\n(2)\nwith \u03b7i \u223c NK(0, IK) and \u0001i \u223c Np(0, \u03a30). A generalization includes a nonparametric mean regres-\nsion by assuming \u03b7i = \u03c8(ti) + \u03bdi, where \u03bdi \u223c NK(0, IK) and \u03c8(t) is a K \u00d7 1 matrix with unknown\ncontinuous entries \u03c8k : T \u2192 (cid:60) that can be modeled in a related manner to the dictionary elements\nin \u03be(t). The induced mean of yi conditionally on ti = t, and marginalizing out \u03bdi is then\n\nyi = \u0398\u03be(ti)\u03b7i + \u0001i,\n\nE(yi|ti = t) = \u00b5(t) = \u0398\u03be(t)\u03c8(t).\n\n(3)\n\n1.2 Our modeling contribution\n\nWe follow the lead of [22] in using a nonparametric latent factor model as in (2), but induce funda-\nmentally different behavior by carefully modifying the priors \u03a0\u03be and \u03a0\u03c8 for the dictionary elements\n\n2\n\n\f\u03beT = {\u03be(t), t \u2208 T }, and \u03c8T = {\u03c8(t), t \u2208 T } respectively. We additionally develop a different and\nmuch more computationally ef\ufb01cient approach to computation under this new model.\nFox and Dunson [22] consider the dictionary functions \u03belk and \u03c8k, for each l = 1, ..., L and\nk = 1, ..., K, as independent Gaussian Processes GP(0, c) with c the squared exponential corre-\nlation function having c(x, x(cid:48)) = exp(\u2212k||x \u2212 x(cid:48)||2\n2). This approach provides a continuous time\nand \ufb02exible model that accommodates missing data and scales to moderately large p, but the pro-\nposed priors for the dictionary functions assume a stationary dependence structure and hence induce\nprior distributions \u03a0\u03a3 and \u03a0\u00b5 on \u03a3T and \u00b5T through (1) and (3) that tend to under-smooth during\nperiods of stability and over-smooth during periods of sharp changes. Moreover the well known\ncomputational problems with usual GP regression are inherited, leading to dif\ufb01culties in scaling to\nlong series and issues in mixing of MCMC algorithms for posterior computation.\nIn our work, we address these problems to develop a novel mean-covariance stochastic process with\nlocally-varying smoothness by replacing GP priors for \u03beT = {\u03be(t), t \u2208 T }, and \u03c8T = {\u03c8(t), t \u2208\nT } with nested Gaussian process (nGP) priors [23], with the goal of maintaining simple computation\nand allowing both covariances and means to vary \ufb02exibly over continuous time. The nGP provides\na highly \ufb02exible prior on the dictionary functions whose smoothness, explicitly modeled by their\nderivatives via stochastic differential equations, is expected to be centered on a local instantaneous\nmean function, which represents an higher-level Gaussian Process, that induces adaptivity to locally-\nvarying smoothing.\nRestricting our attention on the elements of the prior \u03a0\u03be (the same holds for \u03a0\u03c8), the Markovian\nproperty implied by the stochastic differential equations allows a simple state space formulation of\nnGP in which the prior for \u03belk along with its \ufb01rst order derivative \u03be(cid:48)\nlk and the locally instantaneous\nmean Alk(t) = E[\u03be(cid:48)\n\nlk(t)|Alk(t)] follow the approximated state equation\n\n(cid:35)(cid:20) \u03c9i,\u03belk\n\n\u03c9i,Alk\n\n(cid:21)\n\n0\n0\n1\n\n,\n\n(4)\n\n(cid:34) \u03belk(ti+1)\n\n\u03be(cid:48)\nlk(ti+1)\nAlk(ti+1)\n\n(cid:35)\n\n(cid:34) 1\n\n=\n\n0\n0\n\n(cid:35)(cid:34) \u03belk(ti)\n\n(cid:35)\n\n\u03be(cid:48)\nlk(ti)\nAlk(ti)\n\n(cid:34) 0\n\n+\n\n1\n0\n\n\u03b4i\n1\n0\n\n0\n\u03b4i\n1\n\n\u03b4i) and \u03b4i = ti+1 \u2212 ti. This\nwhere [\u03c9i,\u03belk , \u03c9i,Alk ]T \u223c N2(0, Vi,lk), with Vi,lk = diag(\u03c32\nformulation allows continuous time and an irregular grid of observations over t by relating the\nlatent states at i + 1 to those at i through the distance \u03b4i between ti+1 and ti, with ti \u2208 T the\ntime observation related to the ith observation. Moreover, compared to [23] our approach extends\nthe analysis to the multivariate case and accommodates locally adaptive smoothing not only on\nthe mean but also on the time-varying variance and covariance functions. Finally, the state space\nformulation allows the implementation of an online updating algorithm and facilitates the de\ufb01nition\nof a simple Gibbs sampling which reduces the GP computational burden involving matrix inversions\nfrom O(T 3) to O(T ), with T denoting the length of the time series.\n\n\u03b4i, \u03c32\n\nAlk\n\n\u03belk\n\n1.3 Bayesian inference and online learning\nFor \ufb01xed truncation levels L\u2217 and K\u2217, the algorithm for posterior computation alternates between\na simple and ef\ufb01cient simulation smoother step [24] to update the state space formulation of the\nnGP, and standard Gibbs sampling steps for updating the parametric components of the model.\nSpeci\ufb01cally, considering the observations (yi, ti) for i = 1, ..., T :\nA. Given \u0398 and {\u03b7i}T\n\ni=1, a multivariate version of the MCMC algorithm proposed by Zhu and Dun-\ni=1, its\ni=1, the\n(for which inverse Gamma priors are assumed)\n\nson [23] draws posterior samples from each dictionary element\u2019s function {\u03belk(ti)}T\n\ufb01rst order derivative {\u03be(cid:48)\ni=1, the corresponding instantaneous mean {Alk(ti)}T\nvariances in the state equations \u03c32\n\u03belk\nand the variances of the error terms in the observation equation \u03c32\n\nlk(ti)}T\n\nj with j = 1, ..., p.\n\n, \u03c32\n\nAlk\n\nB. If the mean process needs not to be estimated, recalling the prior \u03b7i \u223c NK\u2217 (0, IK\u2217 ) and model\n(2), the standard conjugate posterior distribution from which to sample the vector of latent\nfactors for each i given \u0398, {\u03c3\u22122\nj }p\nOtherwise, if we want to incorporate the mean regression, we implement a block sampling\nof {\u03c8(ti)}T\ni=1 following a similar approach used for drawing samples from\nthe dictionary elements process.\n\ni=1 and {\u03be(ti)}T\n\ni=1 and {\u03bdi}T\n\ni=1 is Gaussian.\n\nj=1, {yi}T\n\n3\n\n\f\u03a32,2(ti)\n\n\u03a31,3(ti)\n\n\u00b55(ti)\n\n\u03a39,9(ti)\n\n\u03a310,3(ti)\n\n\u00b55(ti)\n\nFigure 1: For locally varying smoothness simulation (top) and smooth simulation (bottom), plots of\ntruth (black) and posterior mean respectively of LBCR (solid red line) and BCR (solid green line) for\nselected components of the variance (left), covariance (middle), mean (right). For both approaches\nthe dotted lines represent the 95% highest posterior density intervals.\n\nC. Finally, conditioned on {yi}T\n\ni=1, and recalling the shrinkage\nprior for the elements of \u0398 de\ufb01ned in [22], we update \u0398, each local shrinkage hyperparam-\neter \u03c6jl and the global shrinkage hyperparameters \u03c4l via standard conjugate analysis.\n\ni=1, {\u03b7i}T\n\ni=1, {\u03c3\u22122\nj }p\n\nj=1 and {\u03be(ti)}T\n\nThe problem of online updating represents a key point in multivariate time series with high frequency\ndata. Referring to our formulation, we are interested in updating an approximated posterior distri-\nbution for \u03a3(tT +h) and \u00b5(tT +h) with h = 1, ..., H once a new vector of observations {yi}T +H\ni=T +1 is\navailable, instead of rerunning posterior computation for the whole time series.\nSince as T increases the posterior for the time-stationary parameters rapidly becomes concentrated,\nwe \ufb01x these parameters at estimates ( \u02c6\u0398, \u02c6\u03a30, \u02c6\u03c32\n) and dynamically update the\n\u03belk\ndictionary functions alternating between steps A and B for the new set of observations. To initialize\nthe algorithm at T + 1 we propose to run the online updating for {yi}T +H\ni=T\u2212k, with k small, and\nchoosing a diffuse but proper prior for the initial states at T\u2212k. Such approach is suggested to reduce\nthe problem related to the larger conditional variances (see, e.g. [25]) of the latent states at the end\nof the sample (i.e. at T ), which may affect the initial distributions in T + 1. The online algorithm is\nalso ef\ufb01cient in exploiting the advantages of the state space formulation for the dictionary functions,\nrequiring matrix inversion computations of order depending only on the length of the additional\nsequence H and on the number of the last observations k used to initialize the algorithm.\n\n, \u02c6\u03c32\n\u03c8k\n\n\u02c6\u03c32\nBk\n\n, \u02c6\u03c32\n\nAlk\n\n2 Simulation studies\n\nThe aim of the following simulation studies is to compare the performance of our proposal (LBCR,\nlocally adaptive Bayesian covariance regression) with respect to BCR, and to the models for multi-\nvariate stochastic volatility most widely used in practice, speci\ufb01cally: EWMA, PC-GARCH, GO-\nGARCH and DCC-GARCH. In order to assess whether and to what extent LBCR can accommodate,\nin practice, even sharp changes in the time-varying covariances and means, and to evaluate the costs\nassociated to our \ufb02exible approach in settings where the mean and covariance functions do not re-\nquire locally adaptive estimation tecniques, we will focus on two different sets of simulated data.\nThe \ufb01rst dataset consists in 5-dimensional observations yi for each ti \u2208 To = {1, 2, ..., 100}, from\nthe latent factor model in (2) with \u03a3(t) de\ufb01ned as in (1). To allow sharp changes of the covariances\nand means in the generating mechanism, we consider a 2 \u00d7 2 (i.e. L = K = 2) matrix {\u03be(ti)}100\ni=1\nof time-varying functions adapted from Donoho and Johnstone [26] with locally-varying smooth-\nness (more speci\ufb01cally we choose \u2018bumps\u2019 functions also to mimic possible behavior in practical\nsettings). The second set of simulated data is the same dataset of 10-dimensional observations yi\n\n4\n\nTime020406080100050100150Time020406080100-300-250-200-150-100-500Time020406080100-6-4-202402040608010012345020406080100-0.50.00.51.01.5020406080100-0.4-0.20.00.20.40.60.81.0\fTable 1: Summaries of the standardized squared errors.\n\nLocally varying smoothness\nmean\nmax\n\nq0.9\ncovariance \u03a3(ti)\n\nq0.95\n\nmean\n\nmax\n\nConstant smoothness\n\nq0.9\n\nq0.95\ncovariance \u03a3(ti)\n\nEWMA\nPC-GARCH\nGO-GARCH\nDCC-GARCH\nBCR\nLBCR\n\n1.37\n1.75\n2.40\n1.75\n1.80\n0.90\n\n2.28\n5.49\n2.49\n6.48\n3.66\n10.32\n2.21\n6.95\n2.25\n7.32\n1.99\n4.52\nmean \u00b5(ti)\n\n85.86\n229.50\n173.41\n226.47\n142.26\n36.95\n\n0.030\n0.018\n0.043\n0.022\n0.009\n0.009\n\n0.081\n0.133\n0.048\n0.076\n0.104\n0.202\n0.057\n0.110\n0.019\n0.039\n0.022\n0.044\nmean \u00b5(ti)\n\n1.119\n0.652\n1.192\n0.466\n0.311\n0.474\n\nSMOOTH SPLINE 0.064\nBCR\n0.087\nLBCR\n0.062\n\n0.128\n0.185\n0.123\n\n0.186\n0.379\n0.224\n\n2.595\n2.845\n2.529\n\n0.007\n0.005\n0.005\n\n0.019\n0.015\n0.017\n\n0.027\n0.024\n0.026\n\n0.077\n0.038\n0.050\n\ninvestigated in Fox and Dunson [22], with smooth GP dictionary functions for each element of the\n5 \u00d7 4 (i.e. L = 5, K = 4) matrices {\u03be(ti)}100\ni=1.\nPosterior computation, both for LBCR and BCR, is performed by assuming diffuse but proper priors\nand by using truncation levels L\u2217 = K\u2217 = 2 for the \ufb01rst dataset and L\u2217 = 5, K\u2217 = 4 for the second\n(at higher levels settings we found that the shrinkage prior on \u0398 results in posterior samples of\nthe elements in the adding columns concentrated around 0). For the \ufb01rst dataset we run 50,000\nGibbs iterations with a burn-in of 20,000 and tinning every 5 samples, while for the second one we\nfollowed Fox and Dunson [22] by considering 10,000 Gibbs iterations which proved to be enough to\nreach convergence, and discarded the \ufb01rst 5,000 as burn-in. In the \ufb01rst set of simulated data, given\nthe substantial independence between samples after thinning the chain, we analyzed mixing by the\nGelman-Rubin procedure [27], based on potential scale reduction factors computed for each chain\nby splitting the sampled quantities in 6 pieces of same length. The analysis shows more problematic\nmixing for BCR with respect of LBCR. Speci\ufb01cally, in LBCR the 95% of the chains have a potential\nreduction factor lower than 1.35, with a median equal to 1.11, while in BCR the 95th quantile is 1.44\nand the median equals to 1.18. Less problematic is the mixing for the second set of simulated data,\nwith potential scale reduction factors having median equal to 1.05 for both approaches and 95th\nquantiles equal to 1.15 and 1.31 for LBCR and BCR, respectively.\nAs regards the other approaches, EWMA has been implemented by choosing the smoothing pa-\nrameter \u03bb that minimizes the mean squared error (MSE) between the estimated covariances and the\ntrue values. PC-GARCH algorithm follows the steps provided by Burns [16] with GARCH(1,1)\nassumed for the conditional volatilities of each single time series and the principal components.\nGO-GARCH and DCC-GARCH recall the formulations provided by van der Weide [18] and Engle\n[15] respectively, assuming a GARCH(1,1) for the conditional variances of the processes analyzed,\nwhich proves to be a correct choice in many \ufb01nancial applications and also in our setting. Differently\nfrom LBCR and BCR, the previous approaches do not model explicitly the mean process {\u00b5(ti)}100\nbut work directly on the innovations {yi\u2212 \u02c6\u00b5(ti)}100\ni=1\ni=1. Therefore in these cases we \ufb01rst model the con-\nditional mean via smoothing spline and in a second step we estimate the models for the innovations.\nThe smoothing parameter for spline estimation has been set to 0.7, which was found to be appropri-\nate to reproduce the true dynamic of {\u00b5(ti)}100\ni=1. Figure 1 compares, in both simulated samples, true\nand posterior mean of \u00b5(t) and \u03a3(t) over the predictor space To together with the point-wise 95%\nhighest posterior density (hpd) intervals for LBCR and BCR. From the upper plots we can clearly\nnote that our approach is able to capture conditional heteroscedasticity as well as mean patterns,\nalso in correspondence of sharp changes in the time-varying true functions. The major differences\ncompared to the true values can be found at the beginning and at the end of the series and are likely\nto be related to the structure of the simulation smoother which causes a widening of the credibility\nbands at the very end of the series, for references see Durbin and Koopman [25]. However, even\nin the most problematic cases, the true values are within the bands of the 95% hpd intervals. Much\nmore problematic is the behavior of the posterior distributions for BCR which badly over-smooth\n\n5\n\n\fUSA NASDAQ\n\nITALY FTSE MIB\n\nFigure 2: For 2 NSI posterior mean (black) and 95% hpd (dotted red) for the variances {\u03a3jj(ti)}415\ni=1.\n\ni=1 and estimated quantities {\u02c6\u00b5(ti)}100\n\nboth covariance and mean functions leading also to many 95% hpd intervals not containing the true\nvalues. Bottom plots in Figure 1 show that the performance of our approach is very close to that\nof BCR, when data are simulated from a model where the covariances and means evolve smoothly\nacross time and local adaptivity is not required. This happens even if the hyperparameters are set in\norder to maintain separation between nGP and GP prior, suggesting large support for LBCR.\nThe comparison of the summaries of the squared errors between true values {\u00b5(ti)}100\ni=1 and\n{\u03a3(ti)}100\ni=1 standardized with the range of the\ntrue underlying processes r\u00b5 = maxi,j{\u00b5j(ti)} \u2212 mini,j{\u00b5j(ti)} and r\u03a3 = maxi,j,k{\u03a3j,k(ti)} \u2212\nmini,j,k{\u03a3j,k(ti)} respectively, once again con\ufb01rms the overall better performance of our approach\nwith respect to all the considered competitors. Table 1 shows that, when local adaptivity is required,\nLBCR provides a superior performance having standardized residuals lower than those of the other\napproaches. EWMA seems to provide quite accurate estimates, however it is important to underline\nthat we choose the optimal smoothing parameter \u03bb in order to minimize the MSE between estimated\nand true parameters, which are clearly not known in practical applications. Different values of \u03bb\nreduces signi\ufb01cantly the performace of EWMA, which shows also lack of robustness. The close-\nness of LBCR and BCR in the constant smoothness dataset con\ufb01rms the \ufb02exibility of LBCR and\nhighlights the better performance of the two approaches with respect to the other competitors also\nwhen smooth processes are investigated.\n\ni=1 and { \u02c6\u03a3(ti)}100\n\n3 Application to National Stock Market Indices (NSI)\n\nNational Stock Indices represent technical tools that allow, through the synthesis of numerous data\non the evolution of the various stocks, to detect underlying trends in the \ufb01nancial market, with\nreference to a speci\ufb01c basis of currency and time. In this application we focus our attention on\nthe multivariate weekly time series of the main 33 (i.e. p = 33) National Stock Indices from\n12/07/2004 to 25/06/2012 downloaded from http://finance.yahoo.com.\nWe consider the heteroscedastic model for the log returns yi \u223c N33(\u00b5(ti), \u03a3(ti)) for i = 1, ..., 415\nand ti in the discrete set To = {1, 2, ..., 415}, where \u00b5(ti) and \u03a3(ti) are given in (3) and (1),\nrespectively. Posterior computation is performed by using the same settings of the \ufb01rst simulation\nstudy and \ufb01xing K\u2217 = 4 and L\u2217 = 5 (which we found to be suf\ufb01ciently large from the fact that the\nposterior samples of the last few columns of \u0398 assumed values close to 0). Missing values in our\ndataset do not represent a limitation since the Bayesian approach allows us to update our posterior\nconsidering solely the observed data. We run 10,000 Gibbs iterations with a burn-in of 2,500.\nExamination of trace plots for {\u03a3(ti)}415\nt=1 showed no evidence against convergence.\nPosterior distributions for the variances in Figure 2 show that we are clearly able to capture the\nrapid changes in the dynamics of volatilities that occur during the world \ufb01nancial crisis of 2008,\nin early 2010 with the Greek debt crisis and in the summer of 2011 with the \ufb01nancial speculation\nin government bonds of European countries together with the rejection of the U.S. budget and the\ndowngrading of the United States rating. Similar conclusions hold for the posterior distributions of\nthe trajectories of the means, with rapid changes detected in correspondence of the world \ufb01nancial\ncrisis in 2008.\n\ni=1 and {\u00b5(ti)}415\n\n6\n\n2004-07-192007-09-212010-11-230.0000.0020.0040.0062004-07-192007-09-212010-11-230.0000.0020.0040.006\fLBCR\n\nBCR\n\nFigure 3: Black line: For USA NASDAQ median of correlations with the other 32 NSI based on\nposterior mean of {\u03a3(ti)}415\ni=1. Red lines: 25%, 75% (dotted lines) and 50% (solid line) quantiles\nof correlations between USA NASDAQ and European countries (without considering Greece and\nRussia). Green lines: 25%, 75% (dotted lines) and 50% (solid line) quantiles of correlations between\nUSA NASDAQ and the countries of Southeast Asia (Asian Tigers and India).\n\nFrom the correlations between NASDAQ and the other National Stock Indices (based on the pos-\nterior mean { \u02c6\u03a3(ti)}415\ni=1 of the covariances function) in Figure 3, we can immediately notice the\npresence of a clear geo-economic structure in world \ufb01nancial markets (more evident in LBCR than\nin BCR), where the dependence between the U.S. and European countries is systematically higher\nthan that of South East Asian Nations (Economic Tigers), showing also different reactions to crises.\nThe \ufb02exibility of the proposed approach and the possibility of accommodating varying smoothness\nin the trajectories over time, allow us to obtain a good characterization of the dynamic dependence\nstructure according with the major theories on \ufb01nancial crisis. Left plot in Figure 3 shows how the\nchange of regime in correlations occurs exactly in correspondence to the burst of the U.S. housing\nbubble (A), in the second half of 2006. Moreover we can immediately notice that the correlations\namong \ufb01nancial markets increase signi\ufb01cantly during the crises, showing a clear international \ufb01nan-\ncial contagion effect in agreement with other theories on \ufb01nancial crises. As expected the persistence\nof high levels of correlation is evident during the global \ufb01nancial crisis between late-2008 and end-\n2009 (C), at the beginning of which our approach also captures a dramatic change in the correlations\nbetween the U.S. and Economic Tigers, which lead to levels close to those of Europe. Further rapid\nchanges are identi\ufb01ed in correspondence of Greek crisis (D), the worsening of European sovereign-\ndebt crisis and the rejection of the U.S. budget (F) and the recent crisis of credit institutions in Spain\ntogether with the growing \ufb01nancial instability in Eurozone (G). Finally, even in the period of U.S.\n\ufb01nancial reform launched by Barack Obama and EU efforts to save Greece (E), we can notice two\npeaks representing respectively Irish debt crisis and Portugal debt crisis. BCR, as expected, tends\nto over-smooth the dynamic dependence structure during the \ufb01nancial crisis, proving to be not able\nto model the sharp change in the correlations between USA NASDAQ and Economic Tigers during\nlate-2008, and the two peaks in (E) at the beginning of 2011.\nThe possibility to quickly update the estimates and the predictions as soon as new data arrive, rep-\nresents a crucial aspect to obtain quantitative informations about the future scenarios of the crisis\nin \ufb01nancial markets. To answer this goal, we apply the proposed online updating algorithm to the\nnew set of weekly observations {yi}422\ni=416 from 02/07/2012 to 13/08/2012 conditioning on pos-\nterior estimates of the Gibbs sampler based on observations {yi}415\ni=1 available up to 25/06/2012.\nWe initialized the simulation smoother algorithm with the last 8 observations of the previous sam-\nple. Plots at the top of Figure 4 show, for 3 selected National Stock Indices, the new observed log\nreturns {yji}422\ni=416 together with the mean and the 2.5% and 97.5% quantiles of their marginal and\nconditional distributions. We use standard formulas of the multivariate normal distribution based\non the posterior mean of the updated {\u03a3(ti)}422\ni=416 after 5,000 Gibbs iterations\nwith a burn in of 500.We can clearly notice the good performance of our proposed online updat-\ning algorithm in obtaining a characterization for the distribution of new observations. Also note\nthat the multivariate approach together with a \ufb02exible model for the mean and covariance, allow\nfor signi\ufb01cant improvements when the conditional distribution of an index given the others is ana-\nlyzed. To obtain further informations about the predictive performance of our LBCR, we can easily\nuse our online updating algorithm to obtain h step-ahead predictions for \u03a3(tT +h|T ) and \u00b5(tT +h|T )\nwith h = 1, ..., H. In particular, referring to Durbin and Koopman [25], we can generate posterior\n\ni=416 and {\u00b5(ti)}422\n\n7\n\n0.00.20.40.60.8ABCDEFG2004-07-192006-04-102007-12-312009-09-212011-06-132004-07-192006-04-102007-12-312009-09-212011-06-130.20.40.60.8ABCDEFG2004-07-192006-04-102007-12-312009-09-212011-06-132004-07-192006-04-102007-12-312009-09-212011-06-13\fUSA NASDAQ\n\nINDIA BSE30\n\nFRANCE CAC40\n\nFigure 4: Top: For 3 selected NSI, plot of the observed log returns (black) together with the mean\nand the 2.5% and 97.5% quantiles of the marginal distribution (red) and conditional distribution\n\u2212j\ngiven the other 32 NSI (green) yji|y\ni = {yqi, q (cid:54)= j}, based on the posterior mean of\n{\u03a3(ti)}422\ni=416 from the online updating procedure for the new observations from\n02/07/2012 to 13/08/2012. Bottom: boxplots of the one step ahead prediction errors for the 33\nNSI computed with 3 different methods.\n\ni=416 and {\u00b5(ti)}422\n\n\u2212j\ni with y\n\nsamples from \u03a3(tT +h|T ) and \u00b5(tT +h|T ) for h = 1, ..., H merely by treating {yi}T +H\ni=T +1 as missing\nvalues in the proposed online updating algorithm. Here, we consider the one step ahead prediction\n(i.e. H = 1) problem for the new observations. More speci\ufb01cally, for each i from 415 to 421, we\nupdate the mean and covariance functions conditioning on informations up to ti through the online\nalgorithm and then obtain the predicted posterior distribution for \u03a3(ti+1|i) and \u00b5(ti+1|i) by adding\nto the sample considered for the online updating a last column yi+1 of missing values. Plots at the\nbottom of Figure 4, show the boxplots of the one step ahead prediction errors for the 33 NSI ob-\ntained as the difference between the predicted value \u02dcyj,i+1|i and, once available, the observed log\nreturn yj,i+1 with i + 1 = 416, ..., 422 corresponding to weeks from 02/07/2012 to 13/08/2012.\nIn (a) we forecast the future log returns with the unconditional mean {\u02dcyi+1}421\ni=415 = 0, which is\nwhat is often done in practice under the general assumption of zero mean, stationary log returns. In\n(b) we consider \u02dcyi+1|i = \u02c6\u00b5(ti+1|i), the posterior mean of the one step ahead predictive distribution\nof \u00b5(ti+1|i), obtained from the previous proposed approach after 5,000 Gibbs iterations with a burn\nin of 500. Finally in (c) we suppose that the log returns of all National Stock Indices except that of\ncountry j (i.e. yj,i+1) become available at ti+1 and, considering yi+1|i \u223c Np(\u02c6\u00b5(ti+1|i), \u02c6\u03a3(ti+1|i))\nwith \u02c6\u00b5(ti+1|i) and \u02c6\u03a3(ti+1|i) posterior means of the one step ahead predictive distribution respec-\ntively for \u00b5(ti+1|i) and \u03a3(ti+1|i), we forecast \u02dcyj,i+1 with the conditional mean of yj,i+1 given the\nother log returns at time ti+1. Prediction with unconditional mean (a) seems to lead to over-predicted\nvalues while our approach (b) provides median-unbiased predictions. Moreover, the combination of\nour approach and the use of conditional distributions of one return given the others (c) further im-\nproves forecasts reducing also the variability of the predictive distribution. We additionally obtain\nwell calibrated predictive intervals unlike competing methods.\n\n4 Discussion\n\nIn this paper, we have presented a generalization of Bayesian nonparametric covariance regression\nto obtain a better characterization for mean and covariance temporal dynamics. Maintaining simple\nconjugate posterior updates and tractable computations in moderately large p settings, our model\nincreases the \ufb02exibility of previous approaches as shown in the simulation studies. Beside these\nkey advantages, the state space formulation enables development of a fast online updating algorithm\nuseful for high frequency data. The application to the problem of capturing temporal and geo-\neconomic structure between \ufb01nancial markets shows the utility of our approach in the analysis of\nmultivariate \ufb01nancial time series.\n\n8\n\nTime-0.050.000.052012-07-022012-07-162012-07-302012-08-13Time-0.050.000.050.102012-07-022012-07-162012-07-302012-08-13Time-0.050.000.050.102012-07-022012-07-162012-07-302012-08-13(a)-0.08-0.040.000.020.040.062012-07-022012-07-162012-07-302012-08-13(b)-0.08-0.040.000.020.040.062012-07-022012-07-162012-07-302012-08-13(c)-0.08-0.040.000.020.040.062012-07-022012-07-162012-07-302012-08-13\fReferences\n\n[1] Tsay, R.S. (2005). Analysis of Financial Time Series. Hoboken, New Jersey: Wiley.\n[2] Kalman, R.E. (1960). A new approach to linear \ufb01ltering and prediction problems. Journal of Basic Engi-\nneering 82:35-45.\n[3] Rasmussen, C.E. & Williams, C.K.I (2006). Gaussian processes for machine learning. Boston: MIT Press.\n[4] Huang, J.Z., Wu, C.O & Zhou, L. (2002). Varying-coef\ufb01cient models and basis function approximations\nfor the analysis of repeated measurements. Biometrika 89:111-128.\n[5] Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Models. London: Chapman and Hall.\n[6] Wu C.O., Chiang C.T. & Hoover D.R. (1998). Asymptotic con\ufb01dence regions for kernel smoothing of a\nvarying-coef\ufb01cient model with longitudinal data. JASA 93:1388-1402.\n[7] Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Statistics 19:1-67.\n[8] Smith, M. & Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of\nEconometrics 75:317-343.\n[9] George, E.I. & McCulloch, R.E. (1993). Variable selection via Gibbs sampling. JASA 88:881-889.\n[10] Donoho, D.L. & Johnstone, I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. JASA\n90:1200-1224.\n[11] Fan, J. & Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial \ufb01tting: variable band-\nwidth and spatial adaptation. JRSS. Series B 57:371-394.\n[12] Wolpert, R.L., Clyde M.A. & Tu, C. (2011). Stochastic expansions using continuous dictionaries: Levy\nadaptive regression kernels. Annals of Statistics 39:1916-1962.\n[13] Bollerslev, T., Engle, R.F. and Wooldrige, J.M. (1988). A capital-asset pricing model with time-varying\ncovariances. Journal of Political Economy 96:116-131.\n[14] Engle, R.F. & Kroner, K.F. (1995). Multivariate simultaneous generalized ARCH. Econometric Theory\n11:122-150.\n[15] Engle, R.F. (2002). Dynamic conditional correlation: a simple class of multivariate generalized autore-\ngressive conditional heteroskedasticity models. Journal of Business & Economic Statistics 20:339-350.\n[16] Burns, P. (2005). Multivariate GARCH with Only Univariate Estimation. http://www.burns-stat.com.\n[17] Alexander, C.O. (2001). Orthogonal GARCH. Mastering Risk 2:21-38.\n[18] van der Weide, R. (2002). GO-GARCH: a multivariate generalized orthogonal GARCH model. Journal\nof Applied Econometrics 17:549-564.\n[19] Nakajima, J. & West, M. (2012). Dynamic factor volatility modeling: A Bayesian latent threshold ap-\nproach. Journal of Financial Econometrics, in press.\n[20] Wilson, A.G. & Ghahramani Z. (2010). Generalised Wishart Processes. arXiv:1101.0240.\n[21] Bru, M. (1991). Wishart Processes. Journal of Theoretical Probability 4:725-751.\n[22] Fox E. & Dunson D.B. (2011). Bayesian Nonparametric Covariance Regression. arXiv:1101.2017.\n[23] Zhu B. & Dunson D.B., (2012). Locally Adaptive Bayes Nonparametric Regression via Nested Gaussian\nProcesses. arXiv:1201.4403.\n[24] Durbin, J. & Koopman, S. (2002). A simple and ef\ufb01cient simulation smoother for state space time series\nanalysis. Biometrika 89:603-616.\n[25] Durbin, J. & Koopman, S. (2001). Time Series Analysis by State Space Methods. New York: Oxford\nUniversity Press Inc.\n[26] Donoho, D.L. & Johnstone, J.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425-\n455.\n[27] Gelman, A. & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statisti-\ncal Science 7:457-511.\n\n9\n\n\f", "award": [], "sourceid": 825, "authors": [{"given_name": "Daniele", "family_name": "Durante", "institution": "University of Padua"}, {"given_name": "Bruno", "family_name": "Scarpa", "institution": "University of Padua"}, {"given_name": "David", "family_name": "Dunson", "institution": "Duke University"}]}