{"title": "Dynamic Mode Decomposition with Reproducing Kernels for Koopman Spectral Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 911, "page_last": 919, "abstract": "A spectral analysis of the Koopman operator, which is an infinite dimensional linear operator on an observable, gives a (modal) description of the global behavior of a nonlinear dynamical system without any explicit prior knowledge of its governing equations. In this paper, we consider a spectral analysis of the Koopman operator in a reproducing kernel Hilbert space (RKHS). We propose a modal decomposition algorithm to perform the analysis using finite-length data sequences generated from a nonlinear system. The algorithm is in essence reduced to the calculation of a set of orthogonal bases for the Krylov matrix in RKHS and the eigendecomposition of the projection of the Koopman operator onto the subspace spanned by the bases. The algorithm returns a decomposition of the dynamics into a finite number of modes, and thus it can be thought of as a feature extraction procedure for a nonlinear dynamical system. Therefore, we further consider applications in machine learning using extracted features with the presented analysis. We illustrate the method on the applications using synthetic and real-world data.", "full_text": "Dynamic Mode Decomposition with Reproducing\n\nKernels for Koopman Spectral Analysis\n\nYoshinobu Kawaharaab\n\na The Institute of Scienti\ufb01c and Industrial Research, Osaka University\n\nb Center for Advanced Integrated Intelligence Research, RIKEN\n\nykawahara@sanken.osaka-u.ac.jp\n\nAbstract\n\nA spectral analysis of the Koopman operator, which is an in\ufb01nite dimensional lin-\near operator on an observable, gives a (modal) description of the global behavior\nof a nonlinear dynamical system without any explicit prior knowledge of its gov-\nerning equations. In this paper, we consider a spectral analysis of the Koopman\noperator in a reproducing kernel Hilbert space (RKHS). We propose a modal de-\ncomposition algorithm to perform the analysis using \ufb01nite-length data sequences\ngenerated from a nonlinear system. The algorithm is in essence reduced to the\ncalculation of a set of orthogonal bases for the Krylov matrix in RKHS and the\neigendecomposition of the projection of the Koopman operator onto the subspace\nspanned by the bases. The algorithm returns a decomposition of the dynamics\ninto a \ufb01nite number of modes, and thus it can be thought of as a feature extraction\nprocedure for a nonlinear dynamical system. Therefore, we further consider appli-\ncations in machine learning using extracted features with the presented analysis.\nWe illustrate the method on the applications using synthetic and real-world data.\n\n1 Introduction\n\nModeling nonlinear dynamical systems using data is fundamental in a variety of engineering and\nscienti\ufb01c \ufb01elds. In machine learning, the problem of learning dynamical systems has been actively\ndiscussed, and several Bayesian approaches have been proposed [11, 34]. In the \ufb01elds of physics,\none popular approach for this purpose is the decomposition methods that factorize the dynamics into\nmodes based on some criterion from the data. For example, proper orthogonal decomposition (POD)\n(see, for example, [12]), which generates orthogonal modes that optimally capture the vector energy\nof a given dataset, has been extensively applied to complex phenomena in physics [5, 22] even\nthough this method is currently known to have several drawbacks. The so-called spectral method\nfor dynamical systems [15, 31, 17], which is often discussed in machine learning, is closely related\nto this type of technique, where one aims to estimate a prediction model rather than understand the\ndynamics by examining the obtained modes.\nAmong the decomposition techniques, dynamic mode decomposition (DMD) [25, 26] has recently\nattracted attention in the \ufb01eld of physics, such as \ufb02ow mechanics, and in engineering, and has been\napplied to data obtained from complex phenomena [2, 4, 6, 10, 21, 25, 27, 32]. DMD approximates\nthe spectra of the Koopman operator [16], which is an in\ufb01nite-dimensional linear operator that rep-\nresents nonlinear and \ufb01nite-dimensional dynamics without linearization. While POD just \ufb01nds the\nprincipal directions in a dataset, DMD can yield direct information concerning the dynamics such\nas growth rates and the frequencies of the dynamics.\nIn this paper, we consider a spectral analysis of the Koopman operator in reproducing kernel Hilbert\nspaces (RKHSs) for a nonlinear dynamical system\n(1)\nwhere x2M is the state vector on a \ufb01nite-dimensional manifold M(cid:18) Rd, and f is a (possibly,\nnonlinear) state-transition function. We present a modal decomposition algorithm to perform this,\n\nxt+1 = f (xt);\n\n1\n\n\fwhich is in principle reduced to the calculation of a set of orthogonal bases for the Krylov matrix\nin RKHS and the eigendecomposition of the projection of the Koopman operator onto the subspace\nspanned by the bases. Although existing DMD algorithms can conceptually be thought of as produc-\ning an approximation of the eigenfunctions of the Koopman operator using a set of linear monomials\nof observables (or the pre-determined functional maps of observables) as basis functions, which is\nanalogous to a one-term Taylor expansion at each point, our algorithm gives an approximation with\na set of nonlinear basis functions due to the expressiveness of kernel functions. The proposed algo-\nrithm provides a modal decomposition of the dynamics into a \ufb01nite number of modes, and thus it\ncould be considered as a feature extraction procedure for a nonlinear dynamical system. Therefore,\nwe consider applications using extracted features from our analysis such as state prediction, sequen-\ntial change-point detection, and dynamics recognition. We illustrate our method on the applications\nusing synthetic and real-world data.\nThe remainder of this paper is organized as follows. In Section 2, we brie\ufb02y review the spectral anal-\nysis of nonlinear dynamical systems with the Koopman operator and DMD. In Section 3, we extend\nthe analysis with reproducing kernels, and provide a modal decomposition algorithm to perform this\nanalysis based on the equivalent principle of DMD. Although this method is mathematically correct,\na practical implementation could yield an ill-conditioned algorithm. Therefore, in Section 4, we\ndescribe a way to robustly it by projecting data onto the POD directions. In Section 5, we describe\nrelated works. In Section 6, we show some empirical examples by the proposed algorithm and, in\nSection 7, we describe several applications using extracted features with empirical results. Finally,\nwe conclude the paper in Section 8.\n\n2 The Koopman Operator and Dynamic Mode Decomposition\n\n(Kgi)(x) = gi \u25e6 f (x);\n\nConsider a discrete-time nonlinear dynamical system (1). The Koopman operator [16], which we\ndenote here by K, is an in\ufb01nite-dimensional linear operator that acts on a scalar function gi : M! C,\nmapping gi to a new function Kgi given as follows:\n(2)\nwhere \u25e6 denotes the composition of gi with f. We see that K acts linearly on the function gi, even\nthough the dynamics de\ufb01ned by f may be nonlinear. Since K is a linear operator, it has, in general,\nan eigendecomposition\n(3)\nwhere (cid:21)j 2 C is the j-th eigenvalue (called the Koopman eigenvalue) and \u03c6j is the corresponding\neigenfunction (called the Koopman eigenfunction). We denote the concatenation of gi as g :=\n\u22a4. If each gi lies within the span of the eigenfunctions \u03c6j, we can expand the vector-\n[g1; : : : ; gp]\nvalued g in terms of these eigenfunctions as\ng(x) =\n\n(4)\nwhere uj is a set of vector coef\ufb01cients called Koopman modes. Then, by the iterative applications\nof Eqs. (2) and (3), we obtain\n\nK\u03c6j(x) = (cid:21)j\u03c6j(x);\n\nj=1\u03c6j(x)uj;\n\nj=1(cid:21)l\n\nj\u03c6j(x)uj;\n\n(5)\nwhere f l is the l-time compositions of f. Therefore, (cid:21)j characterizes the temporal behavior of the\ncorresponding Koopman mode uj, i.e., the phase of (cid:21)j determines its frequency, and the magnitude\ndetermines the growth rate of the dynamics. Note that, for a system evolving on an attractor, the\nKoopman eigenvalues always lie on a unit circle [20].\nDMD [25, 26] (and its variants) is a popular approach for estimating the approximations of (cid:21)j and uj\nfrom a \ufb01nite-length data sequence y0; y1; : : : ; y(cid:28) (2 Rp), where we denote yt := g(xt). DMD can\nfundamentally be considered as a special use of the Arnoldi method [1]. That is, using the empirical\nRitz values ~(cid:21)j and vectors vj obtained by the Arnoldi method when regarding the subspace spanned\nby y0; : : : ; y(cid:28)(cid:0)1 as the Krylov subspace for y0 (and implicitly for some matrix A 2 Rp(cid:2)p), it is\nshown that the observables are expressed as\n\n\u22111\n\u22111\n\ng \u25e6 f l(x) =\n\n\u2211\n\u2211\n\nyt =\ny(cid:28) =\n\njvj (t = 0; : : : ; (cid:28) (cid:0) 1); and\n~(cid:21)t\nj vj + r where r ? spanfy0; : : : ; y(cid:28)(cid:0)1\n~(cid:21)(cid:28)\n\n(cid:28)\nj=1\n(cid:28)\nj=1\n\ng:\n\n(6a)\n(6b)\n\nComparing Eq. (6a) with Eq. (5) infers that the empirical Ritz values ~(cid:21)j and vectors vj behave in\nprecisely the same manner as the Koopman eigenvalues (cid:21)j and modes uj (\u03c6j(x0)uj), but for the\n\n2\n\n\f\ufb01nite sum in Eq. (6a) instead of the in\ufb01nite sum in Eq. (5). Note that, for r = 0 in Eq. (6b) (which\ncould happen when the data are suf\ufb01ciently large), the approximate modes are indistinguishable\nfrom the true Koopman eigenvalues and modes (as far as the data points are concerned), with the\nexpansion (5) comprising only a \ufb01nite number of terms.\n\n3 Dynamic Mode Decomposition with Reproducing Kernels\n\nAs described above, the estimation of the Koopman mode by DMD (and its variants) can capture\nthe nonlinear dynamics from \ufb01nite-length data sequences generated from a dynamical system. Con-\nceptually, DMD can be considered as producing an approximation of the Koopman eigenfunctions\nusing a set of linear monomials of observables as basis functions, which is analogous to a one-term\nTaylor expansion at each point. In situations where eigenfunctions can be accurately approximated\nusing linear monomials (e.g., in a small neighborhood of a stable \ufb01xed point), DMD will produce\nan accurate local approximation of the Koopman eigenfunctions. However, this is certainly not ap-\nplicable to all systems (in particular, beyond the region of validity for local linearization). Here, we\nextend the Koopman spectral analysis with reproducing kernels to approximate the Koopman eigen-\nfunctions with richer basis functions. We provide a modal decomposition algorithm to perform this\nanalysis based on the equivalent principle with DMD.\nLet H be the RKHS embedded with the dot product \u27e8(cid:1);(cid:1)\u27e9H (we abbreviate \u27e8(cid:1);(cid:1)\u27e9H as \u27e8(cid:1);(cid:1)\u27e9 for simplic-\nity) and a positive de\ufb01nite kernel k. Additionally, let \u03d5 : M ! H. Then, we de\ufb01ne the Koopman\noperator on the feature map \u03d5 by\n\n(KH\u03d5)(x) = \u03d5 \u25e6 f (x):\n\n(7)\nThus, the Koopman operator KH is a linear operator in H. Note that almost of the theoretical claims\nin this and the next sections do not necessarily require \u03d5 to be in RKHS (it is suf\ufb01cient that \u03d5 stays in\na Hilbert space). However, this assumption should perform the calculation in practice (as described\nin the last parts of this and the next sections). Therefore, we proceed with this assumption in the\nfollowing parts. We denote by \u03c6j the j-th eigenfunction of KH with the corresponding eigenvalue\n(cid:21)j. Also, we de\ufb01ne (cid:8) := spanf\u03d5(x) : x 2 Mg.\nWe \ufb01rst expand the notions, such as the Ritz values and vectors, that appear in DMD with reproduc-\ning kernels. Suppose we have a sequence x0; x1; : : : ; x(cid:28) . The Krylov subspace for \u03d5(x0) is de\ufb01ned\nas the subspace spanned by \u03d5(x0); (KH\u03d5)(x0); : : : ; (K(cid:28)(cid:0)1H \u03d5)(x0). Note that this is identical to the\none spanned by \u03d5(x0); : : : ; \u03d5(x(cid:28)(cid:0)1), whose corresponding Krylov matrix is given by\n(8)\nTherefore, if we denote a set of (cid:28) orthogonal bases of the Krylov subspace by q1; : : : ; q(cid:28) (2 H)\n(obtained from the Gram-Schmidt orthogonalization described below), then the orthogonal projec-\ntion of KH onto M(cid:28) is given by P(cid:28) = Q(cid:3)\n(cid:28) indicates the\nHermitian transpose of Q(cid:28) . Consequently, the empirical Ritz values and vectors are de\ufb01ned as the\neigenvalues and vectors of P(cid:28) , respectively. Now, we have the following theorem:\nTheorem 1. Consider a sequence \u03d5(x0); \u03d5(x1); : : : ; \u03d5(x(cid:28) ), and let ~(cid:21)j and ~\u03c6j be the empirical Ritz\nvalues and vectors for this sequence. Assume that ~(cid:21)j\u2019s are distinct. Then, we have\n\nKHQ(cid:28) , where Q(cid:28) = [q1 (cid:1)(cid:1)(cid:1) q(cid:28) ] and Q(cid:3)\n\nM(cid:28) = [\u03d5(x0) (cid:1)(cid:1)(cid:1) \u03d5(x(cid:28)(cid:0)1)]:\n\n(cid:28)\nj=1\n(cid:28)\nj=1\n\nj ~\u03c6j (t = 0; : : : ; (cid:28) (cid:0) 1); and\n~(cid:21)t\nj ~\u03c6j + where ? spanf\u03d5(x0); : : : ; \u03d5(x(cid:28)(cid:0)1)g:\n~(cid:21)(cid:28)\n\n(9a)\n(9b)\nProof. Let M(cid:28) = Q(cid:28) R (R 2 C(cid:28)(cid:2)(cid:28) ) be the Gram-Schmidt QR decomposition of M(cid:28) . Then, the\ncompanion matrix (rational canonical form) of P(cid:28) is given as F := R\n(cid:0)1P(cid:28) R. Note that the sets of\neigenvalues of P(cid:28) and F are equivalent. Since F is a companion matrix and ~(cid:21)j\u2019s are distinct, F can\n(cid:0)1 ~(cid:3)T , where ~(cid:3) is a diagonal matrix with ~(cid:21)1; : : : ; ~(cid:21)(cid:28) and T is a\nbe diagonalized in the form F = T\nVandermonde matrix de\ufb01ned by Tij = ~(cid:21)j(cid:0)1\n. Therefore, the empirical Ritz vectors ~\u03c6j are obtained\nas the columns of V = M(cid:28) T\n(cid:0)1. This proves Eq. (9a). Suppose a linear expansion of \u03d5(x(cid:28) ) is\nrepresented as\n\n\u2211\n\u2211\n\n(cid:0)1P(cid:28) R = M(cid:0)1\nSince F = R\nlast column of M(cid:28) F = M(cid:28) T\n\n\u03d5(x(cid:28) ) = M(cid:28) c + where ? spanf\u03d5(x0); : : : ; \u03d5(x(cid:28)(cid:0)1)g:\n\n(10)\nKHM(cid:28) (therefore, M(cid:28) F = KHM(cid:28) ), the \ufb01rst term is given by the\n(cid:0)1 ~(cid:3)T = V ~(cid:3)T . This proves Eq. (9b).\n\n(cid:28)\n\n\u03d5(xt) =\n\n\u03d5(x(cid:28) ) =\n\n(cid:28)\n\ni\n\n3\n\n\fThis theorem gives an extension of DMD via the Gram-Schmidt QR decomposition in the feature\nspace. Although in Step (2), the Gram-Schmidt QR orthogonalization is performed in RKHS, this\ncalculation can be reduced to operations on a Gram matrix due to the reproducing property of kernel\nfunctions.\n(1) De\ufb01ne M(cid:28) by Eq. (8) and M+ := [\u03d5(x1); : : : ; \u03d5(x(cid:28) )].\n(2) Calculate the Gram-Schmidt QR decomposition M(cid:28) = Q(cid:28) R (e.g., refer to Section 5.2 of [29]).\n(cid:0)1 ~(cid:3)T , where each diagonal ele-\n(3) Calculate the eigendecomposition of R\n(4) De\ufb01ne ~\u03c6j to be the columns of M(cid:28) T\nThe original DMD algorithm (and its variants) produce an approximation of the eigenfunctions of\nthe Koopman operator in Eq. (2) using the set of linear monomials of observables as basis functions.\nIn contrast, because the above algorithm works with operations directly in the functional space,\nthe Koopman operator de\ufb01ned in Eq. (7) is identical to the transition operator on an observable.\nTherefore, the eigenfunctions of the Koopman operator are fully recovered if the Krylov subspace\nis suf\ufb01ciently large, i.e., \u03d5(x(cid:28) ) is also in spanf\u03d5(x0); : : : ; \u03d5(x(cid:28)(cid:0)1)g (or = 0).\n\nM+(=F ) = T\n\nment of ~(cid:3) gives ~(cid:21)j.\n\n(cid:0)1Q(cid:3)\n\n(cid:0)1.\n\n(cid:28)\n\n4 Robustifying with POD Bases\n\n(cid:28)\n\nAlthough the above decomposition based on the Gram-Schmidt orthogonalization is mathematically\ncorrect, a practical implementation could yield an ill-conditioned algorithm that is often incapable\nof extracting multiple modes. A similar issue has been well known for DMD [26], where one needs\nto adopt a way to robustify DMD by projecting data onto the (truncated) POD directions [8, 33].\nHere, we discuss a similar modi\ufb01cation of our principle with the POD basis.\n(cid:3) be the eigen-decomposition\nFirst, consider kernel PCA [28] on x0; x1; : : : ; x(cid:28)(cid:0)1: Let (cid:22)G = BSB\nof the centered Gram matrix (cid:22)G = HGH = G (cid:0) 1(cid:28) G (cid:0) G1(cid:28) + 1(cid:28) G1(cid:28) , where G = M(cid:3)\nM(cid:28) is\nthe Gram matrix for the data, H = I (cid:0) 1(cid:28) and 1(cid:28) is a (cid:28)-by-(cid:28) matrix for which each element takes\n\u2211\nthe value 1=(cid:28). Suppose the eigenvalues and eigenvectors can be truncated accordingly based on the\nmagnitudes of the eigenvalues, which results in (cid:22)G (cid:25) (cid:22)B (cid:22)S (cid:22)B\n(cid:3) where p ((cid:20)(cid:28) ) eigenvalues are adopted.\n\u2211\nDenote the j-th column of (cid:22)B by (cid:12)j and let (cid:22)\u03d5(xi)=\u03d5(xi)(cid:0)\u03d5c, where \u03d5c=\n(cid:28)(cid:0)1\nj=0 \u03d5(xj). A principal\n(cid:22)\u03d5(xi) = M(cid:28) H(cid:11)j (j =\n(cid:28)(cid:0)1\northogonal direction in the feature space is then given by (cid:23)j =\ni=0 (cid:11)j;i\n(cid:0)1=2). Since M+ = KHM(cid:28) ,\njj (cid:12)j. Let U = [(cid:23)1; : : : ; (cid:23)p] (= M(cid:28) H (cid:22)B (cid:22)S\n(cid:0)1=2\n1; : : : ; p), where (cid:11)j = (cid:22)S\nthe projection of KH onto the space spanned by (cid:23)j is given as\nM+)H (cid:22)B (cid:22)S\n^F := U(cid:3)KHU = (cid:22)S\n(cid:0)1=2:\n(cid:0)1=2 (cid:22)B\n(cid:3)\nNote that the (i; j)-the element of the matrix (M(cid:3)\n(cid:0)1 ^(cid:3) ^T be the eigendecomposition of ^F , then\n^F = ^T\n\nH(M(cid:3)\n(11)\nM+) is given by k(xi(cid:0)1; xj). Then, if we let\n\n(cid:22)\u03c6j = Ubj = M(cid:28) H (cid:22)B (cid:22)S\n(cid:0)1, can be used as an alternative to the empirical Ritz vector ~\u03c6j.\nwhere bj is the j-th column of ^T\nThat is, we have the following theorem:\nTheorem 2. Assume that \u03c6j 2 (cid:8), so that \u03c6j(x) = \u27e8\u03d5(x); (cid:20)j\u27e9 for some (cid:20)j 2 H and 8x 2 M. If\n(cid:20)j is in the subspace spanned by the columns of U, so that (cid:20)j = Uaj for some aj 2 Cp, then aj is\na left eigenvector of ^F with eigenvalue (cid:21)j, and also we have\np\nj=1\u03c6j(x) (cid:22)\u03c6j:\n\n(12)\nProof. Since KH\u03c6j = (cid:21)j\u03c6j, we have \u27e8\u03d5(f (x)); (cid:20)j\u27e9 = (cid:21)j \u27e8\u03d5(x); (cid:20)j\u27e9. Thus, from the assumption,\n\n(cid:0)1=2bj;\n\n\u2211\n\n\u03d5(x) =\n\n(cid:28)\n\n(cid:28)\n\n\u27e8\u03d5(f (x));Uaj\u27e9 = (cid:21)j \u27e8\u03d5(x);Uaj\u27e9 :\n\nBy evaluating at x0; x1; : : : ; x(cid:28)(cid:0)1 and then stacking into matrices, we have\n\n(cid:0)1HM(cid:3)\nIf we multiply H (cid:22)G\nU(cid:3)M+H (cid:22)G\n(cid:3)\nj\n\na\n\n(cid:28)\n\n(cid:28)\n\n(Uaj)\n\n(cid:3)M+ = (cid:21)j(Uaj)\n\n(cid:3)M(cid:28) :\nU from the righthand side, this gives\nU(cid:3)M(cid:28) H (cid:22)G\n\nU = (cid:21)ja\n\n(cid:0)1HM(cid:3)\n\n(cid:3)\nj\n\n(cid:0)1HM(cid:3)\n\n(cid:28)\n\nU = (cid:21)ja\n(cid:3)\nj :\n\n4\n\n\f(cid:28)\n\np\nj=1(a\n\n\u2211\n\nSince U(cid:3)M+H (cid:22)G\nU = U(cid:3)KHU(= ^F ), this means aj is a left eigenvector of ^F with eigen-\n(cid:0)1HM(cid:3)\n\u2211\nvalue (cid:21)j. Let bj be a (right) eigenvector of ^F with eigenvalue (cid:21)j and the corresponding left eigen-\nj bj = (cid:14)ij, then any vector h 2 Cp can be\n(cid:3)\n\u2211\nvector aj.Assuming these have been normalized so that a\nwritten as h =\n\nj h)bj. Applying this to U(cid:3)\n(cid:3)\nU(cid:3)\nSince bj = (U(cid:3)U)bj = U(cid:3)\nThis theorem clearly gives the connection between the eigenvalues/eigenvectors found by the above\nprocedure and the Koopman eigenvalues/eigenfunctions. The assumptions in the theorem means\nthat the data are suf\ufb01ciently rich and thus a set of the kernel principal components gives a good\napproximation of the representation with the Koopman eigenfunctions. As in the case of Eq. (5), by\nthe iterative applications of Eq. (3), we obtain\n\n\u03d5(x) =\n(cid:22)\u03c6j, this proves Eq. (12).\n\np\nj=1\u03c6j(x)bj\n\n\u03d5(x) gives\n\n\u03d5(x))bj: =\n\np\nj=1(a\n\nU(cid:3)\n\n(cid:3)\nj\n\n\u2211\n\n\u03d5(xt) =\n\np\n\nj=1(cid:21)t\n\nj\u03c6j(x0) (cid:22)\u03c6j:\n\n(13)\n\nThe procedure for the robusti\ufb01ed variant of the DMD is summarized as follows.1\n(1) De\ufb01ne M(cid:28) and calculate the centered Gram matrix (cid:22)G = HM(cid:3)\n(2) Calculate the eigendecomposition (cid:22)G (cid:25) (cid:22)B (cid:22)S (cid:22)B\n(3) Calculate ^F as in Eq. (11) and its eigendecomposition ^F = ^T\n\n(cid:3), which gives the kernel principal directions U.\n(cid:0)1 ^(cid:3) ^T , where each diagonal\n\nM(cid:28) H.\n\n(cid:28)\n\nelement of ^(cid:3) gives (cid:21)j.\n\n(cid:0)1.\n\n(cid:0)1=2 ^T\n\n(4) De\ufb01ne (cid:22)\u03c6j to be the columns of M(cid:28) H (cid:22)B (cid:22)S\nUnlike the procedure described in Section 3, the above procedure can perform the truncation of\neigenvectors corresponding to small singular values. As well as DMD, this step becomes bene\ufb01cial\nin practice when the Gram matrix G, in our case, is rank-de\ufb01cient or nearly so.\nRemark: Although we assumed that data is a consecutive sequence for demonstrating the correct-\nness of the algorithm, as evident from the above steps, the estimation procedure itself does not neces-\nsarily require a sequence but rather a collection of pairs of consecutive observables f(x(i)\ni=1,\n1 ), with the appropriate de\ufb01nitions of M(cid:28) and M+.\nwhere each pair is supposed to be x(i)\n\n2 = f (x(i)\n\n2 )g(cid:28)\n\n1 ; x(i)\n\n5 Related Works\n\nSpectral analysis (or, referred as the decomposition technique) for dynamical systems is a popular\napproach aimed at extracting information concerning (low-dimensional) dynamics from data. Com-\nmon techniques include global eigenmodes for linearized dynamics (see, e.g., [3]), discrete Fourier\ntransforms, POD for nonlinear dynamics [30, 12], and balancing modes for linear systems [24] as\nwell as multiple variants of these techniques, such as those using shift modes [22] in conjunction\nwith POD modes. In particular, POD, which is in principle equivalent to principal component analy-\nsis, has been extensively applied to the analysis of physical phenomena [5, 22] even though it suffers\nfrom numerous known issues, including the possibility of principal directions in a set of data may\nnot necessarily correspond to the dynamically important ones.\nDMD has recently attracted considerable attention in physics such as \ufb02uid mechanics [2, 10, 21, 25,\n27] and in engineering \ufb01elds [4, 6, 32]. Unlike POD (and its variants), DMD yields direct infor-\nmation about the dynamics such as growth rates and frequencies associated with each mode, which\ncan be obtained from the magnitude and phase of each corresponding eigenvalue of the Koopman\noperator. However, the original DMD has several numerical disadvantages related to the accuracy of\nthe approximate expressions of the Koopman eigenfunctions from data. Therefore, several variants\nof DMD have been proposed to rectify this point, including exact DMD [33] and optimized DMD\n[8]. Jovanovi\u00b4c et al. proposed sparsity-promoting DMD [13], which provides a framework for the\napproximation of the Koopman eigenfunctions with fewer bases. Williams et al. proposed extended\nDMD [35], which works on pre-determined basis functions instead of the monomials of observables.\nAlthough in extended DMD the Koopman mode is de\ufb01ned as the eigenvector of the corresponding\noperator of coef\ufb01cients on basis functions, the resulting procedure is similar to the robust-version of\nour algorithm.\n\n1The Matlab code is available at http://en.44nobu.net/codes/kdmd.zip\n\n5\n\n\fFigure 1: Estimated eigenvalues with the data from the\ntoy system (left) and the H\u00b4enon map (right).\n\nFigure 2: Examples of the true versus (1-step) pre-\ndicted values via the proposed method for the toy\nsystem (left) and the H\u00b4enon map (right).\n\nIn system control, subspace identi\ufb01cation [23, 14], or called the eigensystem realization method,\nhas been a popular approach to modeling of dynamical systems. This method basically identi\ufb01es\nlow-dimensional (hidden) states as canonical vectors determined by canonical correlation analysis,\nand estimates parameters in the governing system using the state estimates. This type of method\nis known as a spectral method for dynamical systems in the machine learning community and has\nrecently been applied to several types of systems such as variants of hidden Markov models [31, 19],\nnonlinear dynamical systems [15], and predictive state-representation [17]. The relation between\nDMD and other methods, particularly the eigensystem realization method, is an interesting open\nproblem. This is brie\ufb02y mentioned in [33] but it would require further investigation in future studies.\n\n6 Empirical Example\n\nTo illustrate how our algorithm works, we here consider two examples: a toy nonlinear system given\nby xt+1= 0:9xt, yt+1= 0:5yt+(0:92(cid:0)0:5)x2\nt , and one of the well-known chaotic maps, called\nthe H\u00b4enon map (xt+1 = 1 (cid:0) ax2\nt + yt, yt+1 = bxt), which was originally presented by H\u00b4enon\nas a simpli\ufb01ed model of the Poincar\u00b4e section of the Lorenz attractor. As for the toy one, the two\neigenvalues are 0.5 and 0.9 with the corresponding eigenfunctions \u03c60:9 = xt and \u03c60:5 = yt (cid:0) x2\nt ,\nrespectively. And as for the H\u00b4enon map, we set the parameters as a = 1:4, b = 0:3. It is known\nthat this map has two equilibrium points ((cid:0)1:13135;(cid:0)0:339406) and (0:631354; 0:189406), whose\ncorresponding eigenvalues are 2:25982 and (cid:0)1:09203, and (cid:0)2:92374 and (cid:0)0:844054.\nWe generated samples according to these systems with several initial conditions and then applied\nthe presented procedure to estimate the Koopman modes. We used the polynomial kernel of degree\nthree for the toy system, and the Gaussian kernel with width 1 for the H\u00b4enon map, respectively.\nThe graphs in Fig. 1 show the estimated eigenvalues for two cases. As seen from the left graph, the\neigenvalues for the toy system were precisely estimated. Meanwhile, from the right graph, the part\nof the eigenvalues of the equilibrium points seem to be approximately estimated by the algorithm.\n\n7 Applications\n\nThe above algorithm provides a decomposition of the dynamics into a \ufb01nite number of modes, and\ntherefore, could be considered as a feature extraction procedure for a nonlinear dynamical system.\nThis would be useful to directly understand dominant characteristics of the dynamics, as done in\nscienti\ufb01c \ufb01elds with DMD [2, 10, 21, 25, 27]. However, here we consider some examples of appli-\ncations using extracted features with the proposed analysis; prediction, sequential change detection,\nand the recognition of dynamic patterns, with some empirical examples.\nPrediction via Preimage: As is known in physics (nonlinear science), long-term predictions in a\nnonlinear dynamical system are, in principle, impossible if at least one of its Lyapunov exponents\nis positive, which would be typically the case of interests. This is true even if the dimension of the\nsystem is low because uncertainty involved in the evolution of the system exponentially increases\nover time. However, it may be possible to predict an observable in the near future (i.e., short-\nterm prediction) if we could formulate a precise predictive model. Therefore, we here consider a\nprediction based on estimated Koopman spectra as in Eq. (13). Since Eq. (13) is represented as the\nlinear combination of \u03d5(xi) (i = 0; : : : ; (cid:28) (cid:0) 1), a prediction can be obtained by considering the\npre-image of the predicted observables in the feature space. Even though any method for \ufb01nding a\npre-image of a vector in the feature space can be used for this purpose, here we describe an approach\n\n6\n\n123456700.20.40.60.81kernel DMDTrueIndexEigenvalue8-3-2-10-10.500.51kernel DMDEquilibriumDMD123RealImage050100150200-4-2024x1TruePredicted value(cid:85)(cid:74)(cid:78)(cid:70)020406080100-1.5-1-0.500.511.5TruePredicted valuex1(cid:85)(cid:74)(cid:78)(cid:70)\fFigure 3: MDS embedding with the distance ma-\ntrix from kernel principal angle between subspaces of\nthe estimated Koopman eigenfunctions for locomotion\ndata. Each point is colored according to its assigned\nmotion (jump, walk, run, and varied).\n\nFigure 4: Sample sequence (top) and change\nscores by our method (green) and the kernel\nchange detection method (blue).\n\nbased on a similar idea with multidimensional scaling (MDS), as describe in [18], where a pre-image\nis recovered to preserve the distance between it and other data points in the input space as well as the\nfeature space. The basic steps are (i) \ufb01nd n-neighbors of a new point ^\u03d5(x(cid:28) +l) in the feature space,\n(ii) calculate the corresponding distance between the preimage ^x(cid:28) +l and each data point xt based on\nthe relation between the feature- and input-space distances, and (iii) calculate the pre-image in order\nto preserve the input distances. For step (i), we need the distance between the estimated feature and\neach data point in the feature space, which is calculated as\n\n\u2225 ^\u03d5(x(cid:28) +l) (cid:0) \u03d5(xt)\u22252 = \u2225 ^\u03d5(x(cid:28) +l)\u22252 + \u2225\u03d5(xt)\u22252 (cid:0) 2 ^\u03d5(x(cid:28) +l)\n(cid:3)\n(M(cid:3)\n\nM(cid:28) )c + k(xt; xt) (cid:0) 2c\n\n(M(cid:3)\n\n= c\n\n(cid:3)\n\n(cid:3)\n\n(cid:28)\n\n\u03d5(xt)\n(cid:28) \u03d5(xt));\n\nwhere c is from Eq. (10). Note that the \ufb01rst and third terms in the above equation can be calculated\nusing the values in the Gram matrix for the data. Once we obtain n-neighbors based on the feature\ndistances, we can construct the corresponding local coordinate by calculating a set of orthogonal\nbases (via, for example, singular value decomposition of the data matrix for the neighbors) based\non the distances in the input spaces, which are analytically obtained from the feature distances [18].\nThe graphs in Fig. 2 show empirical examples of the true versus predicted values as described above\nfor the toy nonlinear system and the H\u00b4enon map. The setups for the data generation and the kernels\netc. are same with the previous section.\nEmbedding and Recognition of Dynamics: A direct but important application of the presented\nanalysis is the embedding and recognition of dynamics with the extracted features. Like (kernel)\nPCA, a set of Koopman eigenfunctions estimated via the analysis can be used as the bases of a\nlow dimensional subspace that represents the dynamics. For example, the recognition of dynamics\nbased on this representation can be performed as follows. Suppose we are given m collection of\ndata sequences fxtg(cid:28)i\nt=0 (i=1;: : : ;m) each of which is generated from some known dynamics C\n(e.g., walks, runs, jumps etc.). Then, a set of estimated Koopman eigenfunctions for each known\ndynamics, which we denote by Ac = M(cid:28) wc for the corresponding complex vector wc, can be\nregarded as the bases of a low-dimensional embedding of the sequences. Hence, if we let A be a\nset of the estimated Koopman eigenfunctions for a new sequence, its category of dynamics can be\nestimated as\nwhere dist(A;Ac) is a distance between two subspaces spanned by A and Ac. For example, such a\ndistance can be given via the kernel principal angles between two subspaces in the feature space [36].\nFig. 3 shows an empirical example of this application using the locomotion data from CMU Graphics\nLab Motion Capture Database.2 We used the RBF Gaussian kernel, where the kernel width was set\nas the median of the distances from a data matrix. The \ufb01gure shows an embedding of the sequences\nvia MDS with the distance matrix, which was calculated with kernel principal angles [36] between\nsubspaces spanned by the Koopman eigenfunctions. Each point is colored according to its motion\n(jump, walk, run, and varied).\n\ndist(A;Ac);\n\n^i = argmin\n\nc2C\n\n2Available at http://mocap.cs.cmu.edu.\n\n7\n\n-1-0.500.511.5-0.400.40.81.2jumpwalkrunvariedwalk, then turnrun, then stoprun, then turnslow walk, then stop0500100015002000-20-1001020304050x1x2x310kDMD101-SVM\fSequential Change-Point Detection: Another possible application is the sequential detection of\nchange-points in a nonlinear dynamical system based on the prediction via the presented analy-\nsis. Here, we give a criterion for this problem based on the so-called cumulative-sum (CUSUM)\nof likelihood-ratios (see, for example, [9]). Let x0; x1; x2; : : : be a sequence of random vec-\ntors distributed according to some distribution ph (h = 0; 1). Then, change-point detection is\nde\ufb01ned as the sequential decision between hypotheses; H0 : p(xi) = p0(xi) for i = 1; : : : ; T ,\nand H1 : p(xi) = p0(xi) for i = 1; : : : ; (cid:28) and p(xi) = p1(xi) for i = (cid:28) + 1; : : : ; T , where\n1 (cid:20) (cid:28) (cid:20) T ((cid:20) 1). In CUSUM, the stopping rule is given as\n\n\u2211\n\n}\nt=(cid:28) +1 log (p1(xt)=p0(xt)) (cid:21) c\n\nT\n\nT : max1(cid:20)(cid:28) <T\n\n= inf\n\nT\n(cid:3) is the stopping time). Although the Koopman operator is, in general, de\ufb01ned for\nwhere c > 0 (T\na deterministic system, it is known to be extended to a stochastic system xt+1 = f (xt; vt), where\nvt is a stochastic disturbance [20]. In that case, the operator works on the expectation. Hence, let us\nde\ufb01ne the distribution of xt as a nonparametric exponential family [7], given by\n\np(xt) = exp (\u27e8(cid:18)((cid:1)); (xt)\u27e9H (cid:0) g((cid:18))) = exp (\u27e8\u03d5 \u25e6 f (xt(cid:0)1); \u03d5(xt)\u27e9H (cid:0) g(\u03d5 \u25e6 f (xt(cid:0)1))) ;\n\n;\n\n{\n\n(cid:3)\n\nwhere g is the log-partition function. Then, the log-likelihood ratio score is given as\n\ni=(cid:28) +1 log (p1(xt)=p0(xt)) / (cid:0)\u2211\n\u2211\n\nT\n\n(\u2211\n\ni k(xj; xi) (cid:0)\u2211\n\nlog (cid:3)(cid:28) (x1:T ) :=\n\nT\ni=(cid:28) +1\n\n(cid:28)\n\nj=1(cid:11)(0)\n\n(cid:28)\n\nj=1(cid:11)(1)\n\ni k(xj; xi)\n\n;\n\n)\n\ni\n\nand (cid:11)(1)\n\nwhere (cid:11)(0)\nare the coef\ufb01cients obtained by the proposed algorithm with the data for i =\n1; : : : ; (cid:28) and i = (cid:28) + 1; : : : ; T , respectively. Here, since the variation of the second term is much\nsmaller than the \ufb01rst one (cf. [7]), the decision rule, log (cid:3)(cid:3) (cid:21) c, can be simpli\ufb01ed by ignoring the\nsecond term. As a result, we have the following decision rule with some critical value ~c (cid:20) 0:\n\ni\n\n(cid:0) log (cid:3)(cid:28) (x1:T ) (cid:25)\u2211\n\n\u2211\n\nT\ni=(cid:28) +1\n\n(cid:28)\n\nj=1(cid:11)(0)\n\ni k(xj; xi) (cid:20) ~c;\n\nA change-point is detected if the above rule is satis\ufb01ed. Otherwise, the procedure will be repeated\nuntil a change-point is detected by updating the coef\ufb01cients using new samples. Fig. 4 shows an\nempirical example of the (normalized) change score calculated with the proposed algorithm, with\ncomparison with the one by the kernel change detection method (cf. [7]), for the shown data gener-\nated from the Lorenz map. We used the RBF Gaussian kernel as in the same way. In the simulation,\nthe parameter of the map changes at 800 and 1200 although the ranges of the data values dramatically\nchange in other areas (where the score by the comparative method has changed correspondingly).\n\n8 Conclusions\n\nWe presented a spectral analysis method with the Koopman operator in RKHSs, and developed\nalgorithms to perform the analysis using a \ufb01nite-length data sequence from a nonlinear dynamical\nsystem, that is essentially reduced to the calculation of a set of orthogonal bases of the Krylov matrix\nin RKHSs and the eigendecomposition of the projection of the Koopman operator onto the subspace\nspanned by the bases. We further considered applications using estimated Koopman spectra with\nthe proposed analysis, which were empirically illustrated using synthetic and real-world data.\n\nAcknowledgments\n\nThis work was supported by JSPS KAKENHI Grant Number JP16H01548.\n\nReferences\n[1] W.E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem.\n\nQuarterly of Applied Mathematics, 9:17\u201329, 1951.\n\n[2] S. Bagheri. Koopman-mode decomposition of the cylinder wake. Journal of Fluid Mechanics, 726:596\u2013\n\n623, 2013.\n\n[3] S. Bagheri, P. Schlatter, P.J. Schmid, and D.S. Henningson. Global stability of a jet in cross \ufb02ow. Journal\n\nof Fluid Mechanics, 624:33\u201344, 2009.\n\n[4] E. Berger, M. Satsuma, D. Vogt, B. Jung, and H. Ben Amor. Dynamic mode decomposition for perturba-\ntion estimation in human robot interaction. In Proc. of the 23rd IEEE Int\u2019l Symp. on Robot and Human\nInteractive Communication, pages 593\u2013600, 2014.\n\n8\n\n\f[5] J.-P. Bonnet, C.R. Cole, J. Delville, M.N. Glauser, and L.S. Ukeiley. Stochastic estimation and proper\northogonal decomposition: Complementary techniques for identifying structure. Experiments in Fluids,\n17:307\u2013314, 1994.\n\n[6] B. Brunton, L. Aohnson, J. Ojemann, and J. Nathan Kutz. Extracting spatial-temporal coherent patterns\nin large-scale neural recordings using dynamic mode decomposition. Journal of Neuroscience Methods,\n258:1\u201315, 2016.\n\n[7] S. Canu and A. Smola. Kernel methods and the exponential family. Neurocomputing, 69:714\u2013720, 2006.\n[8] K.K. Chen, J.H. Tu, and C.W. Rowley. Variants of dynamic mode decomposition: Boundary condition,\n\nKoopman, and Fourier analyses. Journal of Nonlinear Science, 22(6):887\u2013915, 2012.\n\n[9] M. Cs\u00a8org\u00a8o and L. Horv\u00b4ath. Limit Theorems in Change-Point Analysis. Wiley, 1988.\n[10] D. Duke, D. Honnery, and J. Soria. Experimental investigation of nonlinear instabilities in annular liquid\n\nsheets. Journal of Fluid Mechanics, 691:594\u2013604, 2012.\n\n[11] Z. Ghahramani and S.T. Roweis. Learning nonlinear dynamical systems using an EM algorithm. In Proc.\n\nof the 1998 Conf. on Advances in Neural Information Processing Systems II, pages 431\u2013437.\n\n[12] P. Holmes, J.L. Lumley, and G. Berkooz. Turbulence, Coherent Structures, Dynamical Systems and\n\n[13] M.R. Jovanovi\u00b4c, P.J. Schmid, and J.W. Nichols. Sparsity-promoting dynamic mode decomposition.\n\nSymmetry. Cambridge University Press, 1996.\n\nPhysics of Fluids, 26:024103, 2014.\n\n[14] T. Katayama. Subspace Methods for System Identi\ufb01cation. Springer, 2005.\n[15] Y. Kawahara, T. Yairi, and K. Machida. A kernel subspace method by stochastic realization for learning\n\nnonlinear dynamical systems. In Adv. in Neural Infor. Processing Systems 19, pages 665\u2013672. 2007.\n\n[16] B.O. Koopman. Hamiltonian systems and transformation in Hilbert space. Proc. of the National Academy\n\nof Sciences of the United States of America, 17(5):315\u2013318, 1931.\n\n[17] A. Kulesza, N. Jiang, and S. Singh. Spectral learning of predictive state representations with insuf\ufb01cient\n\nstatistics. In Proc. of the 29th AAAI Conf. on Arti\ufb01cial Intelligence (AAAI\u201915), pages 2715\u20132721.\n\n[18] James Tin-Yau Kwok and Ivor Wai-Hung Tsang. The pre-image problem in kernel methods. IEEE Trans.\n\non Neural Networks, 15(6):1517\u20131525, 2004.\n\n[19] I. Melnyk and A. Banerjee. A spectral algorithm for inference in hidden semi-markov models. In Proc.\n\nof the 18th Int\u2019l Conf. on Arti\ufb01cial Intelligence and Statistics (AISTATS\u201915), pages 690\u2013698, 2015.\n\n[20] I. Mezi\u00b4c. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear\n\nDynamics, 41:309\u2013325, 2005.\n\n[21] T.W. Muld, G. Efraimsson, and D.S. Henningson. Flow structures around a high-speed train extracted\nusing proper orthogonal decomposition and dynamic mode decomposition. Computers and Fluids, 57:87\u2013\n97, 2012.\n\n[22] B.R. Noack, K. Afanasiev, M. Morzynski, G. Tadmor, and F. Thiele. A hierarchy of low-dimensional\n\nmodels for the transient and post-transient cylinder wake. J. of Fluid Mechanics, 497:335\u2013363, 2003.\n\n[23] P. Van Overschee and B. De Moor. Subspace Identi\ufb01cation for Linear Systems: Theory, Implementation,\n\nApplications. Kluwer Academic Publishers, 1996.\n\n[24] C.W. Rowley. Model reduction for \ufb02uids using balanced proper orthogonal decomposition. International\n\nJournal of Bifurcation Chaos, 15(3):997\u20131013, 2005.\n\n[25] C.W. Rowley, I. Mezi\u00b4c, S. Bagheri, P. Schlatter, and D.S. Henningson. Spectral analysis of nonlinear\n\n\ufb02ows. Journal of Fluid Mechanics, 641:115\u2013127, 2009.\n\n[26] P.J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Me-\n\nchanics, 656:5\u201328, 2010.\n\n[27] P.J. Schmid and J. Sesterhenn. Dynamic mode decomposition of turbulent cavity \ufb02ows for self-sustained\n\noscillations. Int\u2019l J. of Heat and Fluid Flow, 32(6):1098\u20131110, 2010.\n\n[28] B. Sch\u00a8olkopf, A. Smola, and K.-R. M\u00a8uller. Nonlinear component analysis as a kernel eigenvalue problem.\n\nNeural Computation, 10:1299\u20131319, 1998.\n\n[29] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.\n[30] L. Sirovich. Turbulence and the dynamics of coherent structures. Quarterly of applied mathematics,\n\n45:561\u2013590, 1987.\n\n[31] L. Song, B. Boots, S.M. Siddiqi, G. Gordon, and A. Smola. Hilbert space embeddings of hidden markov\n\nmodels. In Proc. of the 27th Int\u2019l Conf. on Machine Learning (ICML\u201910), pages 991\u2013998.\n\n[32] Y. Suzuki and I. Mezi\u00b4c. Nonlinear koopman modes and power system stability assessment without mod-\n\nels. IEEE Trans. on Power Systems, 29:899\u2013907, 2013.\n\n[33] J.H. Tu, C.W. Rowley, D.M. Luchtenburg, S.L. Brunton, and J.N. Kutz. On dynamic mode decomposition:\n\nTheory and applications. Journal of Computational Dynamics, 1(2):391\u2013421, 2014.\n\n[34] J. Wang, A. Hertzmann, and D.M. Blei. Gaussian process dynamical models. In Advances in Neural\n\nInformation Processing Systems 18, pages 1441\u20131448. 2006.\n\n[35] M.O. Williams, I.G. Kevrekidis, and C.W. Rowley. A data-driven approximation of the Koopman opera-\n\ntor: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25:1307\u20131346, 2015.\n\n[36] L. Wolf and A. Shashua. Learning over sets using kernel principal angles. Journal of Machine Learning\n\nResearch, 4:913\u2013931, 2003.\n\n9\n\n\f", "award": [], "sourceid": 564, "authors": [{"given_name": "Yoshinobu", "family_name": "Kawahara", "institution": "Osaka University"}]}