{"title": "Convergence-Rate-Matching Discretization of Accelerated Optimization Flows Through Opportunistic State-Triggered Control", "book": "Advances in Neural Information Processing Systems", "page_first": 9770, "page_last": 9779, "abstract": "A recent body of exciting work seeks to shed light on the behavior of accelerated methods in optimization via high-resolution differential equations. These differential equations are continuous counterparts of the discrete-time optimization algorithms, and their convergence properties can be characterized using the powerful tools provided by classical Lyapunov stability analysis. An outstanding question of pivotal importance is how to discretize these continuous flows while maintaining their convergence rates. This paper provides a novel approach through the idea of opportunistic state-triggered control. We take advantage of the Lyapunov functions employed to characterize the rate of convergence of high-resolution differential equations to design variable-stepsize forward-Euler discretizations that preserve the Lyapunov decay of the original dynamics. The philosophy of our approach is not limited to forward-Euler discretizations and may be combined with other integration schemes.", "full_text": "Convergence-Rate-Matching Discretization of\n\nAccelerated Optimization Flows Through\nOpportunistic State-Triggered Control\n\nMechanical and Aerospace Engineering\n\nMechanical and Aerospace Engineering\n\nJorge Cort\u00e9s\n\nUC San Diego\n\nSan Diego, CA 9500\ncortes@ucsd.edu\n\nMiguel Vaquero\n\nUC San Diego\n\nSan Diego, CA 9500\n\nmivaquerovallina@ucsd.edu\n\nAbstract\n\nA recent body of exciting work seeks to shed light on the behavior of acceler-\nated methods in optimization via high-resolution differential equations. These\ndifferential equations are continuous counterparts of the discrete-time optimization\nalgorithms, and their convergence properties can be characterized using the power-\nful tools provided by classical Lyapunov stability analysis. An outstanding question\nof pivotal importance is how to discretize these continuous \ufb02ows while maintaining\ntheir convergence rates. This paper provides a novel approach through the idea of\nopportunistic state-triggered control. We take advantage of the Lyapunov functions\nemployed to characterize the rate of convergence of high-resolution differential\nequations to design variable-stepsize forward-Euler discretizations that preserve\nthe Lyapunov decay of the original dynamics. The philosophy of our approach\nis not limited to forward-Euler discretizations and may be combined with other\nintegration schemes.\n\n1\n\nIntroduction\n\nThis paper builds on the current research activity that seeks to characterize the convergence properties\nof dynamical systems that are continuous-time versions of accelerated algorithms in optimization.\nThis body of work sits at the intersection of various disciplines, most notably nonlinear systems and\noptimization, and has brought to the understanding of acceleration properties a wealth of powerful\ntechniques from Lyapunov stability analysis, calculus of variations, and geometric methods. This\npaper takes another step in this direction by further advancing the synergy between stability analysis\nand the study of optimization algorithms. Here, we propose to employ an opportunistic state-triggered\napproach to discretize continuous \ufb02ows in a way that respects the Lyapunov function decay that\nexplains their accelerated convergence rates.\n\nSummary of Results\n\nThe contribution of this paper is the design of a variable-stepsize forward-Euler discretization that\npreserves the Lyapunov decay of high-resolution differential equations. A main novelty of our\ntechnical approach is to employ, in the context of the discretization of state-of-the-art optimization\n\ufb02ows, ideas from opportunistic state-triggered control to develop real-time implementations of\nclosed-loop dynamical systems. We build on the Lyapunov functions employed to characterize\nthe rate of convergence of high-resolution differential equations to identify triggers that help us\ndetermine the stepsize of the discretization as a function of the current iterate. By design, these\ntriggers ensure that the discretization retains the decay rate of the Lyapunov function. Since the\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fevaluation of the Lyapunov function relies on knowledge of the problem optimizer, we rely on\nwell-known bounds available for strongly-convex functions to synthesize triggers that do not require\nsuch knowledge. Various simulations show the superior performance of the proposed method in\ncomparison with recently proposed constant-stepsize discretizations. The \ufb02exibility of the proposed\nframework provides a promising path towards the understanding of the acceleration phenomenon and\nthe design of new adaptive optimization algorithms.\n\nRelated Work\n\nState-Triggered Control. The basic idea of opportunistic state-triggered control, see [13, 20] and\nreferences therein, is to abandon the paradigm of continuous or periodic sampling/control in exchange\nfor deliberate, opportunistic aperiodic sampling/control to improve ef\ufb01ciency in the use of resources\nwhile maintaining stability. Opportunistic state-triggered control can be roughly divided into event-\ntriggered and self-triggered designs. In event-triggered control, one continuously monitors certain\nconditions whose violation triggers certain desirable action whereas in self-triggered control the\naim is to predict, with the information available at the last triggering time, when the next triggering\ncondition will take place. Beyond stability, many triggered designs also pay attention to guaranteeing\na desired performance by, for instance, making sure that the system enjoys a certain convergence rate.\nThis is accomplished through careful analysis of the evolution of a Lyapunov function. Works [6, 12]\nbased on the derivative-based approach ensure the closed-loop system\u2019s stability by monitoring the\ntime derivative of the Lyapunov function. Other works [7, 26] resort to Lyapunov sampling-type\nconditions, where triggers are stated in terms of the Lyapunov function by monitoring its decay.\nRecent work [6, 21] combines the strengths of both types of design. Our technical treatment here\nfollows the derivative-based approach, albeit we believe that other types of design could also be\ncombined with our results here.\nAccelerated Methods in Optimization. Steepest gradient descent is a keystone in \ufb01rst-order opti-\nmization methods, but can be very slow. The work [22] introduced the so called heavy-ball method,\nwhich aims to speed up the convergence of the gradient descent algorithm by including a momentum\nterm. Later on, [18] designed an algorithm similar in form, the so-called Nesterov\u2019s accelerated\ngradient, and using the technique known as estimating sequences, showed that the method achieves\nblack-box oracle bounds, i.e., it is optimal on the class of smooth convex or strongly convex func-\ntions. Ever since its appearance, acceleration has remained mysterious, to a great extent due to the\nelegant but unintuitive algebraic arguments used by Nesterov in his derivations. To clarify the ideas\nunderlying acceleration methods, the literature has explored different viewpoints. Some work [1]\nrelies on coupling different dynamics, where at any step mirror descent and gradient descent are\ninterpolated. Other approaches are based on dissipativity theory, [14], integral quadratic constraints,\n[16], and even geometric arguments [5]. The most relevant line of research for our purposes is the\none initiated in [25], which introduces a second-order differential equation which is the continuous\nlimit of Nesterov\u2019s accelerated gradient method. This ODE exhibits approximate equivalence to\nNesterov\u2019s scheme and thus can serve as a tool for its analysis. Especially salient is the fact that\nthe analysis (both stability and rate of convergence) of the mentioned ODE is carried out using a\nLyapunov function. This work has spurred a lot of activity aimed at uncovering the rationale behind\nthe phenomenon of acceleration resorting to continuous dynamics, including the variational viewpoint\nintroduced in [27], the connections between Lyapunov theory and estimating sequences in [28] and\nthe Hamiltonian perspective exploited in [8, 17]. The work [15] employs a hybrid systems approach\nto design a continuous-time dynamics with a feedback regulator of the viscosity of the heavy-ball\nODE, guaranteeing arbitrarily fast exponential convergence. Recently, high-resolution ODEs were\nintroduced in [23] as more accurate surrogates for the heavy-ball and Nesterov\u2019s algorithms. The\nwork [3] introduces similar dynamics under the name inertial systems with Hessian-driven damping.\nA number of works have also explored the discretization of accelerated continuous models and their\nstability. The work [27] shows that the forward Euler method can be inef\ufb01cient and even become\nunstable after a few iterations. Some experimentation using symplectic integrators, without theoretical\nguarantees, is given in [4]. The work [29] shows that high-order Runge-Kutta integrators can also be\nused to retain acceleration when discretizing Nesterov\u2019s methods for convex functions. The work [24]\nanalyzes in detail the properties of explicit, implicit, and symplectic integrators when applied to the\nhigh-resolution dynamics corresponding to the heavy-ball and Nesterov\u2019s schemes. The methods\nproposed in this paper can be understood as variable-stepsize discretizations, which are a popular\nclass of methods in numerical analysis. Some examples of their success include line-search methods\n\n2\n\n\fin optimization [19], the Runge\u2013Kutta\u2013Fehlberg algorithm [11], and adaptive-structure-preserving\nintegrators [10].\n\n2 Preliminaries\n\n2.1 Notation and Assumptions\nWe denote by R, R>0, and N the sets of real, positive real, and natural numbers, resp. All vectors\nare considered column vectors and we denote their scalar product by (cid:104)\u00b7,\u00b7(cid:105). We use (cid:107)\u00b7(cid:107) to denote the\n2-norm in Euclidean space. Given \u00b5 \u2208 R>0, a function f : Rn \u2192 R is convex if f (kx + (1\u2212 k)y) \u2264\nkf (x) + (1 \u2212 k)f (y) for x, y \u2208 Rn and k \u2208 [0, 1]. A continuously differentiable function f is\n\u00b5-strongly convex if f (y) \u2212 f (x) \u2265 (cid:104)\u2207f (x), y \u2212 x(cid:105) + \u00b5\n2 (cid:107)x \u2212 y(cid:107)2 for x, y \u2208 Rn. Given L \u2208 R>0\nand a function f : X \u2192 Y between two normed spaces (X,(cid:107)\u00b7(cid:107)X ) and (Y,(cid:107)\u00b7(cid:107)Y ), f is L-Lipschitz\nif (cid:107)f (x) \u2212 f (x(cid:48))(cid:107)Y \u2264 L(cid:107)x \u2212 x(cid:48)(cid:107)X for x, x(cid:48) \u2208 X. We endow the space of Rn\u00d7m matrices with\nthe induced matrix norm, namely (cid:107)A(cid:107) = max(cid:107)x(cid:107)=1 (cid:107)Ax(cid:107). We denote by S 1\n\u00b5,L(Rn) the set of\ncontinuously differentiable functions on Rn, \u00b5-strongly convex that have L-Lipschitz continuous\ngradient. The function class S 2\n\u00b5,L(Rn) of twice differentiable functions\nwith Lipschitz Hessian. A function f : Rn \u2192 R is positive de\ufb01nite relative to x\u2217 if f (x\u2217) = 0 and\nf (x) > 0 for x \u2208 Rn \\ {x\u2217}.\n\n\u00b5,L(Rn) is the subclass of S 1\n\n2.2 Opportunistic State-Triggered Control\n\nHere we provide a basic account of how real-time implementations of continuous-time controlled\ndynamical systems can be developed using opportunistic state-triggered control. We refer to [6, 13] for\nmore complete expositions. We build on these ideas later to develop discretizations of high-resolution\ndifferential equations. Consider the controlled dynamical system on Rn\n\n(1)\nwhere X : Rn \u00d7 Rm \u2192 Rn and X(p\u2217, 0) = 0. Assume we are given a stabilizing feedback law\nu = k(p) along with a Lyapunov function V : Rn \u2192 R that serves as a certi\ufb01cate of the asymptotic\nstability of the equilibrium p\u2217 \u2208 Rn under the closed-loop system. Formally,\n\n\u02d9p = X(p, u),\n\n\u02d9V = (cid:104)\u2207V (p), X(p, k(p))(cid:105) \u2264 \u2212F (p),\n\n(2)\nwith F a positive de\ufb01nite function relative to p\u2217. For simplicity, we restrict ourselves to the case\nF (p) = \u03b1V (p), with \u03b1 \u2208 R>0 (in this case, the convergence of V is exponential). The controller\nu = k(p) cannot be implemented in real time, because it requires both continuous sampling and\nactuation. The real-time implementation of the closed-loop system can be tackled by considering a\nsample-and-hold implementation of (1) of the form\n\n\u02d9p = X(p, k(\u02c6p)),\n\n(3)\nwith p(0) = \u02c6p, where \u02c6p is a sampled version of the state p. The most common approach consists\nof periodically sampling the state, selecting a stepsize small enough to ensure that the function V\nremains monotonically decreasing for the resulting system. However, constant stepsizes are generally\nconservative, as they need to deal with worst-case scenarios. Instead, opportunistic state-triggered\ncontrol seeks to adjust the stepsize as determined by the current system state. Formally, let {t1, t2, . . .}\nbe a sequence of triggering times and denote pi = p(ti), for i \u2208 N. Consider\n\nfor t \u2208 [ti, ti+1] and i \u2208 N.\n\n\u02d9p = X(p, k(pi)),\n\n(4)\nThe objective is then to identify a criterion to select the sequence of triggering times in a way that\nensures that (i) the triggered dynamics (4) retains the guarantees on the evolution of the Lyapunov\nfunction and (ii) the inter-sampling times are lower bounded. Condition (ii) ensures feasibility and\nrules out the possibility of Zeno behavior, cf. [9], whereas condition (i) ensures that the triggered\ndynamics has the same convergence properties as the original dynamics.\nInterestingly, both conditions can be met with designs that involve the Lyapunov function V itself.\nEvent-triggered designs compute the sequence of triggering times by monitoring the evolution of\ncertain function until a condition is violated. More precisely, assume that we have access to a\ncontinuous function g : Rn \u00d7 R \u2192 R that satis\ufb01es g(p, 0) < 0 for all p \u2208 Rn \\ {p\u2217} and\n\n\u02d9V (p(t)) + \u03b1V (p(t)) \u2264 g(\u02c6p, t),\n\n3\n\n\fholds along the solutions of (3). Then, for each i \u2208 N, the next triggering time can be determined by\n\nti+1 = min{t | t > ti such that g(pi, t) = 0}.\n\nNote that, by design, this choice ensures that \u02d9V (p(t)) \u2264 \u2212\u03b1V (p(t)) along the dynamics (4). If g is\nsuch that ti+1, as de\ufb01ned above, can be determined explicitly only with knowledge of pi, one refers\nto this design as self-triggered (because it does not require the continuous monitoring of the evolution\nof the state under (3) in order to identify it).\n\n2.3 Adaptive-Stepsize Forward-Euler Discretization of Continuous-Time Dynamics via\n\nOpportunistic State Triggering\n\nThe ideas described in Section 2.2 can also be applied in the context of discretization of asymptotically\nstable continuous-time dynamical systems, as we explain next. Consider a dynamical system on Rn,\n(5)\nwhere Y : Rn \u2192 Rn. Assume p\u2217 is a globally asymptotically stable equilibrium point under this\ndynamics, and a certi\ufb01cate, in the form of a Lyapunov function V : Rn \u2192 R, is available, meaning\n\u02d9V = (cid:104)\u2207V (p), Y (p)(cid:105) \u2264 \u2212\u03b1V (p) for all p \u2208 Rn. Following the state-triggered approach\nthat\ndescribed above, consider the sampled implementation of the dynamics described by\n\n\u02d9p = Y (p)\n\nwith p(0) = \u02c6p. Note that, the righthand side being constant, this is equivalent to writing\n\n\u02d9p = Y (\u02c6p),\n\np(t) = \u02c6p + tY (\u02c6p),\n\n(6)\n\n(7)\n\nwhich exactly corresponds to a forward-Euler discretization of stepsize t. Therefore, a successful\nopportunistic state-triggered design would ensure that the monotonic behavior of the Lyapunov\nfunction is respected, in turn guaranteeing convergence to the equilibrium at the same rate as\nthe original dynamics (5). Given the connection noted above with the Euler discretization, such\nstate-triggered implementation admits an interesting interpretation from a numerical viewpoint, cf.\nFigure 1. In fact, the state-triggered implementation exactly corresponds to a variable stepsize Euler\ndiscretization where, at each iterate, the trigger criteria helps us determine the stepsize according\nto the decay criteria speci\ufb01ed by the Lyapunov function. Before this decay condition is violated,\nthe state is re-sampled, and the process is repeated. By design, the resulting variable-stepsize Euler\ndiscretization retains the convergence rate of the original dynamics.\n\nFigure 1: Equivalence between opportunistic state-triggered implementation and variable-stepsize forward-Euler\ndiscretization. The black lines correspond to the trajectories of the original dynamics (5). The red lines are\ntrajectories of the family of sampled dynamical systems (6), which are the same as the iterates of forward Euler\nmethods with different stepsizes.\n\nWe \ufb01nish this section by pointing out that continuous models of accelerated optimization algorithms,\nparticularly high-resolution ODEs, \ufb01t the pro\ufb01le described above (i.e., they are globally asymptotically\nstable and their convergence can be characterized via suitable Lyapunov functions). Furthermore, their\nacceleration properties are explained as a consequence of the decay rate of the Lyapunov function.\nThis matches perfectly with our state-triggered approach, which seeks to conserve the decay rate of the\nLyapunov function, consequently ensuring the acceleration properties in the resulting discrete-time\nalgorithm. An interesting challenge arises because of the fact that the Lyapunov function typically\nrelies on knowledge of the optimizer, thereby complicating the evaluation of trigger designs based on\nthem. The rest of this paper shows how we tackle this problem to synthesize trigger designs that do\nnot rely on such knowledge and still guarantee the desired decay rate of the Lyapunov function.\n\n4\n\nTrigger-New iterationLyapunovLevel setSample-and-hold implementation\f3 Triggered Discretization of Heavy Ball and Nesterov\u2019s Continuous Models\n\n(cid:21)\n\n(cid:21)\n\n(cid:20) \u02d9x\n\n(cid:20)\n\nHere we present discretizations, using the methodology described in Section 2.3, of the high-resolution\nordinary differential equations (heavy-ball and Nesterov for strongly convex functions) proposed\nin [23] for optimization. Due to space constraints, we discuss in detail the heavy-ball case and refer\nthe reader to the supplementary material for an analogous discretization of the Nesterov\u2019s accelerated\ngradient for strongly convex functions.\nLet f belong to S 1\ns-dependent family of second-order differential equations,\n\n\u00b5,L(Rn) and let x\u2217 be its unique minimizer. Given s \u2208 R>0, consider the following\n\n\u221a\n\u22122\n\n\u02d9v\n\n=\n\nv\n\u00b5v \u2212 (1 +\n\u221a\nwith initial conditions x(0) = x0, v(0) = \u2212 2\nas Xhb. The following result characterizes the convergence properties of this dynamics.\nTheorem 3.1 ([23]). Let V : Rn \u00d7 Rn \u2192 R be the positive de\ufb01nite function relative to [x\u2217, 0]T ,\n\n. When convenient, we refer to this dynamics\n\n\u00b5s)\u2207f (x))\n\n(8)\n\ns\u2207f (x0)\n1+\n\n\u221a\n\n\u00b5s\n\n,\n\n\u221a\n\nV (x, v) = (1 +\n\n\u221a\n\n\u00b5s)(f (x) \u2212 f (x\u2217)) +\n\n(cid:107)v(cid:107)2 +\n\n1\n4\n\n1\n4\n\n\u221a\n\n(cid:107)v + 2\n\n\u00b5(x \u2212 x\u2217)(cid:107)2 .\n\nThen \u02d9V \u2264 \u2212\u221a\nstable. Moreover, for s \u2264 1/L, the decay rate of the Lyapunov function V implies\n\n4 V along the dynamics (8) and, as a consequence, [x\u2217, 0]T is globally asymptotically\n\n\u00b5\n\nf (x(t)) \u2212 f (x\u2217) \u2264 7(cid:107)x(0) \u2212 x\u2217(cid:107)2\n\n2s\n\ne\u2212 \u221a\n\n\u00b5\n\n4 t.\n\n(9)\n\nGiven our discussion in Section 2.2, Theorem 3.1 provides all the necessary ingredients to develop a\ndiscretization that respects the convergence rate, and hence inherits the guarantee (9). For simplicity,\nwe use the shorthand notation p = [x, v]T . Observe that the Lyapunov function V depends on the\nminimizer, x\u2217, which is unknown. To circumvent this issue, we resort to tight estimates provided by\nthe convexity properties of the function f.\nConsider the sampled-and-hold implementation of (8) given by \u02d9p = Xhb(\u02c6p), p(0) = \u02c6p or, equiva-\nlently, p(t) = \u02c6p + tXhb(\u02c6p). Our goal is to determine a stepsize t, as large as possible, that guarantees\n\u221a\n4 V (p(t) \u2264 0 along the sampled dynamics. The following result provides us with a\nd\ndt V (p(t)) +\nparticularly useful upper bound to ensure this. The proof is provided in the supplementary material.\nProposition 3.2. For the sample-and-hold dynamics \u02d9p = Xhb(\u02c6p), p(0) = \u02c6p, 0 \u2264 s and 0 \u2264 \u03b1 \u2264\n\u221a\n\u00b5/4. Let\n\n\u00b5\n\nV (p(t)) + \u03b1V (p(t)) = (cid:104)\u2207V (\u02c6p + tXhb(\u02c6p)), Xhb(\u02c6p)(cid:105) + \u03b1V (\u02c6p + tXhb(\u02c6p))\n\nd\ndt\n\n= (cid:104)\u2207V (\u02c6p + tXhb(\u02c6p)) \u2212 \u2207V (\u02c6p), Xhb(\u02c6p)(cid:105)\n\n+ \u03b1(V (\u02c6p + tXhb(\u02c6p)) \u2212 V (\u02c6p))\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\nII\n\n(cid:125)\n\n(cid:124)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\nI\n\n(cid:123)(cid:122)\n\nIII\n\n+ (cid:104)\u2207V (\u02c6p), Xhb(\u02c6p)(cid:105) + \u03b1V (\u02c6p)\n\n.\n\n(cid:125)\n\nThen, the following bounds hold:\n\n1. Term I \u2264 AET(\u02c6p, t) \u2264 AST(\u02c6p)t;\n2. Term II \u2264 BCET(\u02c6p, t) \u2264 BST(\u02c6p)t + CST(\u02c6p)t2;\n3. Term III \u2264 DET(\u02c6p, t) = DST(\u02c6p),\n\n5\n\n\fwhere\n\nAET(\u02c6p, t) = (1 +\n\nBCET(\u02c6p, t) = \u03b1(cid:0)(1 +\n\n+ t(1 +\n\n\u221a\n\n\u221a\n\u00b5s)(cid:104)\u2207f (\u02c6x + t\u02c6v) \u2212 \u2207f (\u02c6x), \u02c6v(cid:105) + t2\n\u221a\n\u00b5s)2 (cid:107)\u2207f (\u02c6x)(cid:107)2 ,\n\u221a\n\u00b5s)(f (\u02c6x + t\u02c6v) \u2212 f (\u02c6x)) + t2 1\n4\n\u221a\n\u221a\n\u00b5s)(cid:104)\u02c6v,\u2207f (\u02c6x)(cid:105) \u2212 t\n\u221a\n\n\u221a\n(cid:107)\u22122\n\u00b5(cid:107)\u02c6v(cid:107)2 + t2 1\n4\n\n\u221a\n\n\u00b5(1 +\n\n\u00b5s)(cid:104)\u2207f (\u02c6x), \u02c6v(cid:105) + t2\u00b5(cid:107)\u02c6v(cid:107)2\n\n\u221a\n\n\u00b5s)\u2207f (\u02c6x)(cid:107)2\n\n\u00b5\u02c6v \u2212 (1 +\n(cid:107)(1 +\n\n\u221a\n\n\u00b5s)\u2207f (\u02c6x)(cid:107)2\n\nDET(\u02c6p, t) = (\u03b1\n\n\u00b5(1 +\n\n\u2212 t(1 +\n\u2212 t\n3\n4\nAST(\u02c6p) = (1 +\n\n\u00b5)(cid:107)\u2207f (\u02c6x)(cid:107)2 /L(cid:1),\n\u221a\n\u00b5)(cid:107)\u02c6v(cid:107)2 +(cid:0)(1 +\n\u2212 \u221a\n\u221a\nBST(\u02c6p) = \u03b1(cid:0) \u2212 \u221a\n\u00b5s)L(cid:107)\u02c6v(cid:107)2 + 2\n\u00b5(cid:107)\u02c6v(cid:107)2 \u2212 \u221a\nCST(\u02c6p) = \u03b1(cid:0)(1 +\n\u221a\n\n\u00b5(1 +\n\u221a\n\n(cid:107)\u02c6v(cid:107)2 +\n\n\u00b5(1 +\n\n\u00b5s)\n\n\u221a\n\nL\n2\n\n1\n4\n\n\u00b5\n\n2L\n\n\u221a\n\n\u221a\n\n\u221a\n\n\u00b5(1 +\n2\n\n+ (\u03b12\u00b5 \u2212\n\n\u03b1 \u2212 \u221a\n\u00b5s)\n\u221a\n(cid:107)\u2207f (\u02c6x)(cid:107)2(cid:1),\n\u00b5s)(cid:104)\u2207f (\u02c6x), \u02c6v(cid:105) + 2\u00b5(cid:107)\u02c6v(cid:107)2 + (1 +\n1\n\u00b5s)\nL\n\u221a\n(cid:107)\u22122\n\u00b5\u02c6v \u2212 (1 +\n\n\u00b5s)\u2207f (\u02c6x)(cid:107)2 +\n\n\u221a\n\n\u00b5s)\u00b5)\n\u221a\n\n(cid:107)\u2212(1 +\n\n1\n4\n\n(cid:1)(cid:107)\u2207f (\u02c6x)(cid:107)2 ,\n\n1\n)\nL2\n\u00b5s)2 (cid:107)\u2207f (\u02c6x)(cid:107)2 ,\n\n\u00b5s)\u2207f (\u02c6x)(cid:107)2(cid:1).\n\n\u221a\n\nWe de\ufb01ne, with a slight abuse of notation,\n\ngET(\u02c6p, t) = AET(\u02c6p, t) + BCET(\u02c6p, t) + DET(\u02c6p, t),\ngST(\u02c6p, t) = CST(\u02c6p)t2 + (AST(\u02c6p) + BST(\u02c6p))t + DST(\u02c6p),\n\n(the reason for the subindexes ET, for event-triggered, and ST, for self-triggered, becomes clear\nbelow). With these functions in place, it follows from Proposition 3.2 that\nV (p(t)) + \u03b1V (p(t)) \u2264 gET(\u02c6p, t) \u2264 gST(\u02c6p, t).\n\n(10)\n\nd\ndt\n\nThis is all we need to determine the stepsize starting from \u02c6p. Formally, we set\n\nt\n\n{t > 0 such that g#(\u02c6p, t) = 0},\n\nstep#(\u02c6p) = min\n\n(11)\nwhere # \u2208 {ET, ST}. Note that, when # = ET, then gET(\u02c6p, t) = 0 is an implicit equation on t.\nInstead, when # = ST, then the solution to the quadratic equation gST(\u02c6p, t) = 0 can be obtained\nDST(\u02c6p) < 0 when \u03b1 \u2264 \u221a\nexplicitly (i.e., determined only with the information about the current state \u02c6p) since CST(\u02c6p) > 0 and\n\n\u2212(AST(\u02c6p) + BST(\u02c6p)) +(cid:112)(AST(\u02c6p) + BST(\u02c6p))2 \u2212 4CST(\u02c6p)DST(\u02c6p)\n\n\u00b5/4. In fact, we have\n\n.\n\nstepST(\u02c6p) =\n\n2CST(\u02c6p)\n\nAlgorithm 1 describes in pseudocode the resulting variable-stepsize integrator.\n\nAlgorithm 1: Triggered Forward-Euler algorithm\nInitialization: Initial point (p0), convergence rate (\u03b1), objective function (f), tolerance (\u0001);\nSet: k = 0;\nwhile (cid:107)\u2207f (x)(cid:107) \u2265 \u0001 do\n\nCompute stepsize tk at current point according to (11);\nCompute next iterate pk+1 = pk + tkXhb(pk);\nSet k = k + 1\n\nend\nTheorem 3.3. For 0 < \u03b1 \u2264 \u221a\nwith the following properties\n\n\u00b5/4 and # \u2208 {ET, ST}, Algorithm 1 is a variable-stepsize integrator\n\n(i) the stepsize is uniformly lower bounded by a positive constant. Namely\n\n(cid:113)\n\n\u2212c2 +\n\n2 + c1 \u2264 stepST (p)\nc2\n\n6\n\n\fwhere\nc1 = min{\n\nc2 = max{\n\n\u00b5 \u2212 3\u03b1\n\u221a\n\n2(cid:0)\u221a\n\u03b1(cid:0)4\u00b5 + L\n(cid:0)2\u00b5 +\n\u03b1(cid:0)4\u00b5 + L\n\n2(cid:0)\u22124\u03b1\u00b5 + L(cid:0)\u221a\n(cid:1)\n\u00b5s + L(cid:1) ,\n2(cid:0)\u221a\n\u00b5 + L(cid:1)(cid:0)\u221a\n\u00b5s + 1(cid:1)\n3\u03b1(cid:0)\u221a\n\u00b5s + L(cid:1)\n\n\u00b5 \u2212 \u03b1(cid:1)(cid:0)\u221a\n3\u03b1L2(cid:0)\u221a\n\u00b5s + 1(cid:1)\n\u00b5s + 1(cid:1)\n\n\u00b5 +\n\n\u221a\n\n\u221a\n\n\u221a\n\n4\n\n,\n\n}.\n\n\u00b5s + 1(cid:1) + \u00b53/2(cid:0)\u221a\n\u00b5s + 1(cid:1)2\n\n\u00b5s + 1(cid:1)(cid:1)\n\n}\n\n(ii) d\n\ndt V (pk + tXhb(pk)) \u2264 \u2212\u03b1V (pk + tXhb(pk)) for all t \u2208 [0, tk] and all k \u2208 {0} \u222a N.\n\nAs a consequence, it follows that f (xk+1) \u2212 f (x\u2217) \u2264 7(cid:107)x(0)\u2212x\u2217(cid:107)2\ni=0 ti for all k \u2208 {0} \u222a N.\nProof. Since gET(p, t) \u2264 gST(p, t) we have stepST(p) \u2264 stepET(p) and therefore it is enough to\nprove the \ufb01rst claim for the ST-case. We rewrite,\n\u2212(AST(p) + BST(p))\n\ne\u2212\u03b1(cid:80)k\n(cid:115)(cid:18) AST(p) + BST(p)\n\n2s\n\nWe bound, using (cid:107)a + b(cid:107)2 \u2264 2(cid:107)a(cid:107)2 + 2(cid:107)b(cid:107)2,\n\nstepST(p) =\n\n2CST(p)\n\n\u221a\n\nCST(p) \u2264 \u03b1(cid:0)((1 +\n\u00b5)(cid:107)v(cid:107)2 \u2212(cid:0)(1 +\n\u03b1(cid:0)((1 +\n\u2265 \u2212(\u03b1 3\n\n4 \u2212 \u221a\n\n\u00b5s) L\n\nL\n2\n\n\u00b5s)\n\n\u221a\n\nand therefore\n\u2212DST(p)\nCST(p)\n\n+\n\n2CST(p)\n\n\u221a\n\n3\n4\n\n(1 +\n\n+ 2\u00b5)(cid:107)v(cid:107)2 +\n\u00b5s) \u03b1\u2212\u221a\n\n\u221a\n2 + 2\u00b5)(cid:107)v(cid:107)2 + 3\n\n\u00b5\n\n2L + (\u03b12\u00b5 \u2212 \u221a\n\u221a\n\n4 (1 +\n\n.\n\nCST(p)\n\n(cid:19)2 \u2212 DST(p)\n\u00b5s)2 (cid:107)\u2207f (x)(cid:107)2(cid:1)\n\u00b5s)2 (cid:107)\u2207f (x)(cid:107)2(cid:1)\n\n\u00b5(1+\n2\n\n) 1\nL2\n\n\u00b5s)\u00b5)\n\n\u221a\n\n(cid:1)(cid:107)\u2207f (x)(cid:107)2\n\n.\n\nWe observe that if we rename (cid:107)\u2207f (x)(cid:107) = z1 and (cid:107)v(cid:107) = z2 then the last expression has the form\n\nWe show in the supplementary material that such expression is upper and lower bounded by positive\nconstants, i.e., there exist (explicit) c1 and c2 \u2208 R>0 such that\n\n\u03b21z2\n\u03b23z2\n\n1 + \u03b22z2\n2\n1 + \u03b24z2\n2\n\n.\n\n(12)\n\n0 < c1 \u2264 \u03b21z2\n\u03b23z2\n\n1 + \u03b22z2\n2\n1 + \u03b24z2\n2\n\n\u2264 c2\n\nfor all z1, z2 \u2208 R\\{0}.\n\nUsing this observation, we have\n\n\u2212(AST(p) + BST(p))\n\n2CST(p)\n\n+\n\n\u2264 \u2212(AST(p) + BST(p))\n\n(cid:115)(cid:18) AST(p) + BST(p)\n\n(cid:19)2\n(cid:115)(cid:18) AST(p) + BST(p)\n\n2CST(p)\n\n+ c1\n\n(cid:19)2 \u2212 DST(p)\n\n2CST(p)\n\n+\nIt is easy to see that the function f (z) = \u2212z +\nz2 + c1 is monotonically decreasing and positive\neverywhere. So, if z is upper bounded, then f (z) is lower bounded by a positive constant. With this\nobservation, and the form of the last expression, it is clear that if we upper bound z = AST(p)+BST(p)\nwe are done. To achieve this goal let us use\n\n2CST(p)\n\nCST(p)\n\n2CST(p)\n\n\u221a\n\n.\n\nCST(p) \u2265 \u03b1(cid:0)(1 +\n\n\u221a\n\n\u00b5s)\n\n\u00b5s)2 (cid:107)\u2207f (x)(cid:107)2(cid:1).\n\n\u221a\n\n\u221a\n\n\u00b5(1 +\n\n\u00b5s)(cid:107)\u2207f (x)(cid:107)2\n\n\u221a\n\n(cid:107)v(cid:107)2 +\n\nL\n1\n2\n4\n\u221a\n\u00b5s)L(cid:107)v(cid:107)2 +\n\n(1 +\n\n\u221a\n\n\u221a\n\n+\n\n\u00b5(1 +\n\n\u00b5s)(cid:107)v(cid:107)2 + 2\u00b5(cid:107)v(cid:107)2 + (1 +\n\n\u00b5s)2 (cid:107)\u2207f (x)(cid:107)2\n\nand\n\nAST(p) + BST(p) \u2264 AST(p) \u2264 (1 +\n\u221a\n\nwhere we have used Cauchy-Schwartz and Young\u2019s inequality in the last estimate. Now, the\nfraction AST(p)+BST(p)\nhas the form (12), and so we can conclude the existence of c2 such that\n\u2264 c2. To \ufb01nish now the proof of the \ufb01rst part of Theorem 3.3 it is only necessary to\nAST(p)+BST(p)\nuse the explicit expressions of c1 and c2 provided in the supplementary material. The second part\n(cid:3)\nfollows from Proposition 3.2 and the algorithm design.\n\n2CST(p)\n\n2CST(p)\n\n7\n\n\flogistic regression cost function, namely(cid:80)10\n\nWe compare the performance of Algorithm 1 for the event-triggered (ET) and self-triggered (ST)\ncases with the explicit and symplectic integrators proposed in [24] in a logistic regression example.\nFigure 2 illustrates the evolution of the stepsize, objective and Lyapunov functions. We set \u03b1 =\n\u00b5/4\nand s = \u00b5/(36L2) following the values in [24]. The objective function corresponds to the regularized\ni=1 log(1 + e\u2212yi(cid:104)vi,x(cid:105)) + 1/2(cid:107)x(cid:107)2, where x \u2208 R4 and\nwe have generated the sampled points (vi, yi) randomly. This function is 1-strongly convex. The\nvalue of L = 177.49 can be estimated by straightforward computations. In the plots, we display the\noptimal stepsize only for comparison purposes, as the minimizer is in practice unknown. Knowledge\nof the minimizer x\u2217 would enable the explicit computation of the Lyapunov function, cf. Theorem 3.1,\nwhich in turn would allow to solve \u02d9V + \u03b1V by any standard numerical method at any iteration. This\nis what we refer to as optimal stepsize.\n\n\u221a\n\nFigure 2: (Top, left) Comparison of stepsizes along the various discrete-time dynamics. The ET and ST\nintegrators keep a larger stepsize for the \ufb01rst 2000 iterations, approaching the optimizer much faster. (Top,\nright) Comparison of the evolution of the ET stepsize and optimal stepsize along the ET dynamics. We observe\nhow the stepsize computed by the ET integrator chases the optimal stepsize as the state evolves. (Bottom, left)\nComparison of the evolution of the objective function along the different dynamics. (Bottom, right) Comparison\nof the evolution of the Lyapunov function along the different dynamics.\n\nFigure 3 provides another comparison for a quadratic objective function over R50 de\ufb01ned by an\n(ill-conditioned) positive de\ufb01nite 50 \u00d7 50 matrix, where \u00b5 = 3.5 and L = 7006.6. We plot the\nevolution of the objective and the logarithm of the Lyapunov functions, comparing the proposed\nalgorithms with forward and symplectic Euler. The supplementary material contains additional\ncomparisons for various 2-dimensional quadratic cases.\n\n4 Conclusions and Future Work\n\nWe have introduced a novel opportunistic state-triggered approach to the discretization of optimization\n\ufb02ows. Our approach relies on the key observation that resource-aware control provides a principled\nway of going from continuous-time control design to real-time implementation with stability and\nperformance guarantees. This is done by opportunistically prescribing when certain action should\noccur. In this case, the action amounts to a certain decay of the Lyapunov function. The presented\nframework provides a promising path towards the design of adaptive optimization algorithms. We\nhave provided theoretical guarantees that ensure the implementability of the method and numerical\ncomparisons with recent discretizations of the heavy-ball dynamics. The supplemental material\ncontains analogous results for the Nesterov\u2019s accelerated gradient for strongly convex functions.\n\n8\n\nETIntegratorSTIntegratorForwardEulerSymplecticEuler02004006008001000120014000.0000.0020.0040.0060.0080.010IterationsStepsizeETIntegratorOptimalStepsize02004006008001000120014000.000.020.040.060.080.10IterationsStepsize0100020003000400050000200400600800Iterationsf(x)-f(x*)010002000300040005000050010001500IterationsLyapunovFunction\fFigure 3: (Left) Comparison of the evolution of the objective function along the different dynamics. (Right)\nComparison of the evolution of the logarithm of the Lyapunov function along the different dynamics.\n\nWe have employed a derivative-based approach to trigger design combined with the forward Euler\nmethod for its simplicity, but believe that other powerful schemes can be synthesized in the future by\nresorting to the following ideas.\nUse of more complex integrators. The setting presented here is general enough to incorporate other\nintegrators beyond the forward Euler method that may yield better performance. Additionally, the\nsampled information employed in our approach is a zero-order-hold, and possibilities exist within\nthe theory of resource-aware control to employ higher-order holds that more accurately approximate\nthe evolution of the continuous-time dynamics. The direct application of the forward Euler method\nto Nesterov\u2019s continuous model gives a dynamics that includes the second-order term \u22072f (x)v.\nThis is a drawback, as precisely the success of Nesterov\u2019s method is the requirement of only \ufb01rst-\n\u221a\norder information. Two promising approaches to circumvent this issue are to approximate the term\ns\u22072f (x)v by \u2207f (xk+1) \u2212 \u2207f (xk), cf. [19, 24], and to recast the second-order Nesterov\u2019s ODE\nas a \ufb01rst-order one, cf. [2, 3] and develop analogous schemes for the resulting dynamics.\nConvergence rate as a result of Lyapunov decay and uniform lower bound on stepsize. The result in\nTheorem 3.3 links the convergence rate of the discrete-time algorithm to the Lyapunov decay and\nthe stepsize of the state-triggered implementation of the continuous-time dynamics. More explicitly,\nif we bound the stepsize by \u02c6t then f (xk+1) \u2212 f (x\u2217) \u2208 O(exp(\u2212\u221a\n\u02c6t)k). Therefore acceleration\ncan be understood as a consequence of the ability of the state-triggered implementation to maintain\ncertain Lyapunov decay for a long enough time (i.e., large stepsize). Although we do not observe\nacceleration in the numerical studies presented here, i.e., exp(\u2212\u221a\nL, this is probably due\nto the simplicity of the used forward-Euler integrator employed. Nonetheless, the introduced variable\nstepsize integrators clearly outperform their equivalent \ufb01xed-stepsize counterparts, reinforcing the\nimportance of extending our design to more complex integrators with which to achieve the desired\nconvergence rates.\nUse of other triggering conditions. Other approaches to trigger design, beyond derivative-based ones,\nare promising. For instance, [26] introduces a Lyapunov sampling event-triggered approach whose\nmain idea is to continuously sample the Lyapunov function until certain decay has been reached.\nThe trigger takes the form tk+1 = min{t > ti such that V (x(t)) \u2212 \u03b7V (xi) = 0}, where \u03b7 \u2208 [0, 1]\nis a design parameter. The expression is similar to the difference V (xi+1) \u2212 V (xi), which is upper\nbounded in [24] for the iterations of explicit and implicit symplectic integrators, and plays a key role\nin the convergence analysis. This suggests the use of similar bounds to develop variable-stepsize\nintegrators. Along the same lines, the use of dynamic triggers [6] to keep track of how much the\nLyapunov function decreases along the evolution is also appealing.\nExtensions to convex functions. The work [23] presents another high-resolution ODE for the case of\nNesterov\u2019s method applied to convex functions. The sharp bounds on the evolution of the Lyapunov\nfunction provided by strong convexity that we employ in the trigger design do not hold anymore. It is\ntherefore challenging and extremely interesting to develop new ideas to tackle this problem.\n\n\u02c6t) \u2265 1\u2212(cid:112) \u00b5\n\n\u00b5\n4\n\n\u00b5\n4\n\nAcknowledgments\n\nThis work was supported by NSF Award CNS-1446891 and AFOSR Award FA9550-15-1-0108.\n\n9\n\n020004000600080001000005.0\u00d71071.0\u00d71081.5\u00d7108IterationsObjectiveFunctionETIntegratorSTIntegratorForwardEulerSymplecticEuler0200040006000800010000141516171819IterationsLogLyapunovFunction\fReferences\n[1] Z. Allen-Zhu and L. Orecchia. Linear Coupling: An Ultimate Uni\ufb01cation of Gradient and Mirror Descent.\nIn 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), pages 1\u201322, Dagstuhl,\nGermany, 2017.\n\n[2] F. Alvarez, H. Attouch, J. Bolte, and P. Redont. A second-order gradient-like dissipative dynamical system\nwith Hessian-driven damping. Application to optimization and mechanics. Journal de Math\u00e9matiques\nPures et Appliqu\u00e9es, 81(8):747\u2013779, 2002.\n\n[3] H. Attouch, Z. Chbani, J. Fadili, and H. Riahi. First-order optimization algorithms via inertial systems\n\nwith Hessian driven damping. arXiv preprint arXiv:1907.10536, 2019.\n\n[4] M. Betancourt, M. Jordan, and A. C. Wilson. On symplectic optimization. arXiv preprint arXiv:\n\n1802.03653, 2018.\n\n[5] S. Bubeck, Y. Lee, and M. Singh. A geometric alternative to Nesterov\u2019s accelerated gradient descent. arXiv\n\npreprint arXiv:1506.08187, 2015.\n[6] V. S. Dolk, D. P. Borgers, and W. P. M. H. Heemels. Output-based and decentralized dynamic event-\ntriggered control with guaranteed Lp-gain performance and Zeno-freeness. IEEE Transactions on Auto-\nmatic Control, 62(1):34\u201349, 2017.\n\n[7] S. Durand, N. Marchand, and J. F. Guerrero-Castellanos. Simple Lyapunov sampling for event-driven\n\ncontrol. IFAC Proceedings Volumes, 44(1):8724\u20138730, 2011.\n\n[8] G. Fran\u00e7a, J. Sulam, D. Robinson, and R. Vidal. Conformal symplectic and relativistic optimization. arXiv\n\npreprint arXiv:1903.04100, 2019.\n\nPrinceton University Press, 2012.\n\n[9] R. Goebel, R. G. Sanfelice, and A. R. Teel. Hybrid Dynamical Systems: Modeling, Stability, and Robustness.\n\n[10] E. Hairer, C. Lubich, and G. Wanner. Geometric numerical integration. Springer, Berlin, Heidelberg, 2010.\n[11] E. Hairer, S. P. N\u00f8rsett, and G. Wanner. Solving Ordinary Differential Equations I (2Nd Revised. Ed.):\n\nNonstiff Problems. Springer, Berlin, Heidelberg, 1993.\n\n[12] W. P. M. H. Heemels, M. C. F. Donkers, and A. R. Teel. Periodic event-triggered control based on state\n\nfeedback. In IEEE Conf. on Decision and Control, pages 2571\u20132576, Dec 2011.\n\n[13] W. P. M. H. Heemels, K. H. Johansson, and P. Tabuada. An introduction to event-triggered and self-triggered\n\ncontrol. In IEEE Conf. on Decision and Control, pages 3270\u20133285, Maui, HI, 2012.\n\n[14] B. Hu and L. Lessard. Dissipativity theory for Nesterov\u2019s accelerated method. In Proceedings of the\n34th International Conference on Machine Learning, pages 1549\u20131557, International Convention Centre,\nSydney, Australia, August 2017.\n\n[15] A. S. Kolarijani, P. M. Esfahani, and T. Keviczky. Fast Gradient-Based Methods with Exponential Rate: A\nHybrid Control Framework. In Proceedings of the 35th International Conference on Machine Learning,\npages 2728\u20132736, July 2018.\n\n[16] L. Lessard, B. Recht, and A. Packard. Analysis and design of optimization algorithms via integral quadratic\n\nconstraints. SIAM Journal on Optimization, 26(1):57\u201395, 2016.\n\n[17] C. J. Maddison, D. Paulin, Y. W. Teh, B. O\u2019Donoghue, and A. Doucet. Hamiltonian descent methods.\n\narXiv preprint arXiv:1809.05042, 2018.\n\n[18] Y. E. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k2).\n\nSoviet Mathematics Doklady, 27(2):372\u2013376, 1983.\n\n[19] J. Nocedal and S. Wright. Numerical optimization. Springer, Berlin, Heidelberg, 2006.\n[20] C. Nowzari, E. Garcia, and J. Cort\u00e9s. Event-triggered control and communication of networked systems\n\nfor multi-agent consensus. Automatica, 105:1\u201327, 2019.\n\n[21] P. Ong and J. Cort\u00e9s. Event-triggered control design with performance barrier. In IEEE Conf. on Decision\n\nand Control, pages 951\u2013956, Miami Beach, FL, Dec. 2018.\n\n[22] B. T. Polyak. Some methods of speeding up the convergence of iterative methods. USSR Computational\n\nMathematics and Mathematical Physics, 4(5):1\u201317, 1964.\n\n[23] B. Shi, S. S. Du, M. I. Jordan, and W. J. Su. Understanding the acceleration phenomenon via high-resolution\n\ndifferential equations. arXiv preprint arXiv:1810.08907, 2018.\n\n[24] B. Shi, S. S. Du, M. I. Jordan, and W. J. Su. Acceleration via symplectic discretization of high-resolution\n\ndifferential equations. arXiv preprint arXiv:1902.03694, 2019.\n\n[25] W. Su, S. Boyd, and E. J. Cand\u00e9s. A differential equation for modeling Nesterov\u2019s accelerated gradient\n\nmethod: theory and insights. Journal of Machine Learning Research, 17:1\u201343, 2016.\n\n[26] M. Velasco, P. Mart\u00ed, and E. Bini. On Lyapunov sampling for event-driven controllers. In IEEE Conf. on\n\nDecision and Control, pages 6238\u20136243, Dec 2009.\n\n[27] A. Wibisono, A. C. Wilson, and M. I. Jordan. A variational perspective on accelerated methods in\n\noptimization. Proceedings of the National Academy of Sciences, 113(47):E7351\u2013E7358, 2016.\n\n[28] A. C. Wilson, B. Recht, and M. I. Jordan. A Lyapunov analysis of momentum methods in optimization.\n\narXiv preprint arXiv:1611.02635, 2018.\n\n[29] J. Zhang, A. Mokhtari, S. Sra, and A. Jadbabaie. Direct Runge-Kutta discretization achieves acceleration.\n\nIn Conference on Neural Information Processing Systems, pages 3904\u20133913, 2018.\n\n10\n\n\f", "award": [], "sourceid": 5181, "authors": [{"given_name": "Miguel", "family_name": "Vaquero", "institution": "UCSD"}, {"given_name": "Jorge", "family_name": "Cortes", "institution": "UCSD"}]}