Part of Advances in Neural Information Processing Systems 6 (NIPS 1993)
Christopher Atkeson
Dynamic programming provides a methodology to develop planners and controllers for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning and control problems to be solved. We use second order local trajectory optimization to generate locally optimal plans and local models of the value function and its derivatives. We maintain global consistency of the local models of the value function, guaranteeing that our locally optimal plans are actually globally optimal, up to the resolution of our search procedures.
Learning to do the right thing at each instant in situations that evolve over time is difficult, as the future cost of actions chosen now may not be obvious immediately, and may only become clear with time. Value functions are a representational tool that makes the consequences of actions explicit. Value functions are difficult to learn directly, but they can be built up from learned models of the dynamics of the world and the cost function. This paper focuses on how fast optimizers that only produce locally optimal answers can playa useful role in speeding up the process of computing or learning a globally optimal value function. Consider a system with dynamics Xk+l = f(xk, Uk) and a cost function L(Xk, Uk),