Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)
Benjamin Van Roy, John Tsitsiklis
We consider the solution to large stochastic control problems by means of methods that rely on compact representations and a vari(cid:173) ant of the value iteration algorithm to compute approximate cost(cid:173) to-go functions. While such methods are known to be unstable in general, we identify a new class of problems for which convergence, as well as graceful error bounds, are guaranteed. This class in(cid:173) volves linear parameterizations of the cost-to- go function together with an assumption that the dynamic programming operator is a contraction with respect to the Euclidean norm when applied to functions in the parameterized class. We provide a special case where this assumption is satisfied, which relies on the locality of transitions in a state space. Other cases will be discussed in a full length version of this paper.