Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Dheeraj Baby, Yu-Xiang Wang
We consider the framework of non-stationary stochastic optimization [Besbes et.al. 2015] with squared error losses and noisy gradient feedback where the dynamic regret of an online learner against a time varying comparator sequence is studied. Motivated from the theory of non-parametric regression, we introduce a \emph{new variational constraint} that enforces the comparator sequence to belong to a discrete kth order Total Variation ball of radius Cn. This variational constraint models comparators that have piece-wise polynomial structure which has many relevant practical applications [Tibshirani2015]. By establishing connections to the theory of wavelet based non-parametric regression, we design a \emph{polynomial time} algorithm that achieves the nearly \emph{optimal dynamic regret} of ˜O(n12k+3C22k+3n). The proposed policy is \emph{adaptive to the unknown radius} Cn. Further, we show that the same policy is minimax optimal for several other non-parametric families of interest.