Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Arun Jambulapati, Jerry Li, Kevin Tian
We develop two methods for the following fundamental statistical task: given an \eps-corrupted set of n samples from a d-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a 1−O(\epslog\eps−1)-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-p norm packing semidefinite programs, giving a (1+\eps)-approximate solution in O(plog(nd\eps)\eps−1) input-sparsity time iterations (where n, d are problem dimensions).