Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Sally Dong, Haotian Jiang, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye
Many fundamental problems in machine learning can be formulated by the convex program minwhere each f_i is a convex, Lipschitz function supported on a subset of d_i coordinates of \theta. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one f_i term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the f_i's, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to \epsilon-accuracy in \widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon)) gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires O(nd \log (1/\epsilon)) gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by [Axiotis, Karczmarz, Mukherjee, Sankowski and Vladu, ICML 2021]. Our main technical contribution is an adaptive procedure to select an f_i term at every iteration via a novel combination of cutting-plane and interior-point methods.