Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Yuanhao Wang, Jian Li
This paper studies minimax optimization problems min, where f(\x,\y) is m_\x-strongly convex with respect to \x, m_\y-strongly concave with respect to \y and (L_\x,L_{\x\y},L_\y)-smooth. Zhang et al. \cite{zhang2019lower} provided the following lower bound of the gradient complexity for any first-order method: \Omega\Bigl(\sqrt{\frac{L_\x}{m_\x}+\frac{L_{\x\y}^2}{m_\x m_\y}+\frac{L_\y}{m_\y}}\ln(1/\epsilon)\Bigr). This paper proposes a new algorithm and proved a gradient complexity bound of \Tilde{O}\Bigl(\sqrt{\frac{L_\x}{m_\x}+\frac{L\cdot L_{\x\y}}{m_\x m_\y}+\frac{L_\y}{m_\y}}\ln\left(1/\epsilon\right)\Bigr), where L=\max\{L_\x,L_{\x\y},L_\y\}. This improves over the best known upper bound \Tilde{O}\left(\sqrt{\nicefrac{L^2}{m_\x m_\y}} \ln^3\left(1/\epsilon\right)\right) by Lin et al. \cite{lin2020near}. Our bound achieves linear convergence rate and tighter dependency on condition numbers, especially when L_{\x\y}\ll L (i.e., the weak interaction regime). Via simple reduction, our new bound also implies improved bounds for strongly convex-concave problems and convex-concave problems. When f is quadratic, we can further improve the bound to O\Bigl(\sqrt{\frac{L_\x}{m_\x}+\frac{L_{\x\y}^2}{m_\x m_\y}+\frac{L_\y}{m_\y}}\left(\frac{L^2}{m_\x m_\y}\right)^{o(1)}\ln(1/\epsilon)\Bigr), which matches the lower bound up to a sub-polynomial factor.