Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Poorya Mianjy, Raman Arora
We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that the dropout training with logistic loss achieves ϵ-suboptimality in the test error in O(1/ϵ) iterations.