Loading [MathJax]/jax/output/CommonHTML/jax.js

Online Robust Regression via SGD on the l1 loss

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback Bibtex MetaReview Paper Review Supplemental

Authors

Scott Pesme, Nicolas Flammarion

Abstract

We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner, one data point after the other. More specifically, for a true parameter θ, we consider the corrupted Gaussian linear model $y = + \varepsilon + bwheretheadversarialnoisebcantakeanyvaluewithprobability\etaandequalszerootherwise.Weconsiderthisadversarytobeoblivious(i.e.,bindependentofthedata)sincethisistheonlycontaminationmodelunderwhichconsistencyispossible.Currentalgorithmsrelyonhavingthewholedataathandinordertoidentifyandremovetheoutliers.Incontrast,weshowinthisworkthatstochasticgradientdescentonthel1lossconvergestothetrueparametervectorata\tilde{O}( 1 / (1 - \eta)^2 n )$ rate which is independent of the values of the contaminated measurements. Our proof relies on the elegant smoothing of the l1 loss by the Gaussian data and a classical non-asymptotic analysis of Polyak-Ruppert averaged SGD. In addition, we provide experimental evidence of the efficiency of this simple and highly scalable algorithm.