Processing math: 34%

Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Authors

Michela Meister, Tamas Sarlos, David Woodruff

Abstract

We revisit the classic randomized sketch of a tensor product of q vectors xiRn. The i-th coordinate (Sx)i of the sketch is equal to qj=1ui,j,xj/m, where ui,j are independent random sign vectors. Kar and Karnick (JMLR, 2012) show that if the sketching dimension m=Ω(ϵ2C2Ωlog(1/δ)), where CΩ is a certain property of the point set Ω one wants to sketch, then with probability 1δ, for all x\in\Omega. However, in their analysis C_{\Omega}^2 can be as large as \Theta(n^{2q}), even for a set \Omega of O(1) vectors x. We give a new analysis of this sketch, providing nearly optimal bounds. Namely, we show an upper bound of m = \Theta \left (\epsilon^{-2} \log(n/\delta) + \epsilon^{-1} \log^q(n/\delta) \right ), which by composing with CountSketch, can be improved to \Theta(\epsilon^{-2}\log(1/(\delta \epsilon)) + \epsilon^{-1} \log^q (1/(\delta \epsilon)). For the important case of q = 2 and \delta = 1/\poly(n), this shows that m = \Theta(\epsilon^{-2} \log(n) + \epsilon^{-1} \log^2(n)), demonstrating that the \epsilon^{-2} and \log^2(n) terms do not multiply each other. We also show a nearly matching lower bound of m = \Omega(\eps^{-2} \log(1/(\delta)) + \eps^{-1} \log^q(1/(\delta))). In a number of applications, one has |\Omega| = \poly(n) and in this case our bounds are optimal up to a constant factor. This is the first high probability sketch for tensor products that has optimal sketch size and can be implemented in m \cdot \sum_{i=1}^q \textrm{nnz}(x_i) time, where \textrm{nnz}(x_i) is the number of non-zero entries of x_i. Lastly, we empirically compare our sketch to other sketches for tensor products, and give a novel application to compressing neural networks.