Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
Michela Meister, Tamas Sarlos, David Woodruff
We revisit the classic randomized sketch of a tensor product of q vectors xi∈Rn. The i-th coordinate (Sx)i of the sketch is equal to ∏qj=1⟨ui,j,xj⟩/√m, where ui,j are independent random sign vectors. Kar and Karnick (JMLR, 2012) show that if the sketching dimension m=Ω(ϵ−2C2Ωlog(1/δ)), where CΩ is a certain property of the point set Ω one wants to sketch, then with probability 1−δ, ‖ for all x\in\Omega. However, in their analysis C_{\Omega}^2 can be as large as \Theta(n^{2q}), even for a set \Omega of O(1) vectors x. We give a new analysis of this sketch, providing nearly optimal bounds. Namely, we show an upper bound of m = \Theta \left (\epsilon^{-2} \log(n/\delta) + \epsilon^{-1} \log^q(n/\delta) \right ), which by composing with CountSketch, can be improved to \Theta(\epsilon^{-2}\log(1/(\delta \epsilon)) + \epsilon^{-1} \log^q (1/(\delta \epsilon)). For the important case of q = 2 and \delta = 1/\poly(n), this shows that m = \Theta(\epsilon^{-2} \log(n) + \epsilon^{-1} \log^2(n)), demonstrating that the \epsilon^{-2} and \log^2(n) terms do not multiply each other. We also show a nearly matching lower bound of m = \Omega(\eps^{-2} \log(1/(\delta)) + \eps^{-1} \log^q(1/(\delta))). In a number of applications, one has |\Omega| = \poly(n) and in this case our bounds are optimal up to a constant factor. This is the first high probability sketch for tensor products that has optimal sketch size and can be implemented in m \cdot \sum_{i=1}^q \textrm{nnz}(x_i) time, where \textrm{nnz}(x_i) is the number of non-zero entries of x_i. Lastly, we empirically compare our sketch to other sketches for tensor products, and give a novel application to compressing neural networks.