NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper studies the problem of privacy-preserving text classification. More specifically, the authors develop a SMC based method for logistic regression and AdaBoost models. Experiments on a real dataset validate the performance of the proposed method. However, the current paper has some issues need to be addressed: 1.The proposed method can only protect the privacy at the inference procedure. To achieve this goal, some other privacy-preserving frameworks, such as differential privacy (DP), can also be used. For example, DP can directly add some small noise to protect the privacy at the inference procedure, which is very simple and efficient. In this sense, the contribution of the current paper is not very clear. 2.The proposed method can only be applied to some very specific models, like uni/bi gram logistic regression and AdaBoost models, which are not the state-of-the art models in both theory and practice for text classification. As a result, the proposed method may not have too much impact. 3.In experiments, there is no baseline result, i.e., the result for non-private method. And the results in terms of accuracy and time seem to be not good. --- After reading the author response, I agree that the proposed SMC framework is a contribution to the secure ML community, and I would like to increase my score.
Reviewer 2
The secret sharing method has been used a lot in multi-party computation. Yucel Saygin wrote several papers on secure clustering using secret sharing. In contrast, the authors force their computations to remain on a ring and could thus map to the 2 adic ring. The mapping and equations are straightforward and not surprising. But the contribution is nice and could stimulate future research. Writing is sound and clear.
Reviewer 3
The authors present a privacy-preserving protocol for learning text classifiers on short texts using secure multiparty communication (SMC). Unlike differential privacy under the central model, a more popular framework at the moment for making it difficult to distinguish the presence or absence of individuals in training data for a model, this protocol aims to ensure that a pretrained classifier may be used on new text data without leaking that data to the classifier's owner. Though the underlying classifier is not a SOTA solution to the test classification problem, hate speech detection, it is a nontrivial classifier of text and can classify a single example in a matter of seconds, substantially improving over the performance of approaches using homomorphic encryption. The authors test their approach on a collection of 10,000 tweets with binary labels describing whether they are hate speech, demonstrating the effectiveness of this tool in aiding automatic moderation of sensitive content. I want to be open that I am not an expert on SMC, and my primary knowledge of privacy-preserving ML is through differential privacy and natural language processing. However, I was impressed at the authors' effective tradeoff of the detail necessary to present their technical contribution in terms of secret-sharing protocols and corresponding proofs of correctness and clarity in their explanation of the role of SMC. While the end result is, at face value, a relatively specific classification algorithm (AdaBoost for short text applied to small hate speech), the building blocks presented here should provide some inspiration for more substantial efforts of this kind, and I'm excited to see where that leads. My biggest questions about the technical contribution of this paper related to the feature representations and transmission of information related to feature vectors, but I think I've mostly worked them out, and I just wanted to state them explicitly here to make sure I understand correctly: - The fact that feature vectors may be redundant (e.g., knowing which bigrams are present uniquely determines which unigrams are present if features have not been pruned) does not affect SMC privacy guarantees. I believe this guarantee comes from the way polynomials for secret sharing are assembled with respect to the total possible domain of values available for a vector. - While the total feature set available in the text collection is much larger than would typically be considered in a text classification setting, at any time only the lexicon of an individual tweet + the lexicon of the features used in Bob's classifer will be used in the lexicon merging phase, so $\pi_{FE}$ will only be considering hundreds of features at a time, not 123,482. This seems to be confirmed in Section 4.1. The chief obstacle to using this sort of method more regularly is that, while vastly faster than HE, several seconds per single classification with a simple classifier is still prohibitively slow for lots of industry applications, and given the size of machines used, it seems like scaling up the responsive servers for this sort of application might not solve the problem. One thing I wondered about was the specific operations that were most time- and memory-consumptive in applying a classifier...is it possible to get a sense of the timing information for the individual operations instead of full classifications? Would information about the amount of time taken to merge the lexicons to produce the feature vector be susceptible to a timing attack? A less important but still valuable gap this paper should consider filling is with regards to claims about relevant DP work in the field. While I concur that most work has focused on hiding training instances, not securely transmitting novel classification data, I think work towards locally-private online learning with DP done by e.g. Apple and Google is still relevant to this domain and bears slightly more development as a discussion topic with which to draw contrast. Smaller notes not affecting my score: - Some attention to phrasing in the first paragraph might be good: given that invasion of childrens' privacy through moderation isn't a totally uncontentious topic when e.g. they're dealing with toxic home situations or abusive relationships, it might not be good to frame it as a "clear benefit" - I think the citation in line 45 might be intended to be [14], not [33]? - The fact that a hashing technique could apply here is actually extremely promising, given hashed text tends to still work well for classifiers and, if features were hashed in the learned model, the small amount of noise added may even help regularize it slightly. It might be worth selling this more. Overall, I was really excited to read this paper, as it's polished work, and am hopeful for its future! -- Based on the author response, I am leaving my review as-is; I am grateful to the authors' address of the high-level points about DP in interaction with SMC but definitely want to make sure to keep their attention on the pieces above to clarify/explain more.