NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:751
Title:Finding Friend and Foe in Multi-Agent Games

Reviewer 1


		
Originality: To the best of my knowledge, this is the first method that surpasses human performance in a multi-player game where actions can be hidden during the game. Most of the components are extensions of DeepStack [1], but new formalizations (e.g. deductive reasoning for inferring the private actions) and model architecture adjustments (e.g. win probability layer) were necessary for addressing the challenges of the game under consideration. Quality & Clarity: Both the experimental setup and the proposed algorithm are clearly presented. The paper is well written and self-contained for the most of it. Significance: The extension of CFR to hidden action games can be very impactful, and the steps to achieve this were non-trivial!

Reviewer 2


		
Originality is sufficient. They proposed a novel combination of existing methods: counterfactual regret minimization, value network trained during self-play, and reasoning. The quality of this paper is good. The authors are knowledgeable about this area. The evaluation; discussion is careful and insightful. The paper is clearly written.

Reviewer 3


		
The paper builds on well-known methods (CFR) and provides novel improvements and modifications that extend the approach to a multiplayer, hidden-role setting. This is original and novel and creative, though the crucial role of CFR cannot be understated. Related work appears to be adequately cited. The empirical results provide the main validation for the soundness and quality of the proposed algorithm; this is reasonable and is explained well in the paper. I have not spotted any obvious illogicalities or mistakes. The paper is mostly well-written and logically organized, and I think that with the supplement it would be possible to reproduce the results with some effort. Moreover, as part of their response the authors have made their code and data publicly available. The authors have also provided satisfactory responses to the following issues raised in the first version of this review: There is some unexplained notation: sigma_{I->a}^t on line 117, z[I] on line 118. Figure 5 and line 230 talk about the gradient of the "replicator dynamic," which is not explained in the text; while the figure is intuitively easy to interpret, its precise meaning is not adequately explained in the paper. On a higher level, I would liked to see a brief explanation of CFR(+) as I was not very familiar with its workings, but I understand that there is limited space available and the approach is well explained in the cited literature. I think the results are important both methodologically and empirically. Multiplayer games with hidden information present a very difficult field and this work presents significant advancements with very good empirical results. While the architecture does have some inherent scalability issues (due to the large number of distinct neural networks needed), this can possibly be addressed in future work, and anyhow the authors obtain their results with surprisingly small networks. I believe that this work is directly applicable to certain other games and scenarios and that this will be a good foundation for future research.