NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Originality: To our knowledge, the idea of reformulation of the problem and the proposed benchmarks is new. Quality: While a contribution is the speedup, there is no result about the time complexity. Clarity: The paper is well-organized and easy to understand the problem setting, but the descriptions of the reformulation and the proposed algorithm are not so easy to read. Significant: While the problem setting is for a limited setting, two-player and no chance move, the proposed algorithm and benchmarks are important. The results, however, seem to be suitable for the game theory community rather than the machine learning community. In what follows, I have the detailed comments w.r.t the significant, the quality, and the clarity: -My main concern is that the machine learning community may not be interested in the work. The proposed benchmarks are interesting for the game theory community. The author should describe the reasons why finding an EFCE is interesting for the machine learning community, or proposes other benchmarks that is suitable for machine learning tasks. -The main difference of the proposed algorithm from the previous work [Von Stengel and Forges, 2008] is the efficiency. Thus, the author should describe about the time complexity of the proposed algorithm. -Is there any theorem about the convergence property or the convergence rate of the proposed algorithm? -The authors should denote the setting of two-player and no chance move in the title. -The readers may easily understand the proposed algorithm if pseudocodes are given.
Reviewer 2
This was an overall nice and fun paper to read. The problem was well-motivated and the background well-covered. Possibly some important definitions were skipped (such as sequence form) but with space constraints I feel the authors chose the right level of abstraction for presentation (leaving the right amount of detail for the appendix). One (fairly minor) concern is that the scope is a bit narrow. There is a small community working on these notions of equilibria, and addressing only the 2-player case without chance feels like a bit restrictive, given previous simple algorithms that solve the n-player setting such as Dudik & Gordon. Another concern is that the algorithm is fairly complicated, more so than von Stengel and Forges, which is already fairly complex. Implementing this may be quite involved. If accepted, I would encourage the authors to publish source code; this may help promote the adoption of the benchmarks as well. The benchmark games are indeed interesting. I wonder if there might be any n-player variants where an EFCE would demonstrate a similar pattern of sequential codes (even if they cannot be computed by this algorithm). Any thoughts? Another few suggestions, questions, and comments: - "Contributions: Our primary objective with this paper is to spark more interest in the community towards a deeper understanding of the behavioral and computational aspects of EFCE." This was a refreshing honest statement of the goals, thank you! It set a nice tone for the paper, which I felt delivered on this front. - Is it possible to use a form of online convex optimization, as proposed in Zinkevich 2003 or one of the many in Hazan '16? I seem to recall several methods for projecting back into the feasible sets (ZInkevich uses L2 as well, so possibly the greedy algorithm or its adversarial variant GIGA could be applied here?) - Along similar lines, there was work done on early Poker AI from Gilpin and Hoda and later Kroer, that proposed using Nesterov's excessive gap for two-player zero-sum games (a similar saddle-point formulation). In particular, they develop "treeplex" constraint sets (similar in structure to \chi_1 and \chi_2) and define prox functions that enable the smoothened gradient descent to be efficiently expressed. I wonder if these ideas could now be carried over to solve EFCEs given the reformulation. - I wonder if it another possibility is to use iterative learning, such as fictitious play (which has efficient sequential variants now, see Fictitious Self-Player of Heinrich, Lanctot and Silver '15 and Heinrich & Silver '16, which essentially uses the sequence-form to compute the updates correctly and compactly) - Finally, if approximate solutions are acceptable then it seems another potential route is open up that is even simpler: end-to-end learning with neural networks. The feasibility constraints could be softly implemented by additional L2 loss on the optimization criteria. Might guarantees still be possible with linear models? Even if not, the simplicity of this is very attractive, and there has been a lot of new ideas for training in this saddle point optimization from the adversarial training and generative models community over the last few years.
Reviewer 3
***Originality and Significance*** It is the first time I heard of the concept of EFCE, and it strikes me as a reasonable solution concept which deserves more study. The proposed formulation as bilinear saddle-point problem should be a novel contribution, and even without the experimental simulations I can easily convince myself that it will lead to faster computations of EFCE. I think this contribution itself is sufficient to get the paper accepted in NeurIPS. ***Clarity*** The paper is nicely written. ***Other Major Comments*** As a theoretical researcher, I think the authors should provide a comparison between the LP formulation and the saddle-point formulation. For instance, I would love to see the LP formulation of von Stengel and Forges presented here, so that I can see the structure (e.g., number of variables and constraints; sparse or not) of the LP, which will provide readers more intuition why the saddle-point formulation is more efficient in practice. Also, a very natural question that the author needs to clarify is: does the LP formulation of von Stengel and Forges permit a purely-mathematical reduction to the saddle-point formulation? I ask this because it is known that Nash equilibrium of two-person zero-sum game permits the von-Neumann LP formulation and a saddle-point formulation, for which the LP duality theory establishes the linkage between them. It appears plausible to me this analog to zero-sum game might hint at something more interesting. I am a bit skeptical about the proposed benchmark games. As we see in the experimental results, the run-time becomes very long even when the Battleship game size is raised slightly (and still nowhere close to the real Battleship game we use to play), suggesting the possibility that this benchmark game can hardly be scalable computationally. ====== *** Comments after Rebuttal *** The discussion about (conversion between) LP and saddle-point formulation sounds more interesting to me than the benchmark games and experiments (which took up 3 pages out of 8), so I suggest moving Appendix A to the main paper --- and if the paper is accepted, there will be an extra page available, so no sacrifice on other materials will be needed. I understood that the game states and actions increase rapidly (probably super-exponentially?) in the benchmark games of Battleship, and this is exactly the (more theoretical) reason why I think the benchmark game can hardly be scalable computationally. Abstraction and approximation techniques coming into play might be plausible; since I am not expert on these techniques, I can't gauge how likely these improvements will happen in the future.