Paper ID: | 817 |
---|---|
Title: | Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian |
This paper borrows established techniques from mathematical optimization to allow for Bayesian Optimization under mixed equality and inequality constraints. Specifically, the borrowed techniques revolve around the concept of the Augmented Lagrangian (AL), which is in this paper extended with slack variables. These variables improve on past approaches by making the expected improvement acquisition tractable while handling mixed constraints "natively".
This paper attacks a realistic problem falling in a very active research area. It is well-written, maintaining a good "flow" until the end while not compromising on giving details, thanks to the careful selection of pieces to be moved to the Appendix. To the best of my knowledge, past work has been explained adequately, drawbacks have been spotted and solutions have been explained well. It is particularly nice to see that the authors took care to "translate" some of the mathematical optimization concepts into a language more easily understood by the mainstream machine learning community. Concerning the contributions of this paper, I find them novel and significant. I got convinced that the proposed method deals well with the "gap" that the authors identified in the literature. Apart from the contributions claimed by the authors, I identify two related aspects which increase the value of the paper: firstly, that concepts from the mathematical optimization field are brought into machine learning. Secondly, I appreciate that the assumptions and choices made to derive the solution are almost always justified with short theoretical analyses. The experiments could certainly be more extensive (e.g. more challenging real-world datasets, demonstration of individual parts of the algorithm, identifying the limiting cases, run times etc). However, the ones already reported are well executed and convincing. I believe that one can always think of ways to improve a paper, however the authors seem to have put a lot of effort in this paper, whose current version will definitely benefit the community.
1-Less confident (might not have understood significant parts)
This paper describes a method for bayesian optimization that can be used to address single-objective problems that have several inequality or equality constraints. The proposed approach is based on working with an augmented Lagrangian so that in the end there is a single objective which has several penality terms depending on lagrange multipliers. The benefit of this simplified scheme is that now it is way easier to compute quantities such as the expected improvement, using the new objective. Th result is a very general bayesian optimization method. Such a method has been compared in several experiments with other techniques from the literature.
The paper is very well written and the idea proposed is expected to be of practival interest to the bayesian optimizaiton community. The experiments also seem exhaustive and the proposed approach is compared with important related methods. These experiments show the improved results that one can obtain by using the proposed method. In summary I believe that this is a well written paper that will find the interest of the NIPS community. The authors have not mentioned it explicetely. Can their mehtod address problems in which the evaluations made are noisy? It seems that their framework considers noisless evaluations only. The authors have indicated that they do average with respect to 100 realizations of the experiments. If that is the case I am wondering why they do not include error bars in their results.
2-Confident (read it all; understood it all reasonably well)
They authors propose an improvement of the Augmented Lagrangian approach [Gramacy et al. 2016] for constrained Bayesian optimization. The new approach is based on the introduction of slack variables. This eliminates some max operators in the original formulation of the augmented Lagrangian objective and replaces them with quadratic functions. The result is an objective whose distribution under the predictions of the Gaussian processes can be easily obtained. This distribution is a weighted sum of a sum of non-central chi-square variates. Computing the expected improvement with respect to this distribution is then very easy, unlike without the slack variables. The introduced slack variables also allow to handle equality constraints, something that no other BO method can directly do. The experiments performed illustrate the performance of the method with respect to some existing baselines.
Quality: The proposed approach is technically sound and the experiments performed illustrate its advantages with respect to existing baselines. One baseline that I miss is PESC implemented in the same way as EFI, that is, with the -h, h heuristic described in the introduction. Since the authors use an epsilon tolerance for satisfying the equality constraints, the same epsilon tolerance could be applied to the -h, h heuristic. PESC has been shown to be significantly better than EFI with inequality constraints. Since the authors compare with PESC in the first toy problem, it would have been very easy for them to include results for PESC also in the other experiments. The authors should clarify also that their approach requires the constrained functions to be observed without noise. Otherwise step 4 cannot be applied. The authors should discuss possible modifications to their method to address constraints with observation noise. Clarity: The paper is clearly written and the proposed method is well described. It is not clear what the double-weight problem on the equalities (line 36) is. Why is this a problem at all? If you include an epsilon tolerance as they do in their method in step 4 (line 178) this should not be a problem. The authors should clarify this. Originality: The method proposed is original by focusing on the case of optimization problems with equality constraints, which has not been addressed before in the literature. They also improve previous methods by proposing the use of slack variables. Significance: The experimental results show that the proposed technique is better than the previous version of the Augmented Lagrangian approach and than other existing techniques. The results would be more significant if they included also comparisons with PESC in a similar way as they do with EFI since PESC has been shown to be significantly better than EFI.
3-Expert (read the paper in detail, know the area, quite certain of my opinion)
This paper proposes an extension of Bayesian optimization to handle mixed constraints, namely equality and inequality constraints. The main idea is the reformulation of inequality constraints using slack variables and solve the resulting equality constrained problem through an augmented Lagrangian formulation.
The paper proposes a solution to an interesting problem, however I think that reporting results on a problem with several mixed constraints, where other approaches fail, would have strengthen the paper. The novelty of the ideas in the paper is perhaps limited (mostly following on from reference [10] and standard Lagrangian augmentation), but I liked the analysis and the resulting possibilities that the proposed method enables. Also, the possibility to integrate this into standard software packages, definitely strengthens the paper. The Authors argue that turning equality constraints into "double sided" inequality constraints may put too much weight on these and ultimately perform badly. I believe this should be investigated in the experiments, and I wonder how PESC would perform in these cases, given that it seems to work really well for inequality constraints. Overall, the problems that are considered in the experiments are rather low dimensional and with few constraints. I wonder how this impacts the relevance of this work, as it is currently not clear what the limitation of the proposed approach are when the number of dimensions and number of constraints grow.
2-Confident (read it all; understood it all reasonably well)
The authors present an approach for handling hard optimization problems including both inequality as well as equality constraints. They combine the ideas of the augmented Lagrangian and slack variables which leads to an approach that can handle both types of constraints and an expectation improvement criteria that does not need computationally intensive sampling.
The paper is very well presented. I was not familiar with large parts of the background used in the paper, but everything is well described making the paper easy to follow. The supplementary material is also a very nice addition to the paper. Excellent work.
2-Confident (read it all; understood it all reasonably well)
This manuscript proposes a Bayesian optimization approach to problems with mixed inequality and equality constraints. The main tool used in the approach is a slack-variable version of the augmented Lagrangian, which allows the incorporation of mixed constraints and has some advantages over the use of the augmented Lagrangian without slack variables. A detailed description of the proposed method is given, including guidance on how to efficiently solve various subproblems that arise via standard library routines (in this case in R). A set of experiments on various difficult problems demonstrate that the proposed method works better than several alternatives, including one proposed only earlier this year.
This manuscript is very interesting, extremely well written, and solves an important problem, namely Bayesian optimization with mixed constraints. The main contribution of the paper is an extension of the recently-published ALBO technique for incorporating inequality constraints into Bayesian optimization problems to handle equality constraints as well. The approach is to incorporate the equality constraints via the addition of slack variables into the augmented Lagrangian. Remarkably, the addition of these variables leads to some computational niceties even in the case that there are no equality constraints at all. Specifically, the calculation of expected improvement in this setting is simplified and may be resolved via quadrature rather than via Monte Carlo sampling. This is a nice result and discussed at length, including specific recommendations for implementation. Further, the authors provide guidance on how to solve for the slack variables, which removes the only real barrier to adopting the proposed slack-variable formulation wholesale. A set of well-designed and insightful experiments establish that the proposed method is superior to (in some cases markedly so) to alternatives, in terms of the performance as a function of the number of function evaluations. The paper is well-written and easy to read, with only a few minor issues, which I list below. The paper is also remarkably complete, providing a nice level of detail in all parts of the discussion, and helpful, more-detailed pointers in the supplementary material. I only have a few suggestions for how the paper could be strengthened. One is that I think some brief discussion regarding the wall-clock time spent computing the EI would be of interest. The authors do a good job convincing me that this is readily implemented and presumably quite fast, although I don't have a good intuition for how slow the "library routines" are. It would also be helpful to put this time in perspective with the time required by the Monte Carlo approximation used in ALBO. Again, I think this comparison would only be in the favor of the authors. Also regarding wall-clock time, it would be interesting to see how this compares end-to-end across the tested methods, perhaps on only one of the experiments. I also have a minor complaint about the computation of EI in (2). The EI in that expression is being computed against f_min, defined to be min {y_1 ... y_n}. This is not really correct due to the noise in the observations. Rather, f_min, if I am to interpret its meaning from the notation, is a random variable with an admittedly annoying distribution. In fact, I would hazard to guess that y_min is usually an optimistic estimate of f_min in Bayesian optimization settings (and things get even more troublesome if the noise is heteroskedastic!). The result is that the computation in (2) is measuring the value of points against an inflated benchmark, which I believe would lead to over-enthusiastic exploration. This is a well-known fact that is almost always swept under the rug, and indeed the formulation in (2) is widespread. Nonetheless, I feel compelled to complain on philosophical grounds. Minor corrections: - Line 45: I find "Here we dub that method ALBO" somewhat awkward. I might recast as "We will refer to this method as ALBO." - Lime 137: I find "re-develop" a bit awkward as well. Perhaps "rewrite?" - Under line 153: Please typeset (13) using the amsmath align environment rather than eqnarray, which is giving you unsightly spacing around the ='s. - I might suggest adding a little negative space to the _f subscripts used on Y and \sigma; they're typeset with too much space to my eyes (inspect 6\rho\sigma_f(x) on line 166 for an example). - Line 170: "how \lambda and and \rho updated" -> "how \lambda and \rho are updated" - The sans serif font should probably be scaled down a bit. I believe this is a problem with the NIPS style file and will email the proceedings chair.
3-Expert (read the paper in detail, know the area, quite certain of my opinion)