Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Sattar Vakili, Nacime Bouziani, Sepehr Jalali, Alberto Bernacchia, Da-shan Shiu
Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function f. The problem can be cast as a Gaussian Process (GP) bandit where f lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When N is the number of exploration trials and γN is the maximal information gain, we prove an ˜O(√γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.