Reviews: Stochastic Bandits with Context Distributions

This paper studies a linear bandit problem where the feature vectors of arms are unknown and drawn from known distributions. The change to the learning algorithm is relatively straightforward, LinUCB where the feature vector is the average feature vector. Then, if the optimal arm is the best arm on average with respect to the feature distribution, all algebra from linear bandits is expected to generalize and the authors indeed get similar results. This paper was discussed and all reviewers agree that it is a borderline. A closely related setting was studied in prior work. The algorithm and its analysis are not surprising and rather standard.

Paper ID:	7862
Title:	Stochastic Bandits with Context Distributions