NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper studies a linear bandit problem where the feature vectors of arms are unknown and drawn from known distributions. The change to the learning algorithm is relatively straightforward, LinUCB where the feature vector is the average feature vector. Then, if the optimal arm is the best arm on average with respect to the feature distribution, all algebra from linear bandits is expected to generalize and the authors indeed get similar results. This paper was discussed and all reviewers agree that it is a borderline. A closely related setting was studied in prior work. The algorithm and its analysis are not surprising and rather standard.