Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicolò Cesa-Bianchi
The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order min (ignoring logarithmic factors), where \alpha_{\varepsilon} and \delta_{\varepsilon} are graph-theoretic quantities measured on the support of the stochastic feedback graph \mathcal{G} with edge probabilities thresholded at \varepsilon. Our result, which holds without any preliminary knowledge about \mathcal{G}, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.