We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible. We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition. This opens up the possibility of efficient and regret-optimal incentivized exploration in high-dimensional action spaces. In the semibandit model, we also improve the sample complexity for the pre-Thompson sampling phase of initial data collection.
翻译:我们推进了激励式赌博机探索研究,其中臂选择被视为推荐,并要求满足贝叶斯激励相容性。近期研究表明,在特定独立性假设下,收集足够初始样本后,流行的汤普森采样算法将成为激励相容的。我们给出了线性赌博机模型的类似结果,其中先验独立性被一种自然凸性条件所替代。这开启了在高维动作空间中进行高效且遗憾最优的激励式探索的可能性。在半赌博机模型中,我们还改进了初始数据收集阶段中前汤普森采样阶段的样本复杂度。