In this paper, we investigate the stochastic contextual bandit with general function space and graph feedback. We propose an algorithm that addresses this problem by adapting to both the underlying graph structures and reward gaps. To the best of our knowledge, our algorithm is the first to provide a gap-dependent upper bound in this stochastic setting, bridging the research gap left by the work in [35]. In comparison to [31,33,35], our method offers improved regret upper bounds and does not require knowledge of graphical quantities. We conduct numerical experiments to demonstrate the computational efficiency and effectiveness of our approach in terms of regret upper bounds. These findings highlight the significance of our algorithm in advancing the field of stochastic contextual bandits with graph feedback, opening up avenues for practical applications in various domains.
翻译:本文研究了具有一般函数空间和图反馈的随机上下文赌博机问题。我们提出了一种能同时适应底层图结构和奖励差距的算法。据我们所知,该算法首次在此随机设定下提供了与间隙相关的上界,填补了文献[35]留下的研究空白。与[31,33,35]相比,我们的方法在遗憾上界方面表现更优,且无需预先知晓图结构参数。我们通过数值实验证明了所提方法在计算效率和遗憾上界方面的有效性。这些结果突显了该算法在推动具有图反馈的随机上下文赌博机领域发展方面的重要意义,为各领域的实际应用开辟了新的途径。