We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. Indeed, in many natural situations, while the nominal set of actions may be large, there also exist significant correlations between the rewards of different actions. In this paper we propose an algorithm that can significantly reduce the action space when such correlations are present, without the need to a-priori know the correlation structure. We provide theoretical guarantees on the performance of the algorithm and demonstrate its practical effectiveness through empirical comparisons with Thompson Sampling and Upper Confidence Bound methods.
翻译:本研究探讨了从赌博机家族共享的大动作空间中选择子集的问题,其目标在于实现与使用完整动作空间相近的性能。实际上,在许多自然场景中,虽然名义上的动作集合可能很大,但不同动作的奖励之间往往存在显著的相关性。本文提出一种算法,能够在存在此类相关性的情况下显著缩减动作空间,且无需预先了解相关性结构。我们为该算法的性能提供了理论保证,并通过与汤普森采样和上置信界方法的实证比较,展示了其实际有效性。