Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting when utilizing known dueling bandit algorithms as a foundation. Additionally, we analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol, resulting in expedited exploration in many cases. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandit setting.
翻译:在分布式系统中,多臂赌博机已涌现出多种方法。多人对决赌博机问题常见于仅存在基于偏好的信息(如人类反馈)的场景,其挑战在于如何控制对非信息性臂对的协作探索,但该问题目前鲜有研究。为填补这一空白,我们证明当以已知的对决赌博机算法为基础时,直接采用"跟随领导者"黑盒方法能够匹配该场景的理论下界。此外,我们分析了一种采用新型孔多塞胜者推荐协议的消息传递全分布式方法,该方法在多数情况下能加速探索过程。实验对比表明,我们的多人算法超越了单玩家基准算法,这凸显了其在应对多人对决赌博机场景中复杂挑战时的有效性。