The multi-agent multi-armed bandit problem has been studied extensively due to its ubiquity in many real-life applications, such as online recommendation systems and wireless networking. We consider the setting where agents should minimize their group regret while collaborating over a given graph via some communication protocol and where each agent is given a different set of arms. Previous literature on this problem only considered one of the two desired features separately: agents with the same arm set communicate over a general graph, or agents with different arm sets communicate over a fully connected graph. In this work, we introduce a more general problem setting that encompasses all the desired features. For this novel setting, we first provide a rigorous regret analysis for the standard flooding protocol combined with the UCB policy. Then, to mitigate the issue of high communication costs incurred by flooding, we propose a new protocol called Flooding with Absorption (FWA). We provide a theoretical analysis of the regret bound and intuitions on the advantages of using FWA over flooding. Lastly, we verify empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.
翻译:多智能体多臂赌博机问题因其在在线推荐系统和无线网络等诸多实际应用中的普遍性而被广泛研究。我们考虑这样一个场景:智能体需通过某种通信协议在给定图上协作以最小化群体遗憾,且每个智能体被赋予不同的臂集。以往关于该问题的文献仅分别考虑了这两项期望特征之一:具有相同臂集的智能体在一般图上通信,或具有不同臂集的智能体在全连接图上通信。在本工作中,我们引入了一个更通用的问题设定,它涵盖了所有期望特征。针对这一新型设定,我们首先对标准泛洪协议与UCB策略的结合进行了严格的遗憾分析。随后,为缓解泛洪协议带来的高通信成本问题,我们提出了一种名为"带吸收的泛洪"(FWA)的新协议。我们提供了遗憾边界的理论分析,并阐述了使用FWA相较于泛洪的优势直觉。最后,我们通过实验验证,与泛洪相比,使用FWA能在遗憾性能损失极小的情况下显著降低通信成本。