Multi-armed bandits are extensively used to model sequential decision-making, making them ubiquitous in many real-life applications such as online recommender systems and wireless networking. We consider a multi-agent setting where each agent solves their own bandit instance endowed with a different set of arms. Their goal is to minimize their group regret while collaborating via some communication protocol over a given network. Previous literature on this problem only considered arm heterogeneity and networked agents separately. In this work, we introduce a setting that encompasses both features. For this novel setting, we first provide a rigorous regret analysis for a standard flooding protocol combined with the classic UCB policy. Then, to mitigate the issue of high communication costs incurred by flooding in complex networks, we propose a new protocol called Flooding with Absorption (FwA). We provide a theoretical analysis of the resulting regret bound and discuss the advantages of using FwA over flooding. Lastly, we experimentally verify on various scenarios, including dynamic networks, that FwA leads to significantly lower communication costs despite minimal regret performance loss compared to other network protocols.
翻译:多臂赌博机广泛用于建模序贯决策,这使其在网络推荐系统和无线通信等许多实际应用中无处不在。我们考虑一个多智能体场景,其中每个智能体解决各自具有不同臂集的赌博机实例,其目标是通过给定网络上的通信协议协作,最小化群体遗憾。以往关于该问题的文献仅分别考虑了臂异质性和网络智能体。在本文中,我们引入了一个同时包含这两种特征的新场景。针对这一新场景,我们首先对标准洪泛协议与经典UCB策略的结合进行了严格的遗憾分析。然后,为缓解复杂网络中洪泛协议导致的高通信开销问题,我们提出了一种名为"带吸收的洪泛"(FwA)的新协议。我们对由此得到的遗憾界进行了理论分析,并讨论了使用FwA相比洪泛协议的优势。最后,我们在包括动态网络在内的多种场景下通过实验验证,与其他网络协议相比,FwA在实现最低性能遗憾损失的同时,能显著降低通信开销。