We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.
翻译:本文研究多智能体线性随机赌博机问题的一个特例,称为聚类多智能体线性赌博机。针对该场景,我们提出了一种新型算法,通过利用智能体之间的高效协作来加速整体优化进程。在该贡献中,网络控制器负责估计网络的潜在聚类结构,并优化同一组内智能体之间的经验共享。我们从遗憾最小化问题和聚类质量两个维度提供了理论分析。通过对合成数据和真实数据与现有最优算法的实证比较,我们证明了所提方法的有效性:该算法在显著降低遗憾值的同时,还能准确恢复出真实的潜在聚类划分。