We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.
翻译:本文研究多智能体线性随机赌博机问题的一个特定实例,即聚类多智能体线性赌博机。针对该设定,我们提出了一种新颖的算法,通过智能体间的高效协作来加速整体优化问题。在该贡献中,网络控制器负责估计网络的潜在聚类结构,并优化同一组内智能体间的经验共享机制。我们从遗憾最小化问题和聚类质量两方面提供了理论分析。在合成数据与真实数据上针对现有最优算法进行的实证评估表明,我们的方法具有有效性:该算法在显著改进遗憾最小化的同时,能够成功恢复真实的潜在聚类划分。