We introduce an approach to improve team performance in a Multi-Agent Multi-Armed Bandit (MAMAB) framework using Fastest Mixing Markov Chain (FMMC) and Fastest Distributed Linear Averaging (FDLA) optimization algorithms. The multi-agent team is represented using a fixed relational network and simulated using the Coop-UCB2 algorithm. The edge weights of the communication network directly impact the time taken to reach distributed consensus. Our goal is to shrink the timescale on which the convergence of the consensus occurs to achieve optimal team performance and maximize reward. Through our experiments, we show that the convergence to team consensus occurs slightly faster in large constrained networks.
翻译:本文提出一种利用最快混合马尔可夫链与最快分布式线性平均优化算法提升多智能体多臂老虎机框架下团队性能的方法。多智能体团队通过固定关系网络进行表征,并采用Coop-UCB2算法进行仿真。通信网络的边权重直接影响达成分布式共识所需的时间。本研究旨在压缩共识收敛的时间尺度,以实现最优团队性能与奖励最大化。实验结果表明,在大型受限网络中,团队共识的收敛速度略有提升。