Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ - in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of \texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of \texttt{AsyncMATD}, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate.
翻译:近期研究从理论上证明了合作在多智能体强化学习(MARL)中的有益效应。在涉及$N$个智能体的设定中,这种有益效应通常表现为$N$倍的线性收敛加速,即达到特定收敛精度所需迭代次数的减少与$N$成正比。本文首次证明,在局部智能体更新存在异步延迟的MARL框架中,这一加速特性同样成立。具体而言,我们考虑一个策略评估问题,其中多个智能体通过与中央聚合器通信来协作评估一个共同策略。在此设定下,我们研究了\texttt{AsyncMATD}的有限时间收敛性,这是一种异步多智能体时序差分(TD)学习算法,其中智能体的局部TD更新方向受到有界异步延迟的影响。我们的主要贡献是对\texttt{AsyncMATD}进行了有限时间分析,在建立线性收敛加速的同时,明确了时变异步延迟对最终收敛速率的影响。