We study distributed reinforcement learning (RL) with policy gradient methods under asynchronous and parallel computations and communications. While non-distributed methods are well understood theoretically and have achieved remarkable empirical success, their distributed counterparts remain less explored, particularly in the presence of heterogeneous asynchronous computations and communication bottlenecks. We introduce two new algorithms, Rennala NIGT and Malenia NIGT, which implement asynchronous policy gradient aggregation and achieve state-of-the-art efficiency. In the homogeneous setting, Rennala NIGT provably improves the total computational and communication complexity while supporting the AllReduce operation. In the heterogeneous setting, Malenia NIGT simultaneously handles asynchronous computations and heterogeneous environments with strictly better theoretical guarantees. Our results are further corroborated by experiments, showing that our methods significantly outperform prior approaches.
翻译:我们研究了在异步并行计算与通信条件下的分布式强化学习(RL)策略梯度方法。尽管非分布式方法在理论上已得到充分理解并取得了显著的实证成功,但其分布式版本的研究仍相对不足,尤其是在存在异构异步计算和通信瓶颈的情况下。我们提出了两种新算法——Rennala NIGT 与 Malenia NIGT,它们实现了异步策略梯度聚合,并达到了最先进的效率。在同构设置中,Rennala NIGT 在支持 AllReduce 操作的同时,理论上能够改善总计算与通信复杂度。在异构设置中,Malenia NIGT 能够同时处理异步计算与异构环境,且具有严格更优的理论保证。实验结果进一步验证了我们的结论,表明我们的方法显著优于现有方法。