We study distributed reinforcement learning (RL) with policy gradient methods under asynchronous and parallel computations and communications. While non-distributed methods are well understood theoretically and have achieved remarkable empirical success, their distributed counterparts remain less explored, particularly in the presence of heterogeneous asynchronous computations and communication bottlenecks. We introduce two new algorithms, Rennala NIGT and Malenia NIGT, which implement asynchronous policy gradient aggregation and achieve state-of-the-art efficiency. In the homogeneous setting, Rennala NIGT provably improves the total computational and communication complexity while supporting the AllReduce operation. In the heterogeneous setting, Malenia NIGT simultaneously handles asynchronous computations and heterogeneous environments with strictly better theoretical guarantees. Our results are further corroborated by experiments, showing that our methods significantly outperform prior approaches.
翻译:我们研究在异步并行计算与通信条件下,基于策略梯度的分布式强化学习方法。尽管非分布式方法在理论上已被充分理解并取得显著实证成功,但其分布式对应方法——尤其在异构异步计算与通信瓶颈存在的情况下——仍探索不足。我们提出两种新算法Rennala NIGT与Malenia NIGT,分别实现异步策略梯度聚合并达到当前最优效率。在同构场景中,Rennala NIGT在支持AllReduce操作的同时,可证明地降低了总计算与通信复杂度。在异构场景中,Malenia NIGT以严格更优的理论保证同步处理异步计算与异构环境。实验结果进一步验证,我们的方法显著优于现有方案。