Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.
翻译:由于样本复杂度高,仿真技术至今仍是强化学习成功应用的关键。然而,许多现实问题展现出过度复杂的动态特性,导致其全尺度仿真计算缓慢。本文展示了如何将包含大量智能体的大型网络化系统分解为多个局部组件,从而构建独立并行运行的分离式模拟器。为监测不同局部组件间的相互影响,每个模拟器配备了一个基于真实轨迹周期性训练的习得模型。实验结果表明,将仿真分布到不同进程不仅能在数小时内完成大型多智能体系统的训练,还有助于缓解同步学习带来的负面效应。