Research in model-based reinforcement learning has made significant progress in recent years. Compared to single-agent settings, the exponential dimension growth of the joint state-action space in multi-agent systems dramatically increases the complexity of the environment dynamics, which makes it infeasible to learn an accurate global model and thus necessitates the use of agent-wise local models. However, during multi-step model rollouts, the prediction of one local model can affect the predictions of other local models in the next step. As a result, local prediction errors can be propagated to other localities and eventually give rise to considerably large global errors. Furthermore, since the models are generally used to predict for multiple steps, simply minimizing one-step prediction errors regardless of their long-term effect on other models may further aggravate the propagation of local errors. To this end, we propose Models as AGents (MAG), a multi-agent model optimization framework that reversely treats the local models as multi-step decision making agents and the current policies as the dynamics during the model rollout process. In this way, the local models are able to consider the multi-step mutual affect between each other before making predictions. Theoretically, we show that the objective of MAG is approximately equivalent to maximizing a lower bound of the true environment return. Experiments on the challenging StarCraft II benchmark demonstrate the effectiveness of MAG.
翻译:近年来,基于模型的强化学习研究取得了显著进展。与单智能体场景相比,多智能体系统中联合状态-动作空间的指数级维度增长大幅提升了环境动力学的复杂性,使得学习精确的全局模型变得不可行,因而必须采用基于智能体的局部模型。然而,在多步模型展开过程中,一个局部模型的预测可能影响下一步中其他局部模型的预测。因此,局部预测误差会传播至其他局部区域,最终导致相当大的全局误差。此外,由于模型通常用于多步预测,若仅最小化单步预测误差而不考虑其对其他模型的长期影响,可能进一步加剧局部误差的传播。为此,我们提出模型作为智能体(MAG)框架,这是一种多智能体模型优化框架,该框架反向地将局部模型视为多步决策智能体,并将当前策略视为模型展开过程中的动力学机制。通过这种方式,局部模型在做出预测前能够考虑彼此之间的多步相互影响。理论分析表明,MAG的目标函数近似等价于最大化真实环境回报的下界。在具有挑战性的星际争霸II基准测试上的实验证明了MAG的有效性。