Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.
翻译:对自然界中对称模式的识别与分析已在多个科学领域催生了重大发现,例如物理学中引力定律的表述以及化学结构研究的进展。本文重点研究如何利用某些合作式多智能体强化学习问题中固有的欧几里得对称性,这种对称性在许多应用场景中普遍存在。我们首先形式化地描述了一类具有广义对称性概念的马尔可夫博弈子类,该类博弈允许对称最优值与策略的存在。受这些特性启发,我们设计了将对称约束作为归纳偏置嵌入的神经网络架构,用于多智能体行动者-评论家方法。这种归纳偏置使得模型在各种合作式多智能体强化学习基准测试中取得了优越的性能,并在具有重复对称模式的未见场景中展现出零样本学习与迁移学习等令人印象深刻的泛化能力。代码发布于:https://github.com/dchen48/E3AC。