Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.
翻译:自然界中对称模式的识别与分析已在诸多科学领域催生了重大发现,例如物理学中引力定律的提出以及化学结构研究的进展。本文聚焦于利用某些合作多智能体强化学习(MARL)问题中固有且广泛应用于各类场景的欧几里得对称性。首先,我们通过一种通用的对称性概念,正式刻画了一类马尔可夫博弈的子集,该子集承认对称最优值与策略的存在性。基于这些性质,我们设计了将对称约束作为归纳偏置嵌入的神经网络架构,用于多智能体演员-评论家方法。该归纳偏置在多个合作MARL基准测试中展现出卓越性能,并具备令人印象深刻的泛化能力,例如在具有重复对称模式的未见场景中实现零样本学习与迁移学习。代码见:https://github.com/dchen48/E3AC。