Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.
翻译:自然界中对称模式的识别与分析已在多个科学领域带来重要发现,例如物理学中引力定律的提出以及化学结构研究的进展。本文聚焦于挖掘某些合作多智能体强化学习(MARL)问题中固有的欧几里得对称性——这类对称性在众多应用中普遍存在。我们首先通过一种通用的对称性概念正式刻画了马尔可夫博弈的一个子类,该子类允许存在对称最优值与策略。受这些性质的启发,我们设计了一种将对称约束嵌入作为归纳偏置的神经网络架构,用于多智能体演员-评论家方法。该归纳偏置在多种合作MARL基准测试中展现出卓越性能,并在具有重复对称模式的未见场景中实现出色的泛化能力,例如零样本学习与迁移学习。代码开源地址:https://github.com/dchen48/E3AC。