Symmetry is an important inductive bias that can improve model robustness and generalization across many deep learning domains. In multi-agent settings, a priori known symmetries have been shown to address a fundamental coordination failure mode known as mutually incompatible symmetry breaking; e.g. in a game where two independent agents can choose to move "left'' or "right'', and where a reward of +1 or -1 is received when the agents choose the same action or different actions, respectively. However, the efficient and automatic discovery of environment symmetries, in particular for decentralized partially observable Markov decision processes, remains an open problem. Furthermore, environmental symmetry breaking constitutes only one type of coordination failure, which motivates the search for a more accessible and broader symmetry class. In this paper, we introduce such a broader group of previously unexplored symmetries, which we call expected return symmetries, which contains environment symmetries as a subgroup. We show that agents trained to be compatible under the group of expected return symmetries achieve better zero-shot coordination results than those using environment symmetries. As an additional benefit, our method makes minimal a priori assumptions about the structure of their environment and does not require access to ground truth symmetries.
翻译:对称性是一种重要的归纳偏置,可提升深度学习中诸多领域的模型鲁棒性与泛化能力。在多智能体场景中,先验已知的对称性已被证明能够解决一种称为"相互不兼容对称性破缺"的基本协调失败模式:例如,当两个独立智能体可选择向左或向右移动,且当两者选择相同或不同动作时分别获得+1或-1奖励的情形。然而,如何高效自动地发现环境对称性(尤其针对分散式部分可观测马尔可夫决策过程)仍是开放性问题。此外,环境对称性破缺仅为协调失败类型之一,这促使我们探索更易获取、更广泛的对称类别。本文提出一类尚未被探索的广义对称性,我们称之为"期望收益对称性",其包含环境对称性作为子群。研究表明,在期望收益对称性群作用下训练得到的兼容智能体,在零样本协调任务中显著优于仅基于环境对称性的方法。作为额外优势,该方法对智能体环境结构的先验假设极少,且无需获取真实对称性信息。