Symmetry is an important inductive bias that can improve model robustness and generalization across many deep learning domains. In multi-agent settings, a priori known symmetries have been shown to address a fundamental coordination failure mode known as mutually incompatible symmetry breaking; e.g. in a game where two independent agents can choose to move "left'' or "right'', and where a reward of +1 or -1 is received when the agents choose the same action or different actions, respectively. However, the efficient and automatic discovery of environment symmetries, in particular for decentralized partially observable Markov decision processes, remains an open problem. Furthermore, environmental symmetry breaking constitutes only one type of coordination failure, which motivates the search for a more accessible and broader symmetry class. In this paper, we introduce such a broader group of previously unexplored symmetries, which we call expected return symmetries, which contains environment symmetries as a subgroup. We show that agents trained to be compatible under the group of expected return symmetries achieve better zero-shot coordination results than those using environment symmetries. As an additional benefit, our method makes minimal a priori assumptions about the structure of their environment and does not require access to ground truth symmetries.
翻译:对称性是一种重要的归纳偏置,能够提升模型在许多深度学习领域的鲁棒性和泛化能力。在多智能体场景中,先验已知的对称性已被证明能够解决一种被称为互不相容对称性破缺的基本协调失效问题;例如,在一个游戏中,两个独立的智能体可以选择“左”或“右”移动,并且当智能体选择相同动作或不同动作时,分别获得+1或-1的奖励。然而,高效且自动地发现环境对称性,特别是针对分散式部分可观测马尔可夫决策过程,仍然是一个未解决的问题。此外,环境对称性破缺仅构成协调失效的一种类型,这促使我们寻找一种更易获取且更广泛的对称性类别。在本文中,我们引入了这样一个更广泛的、先前未被探索的对称性群,我们称之为期望回报对称性,其包含环境对称性作为一个子群。我们证明,在期望回报对称性群下训练为兼容的智能体,比使用环境对称性的智能体实现了更好的零样本协调效果。作为一个额外优势,我们的方法对其环境结构做出最少的先验假设,并且不需要访问真实的对称性。