Symmetry is an important inductive bias that can improve model robustness and generalization across many deep learning domains. In multi-agent settings, a priori known symmetries have been shown to address a fundamental coordination failure mode known as mutually incompatible symmetry breaking; e.g. in a game where two independent agents can choose to move "left'' or "right'', and where a reward of +1 or -1 is received when the agents choose the same action or different actions, respectively. However, the efficient and automatic discovery of environment symmetries, in particular for decentralized partially observable Markov decision processes, remains an open problem. Furthermore, environmental symmetry breaking constitutes only one type of coordination failure, which motivates the search for a more accessible and broader symmetry class. In this paper, we introduce such a broader group of previously unexplored symmetries, which we call expected return symmetries, which contains environment symmetries as a subgroup. We show that agents trained to be compatible under the group of expected return symmetries achieve better zero-shot coordination results than those using environment symmetries. As an additional benefit, our method makes minimal a priori assumptions about the structure of their environment and does not require access to ground truth symmetries.
翻译:对称性作为一种重要的归纳偏置,能够提升模型在众多深度学习领域的鲁棒性与泛化能力。在多智能体场景中,先验已知的对称性已被证明能够解决一种被称为互不相容对称性破缺的基本协调失效问题;例如,在一个两个独立智能体可选择“左”或“右”移动的博弈中,当智能体选择相同动作时获得+1奖励,选择不同动作时获得-1奖励。然而,对于环境对称性的高效自动发现,特别是在去中心化部分可观测马尔可夫决策过程中,仍是一个开放性问题。此外,环境对称性破缺仅构成协调失效的一种类型,这促使我们寻求一类更易获取且更广泛的对称性。本文引入了一类此前未被探索的更广泛的对称性,我们称之为期望回报对称性,其包含环境对称性作为一个子群。我们证明,在期望回报对称性群下训练为兼容的智能体,比使用环境对称性的智能体实现了更好的零样本协调效果。作为额外优势,我们的方法对其环境结构做出最少的先验假设,且无需获取真实对称性信息。