In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.
翻译:在许多协作场景中,人工智能(AI)代理必须能够适应采用未知或前所未见策略的新队友。虽然这对人类而言通常很简单,但对AI代理来说可能具有挑战性。例如,如果一个AI代理学习与其他仅沿道路一侧行驶的车辆(训练集)协同驾驶,它可能难以将这种经验应用于与在相反侧行驶的驾驶员协调,即使他们的行为仅仅沿左右对称轴发生翻转。为了解决这个问题,我们引入了打破对称性的增强(SBA)方法,该方法通过应用对称翻转操作来增加训练队友行为的多样性。通过学习对增强后的队友集合的最佳响应,我们的代理能够接触到更广泛的行为惯例,从而在与新队友部署时提升性能。我们在两种设置中通过实验验证了这一点,并表明我们的方法在具有挑战性的纸牌游戏《花火》中优于以往的临时团队协作结果。我们还提出了一种通用指标,用于估计给定策略集合中的对称依赖性。