Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

Zero-shot coordination (ZSC) is a popular setting for studying the ability of reinforcement learning (RL) agents to coordinate with novel partners. Prior ZSC formulations assume the $\textit{problem setting}$ is common knowledge: each agent knows the underlying Dec-POMDP, knows others have this knowledge, and so on ad infinitum. However, this assumption rarely holds in complex real-world settings, which are often difficult to fully and correctly specify. Hence, in settings where this common knowledge assumption is invalid, agents trained using ZSC methods may not be able to coordinate well. To address this limitation, we formulate the $\textit{noisy zero-shot coordination}$ (NZSC) problem. In NZSC, agents observe different noisy versions of the ground truth Dec-POMDP, which are assumed to be distributed according to a fixed noise model. Only the distribution of ground truth Dec-POMDPs and the noise model are common knowledge. We show that a NZSC problem can be reduced to a ZSC problem by designing a meta-Dec-POMDP with an augmented state space consisting of all the ground-truth Dec-POMDPs. For solving NZSC problems, we propose a simple and flexible meta-learning method called NZSC training, in which the agents are trained across a distribution of coordination problems - which they only get to observe noisy versions of. We show that with NZSC training, RL agents can be trained to coordinate well with novel partners even when the (exact) problem setting of the coordination is not common knowledge.

翻译：零样本协调（ZSC）是一种用于研究强化学习（RL）智能体与陌生伙伴协调能力的常用设定。先前的ZSC公式假设$\textit{问题设定}$是公共知识：每个智能体都知道底层的Dec-POMDP，知道其他智能体也具备这一知识，并依此类推形成无限递归认知。然而，这一假设在复杂的现实场景中往往难以成立，因为现实环境通常难以被完整且准确地描述。因此，在公共知识假设失效的场景中，使用ZSC方法训练的智能体可能无法实现有效协调。为突破这一局限，我们提出了$\textit{噪声零样本协调}$（NZSC）问题。在NZSC框架下，智能体观测到真实Dec-POMDP的不同噪声版本，这些噪声版本被假定服从固定的噪声模型分布。仅有真实Dec-POMDP的分布和噪声模型作为公共知识。我们证明，通过设计具有扩展状态空间（包含所有真实Dec-POMDP）的元Dec-POMDP，可以将NZSC问题转化为ZSC问题。针对NZSC问题的求解，我们提出了一种简单灵活的元学习方法——NZSC训练，该方法通过在协调问题分布上进行训练来培养智能体（智能体仅能观测到问题的噪声版本）。实验表明，经过NZSC训练的RL智能体即使在协调问题的（精确）设定并非公共知识的情况下，仍能实现与陌生伙伴的有效协调。