通过智能体间相对动力学发现协调联合选项 (Discovering Coordinated Joint Options via Inter-Agent Relative Dynamics)

Temporally extended actions improve the ability to explore and plan in single-agent settings. In multi-agent settings, the exponential growth of the joint state space with the number of agents makes coordinated behaviours even more valuable. Yet, this same exponential growth renders the design of multi-agent options particularly challenging. Existing multi-agent option discovery methods often sacrifice coordination by producing loosely coupled or fully independent behaviours. Toward addressing these limitations, we describe a novel approach for multi-agent option discovery. Specifically, we propose a joint-state abstraction that compresses the state space while preserving the information necessary to discover strongly coordinated behaviours. Our approach builds on the inductive bias that synchronisation over agent states provides a natural foundation for coordination in the absence of explicit objectives. We first approximate a fictitious state of maximal alignment with the team, the \textit{Fermat} state, and use it to define a measure of \textit{spreadness}, capturing team-level misalignment on each individual state dimension. Building on this representation, we then employ a neural graph Laplacian estimator to derive options that capture state synchronisation patterns between agents. We evaluate the resulting options across multiple scenarios in two multi-agent domains, showing that they yield stronger downstream coordination capabilities compared to alternative option discovery methods.

翻译：时间扩展动作增强了单智能体场景下的探索与规划能力。在多智能体场景中，联合状态空间随智能体数量呈指数级增长，使得协调行为更具价值。然而，这种指数增长特性也使多智能体选项的设计变得尤为困难。现有的多智能体选项发现方法往往通过生成松散耦合或完全独立的行为来牺牲协调性。为应对这些局限性，我们提出了一种新颖的多智能体选项发现方法。具体而言，我们设计了一种联合状态抽象方法，在压缩状态空间的同时保留发现强协调行为所需的信息。我们的方法基于这样的归纳偏置：在缺乏显式目标的情况下，智能体状态间的同步为协调提供了自然基础。我们首先近似计算与团队保持最大对齐的虚构状态——\textit{Fermat}状态，并利用它定义\textit{扩散度}度量，以捕捉每个独立状态维度上团队层面的失准情况。基于此表征，我们随后采用神经图拉普拉斯估计器来推导能够捕获智能体间状态同步模式的选项。我们在两个多智能体领域的多种场景中对所得选项进行评估，结果表明相较于其他选项发现方法，该方法能产生更强的下游协调能力。