Ad hoc teamwork requires an agent to cooperate with unknown teammates without prior coordination. Many works propose to abstract teammate instances into high-level representation of types and then pre-train the best response for each type. However, most of them do not consider the distribution of teammate instances within a type. This could expose the agent to the hidden risk of \emph{type confounding}. In the worst case, the best response for an abstract teammate type could be the worst response for all specific instances of that type. This work addresses the issue from the lens of causal inference. We first theoretically demonstrate that this phenomenon is due to the spurious correlation brought by uncontrolled teammate distribution. Then, we propose our solution, CTCAT, which disentangles such correlation through an instance-wise teammate feedback rectification. This operation reweights the interaction of teammate instances within a shared type to reduce the influence of type confounding. The effect of CTCAT is evaluated in multiple domains, including classic ad hoc teamwork tasks and real-world scenarios. Results show that CTCAT is robust to the influence of type confounding, a practical issue that directly hazards the robustness of our trained agents but was unnoticed in previous works.
翻译:临时团队协作要求智能体在无预先协调的情况下与未知队友合作。许多工作提出将队友实例抽象为类型的高级表征,并为每种类型预训练最优响应策略。然而,现有方法大多忽略了同一类型内队友实例的分布差异,这会使智能体面临隐藏的"类型混淆"风险。在最坏情况下,针对抽象队友类型的最优响应策略可能对该类型的所有具体实例均构成最差响应。本文从因果推断视角解决该问题:首先通过理论分析证明该现象源于非受控队友分布带来的虚假关联;进而提出CTCAT解决方案,通过实例级队友反馈修正来解耦此类关联。该修正机制对共享类型内队友实例的交互权重进行重标定,从而削弱类型混淆的影响。我们在经典临时团队协作任务和现实场景等多个领域评估了CTCAT的效果。结果表明,CTCAT对类型混淆的影响具有鲁棒性——这一实际问题直接威胁已训练智能体的稳定性,但在以往工作中未被充分认识。