In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.
翻译:本文提出ToMCAT(面向团队合作智能体的心智理论),这是一种生成心智理论条件轨迹的新框架。该框架结合了元学习机制与多智能体去噪扩散模型:元学习机制通过对队友潜在目标及未来行为进行心智理论推理,多智能体扩散模型则根据智能体自身目标及通过心智理论计算得出的队友特征,生成智能体及其队友的协同规划方案。我们实现了一个在线规划系统,当检测到先前生成的规划与当前环境状态存在偏差时,该系统能够动态地从扩散模型中采样新轨迹(重新规划)。通过在模拟烹饪领域开展多组实验,研究结果表明动态重规划机制能在保障团队绩效的同时显著降低资源消耗。我们还证明,智能体在任务执行过程中收集的近期环境观测数据与队友行为信息,结合心智理论推断,对于生成具备团队意识的动态适应规划至关重要——尤其在缺乏队友先验信息的情况下。