As robots become more integrated in society, their ability to coordinate with other robots and humans on multi-modal tasks (those with multiple valid solutions) is crucial. We propose to learn such behaviors from expert demonstrations via imitation learning (IL). However, when expert demonstrations are multi-modal, standard IL approaches can struggle to capture the diverse strategies, hindering effective coordination. Diffusion models are known to be effective at handling complex multi-modal trajectory distributions in single-agent systems. Diffusion models have also excelled in multi-agent scenarios where multi-modality is more common and crucial to learning coordinated behaviors. Typically, diffusion-based approaches require a centralized planner or explicit communication among agents, but this assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC-D, a Centralized Training, Decentralized Execution (CTDE) paradigm for multi-modal multi-agent imitation learning using diffusion policies. Agents are trained jointly with full information, but execute policies using only local information to achieve implicit coordination. We demonstrate in both simulation and hardware experiments that our method recovers multi-modal coordination behavior among agents in a variety of tasks and environments, while improving upon state-of-the-art baselines.
翻译:随着机器人在社会中日益普及,它们与其他机器人及人类在多模态任务(即存在多种有效解决方案的任务)上的协同能力变得至关重要。我们提出通过模仿学习从专家示范中学习此类行为。然而,当专家示范具有多模态特性时,标准模仿学习方法难以捕捉多样化的策略,从而阻碍有效协同。扩散模型已知能有效处理单智能体系统中的复杂多模态轨迹分布。扩散模型同样在多智能体场景中表现出色,因为多模态性在此类场景中更为常见且对学习协同行为至关重要。通常,基于扩散的方法需要集中式规划器或智能体间的显式通信,但这一假设在现实场景中可能失效,因为机器人必须独立运行或与无法直接通信的智能体(如人类)协同。为此,我们提出MIMIC-D,一种采用扩散策略的集中训练、分散执行范式,用于多模态多智能体模仿学习。智能体在训练阶段基于完整信息进行联合训练,而在执行阶段仅利用局部信息运行策略,以实现隐式协同。我们在仿真与硬件实验中证明,该方法能在多种任务和环境中恢复智能体间的多模态协同行为,同时性能优于现有先进基线方法。