Successful collaboration requires team members to stay aligned, especially in complex sequential tasks. Team members must dynamically coordinate which subtasks to perform and in what order. However, real-world constraints like partial observability and limited communication bandwidth often lead to suboptimal collaboration. Even among expert teams, the same task can be executed in multiple ways. To develop multi-agent systems and human-AI teams for such tasks, we are interested in data-driven learning of multimodal team behaviors. Multi-Agent Imitation Learning (MAIL) provides a promising framework for data-driven learning of team behavior from demonstrations, but existing methods struggle with heterogeneous demonstrations, as they assume that all demonstrations originate from a single team policy. Hence, in this work, we introduce DTIL: a hierarchical MAIL algorithm designed to learn multimodal team behaviors in complex sequential tasks. DTIL represents each team member with a hierarchical policy and learns these policies from heterogeneous team demonstrations in a factored manner. By employing a distribution-matching approach, DTIL mitigates compounding errors and scales effectively to long horizons and continuous state representations. Experimental results show that DTIL outperforms MAIL baselines and accurately models team behavior across a variety of collaborative scenarios.
翻译:成功的协作要求团队成员保持协同,尤其在复杂的序列任务中。团队成员必须动态协调执行哪些子任务以及执行顺序。然而,现实世界中的约束(如部分可观测性和有限通信带宽)常导致协作效果欠佳。即使在专家团队中,同一任务也可能存在多种执行方式。为开发适用于此类任务的多智能体系统与人机协作团队,我们关注于多模态团队行为的数据驱动学习。多智能体模仿学习为从演示数据中学习团队行为提供了有前景的框架,但现有方法难以处理异构演示数据,因其假设所有演示均源于单一团队策略。为此,本文提出DTIL:一种分层多智能体模仿学习算法,旨在学习复杂序列任务中的多模态团队行为。DTIL通过分层策略表示每个团队成员,并以因子化方式从异构团队演示中学习这些策略。通过采用分布匹配方法,DTIL有效缓解了误差累积问题,并能适应长时程任务与连续状态表示。实验结果表明,DTIL在多类协作场景中均优于现有多智能体模仿学习基线方法,并能精确建模团队行为。