Despite significant advancements in the field of multi-agent navigation, agents still lack the sophistication and intelligence that humans exhibit in multi-agent settings. In this paper, we propose a framework for learning a human-like general collision avoidance policy for agent-agent interactions in fully decentralized, multi-agent environments. Our approach uses knowledge distillation with reinforcement learning to shape the reward function based on expert policies extracted from human trajectory demonstrations through behavior cloning. We show that agents trained with our approach can take human-like trajectories in collision avoidance and goal-directed steering tasks not provided by the demonstrations, outperforming the experts as well as learning-based agents trained without knowledge distillation.
翻译:尽管多智能体导航领域取得了显著进展,但智能体在面对多智能体环境时仍缺乏人类所展现出的复杂性和智慧。本文提出一种框架,用于在完全去中心化的多智能体环境中学习类人通用避碰策略。该方法通过知识蒸馏与强化学习相结合,基于行为克隆从人类轨迹演示中提取专家策略,并以此塑造奖励函数。实验表明,采用该方法训练的智能体在避碰和目标导向导航任务中能够生成演示数据未提供的类人轨迹,其性能不仅超越专家策略,也优于未使用知识蒸馏训练的基于学习的智能体。