Multi-task imitation learning (MTIL) has shown significant potential in robotic manipulation by enabling agents to perform various tasks using a unified policy. This simplifies the policy deployment and enhances the agent's adaptability across different contexts. However, key challenges remain, such as maintaining action reliability (e.g., avoiding abnormal action sequences that deviate from nominal task trajectories), distinguishing between similar tasks, and generalizing to unseen scenarios. To address these challenges, we introduce the Foresight-Augmented Manipulation Policy (FoAM), an innovative MTIL framework. FoAM not only learns to mimic expert actions but also predicts the visual outcomes of those actions to enhance decision-making. Additionally, it integrates multi-modal goal inputs, such as visual and language prompts, overcoming the limitations of single-conditioned policies. We evaluated FoAM across over 100 tasks in both simulation and real-world settings, demonstrating that it significantly improves IL policy performance, outperforming current state-of-the-art IL baselines by up to 41% in success rate. Furthermore, we released a simulation benchmark for robotic manipulation, featuring 10 task suites and over 80 challenging tasks designed for multi-task policy training and evaluation. See project homepage https://projFoAM.github.io/ for project details.
翻译:多任务模仿学习(MTIL)在机器人操作领域展现出巨大潜力,它使智能体能够通过统一策略执行多种任务。这简化了策略部署,并增强了智能体在不同情境下的适应性。然而,关键挑战依然存在,例如保持动作可靠性(例如避免偏离标称任务轨迹的异常动作序列)、区分相似任务以及泛化到未见场景。为应对这些挑战,我们提出了前瞻增强操作策略(FoAM),一种创新的MTIL框架。FoAM不仅学习模仿专家动作,还预测这些动作的视觉结果以增强决策能力。此外,它集成了多模态目标输入(如视觉和语言提示),克服了单条件策略的局限性。我们在仿真和真实环境中对FoAM进行了超过100项任务的评估,结果表明它显著提升了模仿学习策略的性能,在成功率上比当前最先进的模仿学习基线方法高出最多41%。此外,我们发布了一个机器人操作仿真基准测试,包含10个任务套件和超过80项专为多任务策略训练与评估设计的挑战性任务。项目详情请参见项目主页 https://projFoAM.github.io/。