Imitation learning is a powerful paradigm for training robotic policies, yet its performance is limited by compounding errors: minor policy inaccuracies could drive robots into unseen out-of-distribution (OOD) states in the training set, where the policy could generate even bigger errors, leading to eventual failures. While the Data Aggregation (DAgger) framework tries to address this issue, its reliance on continuous human involvement severely limits scalability. In this paper, we propose WM-DAgger, an efficient data aggregation framework that leverages World Models to synthesize OOD recovery data without requiring human involvement. Specifically, we focus on manipulation tasks with an eye-in-hand robotic arm and only few-shot demonstrations. To avoid synthesizing misleading data and overcome the hallucination issues inherent to World Models, our framework introduces two key mechanisms: (1) a Corrective Action Synthesis Module that generates task-oriented recovery actions to prevent misleading supervision, and (2) a Consistency-Guided Filtering Module that discards physically implausible trajectories by anchoring terminal synthesized frames to corresponding real frames in expert demonstrations. We extensively validate WM-DAgger on multiple real-world robotic tasks. Results that our method significantly improves success rates, achieving a 93.3\% success rate in soft bag pushing with only five demonstrations. The source code is publicly available at https://github.com/czs12354-xxdbd/WM-Dagger.
翻译:模仿学习是训练机器人策略的强大范式,但其性能受到累积误差的限制:微小的策略偏差可能使机器人进入训练集中未见过的分布外状态,导致策略产生更大误差并最终失败。尽管数据聚合框架试图解决该问题,但其对持续人工参与的依赖严重限制了可扩展性。本文提出WM-DAgger,一种利用世界模型合成分布外恢复数据且无需人工参与的高效数据聚合框架。具体而言,我们聚焦于配备眼在手机械臂的操控任务,且仅需少量示范。为避免合成误导性数据并克服世界模型固有的幻觉问题,本框架引入两个关键机制:(1)矫正动作合成模块,生成面向任务的恢复动作以防止误导性监督;(2)一致性引导过滤模块,通过将终端合成帧锚定到专家示范中对应真实帧的方式,剔除物理上不可行的轨迹。我们在多个真实机器人任务上对WM-DAgger进行了广泛验证。结果表明,仅用五次示范,该方法在软包推拉任务中即实现93.3%的成功率,显著提升任务性能。源代码已开源至https://github.com/czs12354-xxdbd/WM-Dagger。