Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. Recent approaches leveraging video prediction models have shown promising results by learning rich spatiotemporal representations from large-scale datasets. However, these models learn action-agnostic dynamics that cannot distinguish between different control inputs, limiting their utility for precise manipulation tasks and requiring large pretraining datasets. We propose a Dynamics-Aligned Flow Matching Policy (DAP) that integrates dynamics prediction into policy learning. Our method introduces a novel architecture where policy and dynamics models provide mutual corrective feedback during action generation, enabling self-correction and improved generalization. Empirical validation demonstrates generalization performance superior to baseline methods on real-world robotic manipulation tasks, showing particular robustness in OOD scenarios including visual distractions and lighting variations.
翻译:基于行为克隆的机器人学习方法因专家演示之外的数据支持有限而泛化能力较差。近期利用视频预测模型的方法通过从大规模数据集中学习丰富的时空表示,已展现出有前景的结果。然而,这些模型学习的是与动作无关的动力学,无法区分不同的控制输入,限制了其在精确操作任务中的实用性,且需要大量预训练数据集。我们提出了一种动力学对齐流匹配策略,将动力学预测整合到策略学习中。该方法引入了一种新颖的架构,其中策略模型和动力学模型在动作生成过程中提供相互校正反馈,从而实现自我修正并提升泛化能力。实证验证表明,在真实世界机器人操作任务中,该方法在泛化性能上优于基线方法,尤其在包含视觉干扰和光照变化的分布外场景中表现出显著的鲁棒性。