Inferring object motion representations from observations enhances the performance of robotic manipulation tasks. This paper introduces a new paradigm for robot imitation learning that generates action sequences by reasoning about object motion from visual observations. We propose MBA (Motion Before Action), a novel module that employs two cascaded diffusion processes for object motion generation and robot action generation under object motion guidance. MBA first predicts the future pose sequence of the object based on observations, then uses this sequence as a condition to guide robot action generation. Designed as a plug-and-play component, MBA can be flexibly integrated into existing robotic manipulation policies with diffusion action heads. Extensive experiments in both simulated and real-world environments demonstrate that our approach substantially improves the performance of existing policies across a wide range of manipulation tasks.
翻译:从观察中推断物体运动表征能够提升机器人操作任务的性能。本文提出了一种机器人模仿学习的新范式,该范式通过从视觉观察中推理物体运动来生成动作序列。我们提出了MBA(动作先于行动)模块,这是一种新颖的模块,采用两个级联的扩散过程,在物体运动引导下分别进行物体运动生成和机器人动作生成。MBA首先基于观察预测物体的未来位姿序列,然后利用该序列作为条件来引导机器人动作的生成。MBA被设计为一个即插即用的组件,可以灵活地集成到现有具备扩散动作头的机器人操作策略中。在模拟和真实环境中的大量实验表明,我们的方法显著提升了现有策略在多种操作任务上的性能。