Robots can acquire complex manipulation skills by learning policies from expert demonstrations, which is often known as vision-based imitation learning. Generating policies based on diffusion and flow matching models has been shown to be effective, particularly in robotics manipulation tasks. However, recursion-based approaches are often inference inefficient in working from noise distributions to policy distributions, posing a challenging trade-off between efficiency and quality. This motivates us to propose FlowPolicy, a novel framework for fast policy generation based on consistency flow matching and 3D vision. Our approach refines the flow dynamics by normalizing the self-consistency of the velocity field, enabling the model to derive task execution policies in a single inference step. Specifically, FlowPolicy conditions on the observed 3D point cloud, where consistency flow matching directly defines straight-line flows from different time states to the same action space, while simultaneously constraining their velocity values, that is, we approximate the trajectories from noise to robot actions by normalizing the self-consistency of the velocity field within the action space, thus improving the inference efficiency. We validate the effectiveness of FlowPolicy on Adroit and Metaworld, demonstrating a 7$\times$ increase in inference speed while maintaining competitive average success rates compared to state-of-the-art policy models. Codes will be made publicly available.
翻译:机器人可通过从专家示范中学习策略来掌握复杂操作技能,该方法通常称为基于视觉的模仿学习。基于扩散模型和流匹配模型的策略生成已被证明是高效的方法,尤其在机器人操作任务中。然而,基于递归的方法在从噪声分布到策略分布的推理过程中往往效率低下,这导致效率与质量之间难以权衡。为此,我们提出FlowPolicy——一个基于一致性流匹配与三维视觉的快速策略生成新框架。该方法通过对速度场自一致性进行归一化来优化流动力学,使模型能够在单步推理中推导任务执行策略。具体而言,FlowPolicy以观测的三维点云为条件,其中一致性流匹配直接定义从不同时间状态到同一动作空间的直线流,同时约束其速度值,即我们通过在动作空间内对速度场的自一致性进行归一化,近似实现从噪声到机器人动作的轨迹建模,从而提升推理效率。我们在Adroit和Metaworld平台上验证了FlowPolicy的有效性,实验表明相较于最先进的策略模型,本方法在保持竞争力的平均成功率的同时,推理速度提升了7倍。代码将公开发布。