As one of the simplest non-prehensile manipulation skills, pushing has been widely studied as an effective means to rearrange objects. Existing approaches, however, typically rely on multi-step push plans composed of pre-defined pushing primitives with limited application scopes, which restrict their efficiency and versatility across different scenarios. In this work, we propose a unified pushing policy that incorporates a lightweight prompting mechanism into a flow matching policy to guide the generation of reactive, multimodal pushing actions. The visual prompt can be specified by a high-level planner, enabling the reuse of the pushing policy across a wide range of planning problems. Experimental results demonstrate that the proposed unified pushing policy not only outperforms existing baselines but also effectively serves as a low-level primitive within a VLM-guided planning framework to solve table-cleaning tasks efficiently.
翻译:作为最简单的非抓取式操作技能之一,推动已被广泛研究为一种有效的物体重排手段。然而,现有方法通常依赖于由预定义推动基元组成的多步推动计划,这些基元的应用范围有限,从而限制了其在不同场景下的效率和通用性。在本工作中,我们提出了一种统一的推动策略,它将一种轻量级的提示机制融入流匹配策略,以指导生成反应式、多模态的推动动作。视觉提示可由高层规划器指定,使得该推动策略能够在广泛的规划问题中重复使用。实验结果表明,所提出的统一推动策略不仅优于现有基线方法,还能有效地作为VLM引导规划框架中的底层基元,高效解决桌面清理任务。