Robot manipulation has increasingly adopted data-driven generative policy frameworks, yet the field faces a persistent trade-off: diffusion models suffer from high inference latency, while flow-based methods often require complex architectural constraints. Although in image generation domain, the MeanFlow paradigm offers a path to single-step inference, its direct application to robotics is impeded by critical theoretical pathologies, specifically spectral bias and gradient starvation in low-velocity regimes. To overcome these limitations, we propose the One-step MeanFlow Policy (OMP), a novel framework designed for high-fidelity, real-time manipulation. We introduce a lightweight directional alignment mechanism to explicitly synchronize predicted velocities with true mean velocities. Furthermore, we implement a Differential Derivation Equation (DDE) to approximate the Jacobian-Vector Product (JVP) operator, which decouples forward and backward passes to significantly reduce memory complexity. Extensive experiments on the Adroit and Meta-World benchmarks demonstrate that OMP outperforms state-of-the-art methods in success rate and trajectory accuracy, particularly in high-precision tasks, while retaining the efficiency of single-step generation.
翻译:机器人操作领域日益广泛采用数据驱动的生成式策略框架,然而该领域始终面临一个权衡难题:扩散模型存在高推理延迟,而基于流的方法通常需要复杂的架构约束。尽管在图像生成领域,均值流范式为实现单步推理提供了可行路径,但其直接应用于机器人操作时受到关键理论缺陷的阻碍,特别是低频速度区域中的谱偏差与梯度匮乏问题。为突破这些局限,我们提出一步式均值流策略——一种专为高保真实时操作设计的新型框架。我们引入轻量级方向对齐机制,显式同步预测速度与真实均值速度。此外,我们实现了微分推导方程来近似雅可比向量积算子,通过解耦前向与反向传播显著降低内存复杂度。在Adroit和Meta-World基准测试上的大量实验表明,OMP在成功率和轨迹精度方面均优于现有先进方法,尤其在高精度任务中表现突出,同时保持了单步生成的高效性。