Flow matching has emerged as a competitive framework for learning high-quality generative policies in robotics; however, we find that generalisation arises and saturates early along the flow trajectory, in accordance with recent findings in the literature. We further observe that increasing the number of Euler integration steps during inference counter-intuitively and universally degrades policy performance. We attribute this to (i) additional, uniformly spaced integration steps oversample the late-time region, thereby constraining actions towards the training trajectories and reducing generalisation; and (ii) the learned velocity field becoming non-Lipschitz as integration time approaches 1, causing instability. To address these issues, we propose a novel policy that utilises non-uniform time scheduling (e.g., U-shaped) during training, which emphasises both early and late temporal stages to regularise policy training, and a dense-jump integration schedule at inference, which uses a single-step integration to replace the multi-step integration beyond a jump point, to avoid unstable areas around 1. Essentially, our policy is an efficient one-step learner that still pushes forward performance through multi-step integration, yielding up to 23.7% performance gains over state-of-the-art baselines across diverse robotic tasks.
翻译:流匹配已成为学习高质量机器人生成策略的竞争性框架;然而,我们发现泛化能力沿流轨迹早期出现并快速饱和,这与近期文献中的发现一致。我们进一步观察到,在推理过程中增加欧拉积分步数会违反直觉且普遍地降低策略性能。我们将此归因于:(i) 额外均匀分布的积分步数对后期区域过采样,从而将动作限制在训练轨迹附近并降低泛化能力;(ii) 当积分时间接近1时,学习到的速度场变为非利普希茨连续,导致不稳定。为解决这些问题,我们提出一种新颖的策略,在训练阶段采用非均匀时间调度(例如U形调度)以同时强调早期和晚期时间阶段来正则化策略训练,并在推理阶段采用密集跳跃积分调度,即使用单步积分替代跳跃点之后的多步积分,从而避开时间1附近的不稳定区域。本质上,我们的策略是一种高效的单步学习器,同时仍通过多步积分提升性能,在多样化的机器人任务中相比最先进的基线方法实现了高达23.7%的性能提升。