Imitation learning has emerged as an effective approach for bootstrapping sequential decision-making in robotics, achieving strong performance even in high-dimensional dexterous manipulation tasks. Recent behavior cloning methods further leverage expressive generative models, such as diffusion models and flow matching, to represent multimodal action distributions. However, policies pretrained in this manner often exhibit limited generalization and require additional fine-tuning to achieve robust performance at deployment time. Such adaptation must preserve the global exploration benefits of pretraining while enabling rapid correction of local execution errors.We propose \emph{Residual Flow Steering} (RFS), a data-efficient reinforcement learning framework for adapting pretrained generative policies. RFS steers a pretrained flow-matching policy by jointly optimizing a residual action and a latent noise distribution, enabling complementary forms of exploration: local refinement through residual corrections and global exploration through latent-space modulation. This design allows efficient adaptation while retaining the expressive structure of the pretrained policy.We demonstrate the effectiveness of RFS on dexterous manipulation tasks, showing efficient fine-tuning both in simulation and in real-world settings when adapting pretrained base policies.Project website:https://weirdlabuw.github.io/rfs.
翻译:模仿学习已成为机器人序列决策引导的有效方法,即使在高维灵巧操作任务中也展现出优异性能。近期的行为克隆方法进一步利用扩散模型与流匹配等表达能力强的生成模型来表征多模态动作分布。然而,以此方式预训练的策略通常泛化能力有限,需要在部署时通过额外微调才能实现鲁棒性能。此类适应过程需在保留预训练全局探索优势的同时,实现对局部执行误差的快速修正。本文提出\textbf{残差流导向}(RFS)——一种用于适应预训练生成策略的数据高效强化学习框架。RFS通过联合优化残差动作与潜在噪声分布来引导预训练的流匹配策略,实现两种互补的探索形式:通过残差修正进行局部优化,以及通过潜在空间调制实现全局探索。该设计在保留预训练策略表达结构的同时实现了高效适应。我们在灵巧操作任务上验证了RFS的有效性,结果表明在仿真与真实场景中,该方法均能对预训练基础策略实现高效微调。项目网站:https://weirdlabuw.github.io/rfs。