Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to impressive progress. Still, imitation alone remains insufficient for tasks requiring reliable and precise movements, such as aligning and inserting objects. Our key insight is that chunked BC policies function as trajectory planners, enabling long-horizon tasks. Conversely, as they execute action chunks open-loop, they lack the fine-grained reactivity necessary for reliable execution. Further, we find that the performance of BC policies saturates despite increasing data. Reinforcement learning (RL) is a natural way to overcome this, but it is not straightforward to apply directly to action-chunked models like diffusion policies. We present a simple yet effective method, ResiP (Residual for Precise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with RL. The residual policy is trained via on-policy RL, addressing distribution shifts and introducing reactivity without altering the BC trajectory planner. Evaluation on high-precision manipulation tasks demonstrates strong performance of ResiP over BC methods and direct RL fine-tuning. Videos, code, and data are available at \url{https://residual-assembly.github.io}.
翻译:近期行为克隆(BC)领域的研究进展,如动作分块与扩散策略,已取得显著成果。然而,对于需要可靠且精确运动的任务(如物体对齐与插入),仅靠模仿学习仍显不足。我们的核心观点是:分块式BC策略可作为轨迹规划器实现长时程任务,但由于其以开环方式执行动作块,缺乏可靠执行所需的细粒度反应能力。此外,我们发现BC策略的性能会随数据量增加而饱和。强化学习(RL)是突破此局限的自然途径,但将其直接应用于扩散策略等动作分块模型存在困难。本文提出一种简洁有效的方法ResiP(精密操作的残差策略),通过为冻结的分块BC模型叠加一个经RL训练的完全闭环残差策略来规避这些挑战。该残差策略通过同策略RL进行训练,既能应对分布偏移,又能引入反应能力,同时保持BC轨迹规划器不变。在高精度操作任务上的评估表明,ResiP在BC方法与直接RL微调之上均展现出强劲性能。视频、代码及数据公开于\url{https://residual-assembly.github.io}。