Robotic assembly presents a long-standing challenge due to its requirement for precise, contact-rich manipulation. While simulation-based learning has enabled the development of robust assembly policies, their performance often degrades when deployed in real-world settings due to the sim-to-real gap. Conversely, real-world reinforcement learning (RL) methods avoid the sim-to-real gap, but rely heavily on human supervision and lack generalization ability to environmental changes. In this work, we propose a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations. The base policy, trained in simulation using low-level state observations and dense rewards, provides strong priors for initial behavior. The residual policy, learned in the real world using visual observations and sparse rewards, compensates for discrepancies in dynamics and sensor noise. Extensive real-world experiments demonstrate that our method, SPARR, achieves near-perfect success rates across diverse two-part assembly tasks. Compared to the state-of-the-art zero-shot sim-to-real methods, SPARR improves success rates by 38.4% while reducing cycle time by 29.7%. Moreover, SPARR requires no human expertise, in contrast to the state-of-the-art real-world RL approaches that depend heavily on human supervision.
翻译:机器人装配因其对精确、密集接触操作的要求而成为一个长期存在的挑战。虽然基于仿真的学习已能开发出鲁棒的装配策略,但由于仿真到现实的差距,这些策略在现实环境中部署时性能往往会下降。相反,现实世界强化学习方法避免了仿真到现实的差距,但严重依赖人工监督,并且缺乏对环境变化的泛化能力。在本工作中,我们提出了一种混合方法,将仿真训练的基础策略与现实世界残差策略相结合,以高效适应现实世界的变化。基础策略在仿真中使用低级状态观测和密集奖励进行训练,为初始行为提供了强有力的先验。残差策略则在现实世界中使用视觉观测和稀疏奖励进行学习,以补偿动力学和传感器噪声方面的差异。大量的现实世界实验表明,我们的方法SPARR在多种双部件装配任务中实现了接近完美的成功率。与最先进的零样本仿真到现实方法相比,SPARR将成功率提高了38.4%,同时将循环时间减少了29.7%。此外,SPARR无需人工专业知识,这与严重依赖人工监督的最先进现实世界强化学习方法形成鲜明对比。