We propose an object-centric recovery policy framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning. Previous behavior cloning (BC) methods rely heavily on a large amount of labeled data coverage, failing in unfamiliar spatial states. Without relying on extra data collection, our approach learns a recovery policy constructed by an inverse policy inferred from object keypoint manifold gradient in the original training data. The recovery policy serves as a simple add-on to any base visuomotor BC policy, agnostic to a specific method, guiding the system back towards the training distribution to ensure task success even in OOD situations. We demonstrate the effectiveness of our object-centric framework in both simulation and real robot experiments, achieving an improvement of 77.7% over the base policy in OOD. Project Website: https://sites.google.com/view/ocr-penn
翻译:我们提出了一种物体中心恢复策略框架,以应对视觉运动策略学习中分布外(OOD)场景带来的挑战。先前的行为克隆(BC)方法严重依赖大量标注数据的覆盖范围,在陌生的空间状态下会失效。我们的方法无需依赖额外的数据收集,而是通过从原始训练数据中的物体关键点流形梯度推断出的逆向策略来构建恢复策略。该恢复策略可作为任何基础视觉运动BC策略的简单附加组件,与具体方法无关,能够引导系统返回训练分布,从而确保即使在OOD情况下也能成功完成任务。我们在仿真和真实机器人实验中验证了所提出的物体中心框架的有效性,在OOD场景下相比基础策略实现了77.7%的性能提升。项目网站:https://sites.google.com/view/ocr-penn