Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. To improve the stability of the policy on real robots, we design a Frame-consistent Uncertainty-aware Sampling (FUS) strategy to get a condensed and hierarchical 3D representation. In addition, a single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation and shows great generalizability to novel categories and instances. Experimental results demonstrate the effectiveness of our framework in both simulation and real-world settings. Our code is available at https://github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation.
翻译:通过视觉反馈操控未见过的铰接物体是真实机器人面临的关键但具有挑战性的任务。现有基于学习的解决方案主要关注视觉可负担性学习或其他预训练视觉模型来引导操控策略,这些方法在真实场景中面对新实例时面临挑战。本文提出一种新颖的零件引导三维强化学习框架,无需示范即可学习操控铰接物体。我们结合二维分割与三维强化学习的优势,提升强化学习策略训练效率。为提高策略在真实机器人上的稳定性,我们设计了框架一致的不确定性感知采样(FUS)策略,以获得紧凑且层次化的三维表示。此外,单一通用的强化学习策略可同时在模拟环境中训练多个铰接物体操控任务,并展现出对新型类别和实例的强大泛化能力。实验结果表明,我们的框架在模拟和真实环境中均具有有效性。代码已开源在:https://github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation。