A long-standing challenge for a robotic manipulation system operating in real-world scenarios is adapting and generalizing its acquired motor skills to unseen environments. We tackle this challenge employing hybrid skill models that integrate imitation and reinforcement paradigms, to explore how the learning and adaptation of a skill, along with its core grounding in the scene through a learned keypoint, can facilitate such generalization. To that end, we develop Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models (KIS-GMM) approach that learns to predict the reference of a dynamical system within the scene as a 3D keypoint, leveraging visual observations obtained by the robot's physical interactions during skill learning. Through conducting comprehensive evaluations in both simulated and real-world environments, we show that our method enables a robot to gain a significant zero-shot generalization to novel environments and to refine skills in the target environments faster than learning from scratch. Importantly, this is achieved without the need for new ground truth data. Moreover, our method effectively copes with scene displacements.
翻译:真实场景中运行的机器人操控系统长期面临一个挑战:如何将其习得的运动技能适应并泛化到未知环境中。我们通过结合模仿与强化学习范式的混合技能模型来应对这一挑战,探究技能的习得与适应过程,以及通过学习关键点将技能核心锚定于场景中的机制如何促进此类泛化。为此,我们提出了关键点集成软演员-评论家高斯混合模型(KIS-GMM)方法,该方法通过学习在技能学习过程中机器人物理交互获得的视觉观测,预测场景中动态系统的参考点作为3D关键点。通过在模拟环境和真实环境中开展综合评估,我们证明该方法能使机器人获得显著的零样本泛化能力以适应新环境,且相比从头学习能更快地优化目标环境中的技能。重要的是,这一过程无需依赖新的真实数据标注。此外,我们的方法能有效应对场景位移干扰。