Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.
翻译:任务参数化高斯混合模型(TP-GMM)是一种用于学习以物体为中心的机器人操作任务的样本高效方法。然而,在实际应用中部署TP-GMM仍面临若干开放挑战。本研究协同解决了三个关键挑战。首先,末端执行器速度属于非欧几里得空间,难以用标准GMM建模。为此,我们提出将机器人末端执行器速度分解为方向与大小两个分量,并采用黎曼GMM分别建模。其次,我们利用分解后的速度对复杂演示轨迹进行技能分割与序列化。通过分割过程,我们进一步对齐技能轨迹,从而将时间作为强力的归纳偏置加以利用。第三,我们提出一种基于视觉观察自动检测各技能相关任务参数的方法。该方法仅需五次演示和RGB-D观测即可学习复杂操作任务。在RLBench上的大量实验评估表明,我们的方法以20倍的样本效率提升实现了最先进的性能。所学策略能够泛化至不同环境、物体实例与物体位置,同时习得的技能具备可复用性。