Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters per skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.
翻译:任务参数化高斯混合模型(TP-GMM)是一种样本高效的学习以物体为中心的机器人操作任务的方法。然而,在实际应用中部署TP-GMM仍面临若干开放挑战。在本工作中,我们协同解决了三个关键挑战。首先,末端执行器速度是非欧几里得的,因此难以用标准GMM建模。为此,我们提出将机器人末端执行器速度分解为方向与大小两个分量,并使用黎曼GMM分别对它们进行建模。其次,我们利用分解后的速度从复杂的演示轨迹中对技能进行分割与排序。通过分割,我们进一步对齐了技能轨迹,从而将时间作为一种强大的归纳偏置加以利用。第三,我们提出了一种方法,能够从视觉观测中自动检测每个技能相关的任务参数。我们的方法仅需五次演示且仅使用RGB-D观测,即可学习复杂的操作任务。在RLBench上进行的大量实验评估表明,我们的方法实现了最先进的性能,且样本效率提升了20倍。所学策略能够泛化至不同环境、物体实例及物体位置,同时习得的技能具备可复用性。