Conditional Neural Expert Processes for Learning Movement Primitives from Demonstration

from arxiv, This work has been submitted to the IEEE RA-L for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Submitted to Robotics and Automation Letters on July 5, 2024

Learning from Demonstration (LfD) is a widely used technique for skill acquisition in robotics. However, demonstrations of the same skill may exhibit significant variances, or learning systems may attempt to acquire different means of the same skill simultaneously, making it challenging to encode these motions into movement primitives. To address these challenges, we propose an LfD framework, namely the Conditional Neural Expert Processes (CNEP), that learns to assign demonstrations from different modes to distinct expert networks utilizing the inherent information within the latent space to match experts with the encoded representations. CNEP does not require supervision on which mode the trajectories belong to. We compare the performance of CNEP against widely used and powerful LfD methods such as Gaussian Mixture Models, Probabilistic Movement Primitives, and Stable Movement Primitives and show that our method outperforms these baselines on multimodal trajectory datasets. The results reveal enhanced modeling performance for movement primitives, leading to the synthesis of trajectories that more accurately reflect those demonstrated by experts, particularly when the skill demonstrations include intersection points from various trajectories. We evaluated the CNEP model on two real-robot tasks, namely obstacle avoidance and pick-and-place tasks, that require the robot to learn multi-modal motion trajectories and execute the correct primitives given target environment conditions. We also showed that our system is capable of on-the-fly adaptation to environmental changes via an online conditioning mechanism. Lastly, we believe that CNEP offers improved explainability and interpretability by autonomously finding discrete behavior primitives and providing probability values about its expert selection decisions.

翻译：从示教中学习是机器人技能获取中广泛使用的技术。然而，同一技能的示教可能表现出显著差异，或者学习系统可能尝试同时获取同一技能的不同实现方式，这使得将这些运动编码为运动基元具有挑战性。为解决这些挑战，我们提出了一种从示教中学习的框架，即条件神经专家过程，该框架学习将来自不同模态的示教分配给不同的专家网络，利用潜在空间中的固有信息将专家与编码表示进行匹配。条件神经专家过程不需要轨迹属于哪个模态的监督信息。我们将条件神经专家过程的性能与广泛使用且强大的从示教中学习方法（如高斯混合模型、概率运动基元和稳定运动基元）进行比较，结果表明我们的方法在多模态轨迹数据集上优于这些基线。结果揭示了运动基元建模性能的提升，从而合成了更准确反映专家示教的轨迹，特别是在技能示教包含来自不同轨迹的交点时。我们在两个真实机器人任务（即避障和抓放任务）上评估了条件神经专家过程模型，这些任务要求机器人学习多模态运动轨迹，并在给定目标环境条件下执行正确的基元。我们还展示了我们的系统能够通过在线条件机制实时适应环境变化。最后，我们相信条件神经专家过程通过自主发现离散行为基元并提供其专家选择决策的概率值，提供了更好的可解释性和可理解性。