Shared dynamics models are important for capturing the complexity and variability inherent in Human-Robot Interaction (HRI). Therefore, learning such shared dynamics models can enhance coordination and adaptability to enable successful reactive interactions with a human partner. In this work, we propose a novel approach for learning a shared latent space representation for HRIs from demonstrations in a Mixture of Experts fashion for reactively generating robot actions from human observations. We train a Variational Autoencoder (VAE) to learn robot motions regularized using an informative latent space prior that captures the multimodality of the human observations via a Mixture Density Network (MDN). We show how our formulation derives from a Gaussian Mixture Regression formulation that is typically used approaches for learning HRI from demonstrations such as using an HMM/GMM for learning a joint distribution over the actions of the human and the robot. We further incorporate an additional regularization to prevent "mode collapse", a common phenomenon when using latent space mixture models with VAEs. We find that our approach of using an informative MDN prior from human observations for a VAE generates more accurate robot motions compared to previous HMM-based or recurrent approaches of learning shared latent representations, which we validate on various HRI datasets involving interactions such as handshakes, fistbumps, waving, and handovers. Further experiments in a real-world human-to-robot handover scenario show the efficacy of our approach for generating successful interactions with four different human interaction partners.
翻译:共享动力学模型对于捕捉人机交互(HRI)固有的复杂性和可变性至关重要。因此,学习此类共享动力学模型能够增强协调性与适应性,从而实现与人类伙伴的成功反应式交互。本研究提出一种新颖方法,以专家混合(Mixture of Experts)的方式从演示数据中学习HRI的共享潜空间表示,从而根据人类观测反应式生成机器人动作。我们训练变分自编码器(VAE)学习机器人运动,并通过混合密度网络(MDN)构建能捕捉人类观测多模态性的信息化潜空间先验进行正则化。我们展示了该公式如何从高斯混合回归公式推导而来——后者通常用于基于演示的HRI学习方法(例如使用HMM/GMM学习人与机器人动作的联合分布)。我们进一步引入额外正则化以防止“模式坍塌”,这是在VAE中使用潜空间混合模型时的常见现象。实验表明,相较于以往基于HMM或循环神经网络的共享潜表示学习方法,我们采用基于人类观测的MDN先验信息训练VAE的方法能生成更精确的机器人运动。我们在包含握手、击拳、挥手及物品传递等多种交互形式的HRI数据集上验证了该结论。在真实世界人机物品传递场景中的进一步实验表明,我们的方法能与四位不同人类交互伙伴成功生成有效交互。