This paper presents XBG (eXteroceptive Behaviour Generation), a multimodal end-to-end Imitation Learning (IL) system for a whole-body autonomous humanoid robot used in real-world Human-Robot Interaction (HRI) scenarios. The main contribution of this paper is an architecture for learning HRI behaviours using a data-driven approach. Through teleoperation, a diverse dataset is collected, comprising demonstrations across multiple HRI scenarios, including handshaking, handwaving, payload reception, walking, and walking with a payload. After synchronizing, filtering, and transforming the data, different Deep Neural Networks (DNN) models are trained. The final system integrates different modalities comprising exteroceptive and proprioceptive sources of information to provide the robot with an understanding of its environment and its own actions. The robot takes sequence of images (RGB and depth) and joints state information during the interactions and then reacts accordingly, demonstrating learned behaviours. By fusing multimodal signals in time, we encode new autonomous capabilities into the robotic platform, allowing the understanding of context changes over time. The models are deployed on ergoCub, a real-world humanoid robot, and their performance is measured by calculating the success rate of the robot's behaviour under the mentioned scenarios.
翻译:本文提出XBG(外感知行为生成系统),一种用于现实世界人机交互场景中全身自主仿人机器人的多模态端到端模仿学习系统。本论文的核心贡献在于提出了一种基于数据驱动方法学习人机交互行为的架构。通过遥操作技术收集了涵盖多类人机交互场景的多样化演示数据集,包括握手、挥手、负载接收、行走及负重行走等行为。在对数据进行同步、滤波和变换处理后,训练了不同的深度神经网络模型。最终系统整合了包含外感知与本体感知信息源的多模态数据,使机器人能够理解环境状态及自身动作。机器人在交互过程中接收图像序列(RGB与深度信息)及关节状态信息,并据此作出响应以展示习得行为。通过多模态信号的时序融合,我们在机器人平台中编码了新的自主能力,使其能够理解随时间变化的上下文情境。相关模型已部署在真实仿人机器人ergoCub上,并通过计算上述场景中机器人行为的成功率来评估其性能。