Multiperspective Teaching of Unknown Objects via Shared-gaze-based Multimodal Human-Robot Interaction

For successful deployment of robots in multifaceted situations, an understanding of the robot for its environment is indispensable. With advancing performance of state-of-the-art object detectors, the capability of robots to detect objects within their interaction domain is also enhancing. However, it binds the robot to a few trained classes and prevents it from adapting to unfamiliar surroundings beyond predefined scenarios. In such scenarios, humans could assist robots amidst the overwhelming number of interaction entities and impart the requisite expertise by acting as teachers. We propose a novel pipeline that effectively harnesses human gaze and augmented reality in a human-robot collaboration context to teach a robot novel objects in its surrounding environment. By intertwining gaze (to guide the robot's attention to an object of interest) with augmented reality (to convey the respective class information) we enable the robot to quickly acquire a significant amount of automatically labeled training data on its own. Training in a transfer learning fashion, we demonstrate the robot's capability to detect recently learned objects and evaluate the influence of different machine learning models and learning procedures as well as the amount of training data involved. Our multimodal approach proves to be an efficient and natural way to teach the robot novel objects based on a few instances and allows it to detect classes for which no training dataset is available. In addition, we make our dataset publicly available to the research community, which consists of RGB and depth data, intrinsic and extrinsic camera parameters, along with regions of interest.

翻译：为了实现机器人在复杂环境中的成功部署，其对环境的理解至关重要。随着先进目标检测器性能的提升，机器人在其交互域内检测物体的能力也在不断增强。然而，这使机器人局限于少数训练好的类别，并阻碍其适应预定义场景之外的新环境。在这种情况下，人类可以在海量交互实体中协助机器人，通过扮演教师的角色传授必要的专业知识。我们提出了一种新颖的流程，在人机协作场景中有效利用人类目光和增强现实技术，教会机器人识别其周围环境中的新物体。通过将目光（引导机器人关注感兴趣物体）与增强现实（传达相应的类别信息）相结合，使机器人能够自主快速获取大量自动标注的训练数据。采用迁移学习方式进行训练后，我们展示了机器人检测新近学习物体的能力，并评估了不同机器学习模型、学习流程以及训练数据量对性能的影响。我们的多模态方法被证明是一种高效且自然的方式，能够基于少量实例教会机器人识别新物体，并使其能够检测尚无训练数据集的类别。此外，我们将包含RGB与深度数据、相机内参和外参以及感兴趣区域的公开数据集提供给研究社区。