A service robot can provide a smoother interaction experience if it has the ability to proactively detect whether a nearby user intends to interact, in order to adapt its behavior e.g. by explicitly showing that it is available to provide a service. In this work, we propose a learning-based approach to predict the probability that a human user will interact with a robot before the interaction actually begins; the approach is self-supervised because after each encounter with a human, the robot can automatically label it depending on whether it resulted in an interaction or not. We explore different classification approaches, using different sets of features considering the pose and the motion of the user. We validate and deploy the approach in three scenarios. The first collects $3442$ natural sequences (both interacting and non-interacting) representing employees in an office break area: a real-world, challenging setting, where we consider a coffee machine in place of a service robot. The other two scenarios represent researchers interacting with service robots ($200$ and $72$ sequences, respectively). Results show that, even in challenging real-world settings, our approach can learn without external supervision, and can achieve accurate classification (i.e. AUROC greater than $0.9$) of the user's intention to interact with an advance of more than $3$s before the interaction actually occurs.
翻译:服务机器人若具备主动检测附近用户是否意图交互的能力,即可提前调整自身行为(例如通过明确示能状态表明可提供服务),从而提供更流畅的交互体验。本研究提出一种基于学习的方法,用于预测人类用户在交互实际发生前与机器人交互的概率;该方法具有自监督特性,因为机器人在与人类每次相遇后,可根据是否产生交互行为自动对交互事件进行标注。我们探索了不同分类方法,使用不同特征集考虑用户的姿态与运动信息。我们在三个场景中验证并部署了该方法:第一个场景收集了代表办公休息区员工的3442段自然序列(包含交互与非交互行为),这是真实环境中具有挑战性的场景,我们以咖啡机替代服务机器人进行实验;另外两个场景分别代表研究人员与服务机器人的交互(分别包含200段和72段序列)。结果表明,即使在具有挑战性的真实场景中,我们的方法无需外部监督即可自主学习,并能在交互实际发生前提前3秒以上准确分类用户的交互意图(AUROC大于0.9)。