Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for human-robot collaboration. The use of machine learning techniques to handle such type of collaboration requires data that are less feasible to produce than in a typical HRC setup. This work outlines scenarios of concurrent tasks for non-dyadic HRC applications. Based upon these concepts, this study also proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing, to reduce the effort involved in producing recordings of pair settings. To validate this statement, 3D skeleton poses of activity of single users were collected and merged in pairs. After this, such datapoints were used to separately train a long short-term memory (LSTM) network and a variational autoencoder (VAE) composed of spatio-temporal graph convolutional networks (STGCN) to recognise the joint activities of the pairs of people. The results showed that it is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings, relieving from the technical difficulties involved in producing these data. The related code and collected data are publicly available.
翻译:人机交互(HRI)研究正逐步涉及多人场景,即机器人与多个用户同时互动。然而,人机协作(HRC)领域的研究仍处于早期阶段。利用机器学习技术处理此类协作需要的数据,其获取难度远高于典型HRC场景。本文为非对称二元HRC应用概述了并发任务场景,并基于此概念提出一种多用户活动数据采集的替代方法:通过收集单用户活动数据并在后期合并处理,以降低成对场景录制所需的工作量。为验证该方案,我们采集单用户活动的三维骨骼姿态数据并将其合并为成对数据。随后,利用这些数据分别训练长短期记忆(LSTM)网络和由时空图卷积网络(STGCN)构成的变分自编码器(VAE),以识别成对人的联合活动。结果表明,采用此方式收集的数据可用于成对HRC场景,且性能与使用同条件下录制的多人组训练数据相当,从而规避了数据采集中的技术难题。相关代码与收集的数据已公开提供。