Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for human-robot collaboration. The use of machine learning techniques to handle such type of collaboration requires data that are less feasible to produce than in a typical HRC setup. This work outlines scenarios of concurrent tasks for non-dyadic HRC applications. Based upon these concepts, this study also proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing, to reduce the effort involved in producing recordings of pair settings. To validate this statement, 3D skeleton poses of activity of single users were collected and merged in pairs. After this, such datapoints were used to separately train a long short-term memory (LSTM) network and a variational autoencoder (VAE) composed of spatio-temporal graph convolutional networks (STGCN) to recognise the joint activities of the pairs of people. The results showed that it is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings, relieving from the technical difficulties involved in producing these data. The related code and collected data are publicly available.
翻译:人机交互(HRI)研究正逐步拓展至多人场景,即机器人与多名人类用户同时交互。然而,在人机协作(HRC)领域,相关研究仍处于早期阶段。利用机器学习技术处理此类协作需要比典型HRC设置更难生成的数据。本研究针对非二元HRC应用中并发任务场景进行概述,并基于这些概念提出一种替代方案:通过收集单用户活动数据并在后处理中将其合并,以降低双人场景录音采集的难度。为验证这一观点,我们采集了单用户活动的3D骨架姿态数据,并将其两两合并。随后,分别使用长短期记忆网络(LSTM)和由时空图卷积网络(STGCN)组成的变分自编码器(VAE)对这些数据点进行训练,以识别双人联合活动。结果表明,通过此方式收集的数据可用于双人HRC场景,且相比在相同条件下记录的群组用户训练数据,可获得相似性能,同时规避了生成此类数据的技术难点。相关代码与采集数据已公开发布。