Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for human-robot collaboration. The use of machine learning techniques to handle such type of collaboration requires data that are less feasible to produce than in a typical HRC setup. This work outlines scenarios of concurrent tasks for non-dyadic HRC applications. Based upon these concepts, this study also proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing, to reduce the effort involved in producing recordings of pair settings. To validate this statement, 3D skeleton poses of activity of single users were collected and merged in pairs. After this, such datapoints were used to separately train a long short-term memory (LSTM) network and a variational autoencoder (VAE) composed of spatio-temporal graph convolutional networks (STGCN) to recognise the joint activities of the pairs of people. The results showed that it is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings, relieving from the technical difficulties involved in producing these data. The related code and collected data are publicly available.
翻译:人机交互(HRI)研究正逐步涉及多人场景,即机器人同时与多名人类用户交互。然而,人机协作(HRC)领域的研究仍处于早期阶段。利用机器学习技术处理此类协作需要的数据,其生成难度远超典型HRC设置。本研究提出了面向非二元HRC应用中并发任务的多场景框架。基于这些概念,本文还提出了一种替代性多用户活动数据采集方法:通过收集单用户数据并在后期处理中合并,以减少成对场景录制所需的工作量。为验证这一方法,我们采集了单用户活动的三维骨骼姿态数据并进行成对合并,随后分别使用长短期记忆网络(LSTM)和由时空图卷积网络(STGCN)构成的变分自编码器(VAE)对这些数据点进行独立训练,以识别成对用户的联合活动。结果表明,通过该方法采集的数据可用于成对HRC场景,且其性能与使用同场景下录制的群体用户训练数据相当,有效规避了数据生成过程中的技术难题。相关代码与采集的数据已公开提供。