With the emergence of collaborative robots (cobots), human-robot collaboration in industrial manufacturing is coming into focus. For a cobot to act autonomously and as an assistant, it must understand human actions during assembly. To effectively train models for this task, a dataset containing suitable assembly actions in a realistic setting is crucial. For this purpose, we present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras, which represent potential viewpoints of a cobot. Since in an assembly context workers tend to perform different actions simultaneously with their two hands, we annotated the performed actions for each hand separately. Therefore, in the ATTACH dataset, more than 68% of annotations overlap with other annotations, which is many times more than in related datasets, typically featuring more simplistic assembly tasks. For better generalization with respect to the background of the working area, we did not only record color and depth images, but also used the Azure Kinect body tracking SDK for estimating 3D skeletons of the worker. To create a first baseline, we report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs. The dataset is available at https://www.tu-ilmenau.de/neurob/data-sets-code/attach-dataset .
翻译:随着协作机器人(cobots)的出现,工业制造中的人机协作正成为研究焦点。协作机器人若要作为助手自主行动,必须理解装配过程中的人类动作。为高效训练相关模型,具备真实环境下合适装配动作的数据集至关重要。为此,我们提出ATTACH数据集,该数据集包含51.6小时的装配过程、95.2k个精细标注动作,并由三个摄像头(代表协作机器人的潜在视角)进行监控。鉴于在装配场景中,工人常同时用双手执行不同动作,我们针对每只手分别标注了所执行的动作。因此,ATTACH数据集中超过68%的标注与其他标注存在重叠,其比例远超相关数据集(这些数据集通常包含更简单的装配任务)。为提升对工作区域背景的泛化能力,我们不仅记录了彩色图像与深度图像,还采用Azure Kinect人体追踪SDK估计了工人的三维骨骼。作为初步基准,我们报告了当前最先进的基于视频与骨骼序列输入的动作识别及动作检测方法的性能。该数据集可通过https://www.tu-ilmenau.de/neurob/data-sets-code/attach-dataset 获取。