Manual assembly workers face increasing complexity in their work. Human-centered assistance systems could help, but object recognition as an enabling technology hinders sophisticated human-centered design of these systems. At the same time, activity recognition based on hand poses suffers from poor pose estimation in complex usage scenarios, such as wearing gloves. This paper presents a self-supervised pipeline for adapting hand pose estimation to specific use cases with minimal human interaction. This enables cheap and robust hand posebased activity recognition. The pipeline consists of a general machine learning model for hand pose estimation trained on a generalized dataset, spatial and temporal filtering to account for anatomical constraints of the hand, and a retraining step to improve the model. Different parameter combinations are evaluated on a publicly available and annotated dataset. The best parameter and model combination is then applied to unlabelled videos from a manual assembly scenario. The effectiveness of the pipeline is demonstrated by training an activity recognition as a downstream task in the manual assembly scenario.
翻译:手工装配工人面临日益复杂的工作任务。以人为中心的辅助系统可提供帮助,但作为使能技术的物体识别却阻碍了这些系统实现精细化的以人为本设计。同时,基于手部姿态的活动识别在复杂使用场景(如佩戴手套)中存在姿态估计效果不佳的问题。本文提出一种自监督流水线,通过最小化人工干预实现针对特定使用场景的手部姿态估计自适应。该方法能够实现低成本且鲁棒的手部姿态活动识别。该流水线包含三部分:基于通用数据集训练的通用手部姿态估计机器学习模型、考虑手部解剖约束的时空滤波模块,以及用于提升模型性能的再训练步骤。我们在公开标注数据集上评估了不同参数组合,并将最优参数与模型组合应用于手工装配场景中的未标注视频。通过在手工装配场景中将活动识别作为下游任务进行训练,验证了该流水线的有效性。