As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker's fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.
翻译:随着协作机器人在工业制造中的日益普及,有效的人机协作变得至关重要。协作机器人应能够识别人类的动作,以协助装配任务并自主运行。为实现这一目标,基于骨架的方法因其在不同人群和环境中的泛化能力而被广泛采用。尽管身体骨架方法在动作识别中应用广泛,但对于手指和手部起重要作用的装配动作而言,其精度可能不足。为解决这一局限性,我们提出一种方法,将低分辨率的身体骨架与高分辨率的手部骨架相结合。我们研究了CNN和Transformer模型,后者尤其擅长利用注意力机制从两种骨架类型中提取并融合关键信息。本文证明了所提方法在增强装配场景动作识别方面的有效性。