Egocentric action recognition is essential for healthcare and assistive technology that relies on egocentric cameras because it allows for the automatic and continuous monitoring of activities of daily living (ADLs) without requiring any conscious effort from the user. This study explores the feasibility of using 2D hand and object pose information for egocentric action recognition. While current literature focuses on 3D hand pose information, our work shows that using 2D skeleton data is a promising approach for hand-based action classification, might offer privacy enhancement, and could be less computationally demanding. The study uses a state-of-the-art transformer-based method to classify sequences and achieves validation results of 94%, outperforming other existing solutions. The accuracy of the test subset drops to 76%, indicating the need for further generalization improvement. This research highlights the potential of 2D hand and object pose information for action recognition tasks and offers a promising alternative to 3D-based methods.
翻译:第一人称视角动作识别对于依赖第一人称摄像头的医疗健康与辅助技术至关重要,因为它能够在无需用户主动参与的情况下,自动持续监测日常生活活动(ADLs)。本研究探索了利用二维手部和物体姿态信息进行第一人称视角动作识别的可行性。当前研究主要关注三维手部姿态信息,而我们的工作表明,使用二维骨架数据实现基于手部的动作分类是一种有前景的方法,它可能增强隐私保护,并且计算需求更低。本研究采用基于Transformer的最先进方法对序列进行分类,验证结果达到94%,优于其他现有方案。测试子集的准确率下降至76%,表明需要进一步提升泛化能力。本研究凸显了二维手部和物体姿态信息在动作识别任务中的潜力,为基于三维的方法提供了一种有前景的替代方案。