Egocentric Human-Object Interaction (EHOI) analysis is crucial for industrial safety, yet the development of robust models is hindered by the scarcity of annotated domain-specific data. We address this challenge by introducing a data generation framework that combines synthetic data with a diffusion-based process to augment real-world images with realistic Personal Protective Equipment (PPE). We present GlovEgo-HOI, a new benchmark dataset for industrial EHOI, and GlovEgo-Net, a model integrating Glove-Head and Keypoint- Head modules to leverage hand pose information for enhanced interaction detection. Extensive experiments demonstrate the effectiveness of the proposed data generation framework and GlovEgo-Net. To foster further research, we release the GlovEgo-HOI dataset, augmentation pipeline, and pre-trained models at: GitHub project.
翻译:第一人称人-物交互分析对于工业安全至关重要,然而,标注领域特定数据的稀缺性阻碍了鲁棒模型的开发。我们通过引入一个数据生成框架来应对这一挑战,该框架将合成数据与基于扩散的过程相结合,为真实世界图像添加逼真的个人防护装备。我们提出了GlovEgo-HOI——一个用于工业第一人称人-物交互的新基准数据集,以及GlovEgo-Net——一个集成Glove-Head与Keypoint-Head模块以利用手部姿态信息来增强交互检测的模型。大量实验证明了所提出的数据生成框架与GlovEgo-Net的有效性。为促进进一步研究,我们在GitHub项目上发布了GlovEgo-HOI数据集、数据增强流程及预训练模型。