In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility and effectiveness of synthetic EHOI data produced by the proposed tool, we designed a new method that predicts and combines different multimodal signals to detect EHOIs in RGB images. Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data. Moreover, to fully understand the usefulness of our method, we conducted an in-depth analysis in which we compared and highlighted the superiority of the proposed approach over different state-of-the-art class-agnostic methods. To support research in this field, we publicly release the datasets, source code, and pre-trained models at https://iplab.dmi.unict.it/egoism-hoi.
翻译:本文针对工业场景下的自体中心人-物交互(Egocentric Human-Object Interaction, EHOI)检测问题展开研究。为克服当前该领域缺乏公开数据集的困境,我们提出了一套包含工具与流程的解决方案,可生成与多种标注及数据信号(如深度图或分割掩码)配对的EHOI合成图像。基于该流程,我们构建了EgoISM-HOI——一个包含工业环境中合成EHOI图像的多模态数据集,并提供丰富的手部与物体标注。为验证所提工具生成的合成EHOI数据的实用性与有效性,我们设计了一种新方法,通过预测并融合不同多模态信号来检测RGB图像中的EHOI。研究表明,利用合成数据对提出的方法进行预训练,可在真实世界数据测试中显著提升性能。此外,为全面理解方法价值,我们开展了深入分析,对比并凸显了本方法相较于多种前沿类别无关方法的优越性。为支持该领域研究,我们已在https://iplab.dmi.unict.it/egoism-hoi 公开数据集、源代码与预训练模型。