In this paper, we tackle the problem of Egocentric Human-Object Interaction (EHOI) detection in an industrial setting. To overcome the lack of public datasets in this context, we propose a pipeline and a tool for generating synthetic images of EHOIs paired with several annotations and data signals (e.g., depth maps or instance segmentation masks). Using the proposed pipeline, we present EgoISM-HOI a new multimodal dataset composed of synthetic EHOI images in an industrial environment with rich annotations of hands and objects. To demonstrate the utility and effectiveness of synthetic EHOI data produced by the proposed tool, we designed a new method that predicts and combines different multimodal signals to detect EHOIs in RGB images. Our study shows that exploiting synthetic data to pre-train the proposed method significantly improves performance when tested on real-world data. Moreover, the proposed approach outperforms state-of-the-art class-agnostic methods. To support research in this field, we publicly release the datasets, source code, and pre-trained models at https://iplab.dmi.unict.it/egoism-hoi.
翻译:本文针对工业环境中的自我中心人-物交互(EHOI)检测问题展开研究。为解决该场景下公开数据集匮乏的困境,我们提出了一套流水线工具,可生成与多种标注及数据信号(如深度图、实例分割掩码)配对的EHOI合成图像。利用该流水线,我们构建了EgoISM-HOI——一个包含工业环境中合成EHOI图像、附有丰富手部与物体标注的新型多模态数据集。为验证所提工具生成的合成EHOI数据的实用性与有效性,我们设计了一种新方法,通过预测并融合多种多模态信号,在RGB图像中检测EHOI。研究表明:在真实数据测试中,利用合成数据对方法进行预训练可显著提升性能;同时,本方法优于当前最优的类别无关方法。为支持该领域研究,我们已在https://iplab.dmi.unict.it/egoism-hoi 公开数据集、源代码及预训练模型。