Detecting human-object interactions (HOIs) is an intricate challenge in the field of computer vision. Existing methods for HOI detection heavily rely on appearance-based features, but these may not fully capture all the essential characteristics necessary for accurate detection. To overcome these challenges, we propose an innovative graph-based approach called TMGHOI (Translational Model for Human-Object Interaction Detection). Our method effectively captures the sentiment representation of HOIs by integrating both spatial and semantic knowledge. By representing HOIs as a graph, where the interaction components serve as nodes and their spatial relationships as edges. To extract crucial spatial and semantic information, TMGHOI employs separate spatial and semantic encoders. Subsequently, these encodings are combined to construct a knowledge graph that effectively captures the sentiment representation of HOIs. Additionally, the ability to incorporate prior knowledge enhances the understanding of interactions, further boosting detection accuracy. We conducted extensive evaluations on the widely-used HICO-DET datasets to demonstrate the effectiveness of TMGHOI. Our approach outperformed existing state-of-the-art graph-based methods by a significant margin, showcasing its potential as a superior solution for HOI detection. We are confident that TMGHOI has the potential to significantly improve the accuracy and efficiency of HOI detection. Its integration of spatial and semantic knowledge, along with its computational efficiency and practicality, makes it a valuable tool for researchers and practitioners in the computer vision community. As with any research, we acknowledge the importance of further exploration and evaluation on various datasets to establish the generalizability and robustness of our proposed method.
翻译:人-物交互检测是计算机视觉领域的一项复杂挑战。现有HOI检测方法主要依赖基于外观的特征,但这些特征可能无法完全捕捉准确检测所需的所有关键特性。为克服这些挑战,我们提出了一种创新的图结构方法TMGHOI(人-物交互检测翻译模型)。该方法通过整合空间知识与语义知识,有效捕捉HOI的情感表征。我们将交互组件表示为节点,其空间关系表示为边,以此构建HOI的图结构。为提取关键的空间与语义信息,TMGHOI分别采用空间编码器和语义编码器,随后将这两种编码结果融合以构建知识图谱,从而有效捕获HOI的情感表征。此外,引入先验知识的能力增强了对交互关系的理解,进一步提升了检测精度。我们在广泛使用的HICO-DET数据集上进行了充分评估,验证了TMGHOI的有效性。所提方法以显著优势超越了现有最先进的图结构方法,展现了其作为HOI检测优越解的潜力。我们坚信TMGHOI有望显著提升HOI检测的准确性与效率。其整合空间与语义知识的能力,结合计算效率与实用性,使其成为计算机视觉领域研究人员与实践者的重要工具。当然,如同任何研究,我们承认需要在多种数据集上进一步探索与评估,以建立所提方法的泛化性与鲁棒性。