Humans have a remarkable ability to perceive and reason about the world around them by understanding the relationships between objects. In this paper, we investigate the effectiveness of using such relationships for object detection and instance segmentation. To this end, we propose a Relational Prior-based Feature Enhancement Model (RP-FEM), a graph transformer that enhances object proposal features using relational priors. The proposed architecture operates on top of scene graphs obtained from initial proposals and aims to concurrently learn relational context modeling for object detection and instance segmentation. Experimental evaluations on COCO show that the utilization of scene graphs, augmented with relational priors, offer benefits for object detection and instance segmentation. RP-FEM demonstrates its capacity to suppress improbable class predictions within the image while also preventing the model from generating duplicate predictions, leading to improvements over the baseline model on which it is built.
翻译:人类具有通过理解物体间关系来感知和推理周围世界的卓越能力。本文探究了利用此类关系进行目标检测与实例分割的有效性。为此,我们提出一种基于关系先验的特征增强模型(RP-FEM),这是一种利用关系先验增强目标提议特征的图变换器。该架构基于初始提议获得的场景图进行运算,旨在同时学习用于目标检测与实例分割的关系上下文建模。在COCO数据集上的实验评估表明,利用增强关系先验的场景图为目标检测与实例分割带来了性能提升。RP-FEM展现出抑制图像中不合理类别预测的能力,同时防止模型生成重复预测,从而在基线模型基础上实现了改进。