The goal of this paper is to detect objects by exploiting their interrelationships. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly. We first propose a novel way of creating a graphical representation of an image from inter-object relation priors and initial class predictions, we call a context-likelihood graph. We then learn the joint distribution with an energy-based modeling technique which allows to sample and refine the context-likelihood graph iteratively for a given image. Our formulation of jointly learning the distribution enables us to generate a more accurate graph representation of an image which leads to a better object detection performance. We demonstrate the benefits of our context-likelihood graph formulation and the energy-based graph refinement via experiments on the Visual Genome and MS-COCO datasets where we achieve a consistent improvement over object detectors like DETR and Faster-RCNN, as well as alternative methods modeling object interrelationships separately. Our method is detector agnostic, end-to-end trainable, and especially beneficial for rare object classes.
翻译:本文旨在通过利用目标间的相互关系实现目标检测。与现有方法(将目标和关系分开学习)不同,我们的核心思想是联合学习目标-关系分布。首先提出一种创新方法,利用目标间先验关系与初始类别预测构建图像的图表示,我们称之为上下文似然图。随后采用基于能量的建模技术学习联合分布,该技术可针对给定图像迭代采样并细化上下文似然图。联合学习分布的公式设计使我们能生成更精确的图像图表示,进而提升目标检测性能。通过在Visual Genome和MS-COCO数据集上的实验,我们验证了上下文似然图公式与基于能量的图细化的优势:相比DETR、Faster-RCNN等目标检测器及其他独立建模目标间关系的替代方法,本方法取得了一致的性能提升。本方法具有检测器无关性、端到端可训练性,尤其有利于稀有类别的目标检测。