The goal of this paper is to detect objects by exploiting their interrelationships. Rather than relying on predefined and labeled graph structures, we infer a graph prior from object co-occurrence statistics. The key idea of our paper is to model object relations as a function of initial class predictions and co-occurrence priors to generate a graph representation of an image for improved classification and bounding box regression. We additionally learn the object-relation joint distribution via energy based modeling. Sampling from this distribution generates a refined graph representation of the image which in turn produces improved detection performance. Experiments on the Visual Genome and MS-COCO datasets demonstrate our method is detector agnostic, end-to-end trainable, and especially beneficial for rare object classes. What is more, we establish a consistent improvement over object detectors like DETR and Faster-RCNN, as well as state-of-the-art methods modeling object interrelationships.
翻译:本文旨在通过利用目标间的相互关系来检测目标。我们不依赖预定义且带标注的图结构,而是从目标共现统计中推断出图先验。本文的核心思想是将目标关系建模为初始类别预测与共现先验的函数,从而生成图像的图表示,以改进分类和边界框回归。此外,我们还通过能量基建模学习目标-关系的联合分布。从该分布中采样可得到图像的精化图表示,进而提升检测性能。在Visual Genome和MS-COCO数据集上的实验表明,我们的方法具有检测器无关性、端到端可训练性,尤其对稀有目标类别效果显著。更重要的是,我们相较于DETR和Faster-RCNN等目标检测器,以及建模目标相互关系的最新方法,均取得了持续性的性能提升。