Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. However, current PSG methods have limited performance, which can hinder downstream task development. To improve PSG methods, we conducted an in-depth analysis to identify the bottleneck of the current PSG models, finding that inter-object pair-wise recall is a crucial factor which was ignored by previous PSG methods. Based on this, we present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects. We also observed the sparse nature of object pairs and used this insight to design a lightweight Matrix Learner within the PPN. Through extensive ablation and analysis, our approach significantly improves upon leveraging the strong segmenter baseline. Notably, our approach achieves new state-of-the-art results on the PSG benchmark, with over 10% absolute gains compared to PSGFormer. The code of this paper is publicly available at https://github.com/king159/Pair-Net.
翻译:全景场景图(PSG)是场景图生成(SGG)中的一项具有挑战性的任务,其目标是通过使用全景分割而非边界框,构建更全面的场景图表示。然而,当前的PSG方法性能有限,这可能会阻碍下游任务的发展。为了改进PSG方法,我们进行了深入分析以识别当前PSG模型的瓶颈,发现目标间成对召回率是先前PSG方法忽略的关键因素。基于此,我们提出了一种新型框架:先配对再关系(Pair-Net),它使用配对提议网络(PPN)来学习和过滤主语与宾语之间的稀疏成对关系。我们还观察到目标对的稀疏特性,并利用这一洞察在PPN内设计了一个轻量级矩阵学习器(Matrix Learner)。通过广泛的消融实验和分析,我们的方法显著提升了强分割基线的效果。值得注意的是,我们的方法在PSG基准测试上取得了新的最先进结果,相比PSGFormer实现了超过10%的绝对增益。本文代码已公开于https://github.com/king159/Pair-Net。