Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. Compared to SGG, PSG has several challenging problems: pixel-level segment outputs and full relationship exploration (It also considers thing and stuff relation). Thus, current PSG methods have limited performance, which hinders downstream tasks or applications. The goal of this work aims to design a novel and strong baseline for PSG. To achieve that, we first conduct an in-depth analysis to identify the bottleneck of the current PSG models, finding that inter-object pair-wise recall is a crucial factor that was ignored by previous PSG methods. Based on this and the recent query-based frameworks, we present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects. Moreover, we also observed the sparse nature of object pairs for both Motivated by this, we design a lightweight Matrix Learner within the PPN, which directly learns pair-wised relationships for pair proposal generation. Through extensive ablation and analysis, our approach significantly improves upon leveraging the segmenter solid baseline. Notably, our method achieves over 10\% absolute gains compared to our baseline, PSGFormer. The code of this paper is publicly available at https://github.com/king159/Pair-Net.
翻译:全景场景图(PSG)是场景图生成(SGG)中一项具有挑战性的任务,其旨在利用全景分割而非边界框来创建更全面的场景图表示。与SGG相比,PSG面临若干难题:像素级分割输出以及完整关系探索(同时考虑物体与背景的关系)。因此,当前PSG方法性能有限,阻碍了下游任务或应用的发展。本工作的目标是设计一个新颖且强大的PSG基线方法。为此,我们首先进行深入分析以识别当前PSG模型的瓶颈,发现对象间配对召回率是先前PSG方法忽略的关键因素。基于此发现及近期基于查询的框架,我们提出一种新颖框架:配对后关联(Pair-Net),该框架使用配对提议网络(PPN)来学习并筛选主体与客体间的稀疏配对关系。此外,我们还观察到对象配对固有的稀疏性。受此启发,我们在PPN内设计了一个轻量级矩阵学习器,直接学习配对关系以生成配对提议。通过大量消融实验与分析,我们的方法在利用分割器坚实基线的基础上实现了显著提升。值得注意的是,相较于我们的基线方法PSGFormer,本方法取得了超过10%的绝对性能增益。本文代码已公开于https://github.com/king159/Pair-Net。