Scene Graph Generation (SGG) suffers from a long-tailed distribution, where a few predicate classes dominate while many others are underrepresented, leading to biased models that underperform on rare relations. Unbiased-SGG methods address this issue by implementing debiasing strategies, but often at the cost of spatial understanding, resulting in an over-reliance on semantic priors. We introduce Salience-SGG, a novel framework featuring an Iterative Salience Decoder (ISD) that emphasizes triplets with salient spatial structures. To support this, we propose semantic-agnostic salience labels guiding ISD. Evaluations on Visual Genome, Open Images V6, and GQA-200 show that Salience-SGG achieves state-of-the-art performance and improves existing Unbiased-SGG methods in their spatial understanding as demonstrated by the Pairwise Localization Average Precision
翻译:场景图生成(SGG)受长尾分布问题困扰,即少数谓词类别占据主导而多数类别样本不足,导致模型产生偏差并在稀有关系上表现不佳。无偏SGG方法通过实施去偏策略应对此问题,但常以牺牲空间理解为代价,导致对语义先验的过度依赖。本文提出Salience-SGG——一种配备迭代显著性解码器(ISD)的新型框架,该解码器专注于具有显著空间结构的三元组。为此,我们设计了语义无关的显著性标签来指导ISD。在Visual Genome、Open Images V6和GQA-200数据集上的评估表明,Salience-SGG取得了最先进的性能,并通过配对定位平均精度指标证明其提升了现有无偏SGG方法的空间理解能力。