Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. However, different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations in the dataset, i.e. different predicates for same object pairs. Biased predicate annotations make PSG models struggle in constructing a clear decision plane among predicates, which greatly hinders the real application of PSG models. To address the intrinsic bias above, we propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones. To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities. Meanwhile, we continuously measure the distribution changes between each presentation and its prototype, and constantly screen potential biased data. Finally, with the unbiased predicate-prototype representation embedding space, biased annotations are easily identified. Experiments show that ADTrans significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on multiple datasets.
翻译:全景场景图生成(PSG)旨在解析目标并预测目标间的关系(谓词),以连接人类语言与视觉场景。然而,标注者不同的语言偏好以及谓词之间的语义重叠导致数据集中存在有偏的谓词标注,即同一目标对对应不同谓词。有偏的谓词标注使得PSG模型难以在谓词间构建清晰的决策平面,严重阻碍了PSG模型的实际应用。为了解决上述固有偏差,我们提出了一种名为ADTrans的新型框架,能够自适应地将有偏的谓词标注转换为信息丰富且统一的标注。为保证转换过程中的一致性与准确性,我们提出度量每个谓词类别中表示的不变性,并学习不同强度的无偏谓词原型。同时,我们持续度量每个表示与其原型之间的分布变化,并不断筛选潜在的有偏数据。最终,借助无偏的谓词-原型表示嵌入空间,可轻松识别有偏标注。实验表明,ADTrans显著提升了基准模型的性能,达到了新的最优结果,并在多个数据集上展现出强大的泛化性和有效性。