Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models. To address the intrinsic bias above, we propose a novel framework to infer potentially biased annotations by measuring the predicate prediction risks within each subject-object pair (domain), and adaptively transfer the biased annotations to consistent ones by learning invariant predicate representation embeddings. Experiments show that our method significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on PSG dataset.
翻译:全景场景图生成(Panoptic Scene Graph Generation, PSG)涉及物体的检测及其对应关系(谓词)的预测。然而,谓词标注中存在的偏差对PSG模型构成重大挑战,因为它阻碍了模型在不同谓词间建立清晰决策边界的能力。这一问题严重影响了PSG模型的实际效用与真实场景适用性。为应对上述固有偏差,我们提出一种新颖框架,通过测量每个主-客体对(域)内部的谓词预测风险来推断潜在偏差标注,并通过学习不变谓词表示嵌入,自适应地将偏差标注转换为一致标注。实验表明,我们的方法显著提升了基准模型的性能,取得了新的最优结果,并在PSG数据集上展现出强大的泛化能力与有效性。