Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models. To address the intrinsic bias above, we propose a novel framework to infer potentially biased annotations by measuring the predicate prediction risks within each subject-object pair (domain), and adaptively transfer the biased annotations to consistent ones by learning invariant predicate representation embeddings. Experiments show that our method significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on PSG dataset.
翻译:全景场景图生成(PSG)涉及对象检测及其对应关系(谓词)的预测。然而,有偏的谓词标注给PSG模型带来了显著挑战,因为这阻碍了模型在不同谓词之间建立清晰的决策边界。该问题严重制约了PSG模型的实际应用与真实场景部署能力。针对上述固有偏差,我们提出了一种新颖框架,通过衡量每个主客体对(域)内的谓词预测风险来推断潜在有偏标注,并学习不变谓词表征嵌入,将有偏标注自适应地转换为一致标注。实验表明,我们的方法显著提升了基准模型的性能,取得了新的最佳性能,并在PSG数据集上展现了优秀的泛化能力与有效性。