This paper presents a finding that leveraging the hierarchical structures among labels for relationships and objects can substantially improve the performance of scene graph generation systems. The focus of this work is to create an informative hierarchical structure that can divide object and relationship categories into disjoint super-categories in a systematic way. Specifically, we introduce a Bayesian prediction head to jointly predict the super-category of relationships between a pair of object instances, as well as the detailed relationship within that super-category simultaneously, facilitating more informative predictions. The resulting model exhibits the capability to produce a more extensive set of predicates beyond the dataset annotations, and to tackle the prevalent issue of low annotation quality. While our paper presents preliminary findings, experiments on the Visual Genome dataset show its strong performance, particularly in predicate classifications and zero-shot settings, that demonstrates the promise of our approach.
翻译:本文提出一项发现:利用关系与物体标签间的层次结构,能显著提升场景图生成系统的性能。研究重点在于构建一种信息丰富的层级结构,通过系统化方式将物体与关系类别划分为不相交的超级类别。具体而言,我们引入贝叶斯预测头,以联合预测物体实例对间关系的超级类别及其在此超级类别内的具体关系类型,从而促成更具信息量的预测。该模型不仅能生成超出数据集标注范围的谓词集合,还能有效应对标注质量普遍偏低的问题。尽管本文仅呈现初步研究成果,但在Visual Genome数据集上的实验表明,该方法在谓词分类与零样本场景下展现出强劲性能,充分验证了其应用潜力。