This paper presents a novel approach for inferring relationships between objects in visual scenes. It explicitly exploits an informative hierarchical structure that can be imposed to divide the object and relationship categories into disjoint super-categories. Specifically, our proposed method incorporates a Bayes prediction head, enabling joint predictions of the super-category as the type of relationship between the two objects, along with the detailed relationship within that super-category. This design reduces the impact of class imbalance problems. Furthermore, we also modify the supervised contrastive learning to adapt our hierarchical classification scheme. Experimental evaluations on the Visual Genome and OpenImage V6 datasets demonstrate that this factorized approach allows a relatively simple model to achieve competitive performance, particularly in predicate classification and zero-shot tasks.
翻译:本文提出了一种新颖的方法,用于推断视觉场景中物体之间的关系。该方法明确利用了可施加的信息化层次结构,将物体和关系类别划分为不相交的超级类别。具体而言,我们提出的方法引入了一个贝叶斯预测头,能够联合预测超级类别(作为两个物体之间关系的类型)以及该超级类别内的具体关系。这种设计减轻了类别不平衡问题的影响。此外,我们还修改了监督对比学习,以适应我们的层次分类方案。在Visual Genome和OpenImage V6数据集上的实验评估表明,这种分解方法使一个相对简单的模型能够获得具有竞争力的性能,尤其是在谓词分类和零样本任务中。