In scene graph generation (SGG), learning with cross-entropy loss yields biased predictions owing to the severe imbalance in the distribution of the relationship labels in the dataset. Thus, this study proposes a method to generate scene graphs using optimal transport as a measure for comparing two probability distributions. We apply learning with the optimal transport loss, which reflects the similarity between the labels in terms of transportation cost, for predicate classification in SGG. In the proposed approach, the transportation cost of the optimal transport is defined using the similarity of words obtained from the pre-trained model. The experimental evaluation of the effectiveness demonstrates that the proposed method outperforms existing methods in terms of mean Recall@50 and 100. Furthermore, it improves the recall of the relationship labels scarcely available in the dataset.
翻译:在场景图生成(SGG)中,由于数据集中关系标签分布严重不平衡,使用交叉熵损失进行学习会产生有偏预测。因此,本研究提出一种方法,利用最优传输作为比较两个概率分布的度量来生成场景图。我们将基于最优传输损失(该损失通过传输成本反映标签间的相似性)的学习方法应用于SGG中的谓词分类。在提出的方法中,最优传输的传输成本使用预训练模型获得的词语相似性来定义。实验效果评估表明,所提方法在平均召回率@50和@100上优于现有方法。此外,它还能提高数据集中稀有关系标签的召回率。