Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.
翻译:场景图生成(SGG)模型一直受到基准数据集固有问题的困扰,例如谓词的长尾分布和标注缺失问题。在本工作中,我们旨在通过利用未标注的三元组来缓解SGG的长尾问题。为此,我们引入了一种用于SGG的自训练框架(ST-SGG),该框架为未标注的三元组分配伪标签,并基于这些伪标签训练SGG模型。尽管图像识别的自训练已取得显著进展,但由于SGG任务本身存在语义模糊性和谓词类别的长尾分布等固有特性,为其设计自训练框架更具挑战性。因此,我们提出了一种新颖的SGG伪标签技术,称为基于动量的类别自适应阈值法(CATM),这是一个与模型无关的框架,可应用于任何现有的SGG模型。此外,我们还设计了一个图结构学习器(GSL),当将我们提出的自训练框架应用于最先进的基于消息传递神经网络(MPNN)的SGG模型时,该学习器尤为有益。我们的大量实验验证了ST-SGG在各种SGG模型上的有效性,特别是在提升细粒度谓词类别性能方面。