Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.
翻译:场景图生成(SGG)模型长期受到基准数据集固有问题的困扰,例如长尾谓词分布和缺失标注问题。本研究旨在利用未标注三元组缓解SGG的长尾问题。为此,我们提出面向SGG的自训练框架(ST-SGG),该框架为未标注三元组分配伪标签,并基于这些伪标签训练SGG模型。尽管自训练在图像识别领域已取得显著进展,但由于SGG任务语义歧义性和谓词类别长尾分布等固有特性,为其设计自训练框架更具挑战性。因此,我们提出一种针对SGG的新型伪标签技术——基于动量的类别自适应阈值方法(CATM),这是一种模型无关框架,可应用于任意现有SGG模型。此外,我们设计了一种图结构学习器(GSL),该学习器在将所提自训练框架应用于基于消息传递神经网络(MPNN)的最新SGG模型时尤为有效。大量实验表明,ST-SGG在多种SGG模型上具有有效性,尤其在提升细粒度谓词类别的性能方面表现突出。